Please use this identifier to cite or link to this item: https://doi.org/10.1093/bib/bbab256
DC FieldValue
dc.titleComputationally scalable regression modeling for ultrahigh-dimensional omics data with ParProx
dc.contributor.authorKo, Seyoon
dc.contributor.authorLi, Ginny X.
dc.contributor.authorChoi, Hyungwon
dc.contributor.authorWon, Joong-Ho
dc.date.accessioned2022-10-11T08:07:20Z
dc.date.available2022-10-11T08:07:20Z
dc.date.issued2021-07-13
dc.identifier.citationKo, Seyoon, Li, Ginny X., Choi, Hyungwon, Won, Joong-Ho (2021-07-13). Computationally scalable regression modeling for ultrahigh-dimensional omics data with ParProx. Briefings in bioinformatics 22 (6). ScholarBank@NUS Repository. https://doi.org/10.1093/bib/bbab256
dc.identifier.issn1477-4054
dc.identifier.urihttps://scholarbank.nus.edu.sg/handle/10635/232200
dc.description.abstractStatistical analysis of ultrahigh-dimensional omics scale data has long depended on univariate hypothesis testing. With growing data features and samples, the obvious next step is to establish multivariable association analysis as a routine method to describe genotype-phenotype association. Here we present ParProx, a state-of-the-art implementation to optimize overlapping and non-overlapping group lasso regression models for time-to-event and classification analysis, with selection of variables grouped by biological priors. ParProx enables multivariable model fitting for ultrahigh-dimensional data within an architecture for parallel or distributed computing via latent variable group representation. It thereby aims to produce interpretable regression models consistent with known biological relationships among independent variables, a property often explored post hoc, not during model estimation. Simulation studies clearly demonstrate the scalability of ParProx with graphics processing units in comparison to existing implementations. We illustrate the tool using three different omics data sets featuring moderate to large numbers of variables, where we use genomic regions and biological pathways as variable groups, rendering the selected independent variables directly interpretable with respect to those groups. ParProx is applicable to a wide range of studies using ultrahigh-dimensional omics data, from genome-wide association analysis to multi-omics studies where model estimation is computationally intractable with existing implementation. © The Author(s) 2021. Published by Oxford University Press.
dc.publisherNLM (Medline)
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.sourceScopus OA2021
dc.subjectlatent group lasso
dc.subjectparallel computing
dc.subjectproximal gradient
dc.subjectsparse regression
dc.subjectultrahigh-dimensional omics data
dc.typeArticle
dc.contributor.departmentSAW SWEE HOCK SCHOOL OF PUBLIC HEALTH
dc.description.doi10.1093/bib/bbab256
dc.description.sourcetitleBriefings in bioinformatics
dc.description.volume22
dc.description.issue6
Appears in Collections:Elements
Staff Publications

Show simple item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
10_1093_bib_bbab256.pdf3.01 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check

Altmetric


This item is licensed under a Creative Commons License Creative Commons