Please use this identifier to cite or link to this item:
DC FieldValue
dc.titleDUBStepR is a scalable correlation-based feature selection method for accurately clustering single-cell data
dc.contributor.authorRanjan, Bobby
dc.contributor.authorSun, Wenjie
dc.contributor.authorPark, Jinyu
dc.contributor.authorMishra, Kunal
dc.contributor.authorSchmidt, Florian
dc.contributor.authorXie, Ronald
dc.contributor.authorAlipour, Fatemeh
dc.contributor.authorSinghal, Vipul
dc.contributor.authorJoanito, Ignasius
dc.contributor.authorHonardoost, Mohammad Amin
dc.contributor.authorYong, Jacy Mei Yun
dc.contributor.authorKoh, Ee Tzun
dc.contributor.authorLeong, Khai Pang
dc.contributor.authorRayan, Nirmala Arul
dc.contributor.authorLim, Michelle Gek Liang
dc.contributor.authorPrabhakar, Shyam
dc.identifier.citationRanjan, Bobby, Sun, Wenjie, Park, Jinyu, Mishra, Kunal, Schmidt, Florian, Xie, Ronald, Alipour, Fatemeh, Singhal, Vipul, Joanito, Ignasius, Honardoost, Mohammad Amin, Yong, Jacy Mei Yun, Koh, Ee Tzun, Leong, Khai Pang, Rayan, Nirmala Arul, Lim, Michelle Gek Liang, Prabhakar, Shyam (2021-10-06). DUBStepR is a scalable correlation-based feature selection method for accurately clustering single-cell data. Nature Communications 12 (1) : 5849. ScholarBank@NUS Repository.
dc.description.abstractFeature selection (marker gene selection) is widely believed to improve clustering accuracy, and is thus a key component of single cell clustering pipelines. Existing feature selection methods perform inconsistently across datasets, occasionally even resulting in poorer clustering accuracy than without feature selection. Moreover, existing methods ignore information contained in gene-gene correlations. Here, we introduce DUBStepR (Determining the Underlying Basis using Stepwise Regression), a feature selection algorithm that leverages gene-gene correlations with a novel measure of inhomogeneity in feature space, termed the Density Index (DI). Despite selecting a relatively small number of genes, DUBStepR substantially outperformed existing single-cell feature selection methods across diverse clustering benchmarks. Additionally, DUBStepR was the only method to robustly deconvolve T and NK heterogeneity by identifying disease-associated common and rare cell types and subtypes in PBMCs from rheumatoid arthritis patients. DUBStepR is scalable to over a million cells, and can be straightforwardly applied to other data types such as single-cell ATAC-seq. We propose DUBStepR as a general-purpose feature selection solution for accurately clustering single-cell data. © 2021, The Author(s).
dc.publisherNature Research
dc.rightsAttribution 4.0 International
dc.sourceScopus OA2021
dc.contributor.departmentGENOME INSTITUTE OF SINGAPORE
dc.description.sourcetitleNature Communications
Appears in Collections:Students Publications

Show simple item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
10_1038_s41467-021-26085-2.pdf3.19 MBAdobe PDF




checked on Oct 26, 2022

Page view(s)

checked on Feb 2, 2023

Google ScholarTM



This item is licensed under a Creative Commons License Creative Commons