Please use this identifier to cite or link to this item: https://doi.org/10.1038/s41597-019-0207-2
DC FieldValue
dc.titleCompendiums of cancer transcriptomes for machine learning applications
dc.contributor.authorLim, Su Bin
dc.contributor.authorTan, Swee Jin
dc.contributor.authorLim, Wan-Teck
dc.contributor.authorLim, Chwee Teck
dc.date.accessioned2022-04-07T04:53:11Z
dc.date.available2022-04-07T04:53:11Z
dc.date.issued2019-10-08
dc.identifier.citationLim, Su Bin, Tan, Swee Jin, Lim, Wan-Teck, Lim, Chwee Teck (2019-10-08). Compendiums of cancer transcriptomes for machine learning applications. SCIENTIFIC DATA 6 (1) : 194. ScholarBank@NUS Repository. https://doi.org/10.1038/s41597-019-0207-2
dc.identifier.issn2052-4463
dc.identifier.urihttps://scholarbank.nus.edu.sg/handle/10635/218538
dc.description.abstractThere are massive transcriptome profiles in the form of microarray. The challenge is that they are processed using diverse platforms and preprocessing tools, requiring considerable time and informatics expertise for cross-dataset analyses. If there exists a single, integrated data source, data-reuse can be facilitated for discovery, analysis, and validation of biomarker-based clinical strategy. Here, we present merged microarray-acquired datasets (MMDs) across 11 major cancer types, curating 8,386 patient-derived tumor and tumor-free samples from 95 GEO datasets. Using machine learning algorithms, we show that diagnostic models trained from MMDs can be directly applied to RNA-seq-acquired TCGA data with high classification accuracy. Machine learning optimized MMD further aids to reveal immune landscape across various carcinomas critically needed in disease management and clinical interventions. This unified data source may serve as an excellent training or test set to apply, develop, and refine machine learning algorithms that can be tapped to better define genomic landscape of human cancers.
dc.language.isoen
dc.publisherNATURE PUBLISHING GROUP
dc.relation.isreplacedbyhdl:10635/218538
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.sourceElements
dc.subjectScience & Technology
dc.subjectMultidisciplinary Sciences
dc.subjectScience & Technology - Other Topics
dc.subjectGENE-EXPRESSION
dc.subjectRNA-SEQ
dc.subjectBIOCONDUCTOR
dc.subjectSOFTWARE
dc.typeArticle
dc.date.updated2022-04-07T02:17:32Z
dc.contributor.departmentDEPT OF BIOMEDICAL ENGINEERING
dc.contributor.departmentDUKE-NUS MEDICAL SCHOOL
dc.description.doi10.1038/s41597-019-0207-2
dc.description.sourcetitleSCIENTIFIC DATA
dc.description.volume6
dc.description.issue1
dc.description.page194
dc.published.statePublished
Appears in Collections:Elements
Staff Publications

Show simple item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
Compendiums of cancer transcriptomes for machine learning applications.pdf4.24 MBAdobe PDF

OPEN

PublishedView/Download

SCOPUSTM   
Citations

13
checked on Dec 2, 2022

Page view(s)

147
checked on Dec 1, 2022

Download(s)

3
checked on Dec 1, 2022

Google ScholarTM

Check

Altmetric


This item is licensed under a Creative Commons License Creative Commons