Please use this identifier to cite or link to this item: https://doi.org/10.1038/sdata.2018.136
DC FieldValue
dc.titleA merged lung cancer transcriptome dataset for clinical predictive modeling
dc.contributor.authorLim, S.B
dc.contributor.authorTan, S.J
dc.contributor.authorLim, W.-T
dc.contributor.authorLim, C.T
dc.date.accessioned2020-09-09T03:09:35Z
dc.date.available2020-09-09T03:09:35Z
dc.date.issued2018
dc.identifier.citationLim, S.B, Tan, S.J, Lim, W.-T, Lim, C.T (2018). A merged lung cancer transcriptome dataset for clinical predictive modeling. Scientific data 5 : 180136. ScholarBank@NUS Repository. https://doi.org/10.1038/sdata.2018.136
dc.identifier.issn20524463
dc.identifier.urihttps://scholarbank.nus.edu.sg/handle/10635/175051
dc.description.abstractThe Gene Expression Omnibus (GEO) database is an excellent public source of whole transcriptomic profiles of multiple cancers. The main challenge is the limited accessibility of such large-scale genomic data to people without a background in bioinformatics or computer science. This presents difficulties in data analysis, sharing and visualization. Here, we present an integrated bioinformatics pipeline and a normalized dataset that has been preprocessed using a robust statistical methodology; allowing others to perform large-scale meta-analysis, without having to conduct time-consuming data mining and statistical correction. Comprising 1,118 patient-derived samples, the normalized dataset includes primary non-small cell lung cancer (NSCLC) tumors and paired normal lung tissues from ten independent GEO datasets, facilitating differential expression analysis. The data has been merged, normalized, batch effect-corrected and filtered for genes with low variance via multiple open source R packages integrated into our workflow. Overall this dataset (with associated clinical metadata) better represents the diseased population and serves as a powerful tool for early predictive biomarker discovery.
dc.sourceUnpaywall 20200831
dc.subjecttranscriptome
dc.subjectbiology
dc.subjectdata analysis
dc.subjectfactual database
dc.subjectgene expression profiling
dc.subjectgenetics
dc.subjecthuman
dc.subjectlung tumor
dc.subjectnon small cell lung cancer
dc.subjectprocedures
dc.subjectCarcinoma, Non-Small-Cell Lung
dc.subjectComputational Biology
dc.subjectData Analysis
dc.subjectDatabases, Factual
dc.subjectGene Expression Profiling
dc.subjectHumans
dc.subjectLung Neoplasms
dc.subjectTranscriptome
dc.typeArticle
dc.contributor.departmentDUKE-NUS MEDICAL SCHOOL
dc.contributor.departmentBIOENGINEERING
dc.description.doi10.1038/sdata.2018.136
dc.description.sourcetitleScientific data
dc.description.volume5
dc.description.page180136
Appears in Collections:Elements
Staff Publications

Show simple item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
10_1038_sdata_2018_136.pdf2.71 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.