Please use this identifier to cite or link to this item:
https://doi.org/10.1038/srep13321
Title: | MSIseq: Software for assessing microsatellite instability from catalogs of somatic mutations | Authors: | Ni Huang, M McPherson, J.R Cutcutache, I Teh, B.T Tan, P Rozen, S.G |
Keywords: | algorithm automated pattern recognition data mining database management system dna mutational analysis DNA sequence genetic database human machine learning microsatellite instability molecular genetics nucleotide sequence procedures reproducibility sensitivity and specificity software Algorithms Base Sequence Data Mining Database Management Systems Databases, Genetic DNA Mutational Analysis Humans Machine Learning Microsatellite Instability Molecular Sequence Data Pattern Recognition, Automated Reproducibility of Results Sensitivity and Specificity Sequence Analysis, DNA Software |
Issue Date: | 2015 | Citation: | Ni Huang, M, McPherson, J.R, Cutcutache, I, Teh, B.T, Tan, P, Rozen, S.G (2015). MSIseq: Software for assessing microsatellite instability from catalogs of somatic mutations. Scientific Reports 5 : 13321. ScholarBank@NUS Repository. https://doi.org/10.1038/srep13321 | Abstract: | Microsatellite instability (MSI) is a form of hypermutation that occurs in some tumors due to defects in cellular DNA mismatch repair. MSI is characterized by frequent somatic mutations (i.e., cancer-specific mutations) that change the length of simple repeats (e.g., AAAAA., GATAGATAGATA...). Clinical MSI tests evaluate the lengths of a handful of simple repeat sites, while next-generation sequencing can assay many more sites and offers a much more complete view of their somatic mutation frequencies. Using somatic mutation data from the exomes of a 361-tumor training set, we developed classifiers to determine MSI status based on four machine-learning frameworks. All frameworks had high accuracy, and after choosing one we determined that it had >98% concordance with clinical tests in a separate 163-tumor test set. Furthermore, this classifier retained high concordance even when classifying tumors based on subsets of whole-exome data. We have released a CRAN R package, MSIseq, based on this classifier. MSIseq is faster and simpler to use than software that requires large files of aligned sequenced reads. MSIseq will be useful for genomic studies in which clinical MSI test results are unavailable and for detecting possible misclassifications by clinical tests. | Source Title: | Scientific Reports | URI: | https://scholarbank.nus.edu.sg/handle/10635/175987 | ISSN: | 2045-2322 | DOI: | 10.1038/srep13321 |
Appears in Collections: | Elements Staff Publications |
Show full item record
Files in This Item:
File | Description | Size | Format | Access Settings | Version | |
---|---|---|---|---|---|---|
10_1038_srep13321.pdf | 1.72 MB | Adobe PDF | OPEN | None | View/Download |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.