Please use this identifier to cite or link to this item: https://doi.org/10.1093/nar/gky685
DC FieldValue
dc.titleTranSurVeyor: An improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data
dc.contributor.authorRajaby, R.
dc.contributor.authorSung, W.-K.
dc.date.accessioned2021-11-16T09:29:43Z
dc.date.available2021-11-16T09:29:43Z
dc.date.issued2018
dc.identifier.citationRajaby, R., Sung, W.-K. (2018). TranSurVeyor: An improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data. Nucleic Acids Research 46 (20) : e122. ScholarBank@NUS Repository. https://doi.org/10.1093/nar/gky685
dc.identifier.issn0305-1048
dc.identifier.urihttps://scholarbank.nus.edu.sg/handle/10635/206474
dc.description.abstractTranspositions transfer DNA segments between different loci within a genome; in particular, when a transposition is found in a sample but not in a reference genome, it is called a non-reference transposition. They are important structural variations that have clinical impact. Transpositions can be called by analyzing second generation highthroughput sequencing datasets. Current methods follow either a database-based or a database-free approach. Database-based methods require a database of transposable elements. Some of them have good specificity; however this approach cannot detect novel transpositions, and it requires a good database of transposable elements, which is not yet available formany species. Database-free methods perform de novo calling of transpositions, but their accuracy is low. We observe that this is due to the misalignment of the reads; since reads are short and the human genome has many repeats, false alignments create false positive predictions while missing alignments reduce the true positive rate. This paper proposes new techniques to improve database-free nonreference transposition calling: First, we propose a realignment strategy called one-end remapping that corrects the alignments of reads in interspersed repeats; second, we propose a SNV-aware filter that removes some incorrectly aligned reads. By combining these two techniques and other techniques like clustering and positive-to-negative ratio filter, our proposed transposition caller TranSurVeyor shows at least 3.1-fold improvement in terms of F1-score over existing database-free methods. More importantly, even though TranSurVeyor does not use databases of prior information, its performance is at least as good as existing database-based methods such as MELT, Mobster and Retroseq. We also illustrate that TranSurVeyor can discover transpositions that are not known in the current database. © The Author(s) 2018.
dc.publisherOxford University Press
dc.rightsAttribution-NonCommercial 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc/4.0/
dc.sourceScopus OA2018
dc.typeArticle
dc.contributor.departmentDEPT OF COMPUTER SCIENCE
dc.description.doi10.1093/nar/gky685
dc.description.sourcetitleNucleic Acids Research
dc.description.volume46
dc.description.issue20
dc.description.pagee122
Appears in Collections:Staff Publications
Elements

Show simple item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
10_1093_nar_gky685.pdf792.43 kBAdobe PDF

OPEN

NoneView/Download

SCOPUSTM   
Citations

11
checked on Mar 22, 2023

Page view(s)

90
checked on Mar 16, 2023

Google ScholarTM

Check

Altmetric


This item is licensed under a Creative Commons License Creative Commons