Statistical challenges associated with detecting copy number variations with next-generation sequencing

Please use this identifier to cite or link to this item: https://doi.org/10.1093/bioinformatics/bts535

DC Field	Value
dc.title	Statistical challenges associated with detecting copy number variations with next-generation sequencing
dc.contributor.author	Teo, S.M.
dc.contributor.author	Pawitan, Y.
dc.contributor.author	Ku, C.S.
dc.contributor.author	Chia, K.S.
dc.contributor.author	Salim, A.
dc.date.accessioned	2014-11-26T02:13:38Z
dc.date.available	2014-11-26T02:13:38Z
dc.date.issued	2012-11
dc.identifier.citation	Teo, S.M., Pawitan, Y., Ku, C.S., Chia, K.S., Salim, A. (2012-11). Statistical challenges associated with detecting copy number variations with next-generation sequencing. Bioinformatics 28 (21) : 2711-2718. ScholarBank@NUS Repository. https://doi.org/10.1093/bioinformatics/bts535
dc.identifier.issn	13674803
dc.identifier.uri	http://scholarbank.nus.edu.sg/handle/10635/108849
dc.description.abstract	Motivation: Analysing next-generation sequencing (NGS) data for copy number variations (CNVs) detection is a relatively new and challenging field, with no accepted standard protocols or quality control measures so far. There are by now several algorithms developed for each of the four broad methods for CNV detection using NGS, namely the depth of coverage (DOC), read-pair, split-read and assembly-based methods. However, because of the complexity of the genome and the short read lengths from NGS technology, there are still many challenges associated with the analysis of NGS data for CNVs, no matter which method or algorithm is used.Results: In this review, we describe and discuss areas of potential biases in CNV detection for each of the four methods. In particular, we focus on issues pertaining to (i) mappability, (ii) GC-content bias, (iii) quality control measures of reads and (iv) difficulty in identifying duplications. To gain insights to some of the issues discussed, we also download real data from the 1000 Genomes Project and analyse its DOC data. We show examples of how reads in repeated regions can affect CNV detection, demonstrate current GC-correction algorithms, investigate sensitivity of DOC algorithm before and after quality control of reads and discuss reasons for which duplications are harder to detect than deletions. © 2012 The Author.
dc.source	Scopus
dc.type	Review
dc.contributor.department	SAW SWEE HOCK SCHOOL OF PUBLIC HEALTH
dc.contributor.department	STATISTICS & APPLIED PROBABILITY
dc.description.doi	10.1093/bioinformatics/bts535
dc.description.sourcetitle	Bioinformatics
dc.description.volume	28
dc.description.issue	21
dc.description.page	2711-2718
dc.description.coden	BOINF
dc.identifier.isiut	000310155300001
Appears in Collections:	Staff Publications

Show simple item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM