Please use this identifier to cite or link to this item:
Title: Statistical Methods for the Detection and Analyses of Structural Variants in the Human Genome
Authors: TEO SHU MEI
Keywords: Structural variants, copy number variations, human genome, regions of homozygosity
Issue Date: 5-Oct-2012
Citation: TEO SHU MEI (2012-10-05). Statistical Methods for the Detection and Analyses of Structural Variants in the Human Genome. ScholarBank@NUS Repository.
Abstract: Structural variations (SVs) are an important and abundant source of variation in the human genome, encompassing a greater proportion of the genome as compared to single nucleotide polymorphisms (SNPs). This thesis investigates different aspects of SV analysis, focusing on copy number variations (CNVs) and regions of homozygosity (ROHs). It is divided into four main studies, each focusing on a different set of aims. In Study I, Identification of recurrent regions of copy-number variation across multiple individuals, we develop an algorithm and software to identify common CNV regions using individually segmented data. The identified common regions allow us to investigate population characteristics of CNVs, as well as to perform association studies. In Study II, Multi-platform segmentation for joint detection of copy number variants, we develop an algorithm to identify CNVs using intensity data from more than one platform. The algorithm is useful when researchers have data from multiple platforms on the same individual. In Study III, Regions of homozygosity in three Southeast-Asian populations, we identify ROHs in three Singapore populations, namely the Chinese, Malays and Indians. We characterize the regions and provide population summary statistics. We also investigate the relationship between the occurrence of ROHs and haplotype frequency, regional linkage disequilibrium (LD) and positive selection. The results show that frequency of occurrence of ROHs is positively associated with haplotype frequency and regional LD. The majority of regions detected for recent positive selection and regions with differential LD between populations overlap with the ROH loci. When we consider both the location of the ROHs and the allelic form of the ROHs, we are able to separate the populations by principal component analysis, demonstrating that ROHs contain information on population structure and the demographic history of a population. Last but not least, in Study IV, Statistical challenges associated with detecting copy number variants with next-generation sequencing technology, we describe and discuss areas of potential biases in CNV detection for each of four commonly used methods. In particular, we focus on issues pertaining to (1) mappability, (2) GC-content bias, (3) quality-control measures of reads, and (4) difficulties in identifying duplications. To gain insights to some of the issues discussed, we download real data from the 1000 Genomes Project and analyze it in terms of depth of coverage (DOC). We show examples of how reads in repeated regions can affect CNV detection, demonstrate current GC correction algorithms, investigate sensitivity of DOC algorithm before and after quality-control of reads and discuss reasons for which duplications are harder to detect than deletions.
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
nus_thesis_rev4.pdf6.48 MBAdobe PDF



Page view(s)

checked on Apr 12, 2019


checked on Apr 12, 2019

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.