Please use this identifier to cite or link to this item: https://doi.org/10.1371/journal.pgen.1007021
Title: Estimation of kinship coefficient in structured and admixed populations using sparse sequencing data
Authors: Dou J.
Sun B. 
Sim X. 
Hughes J.D.
Reilly D.F.
Tai E.S. 
Liu J. 
Wang C. 
Keywords: gene linkage disequilibrium
genetic marker
genome
genotype
heritability
human
Malay (people)
phenotype
population structure
sampling
simulation
Singapore
uncertainty
whole exome sequencing
Asian continental ancestry group
biological model
biology
DNA sequence
exome
genetic association study
genetic database
genetics
genotyping technique
human genome
population genetics
procedures
software
Asian Continental Ancestry Group
Computational Biology
Databases, Genetic
Exome
Genetic Association Studies
Genetics, Population
Genome, Human
Genotype
Genotyping Techniques
Humans
Linkage Disequilibrium
Models, Genetic
Sequence Analysis, DNA
Software
Issue Date: 2017
Publisher: Public Library of Science
Citation: Dou J., Sun B., Sim X., Hughes J.D., Reilly D.F., Tai E.S., Liu J., Wang C. (2017). Estimation of kinship coefficient in structured and admixed populations using sparse sequencing data. PLoS Genetics 13 (9) : e1007021. ScholarBank@NUS Repository. https://doi.org/10.1371/journal.pgen.1007021
Abstract: Knowledge of biological relatedness between samples is important for many genetic studies. In large-scale human genetic association studies, the estimated kinship is used to remove cryptic relatedness, control for family structure, and estimate trait heritability. However, estimation of kinship is challenging for sparse sequencing data, such as those from off-target regions in target sequencing studies, where genotypes are largely uncertain or missing. Existing methods often assume accurate genotypes at a large number of markers across the genome. We show that these methods, without accounting for the genotype uncertainty in sparse sequencing data, can yield a strong downward bias in kinship estimation. We develop a computationally efficient method called SEEKIN to estimate kinship for both homogeneous samples and heterogeneous samples with population structure and admixture. Our method models genotype uncertainty and leverages linkage disequilibrium through imputation. We test SEEKIN on a whole exome sequencing dataset (WES) of Singapore Chinese and Malays, which involves substantial population structure and admixture. We show that SEEKIN can accurately estimate kinship coefficient and classify genetic relatedness using off-target sequencing data down sampled to ~0.15X depth. In application to the full WES dataset without down sampling, SEEKIN also outperforms existing methods by properly analyzing shallow off-target data (~0.75X). Using both simulated and real phenotypes, we further illustrate how our method improves estimation of trait heritability for WES studies. © 2017 Dou et al.
Source Title: PLoS Genetics
URI: https://scholarbank.nus.edu.sg/handle/10635/165369
ISSN: 15537390
DOI: 10.1371/journal.pgen.1007021
Appears in Collections:Staff Publications
Elements

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
10_1371_journal_pgen_1007021.pdf4.21 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.