Please use this identifier to cite or link to this item: https://doi.org/10.1002/sim.5540
DC FieldValue
dc.titleA fast collapsed data method for estimating haplotype frequencies from pooled genotype data with applications to the study of rare variants
dc.contributor.authorKuk, A.Y.C.
dc.contributor.authorLi, X.
dc.contributor.authorXu, J.
dc.date.accessioned2014-10-28T05:09:02Z
dc.date.available2014-10-28T05:09:02Z
dc.date.issued2013-04-15
dc.identifier.citationKuk, A.Y.C., Li, X., Xu, J. (2013-04-15). A fast collapsed data method for estimating haplotype frequencies from pooled genotype data with applications to the study of rare variants. Statistics in Medicine 32 (8) : 1343-1360. ScholarBank@NUS Repository. https://doi.org/10.1002/sim.5540
dc.identifier.issn02776715
dc.identifier.urihttp://scholarbank.nus.edu.sg/handle/10635/104927
dc.description.abstractHaplotype information could lead to more powerful tests of genetic association than single-locus analyses but it is not easy to estimate haplotype frequencies from genotype data due to phase ambiguity. The challenge is compounded when individuals are pooled together to save costs or to increase sample size, which is crucial in the study of rare variants. Existing expectation-maximization type algorithms are slow and cannot cope with large pool size or long haplotypes. We show that by collapsing the total allele frequencies of each pool suitably, the maximum likelihood estimates of haplotype frequencies based on the collapsed data can be calculated very quickly regardless of pool size and haplotype length. We provide a running time analysis to demonstrate the considerable savings in time that the collapsed data method can bring. The method is particularly well suited to estimating certain union probabilities useful in the study of rare variants. We provide theoretical and empirical evidence to suggest that the proposed estimation method will not suffer much loss in efficiency if the variants are rare. We use the method to analyze re-sequencing data collected from a case control study involving 148 obese persons and 150 controls. Focusing on a region containing 25 rare variants around theMGLL gene, our method selects three rare variants as potentially causal. This is more parsimonious than the 12 variants selected by a recently proposed covering method. From another set of 32 rare variants aroundthe FAAH gene, we discover an interesting potential interaction between two of them. © 2012 John Wiley & Sons, Ltd.
dc.description.urihttp://libproxy1.nus.edu.sg/login?url=http://dx.doi.org/10.1002/sim.5540
dc.sourceScopus
dc.subjectCollapsed data
dc.subjectEM algorithm
dc.subjectGenetic association
dc.subjectHaplotype frequency estimation
dc.subjectRare variants
dc.subjectUnion probability
dc.typeArticle
dc.contributor.departmentSTATISTICS & APPLIED PROBABILITY
dc.description.doi10.1002/sim.5540
dc.description.sourcetitleStatistics in Medicine
dc.description.volume32
dc.description.issue8
dc.description.page1343-1360
dc.description.codenSMEDD
dc.identifier.isiut000316625600009
Appears in Collections:Staff Publications

Show simple item record
Files in This Item:
There are no files associated with this item.

SCOPUSTM   
Citations

6
checked on Dec 1, 2022

WEB OF SCIENCETM
Citations

6
checked on Dec 1, 2022

Page view(s)

280
checked on Nov 24, 2022

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.