Please use this identifier to cite or link to this item: https://doi.org/10.1038/srep39489
Title: Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic
Authors: Yebra, G
Hodcroft, E.B
Ragonnet-Cronin, M.L
Keywords: cohort analysis
envelope gene
epidemic
genetics
human
Human immunodeficiency virus
Human immunodeficiency virus infection
molecular epidemiology
phylogeny
regression analysis
reproducibility
South Africa
statistical model
structural gene
United Kingdom
virology
virus genome
Cohort Studies
Epidemics
Genes, env
Genes, gag
Genes, pol
Genome, Viral
HIV
HIV Infections
Humans
Likelihood Functions
Molecular Epidemiology
Phylogeny
Regression Analysis
Reproducibility of Results
South Africa
United Kingdom
Issue Date: 2016
Publisher: Nature Publishing Group
Citation: Yebra, G, Hodcroft, E.B, Ragonnet-Cronin, M.L (2016). Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic. Scientific Reports 6 : 39489. ScholarBank@NUS Repository. https://doi.org/10.1038/srep39489
Rights: Attribution 4.0 International
Abstract: HIV molecular epidemiology studies analyse viral pol gene sequences due to their availability, but whole genome sequencing allows to use other genes. We aimed to determine what gene(s) provide(s) the best approximation to the real phylogeny by analysing a simulated epidemic (created as part of the PANGEA-HIV project) with a known transmission tree. We sub-sampled a simulated dataset of 4662 sequences into different combinations of genes (gag-pol-env, gag-pol, gag, pol, env and partial pol) and sampling depths (100%, 60%, 20% and 5%), generating 100 replicates for each case. We built maximum-likelihood trees for each combination using RAxML (GTR + ?), and compared their topologies to the corresponding true tree's using CompareTree. The accuracy of the trees was significantly proportional to the length of the sequences used, with the gag-pol-env datasets showing the best performance and gag and partial pol sequences showing the worst. The lowest sampling depths (20% and 5%) greatly reduced the accuracy of tree reconstruction and showed high variability among replicates, especially when using the shortest gene datasets. In conclusion, using longer sequences derived from nearly whole genomes will improve the reliability of phylogenetic reconstruction. With low sample coverage, results can be highly variable, particularly when based on short sequences. © The Author(s) 2016.
Source Title: Scientific Reports
URI: https://scholarbank.nus.edu.sg/handle/10635/179774
ISSN: 2045-2322
DOI: 10.1038/srep39489
Rights: Attribution 4.0 International
Appears in Collections:Elements
Staff Publications

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
10_1038_srep39489.pdf869.02 kBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check

Altmetric


This item is licensed under a Creative Commons License Creative Commons