Please use this identifier to cite or link to this item: https://doi.org/10.1186/s40168-017-0233-2
Title: Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads
Authors: Huson, D.H 
Tappu, R
Bazinet, A.L
Xie, C
Cummings, M.P
Nieselt, K
Williams, R 
Keywords: ribosome protein
amino acid sequence
Article
consensus sequence
metagenome
microbiome
multigene family
nonhuman
orthology
PIR-International Protein Sequence Database
priority journal
protein domain
reference database
sequence alignment
sequence analysis
sequence homology
structural bioinformatics
algorithm
biology
DNA sequence
high throughput sequencing
human
metagenomics
microflora
procedures
software
Algorithms
Amino Acid Sequence
Computational Biology
High-Throughput Nucleotide Sequencing
Humans
Metagenome
Metagenomics
Microbiota
Sequence Alignment
Sequence Analysis, DNA
Software
Issue Date: 2017
Citation: Huson, D.H, Tappu, R, Bazinet, A.L, Xie, C, Cummings, M.P, Nieselt, K, Williams, R (2017). Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads. Microbiome 5 (1) : 11. ScholarBank@NUS Repository. https://doi.org/10.1186/s40168-017-0233-2
Abstract: Background: Microbiome sequencing projects typically collect tens of millions of short reads per sample. Depending on the goals of the project, the short reads can either be subjected to direct sequence analysis or be assembled into longer contigs. The assembly of whole genomes from metagenomic sequencing reads is a very difficult problem. However, for some questions, only specific genes of interest need to be assembled. This is then a gene-centric assembly where the goal is to assemble reads into contigs for a family of orthologous genes. Methods: We present a new method for performing gene-centric assembly, called protein-alignment-guided assembly, and provide an implementation in our metagenome analysis tool MEGAN. Genes are assembled on the fly, based on the alignment of all reads against a protein reference database such as NCBI-nr. Specifically, the user selects a gene family based on a classification such as KEGG and all reads binned to that gene family are assembled. Results: Using published synthetic community metagenome sequencing reads and a set of 41 gene families, we show that the performance of this approach compares favorably with that of full-featured assemblers and that of a recently published HMM-based gene-centric assembler, both in terms of the number of reference genes detected and of the percentage of reference sequence covered. Conclusions: Protein-alignment-guided assembly of orthologous gene families complements whole-metagenome assembly in a new and very useful way. © The Author(s) 2017.
Source Title: Microbiome
URI: https://scholarbank.nus.edu.sg/handle/10635/173966
ISSN: 20492618
DOI: 10.1186/s40168-017-0233-2
Appears in Collections:Staff Publications
Elements

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
10_1186_s40168-017-0233-2.pdf4.7 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.