Publication

Towards Handling Repeats in Genome Assembly

NARMADA SAMBATURU
Citations
Altmetric:
Alternative Title
Abstract
Repeat regions have been shown to play a role in human-pathogen interactions, and their study could open up new treatment avenues. Since only small amounts of pathogen can be extracted from a patient, and waiting for the pathogen to multiply in the lab is impractical, a genomics pipeline which works with small quantities of cells and handles repeats is essential. Genome assemblers, however, tend to collapse all occurrences of a repeat into one contiguous sequence (contig). While ordering contigs, assemblers might interpret distant contigs as adjacent if they flank different occurrences of the same repeat. We develop an algorithm to link regions flanking a repeat given only picogram quantities of DNA. The algorithm exploits a 9bp overlap between adjacent fragments caused by the library preparation technique (Nextera). The algorithm was tested with an E.coli library prepared with 0.25pg of DNA, and was able to assemble the sequences bridging 26 repeats.
Keywords
repeat region, repeat, assembly, scaffolding, nextera, genome assembly
Source Title
Publisher
Series/Report No.
Organizational Units
Organizational Unit
COMPUTER SCIENCE
dept
Rights
Date
2014-08-22
DOI
Type
Thesis
Additional Links
Related Datasets
Related Publications