Please use this identifier to cite or link to this item:
Title: New Models and Algorithm for De Novo Peptide Sequencing of Multi-Charge MS/MS Spectra
Keywords: Denovo peptide sequencing, Algorithm, Tandem mass spectrometry, multi-charge spectra, spectrum graph, mono-chromatic scoring function
Issue Date: 10-Jan-2011
Citation: CHONG KET FAH (2011-01-10). New Models and Algorithm for De Novo Peptide Sequencing of Multi-Charge MS/MS Spectra. ScholarBank@NUS Repository.
Abstract: This thesis addresses the problem of de novo peptide sequencing for charge 3 and above spectra, called multi-charge spectra. We show in this thesis that integrating higher charge ion-types (>= charge) for multi-charge spectra and introducing a novel algorithm for denovo sequencing can help in obtaining better sequencing results. Most of the current algorithms do not directly handle multi-charge spectra (>= charge 3). This is because of the additional challenges posed by including them. These challenges include increase in problem size (number of pseudo-peaks to be considered), increase in the noise level caused by these additional pseudo-peaks, and also increase in the complexity of the resulting sequencing problem. These challenges to sequencing multi-charge spectra lead to two questions. Namely, are there higher charged peaks and if so do they increase the percentage of recoverable peptides, and can we devise better sequencing algorithms that consider these higher charge peaks? In this thesis, we answer both these questions. To answer the first question, we first did a characterization study that showed higher charge peaks either increases the upperbound on the percentage of recoverable peptides by explaining fragmentation points not explained by lower charge peaks, or by becoming supporting peaks for fragmentation points already explained by lower charge peaks. To properly model higher charge peaks, we extend the notion of the extended spectrum to include pseudo-peaks of ion-types with higher charges. For a given spectrum, this step properly models the higher charge peaks, but increases the number of pseudopeaks to be considered and also increases the noise level. With this extended spectrum model, our characterization study of annotated spectra from the GPM-Amethyst dataset (charge 1-5) shows an increase in the upperbound of the percentage of recoverable peptide by including higher charge peaks. Although characterization study on ISB and Orbitrap data (both having charge 1-3 data) did not show much improvement when using charge 3 ion-types, we cannot conclude that they are useless since they can still act as supporting ions. This has shown to be true from our sequencing result where using charge 3 ion-types for ISB/ISB2 data results in an improvement in recoverable amino acids of around 1-2% as compared to not using them. While the characterization study shows that considering higher charge peaks can potentially increase the amount of recoverable peptide, the problem of actually recovering the peptide is still very challenging (the second question). To settle this question, we design a de novo peptide sequencing algorithm called MCPS that considers multi-charge peaks and strong patterns associated with contiguous fragmentation points explained by peaks of the same ion-type. MCPS has been shown to give better or comparable sequencing results with other state-of-the art algorithms for some sets of multi-charged spectra. Our algorithm makes use of several key ideas: (i) the use of the extended spectrum graph, (ii) filtering of the extended spectrum graph using mono ion-type tags to reduce noise and bring down the size of the problem while still maintaining a good upperbound on the amount of peptide recoverable (iii) using a scoring function that highlight the importance of mono ion-type tag support for a given peptide tag, (iv) a post-processing step that handles problems with competing mono ion-type tags of different ion-types. Comparing against some current state-of-the-art de novo sequencing algorithms PEAKS, PepNovo and Lutefisk, MCPS does best for charge 3 ISB data and second best for charge 3 ISB2 data. In particular, it can recover 7% more amino acids in the peptide than the second best algorithm, PepNovo, for charge 3 ISB data. 40% of corrected predicted peptides tags for charge 3 ISB/ISB2 data are of length >= 3 and can be used as tags in database search.
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
ChongKF.pdf1.22 MBAdobe PDF



Page view(s)

checked on Dec 9, 2018


checked on Dec 9, 2018

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.