Please use this identifier to cite or link to this item:
Title: A parent mass filter algorithm for peptide sequencing from tandem mass spectra
Authors: TAN HUIYI
Keywords: Peptide, Sequencing, DeNovo, Database, Search, Filter
Issue Date: 5-Mar-2010
Citation: TAN HUIYI (2010-03-05). A parent mass filter algorithm for peptide sequencing from tandem mass spectra. ScholarBank@NUS Repository.
Abstract: The peptide sequencing problem is that of determining the amino acid sequence of a peptide from the mass spectrum produced by the peptide via a tandem mass spectrometry process. This problems has been extensively research in the past decade -- the methods are classified as database search methods or de novo methods. This thesis focuses on database search methods for peptide sequencing and in particular, on spectra from the GPM database. Past research have shown that GPM spectra are particularly challenging as the are many missing peaks and relatively few short sequences, also known as tags that can be found from these spectra. This thesis proposes a database search peptide sequencing algorithm, called PMF-MI (Parent Mass Filter with Mass Index), that work well on spectra with missing peasks and few tags, such as the GPM database. The main idea in PMF-MI is to use the parent mass as an effective filter for the set of putative peptides to be considered. Then, this set of putative peptides can be globally matches against the given spectrum for scoring. This method eliminates the need for having tags to filter the peptide database. Similar ideas have been proposed in the past. However, in our work, we push this idea further by performing a full pre-indexing of all the peptides in the database by their parent masses. This pre-indexing of the peptide database has to be performed only once and based on current database sizes, the entire index uses only 20GB. A typical parent mass of a given spectrum will produce a set of about 200,000 putative peptides on average. We ran our PMF-MI algorithm on the GPM spectra where the annotated peptide agrees with the precursor peptide mass of the spectra. On this dataset of 877 spectra, our PMF-MI algorithm is competitive with INSPECT, the state of the art database search method today. Our PMF-MI recovered 367 correct peptides compared to 376 for INSPECT (based on top 10 ranked results). One limitation of the PMF-MI is that it requires an accurate parent mass for it to be effective. To test this hypothesis, we also ran the PMF-MI algorithm on the entire GPM database using the actual peptide mass of each input spectra. In this case, PMF-MI performed better (577 for PMF-MI compared to 562 for INSPECT). This observation leads us to the next contribution of the thesis, which is an algorithm to compute the correct putative parent mass of a given spectrum. To do this, we examine the peaks which make up the spectra and propose that there are more pairs of peaks which sum up to the parent mass (with one of the pair representing part of the protein and the other representing the remaining part) than pairs of peaks which sum up to any random mass. We supplement our PMF-MI algorithm with this corrected mass and show that we can now recover 404 correct peptides then compared to 367 correct peptides without using this corrected mass.
Appears in Collections:Master's Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
thesis.pdf1.6 MBAdobe PDF



Page view(s)

checked on Apr 21, 2019


checked on Apr 21, 2019

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.