On interaction motif inference from biomolecular interactions: Riding the growth of the high throughput sequential and structural data | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/25811

Title:	On interaction motif inference from biomolecular interactions: Riding the growth of the high throughput sequential and structural data
Authors:	HUGO WILLY
Keywords:	interaction motif, protein short linear motif, RNA secondary structure, motif finding
Issue Date:	30-Nov-2010
Citation:	HUGO WILLY (2010-11-30). On interaction motif inference from biomolecular interactions: Riding the growth of the high throughput sequential and structural data. ScholarBank@NUS Repository.
Abstract:	Biochemical processes in the cell are mostly facilitated by (bio)catalysts commonly known as the enzymes. There are currently two biomolecules that are known to act as enzymes in the cell; the protein and the RNA. The enzymatic property of these two are achieved by their ability to fold into a huge number of possible shapes and structures. In order to accomplish their functions, proteins and RNA often work together with another protein or RNA by forming a complex. How do protein and RNA recognize their correct interaction partners? Based on our current understanding, they recognize a pattern, a motif, on the surface of its partner which it can specifically bind to. To bind those patterns, the protein or the RNA itself has a conserved region dedicated to recognition. We call these conserved patterns which are involved in the interaction between two biomolecules as the interaction motif. These patterns mostly form complementarily shaped surface areas within the two biomolecules. From an evolutionary point of view, the interaction motif is under pressure to be conserved so long as the interaction they mediate is crucial to the organism's survival. Such conservation mean, given enough data, one should be able to design a computational technique to recognize these patterns. The first part of the thesis presents an algorithm to infer RNA secondary structure of an RNA sequence. It is known that the structure/shape of the RNA is generally more conserved than their sequences. We improved the current best method in terms of computational time and space complexity. These improvements are important as more non-coding RNA transcripts from different organisms will be sequenced by the most recent second generation nucleic acid sequencing technology. At the same time, the number of reference RNA structures in the Structural Database like the Protein Data Bank is steadily increasing over the years and we expect more structures will be available soon given the importance of the non-coding RNA. The thesis further proposed two programs, D-STAR and D-SLIMMER, to mine short linear motif (SLiM) from the current protein-protein interaction (PPI) data. Both programs are based on the concept of correlated motif, which basically state that a pair of (interaction) motif that enables interaction will have a significantly higher number of interaction between the proteins containing them. We show that our correlated motif approach, which is interaction based, is more suitable for mining SLiMs from the PPI data. D-STAR was the pioneer program which used the correlated motif concept to find SLiMs from PPI data (earlier work was done on correlation between known protein domains). We further improved D-STAR by designing D-SLIMMER. D-SLIMMER uses a mix of non-linear (protein domain) and linear (SLiM) interaction motif as correlated motifs. This important difference enables D-SLIMMER to outperform D-STAR and other programs like MotifCluster and SLIDER in finding biologically relevant motifs. The final method, SLiMDiet, collects all possible de-novo SLiMs from the structural data in the PDB database. We characterized 452 distinct SLiMs from the Protein Data Bank (PDB), of which 155 are validated by either literature validations or over-representation in high throughput PPI data. We further observed that the lacklustre coverage of existing SLiM detection methods could be due to the assumption that SLiMs occur outside domain regions. 198 of 452 SLiM that we reported are actually found on domain-domain interface; some of them are implicated in autoimmune and neurodegenerative diseases. We propose that these SLiMs is useful for designing inhibitors against the pathogenic protein complexes underlying these diseases.
URI:	http://scholarbank.nus.edu.sg/handle/10635/25811
Appears in Collections:	Ph.D Theses (Open)

Show full item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
HugoW-InteractionMotifThesis.pdf		9.94 MB	Adobe PDF	OPEN	None	View/Download

Google Scholar^TM

Check

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.