Please use this identifier to cite or link to this item: http://scholarbank.nus.edu.sg/handle/10635/15057
Title: Asymptotic results in-over and under-representation of words in DNA
Authors: WANG RANRAN
Keywords: DNA sequence, word count, over- (under-)representation, extrema, asymptotic normality, Markov chain
Issue Date: 4-Jan-2006
Source: WANG RANRAN (2006-01-04). Asymptotic results in-over and under-representation of words in DNA. ScholarBank@NUS Repository.
Abstract: Identifying over- and under-represented words is often useful in extracting information of DNA sequences. In this thesis, we shall focus on the words of maximal and minimal occurrences, which will be definitely regarded as over- and under-represented words respectively. We study the tail probabilities of the extrema over a finite set of standard normal random variables by using techniques like Bonferroni's inequalities and Poisson Approximation. We apply similar techniques and the moderate deviations of m-dependent random variables together, and then derive the asymptotic tail probabilities of extrema over a set of word occurrences under M0 model. The statistical distribution of word counts is also studied. We show the asymptotic normality of word counts under both the M0 and M1 models. Finally we use computer simulations to study the tail probabilities of the most frequently and most rarely occurred DNA words under both the M0 and M1 models. The asymptotic results under the M1 model are shown to be similar to those for the M0 model.
URI: http://scholarbank.nus.edu.sg/handle/10635/15057
Appears in Collections:Master's Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
MasterThesis_WangRanran.pdf355.86 kBAdobe PDF

OPEN

NoneView/Download

Page view(s)

161
checked on Dec 11, 2017

Download(s)

132
checked on Dec 11, 2017

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.