Simultaneously learning DNA motif along with Its position and sequence rank preferences through em algorithm

Please use this identifier to cite or link to this item: https://doi.org/10.1007/978-3-642-29627-7_37

DC Field	Value
dc.title	Simultaneously learning DNA motif along with Its position and sequence rank preferences through em algorithm
dc.contributor.author	Zhang, Z.
dc.contributor.author	Chang, C.W.
dc.contributor.author	Hugo, W.
dc.contributor.author	Cheung, E.
dc.contributor.author	Sung, W.-K.
dc.date.accessioned	2013-07-04T08:26:53Z
dc.date.available	2013-07-04T08:26:53Z
dc.date.issued	2012
dc.identifier.citation	Zhang, Z.,Chang, C.W.,Hugo, W.,Cheung, E.,Sung, W.-K. (2012). Simultaneously learning DNA motif along with Its position and sequence rank preferences through em algorithm. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7262 LNBI : 355-370. ScholarBank@NUS Repository. <a href="https://doi.org/10.1007/978-3-642-29627-7_37" target="_blank">https://doi.org/10.1007/978-3-642-29627-7_37</a>
dc.identifier.isbn	9783642296260
dc.identifier.issn	03029743
dc.identifier.uri	http://scholarbank.nus.edu.sg/handle/10635/41410
dc.description.abstract	Although de novo motifs can be discovered through mining over-represented sequence patterns, this approach misses some real motifs and generates many false positives. To improve accuracy, one solution is to consider some additional binding features (i.e. position preference and sequence rank preference). This information is usually required from the user. This paper presents a de novo motif discovery algorithm called SEME which uses pure probabilistic mixture model to model the motif's binding features and uses expectation maximization (EM) algorithms to simultaneously learn the sequence motif, position and sequence rank preferences without asking for any prior knowledge from the user. SEME is both efficient and accurate thanks to two important techniques: the variable motif length extension and importance sampling. Using 75 large scale synthetic datasets, 32 metazoan compendium benchmark datasets and 164 ChIP-Seq libraries, we demonstrated the superior performance of SEME over existing programs in finding transcription factor (TF) binding sites. SEME is further applied to a more difficult problem of finding the co-regulated TF (co-TF) motifs in 15 ChIP-Seq libraries. It identified significantly more correct co-TF motifs and, at the same time, predicted co-TF motifs with better matching to the known motifs. Finally, we show that the learned position and sequence rank preferences of each co-TF reveals potential interaction mechanisms between the primary TF and the co-TF within these sites. Some of these findings were further validated by the ChIP-Seq experiments of the co-TFs. © 2012 Springer-Verlag Berlin Heidelberg.
dc.description.uri	http://libproxy1.nus.edu.sg/login?url=http://dx.doi.org/10.1007/978-3-642-29627-7_37
dc.source	Scopus
dc.subject	Binding Preference
dc.subject	Expectation Maximization
dc.subject	Importance Sampling
dc.subject	Motif Finding
dc.type	Conference Paper
dc.contributor.department	COMPUTER SCIENCE
dc.description.doi	10.1007/978-3-642-29627-7_37
dc.description.sourcetitle	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
dc.description.volume	7262 LNBI
dc.description.page	355-370
dc.identifier.isiut	NOT_IN_WOS
Appears in Collections:	Staff Publications

Show simple item record

Files in This Item:

There are no files associated with this item.

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM