Please use this identifier to cite or link to this item: https://doi.org/10.1089/cmb.2012.0233
DC FieldValue
dc.titleSimultaneously learning dna motif along with its position and sequence rank preferences through expectation maximization algorithm
dc.contributor.authorZhang, Z.
dc.contributor.authorChang, C.W.
dc.contributor.authorHugo, W.
dc.contributor.authorCheung, E.
dc.contributor.authorSung, W.-K.
dc.date.accessioned2013-07-04T07:36:50Z
dc.date.available2013-07-04T07:36:50Z
dc.date.issued2013
dc.identifier.citationZhang, Z., Chang, C.W., Hugo, W., Cheung, E., Sung, W.-K. (2013). Simultaneously learning dna motif along with its position and sequence rank preferences through expectation maximization algorithm. Journal of Computational Biology 20 (3) : 237-248. ScholarBank@NUS Repository. https://doi.org/10.1089/cmb.2012.0233
dc.identifier.issn10665277
dc.identifier.urihttp://scholarbank.nus.edu.sg/handle/10635/39226
dc.description.abstractAlthough de novo motifs can be discovered through mining over-represented sequence patterns, this approach misses some real motifs and generates many false positives. To improve accuracy, one solution is to consider some additional binding features (i.e., position preference and sequence rank preference). This information is usually required from the user. This article presents a de novo motif discovery algorithm called SEME (sampling with expectation maximization for motif elicitation), which uses pure probabilistic mixture model to model the motif's binding features and uses expectation maximization (EM) algorithms to simultaneously learn the sequence motif, position, and sequence rank preferences without asking for any prior knowledge from the user. SEME is both efficient and accurate thanks to two important techniques: the variable motif length extension and importance sampling. Using 75 large-scale synthetic datasets, 32 metazoan compendium benchmark datasets, and 164 chromatin immunoprecipitation sequencing (ChIP-Seq) libraries, we demonstrated the superior performance of SEME over existing programs in finding transcription factor (TF) binding sites. SEME is further applied to a more difficult problem of finding the co-regulated TF (coTF) motifs in 15 ChIP-Seq libraries. It identified significantly more correct coTF motifs and, at the same time, predicted coTF motifs with better matching to the known motifs. Finally, we show that the learned position and sequence rank preferences of each coTF reveals potential interaction mechanisms between the primary TF and the coTF within these sites. Some of these findings were further validated by the ChIP-Seq experiments of the coTFs. The application is available online. © Copyright 2013, Mary Ann Liebert, Inc. 2013.
dc.description.urihttp://libproxy1.nus.edu.sg/login?url=http://dx.doi.org/10.1089/cmb.2012.0233
dc.sourceScopus
dc.subjectbinding preference
dc.subjectexpectation maximization
dc.subjectimportance sampling
dc.subjectmotif finding.
dc.typeArticle
dc.contributor.departmentCOMPUTER SCIENCE
dc.description.doi10.1089/cmb.2012.0233
dc.description.sourcetitleJournal of Computational Biology
dc.description.volume20
dc.description.issue3
dc.description.page237-248
dc.description.codenJCOBE
dc.identifier.isiut000315888500006
Appears in Collections:Staff Publications

Show simple item record
Files in This Item:
There are no files associated with this item.

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.