A modified algorithm for motif discovery Based on risotto and projection

Galib, Marnim; Hasan, Nahid

dc.contributor.author	Galib, Marnim
dc.contributor.author	Hasan, Nahid
dc.date.accessioned	2021-09-13T09:08:16Z
dc.date.available	2021-09-13T09:08:16Z
dc.date.issued	2014-11-15
dc.identifier.citation	[1] Carvalho AM, Freitas AT, Oliveira AL, Sagot MF: An Efficient Algorithm forthe Identification of Structured Motifs in DNA Promoter Sequences. IEEE/ACM Trans. Comput Biol Bioinformatics2006,3(2):126-140. [2] Pisanti N, Carvalho AM, Marsan L, Sagot MF: RISOTTO: Fast extraction of motifs with mismatches. In Proc. LATIN’06, Volume 3887 of LNCS. Edited by:JR Correa AH, Kiwi M. Spriger-Verlag; 2006:757-768. [3] Alexandra M. Carvalho and Arlindo L. Oliveira, GRISOTTO: A greedy approach to improve combinatorial approach to improve combinatorial algorithms for motif discovery with prior knowledge Algorithms for Molecular Biology, 6:13, Apr 2011. [4] Bailey TL, Bodén M, Whitington T, Machanick P:The value of position specific priors in motif discovery using MEME.BMC Bioinformatics2010,11:179. [5] J. Buhler and M. Tompa. Finding motifs using random projections. Proc. Fifth Annual International Conference on Computational Molecular Biology (RECOMB), April 2001. [6] E. Eskin and P.R. Pevzner. Finding composite regulatory patterns in DNA sequences. Bioinformatics S1, 2002, pp. 354-363. [7] U. Keich and P. Pevzner. Finding motifs in the twilight zone. Bioinformatics 18, 2002, pp. 1374-1381. [8] T. L. Bailey and C. Elkan. Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21(1-2), 1995, pp. 51-80. JAVA Simulation Codes of Proposed Method 44 [9] C. E. Lawrence, S. F. Altschul, M. S. Boguski, J. S. Liu, A. F. Neuwald, and J. C. Wootton.Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 1993, pp. 208-214 [10] G. Hertz and G. Stormo. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 1999, pp. 563-577. [11] A.V. Aho and M.J. Corasick. Efficient string matching: an aid to bibliographic search.Communication of ACM, 18:333–340, 1975. [12] B. Alberts, D. Bray, J. Lewis, M. Raff, K. Roberts, and J. W atson. Molecular Biology of the Cell. Garland Publishing, New York, 1994. [13] M.-F. Sagot. Spelling Approximate repeated or common motifs using a suffix tree. In C.L. Lucchesi and A.V. Moura, editors, Proc. Latin '98, volume 1380, pages 111{127, 1998. LNCS. [14] Gordân R, Narlikar L, Hartemink AJ:A Fast, Alignment-Free, ConservationBased Method for Transcription Factor Binding Site Discovery.Proc. RECOMB’082008,98-11 [15] Gordân R, Hartemink AJ:Using DNA Duplex Stability Information for Transcription Factor Binding Site Discovery.Pacific Symposium on Biocomputing2008, 453-464. [16] Jones, Neil C., and Pavel A. Pevzner. An Introduction to Bioinformatics Algorithms. Cambridge, MA: MIT, 2004. Print. [17] Sanguthevar Rajasekaran, Sudha Balla, and Chun-Hsi Huang.: Exact algorithms for planted motif challenge problems. Dept. of Computer Science and Engineering Univ. of Connecticut, Storrs, CT 06269-2155, USA [18] Mohamed Ibrahim Abouelhoda, Stefan Kurtz, Enno Ohlebusch: Replacing suffix trees with enhanced suffix arrays,24: 2012	en_US
dc.identifier.uri	http://hdl.handle.net/123456789/982
dc.description	Supervised by Prof. Dr. M. A. Mottalib, Head of the Department, Department of Computer Science and Engineering(CSE), Islamic University of Technology (IUT), Co-Supervisor: Md. Arifur Rahman, Lecturer Department of Computer Science and Engineering(CSE), Islamic University of Technology (IUT), Board Bazar, Gazipur-1704, Bangladesh	en_US
dc.description.abstract	An important part of gene regulation is mediated by specific proteins, called transcription factors, which influence the transcription of a particular gene by binding to specific sites on DNA sequences, called transcription factor binding sites (TFBS) or, simply, motifs. Such binding sites are relatively short segments of DNA, normally 5 to 25 nucleotides long, overrepresented in a set of co-regulated DNA sequences. There are two different problems in this setup: motif representation, accounting for the model that describes the TFBS’s; and motif discovery, focusing in unraveling TFBS’s from a set of co-regulated DNA sequences. This thesis proposes a discriminative scoring criterion that culminates in a discriminative mixture of Bayesian networks to distinguish TFBS’s from the background DNA. This new probabilistic model supports further evidence in nonadditivity among binding site positions, providing a superior discriminative power in TFBS’s detection. On the other hand, extra knowledge carefully selected from the literature was incorporated in TFBS discovery in order to capture a variety of characteristics of the TFBS’s patterns. This extra knowledge was combined during the process of motif discovery leading to results that are considerably more accurate than those achieved by methods that rely in the DNA sequence alone.	en_US
dc.language.iso	en	en_US
dc.publisher	Department of Computer Science and Engineering (CSE), Islamic University of Technology (IUT), Board Bazar, Gazipur-1704, Bangladesh	en_US
dc.title	A modified algorithm for motif discovery Based on risotto and projection	en_US
dc.type	Thesis	en_US