DNA Motif Finding Algorithm

Ashraf, Faisal Bin; Abir, Ali Imam

dc.contributor.author	Ashraf, Faisal Bin
dc.contributor.author	Abir, Ali Imam
dc.date.accessioned	2017-10-25T10:01:53Z
dc.date.available	2017-10-25T10:01:53Z
dc.date.issued	2016-11-20
dc.identifier.citation	[1] A. M. N a ar, J.-M. Boutin, S. M. Lipkin, C. Y. Victor, J. M. Holloway, C. K. Glass, and M. G. Rosenfeld, \The orientation and spacing of core dna-binding motifs dictate selective transcriptional responses to three nuclear receptors," Cell, vol. 65, no. 7, pp. 1267{1279, 1991. [2] C. B. Akg ul, D. L. Rubin, S. Napel, C. F. Beaulieu, H. Greenspan, and B. Acar, \Content-based image retrieval in radiology: current status and future directions," Journal of Digital Imaging, vol. 24, no. 2, pp. 208{222, 2011. [3] J. Wood, T. Andersson, A. Bachem, C. Best, F. Genova, D. R. Lopez, W. Los, M. Marinucci, L. Romary, H. Van de Sompel, et al., \Riding the wave: How europe can gain from the rising tide of scienti c data," European Union, 2010. [4] S. Rajasekaran, S. Balla, C.-H. Huang, V. Thapar, M. R. Gryk, M. W. Maciejewski, and M. R. Schiller, \Exact algorithms for motif search.," in APBC, pp. 239{248, 2005. [5] S. Pal and S. Rajasekaran, \Improved algorithms for nding edit distance based motifs," in Bioinformatics and Biomedicine (BIBM), 2015 IEEE International Conference on, pp. 537{542, IEEE, 2015. [6] H. M. Martinez, \An e cient method for nding repeats in molecular sequences," Nucleic acids research, vol. 11, no. 13, pp. 4629{4634, 1983. [7] M. Nicolae and S. Rajasekaran, \E cient sequential and parallel algorithms for planted motif search," BMC bioinformatics, vol. 15, no. 1, p. 1, 2014. [8] A. Price, S. Ramabhadran, and P. A. Pevzner, \Finding subtle motifs by branching from sample strings," Bioinformatics, vol. 19, no. suppl 2, pp. ii149{ii155, 2003. [9] S. Rajasekaran and H. Dinh, \A speedup technique for (l, d)-motif nding algorithms," BMC research notes, vol. 4, no. 1, p. 1, 2011. [10] P. A. Pevzner, S.-H. Sze, et al., \Combinatorial approaches to nding subtle signals in dna sequences.," in ISMB, vol. 8, pp. 269{278, 2000. [11] N. Pisanti, A. M. Carvalho, L. Marsan, and M.-F. Sagot, \Risotto: Fast extraction of motifs with mismatches," in Latin American Symposium on Theoretical Informatics, pp. 757{768, Springer, 2006. [12] F. Y. Chin and H. C. Leung, \Voting algorithms for discovering long motifs.," in APBC, pp. 261{271, 2005. 34 [13] J. Davila, S. Balla, and S. Rajasekaran, \Pampa: An improved branch and bound algorithm for planted (l, d) motif search," in Tech. rep, Citeseer, 2007. [14] J. Davila, S. Balla, and S. Rajasekaran, \Fast and practical algorithms for planted (l, d) motif search," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 4, no. 4, pp. 544{552, 2007. [15] H. Dinh, S. Rajasekaran, and V. K. Kundeti, \Pms5: an e cient exact algorithm for the (, d)-motif nding problem," BMC bioinformatics, vol. 12, no. 1, p. 410, 2011. [16] S. Bandyopadhyay, S. Sahni, and S. Rajasekaran, \Pms6: A fast algorithm for motif discovery," International Journal of Bioinformatics Research and Applica- tions 2, vol. 10, no. 4-5, pp. 369{383, 2014. [17] H. Dinh, S. Rajasekaran, and J. Davila, \Qpms7: a fast algorithm for nding (, d)-motifs in dna and protein sequences," PloS one, vol. 7, no. 7, p. e41425, 2012. [18] M. Nicolae and S. Rajasekaran, \qpms9: An e cient algorithm for quorum planted motif search," Scienti c reports, vol. 5, 2015. [19] U. Keich and P. A. Pevzner, \Finding motifs in the twilight zone," in Proceedings of the sixth annual international conference on Computational biology, pp. 195{ 204, ACM, 2002. [20] C. E. Lawrence, S. F. Altschul, M. S. Boguski, J. S. Liu, A. F. Neuwald, J. C. Wootton, et al., \Detecting subtle sequence signals: a gibbs sampling strategy for multiple alignment," SCIENCE-NEW YORK THEN WASHINGTON-, vol. 262, pp. 208{208, 1993. [21] T. L. Bailey, C. Elkan, et al., \Fitting a mixture model by expectation maximization to discover motifs in bipolymers," 1994. [22] J. Buhler and M. Tompa, \Finding motifs using random projections," Journal of computational biology, vol. 9, no. 2, pp. 225{242, 2002. [23] T. Wang and G. D. Stormo, \Combining phylogenetic data with co-regulated genes to identify regulatory motifs," Bioinformatics, vol. 19, no. 18, pp. 2369{ 2380, 2003. [24] T. L. Bailey and C. Elkan, \The value of prior knowledge in discovering motifs with meme.," in Ismb, vol. 3, pp. 21{29, 1995. [25] E. Eskin and P. A. Pevzner, \Finding composite regulatory patterns in dna sequences," Bioinformatics, vol. 18, no. suppl 1, pp. S354{S363, 2002. [26] S. Sinha, M. Blanchette, and M. Tompa, \Phyme: a probabilistic algorithm for nding motifs in sets of orthologous sequences," BMC bioinformatics, vol. 5, no. 1, p. 1, 2004. 35 [27] P. Beaudoin, S. Coros, M. van de Panne, and P. Poulin, \Motion-motif graphs," in Proceedings of the 2008 ACM SIGGRAPH/Eurographics Symposium on Com- puter Animation, pp. 117{126, Eurographics Association, 2008. [28] R. Gorda^n, L. Narlikar, and A. J. Hartemink, \Finding regulatory dna motifs using alignment-free evolutionary conservation information," Nucleic Acids Research, vol. 38, no. 6, pp. e90{e90, 2010. [29] T. L. Bailey, M. Bod en, T. Whitington, and P. Machanick, \The value of positionspeci c priors in motif discovery using meme," BMC bioinformatics, vol. 11, no. 1, p. 1, 2010. [30] M. Galib, N. Hasan, M. A. Rahman, and M. A. Mottalib, \Rpb-dc: Motif discovery using randomly projected bucketing (rpb) and discriminative conservation (dc) based prior," icita.org, vol. 15, 2015. [31] M. Tompa, N. Li, T. L. Bailey, G. M. Church, B. De Moor, E. Eskin, A. V. Favorov, M. C. Frith, Y. Fu, W. J. Kent, et al., \Assessing computational tools for the discovery of transcription factor binding sites," Nature biotechnology, vol. 23, no. 1, pp. 137{144, 2005. [32] R. Siddharthan, E. D. Siggia, and E. Van Nimwegen, \Phylogibbs: a gibbs sampling motif nder that incorporates phylogeny," PLoS Comput Biol, vol. 1, no. 7, p. e67, 2005. [33] M. Kellis, N. Patterson, M. Endrizzi, B. Birren, and E. S. Lander, \Sequencing and comparison of yeast species to identify genes and regulatory elements," Nature, vol. 423, no. 6937, pp. 241{254, 2003. [34] K. D. MacIsaac, T. Wang, D. B. Gordon, D. K. Gi ord, G. D. Stormo, and E. Fraenkel, \An improved map of conserved regulatory sites for saccharomyces cerevisiae," BMC bioinformatics, vol. 7, no. 1, p. 1, 2006	en_US
dc.identifier.uri	http://hdl.handle.net/123456789/96
dc.description	Supervised by Dr. M.A. Mottalib Head, Department of Computer Science and Engineering Islamic University of Technology (IUT) Co-Supervisor: Md Sirajus Salekin Lecturer, Department of Computer Science and Engineering Islamic University of Technology (IUT)	en_US
dc.description.abstract	DNA contains the information of structure and function of di erent molecules of any living being. Short repeating patterns in a DNA sequence, called Motifs, are useful to understand and analyze this information. Gene function, drug design etc. has in uences of motifs and can be well understand by studying motifs. Transcription Factor Binding Sites, Regulatory elements also can be found using motifs. So motif nding is very important in computational biology. Recent advancements in gene expression analysis already prompt the scientists to introduce a number of motif nding algorithms. Planted Motif Search (PMS) is one of them. It has been found as NP-Hard problem and takes exponential time to calculate. Finding all the possible motifs of the given input set is done by an exact algorithm. But it takes lots of time to calculate. Approximate algorithms always take less time than exact algorithm but do not nd all the possible motifs always. We have proposed an approximate algorithm for Planted Motif Search which at rst generates all possible motif set and use a bucketing concepts to nd out the proper motifs from the whole data set. We use two benchmark data sets of DNA sequences to perform the comparative analysis with other approaches. Experimental results show that our bucketing approach nds more amounts of motifs than the other approximate algorithms and takes less amount of time for about all of the cases than exact algorithms. Most of the time, we get above 80% of the possible motifs of the given input set.	en_US
dc.language.iso	en	en_US
dc.publisher	IUT, CSE	en_US
dc.title	DNA Motif Finding Algorithm	en_US
dc.type	Thesis	en_US