Abstract:
DNA contains the information of structure and function of di erent molecules of any
living being. Short repeating patterns in a DNA sequence, called Motifs, are useful
to understand and analyze this information. Gene function, drug design etc. has
in
uences of motifs and can be well understand by studying motifs. Transcription
Factor Binding Sites, Regulatory elements also can be found using motifs. So motif
nding is very important in computational biology. Recent advancements in gene
expression analysis already prompt the scientists to introduce a number of motif
nding algorithms. Planted Motif Search (PMS) is one of them. It has been found as
NP-Hard problem and takes exponential time to calculate. Finding all the possible
motifs of the given input set is done by an exact algorithm. But it takes lots of time
to calculate. Approximate algorithms always take less time than exact algorithm
but do not nd all the possible motifs always. We have proposed an approximate
algorithm for Planted Motif Search which at rst generates all possible motif set and
use a bucketing concepts to nd out the proper motifs from the whole data set. We
use two benchmark data sets of DNA sequences to perform the comparative analysis
with other approaches. Experimental results show that our bucketing approach nds
more amounts of motifs than the other approximate algorithms and takes less amount
of time for about all of the cases than exact algorithms. Most of the time, we get
above 80% of the possible motifs of the given input set.
Description:
Supervised by
Dr. M.A. Mottalib
Head, Department of Computer Science and Engineering
Islamic University of Technology (IUT)
Co-Supervisor:
Md Sirajus Salekin
Lecturer, Department of Computer Science and Engineering
Islamic University of Technology (IUT)