Abstract:
An important part of gene regulation is mediated by specific proteins, called
transcription factors, which influence the transcription of a particular gene by binding to
specific sites on DNA sequences, called transcription factor binding sites (TFBS) or,
simply, motifs. Such binding sites are relatively short segments of DNA, normally 5 to 25
nucleotides long, overrepresented in a set of co-regulated DNA sequences. There are two
different problems in this setup: motif representation, accounting for the model that
describes the TFBS’s; and motif discovery, focusing in unraveling TFBS’s from a set of
co-regulated DNA sequences. This thesis proposes a discriminative scoring criterion that
culminates in a discriminative mixture of Bayesian networks to distinguish TFBS’s from
the background DNA. This new probabilistic model supports further evidence in nonadditivity
among binding site positions, providing a superior discriminative power in
TFBS’s detection. On the other hand, extra knowledge carefully selected from the
literature was incorporated in TFBS discovery in order to capture a variety of
characteristics of the TFBS’s patterns. This extra knowledge was combined during the
process of motif discovery leading to results that are considerably more accurate than
those achieved by methods that rely in the DNA sequence alone.
Description:
Supervised by
Prof. Dr. M. A. Mottalib,
Head of the Department,
Department of Computer Science and Engineering(CSE),
Islamic University of Technology (IUT),
Co-Supervisor:
Md. Arifur Rahman,
Lecturer
Department of Computer Science and Engineering(CSE),
Islamic University of Technology (IUT),
Board Bazar, Gazipur-1704, Bangladesh