An Improved Data Structure for Efficient Storage of Multiple BIOsequences

Show simple item record

dc.contributor.author Hasan, Md. Zahidul
dc.contributor.author Shimul, Anik Islam
dc.date.accessioned 2021-10-12T06:39:56Z
dc.date.available 2021-10-12T06:39:56Z
dc.date.issued 2012-11-15
dc.identifier.citation [1] Sascha Steinbiss and Stefan Kurtz, “A New Efficient Data Structure for Storage And Retrieval of Multiple BIOsequences”. [2] Shanika Kuruppu, Bryan Beresford-Smith, Thomas Conway, and Justin Zobel, ”Iterative Dictionary Construction for Compression of Large DNA Data Sets”. [3] Hieu Dinh and Sanguthevar Rajasekaran, “A memory-efficient data structure representing exactmatch overlap graphs with application for next-generation DNA assembly”. [4] Sheng Bao, Shi Chen, Zhi-Qiang Jing and Ran Ren, ” A DNA Sequence Compression Algorithm Based on LUT and LZ77”. [5] D.A. Benson, I. Karsch-Mizrachi, D.J. Lipman, J. Ostell, and E.W.Sayers, “GenBank,” Nucleic Acids Research, vol. 38, (Database Issue), pp. D46-D51, 2010. [6] A. Morgulis, G. Coulouris, Y. Raytselis, T.L. Madden, R. Agarwala, and A.A. Schaffer, “Database Indexing for Production MegaBLAST Searches,” Bioinformatics, vol. 24, no. 16, pp. 1757-1764, 2008. [7] Srinivasa K. G , Jagadish M , Venugopal K R ,LMPatnaik, “Efficient Compression of non-repetitive DNA sequences using Dynamic Programming”. [8] E. Rivals, J-P. Delahaye, M. Dauchet, and 0. Delgrange. A guaranteed compression scheme for repetitive dna sequences.” LIFL Lille I Univerisity technical report, page 285, 1995. [9] Raffaele Giancarlo∗, Davide Scaturro and Filippo Utro ,“Textual data compression in computational biology: a synopsis” Dipartimento di Matematica ed Applicazioni, Università di Palermo, Palermo, Italy. [10] Marty C. Brandon, Douglas C. Wallace and Pierre Baldi, “Data structures and compression algorithms for genomic sequence data”. [11] Gergely Korodi and Ioan Tabus, “Compression of Annotated Nucleotide Sequences”. [12] “The NCBI C Toolkit,” ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools, 2011. [13] W.J. Kent, “BLAT-the BLAST-Like Alignment Tool,” Genome Research, vol. 12, no. 4, pp. 656-664, 2002 [14] A. Do ¨ ring, D. Weese, T. Rausch, and K. Reinert, “SeqAn an Efficient, Generic C++ Library for Sequence Analysis,” BMC Bioinformatics, vol. 9, article 11, 2008. 42 en_US
dc.identifier.uri http://hdl.handle.net/123456789/1189
dc.description Supervised by Prof. Dr. M. A. Mottalib, Co-Supervisor, Tareque Mohmud Chowdhury, Assistant Professor, Computer Science and Engineering (CSE), Islamic University of Technology (IUT), Board Bazar, Gazipur-1704. Bangladesh. en_US
dc.description.abstract Compression of large DNA sequences has been a subject of great interest since the availability of genomic databases. Although only two bits are sufficient to encode four bases of DNA (namely A, G, T and C), the massive size DNA sequences forces the need for efficient compression. In this article we are going to propose an improved version of an existing algorithm known as “GtEncseq” which describes the procedure of storing multiple biological sequences of variable Character size, with customizable character transformations, “wildcard” and “separator” support, and a diverse group of internal representations optimized for different arrangements of wildcards and sequence lengths. Our main target is extensive compression of data with an attempt of eliminating the wildcard entries from the sequence but make it available for the reuse. An efficient time requirement for encoding the desired sequence is also a note to consider. en_US
dc.language.iso en en_US
dc.publisher Department of Computer Science and Engineering (CSE), Islamic University of Technology (IUT), Board Bazar, Gazipur-1704, Bangladesh en_US
dc.title An Improved Data Structure for Efficient Storage of Multiple BIOsequences en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IUT Repository


Advanced Search

Browse

My Account

Statistics