Abstract:
The internet is an integral part of our daily life, many of our activities we do it online like gaming, education and so on. Before getting any information online we have to search for it, thus web search engine like Google which are web applications which search over the internet for websites, become very popular. Search engine was there at the early age of internet and become the tool for searching. Search engine has a component which is charged of collecting information that we see on the interface when searching, it is called “crawler”. The crawler collects the information by retrieving the content of the pages it has the link, then look inside the content and take all the links. Hence it can resume the process of retrieving the content of these new links and collect all the content. The process continues until there is no more links to retrieve the content.
However, on the internet information are not only on the pages we see, there are a tremendous amount of information that we are not seeing inside databases this is called hidden web.
To access this information a normal user has to fill up form to access it. Like in the ecommerce websites, we have to give some information on the research box to get information about a product we are looking.
The crawler however has to be able to put the correct words on the form to crawl the content inside databases. Many researches have been going out to solve this issue, to determine the optimal list of words to fill up the form with. The list of words is also known as the bag-of-words (BOW).
In this thesis, we propose a model of structuring words inside the bow to reduce access to a words and set of related words on the bow. We have successful implemented the structure and integrated it to a crawler we have also implemented and showed the
6
result of experiment. From that we are able to see the effect of the structure in a crawler.
Description:
Supervised by
Md. Kamrul Hasan, PHD,
Associate Professor,
Department of Computer Science and Engineering (CSE),
Islamic University of Technology (IUT),
The Organization of the Islamic Cooperation (OIC),
Gazipur-1704, Dhaka, Bangladesh