Designing Spam Mail Filtering Using Data Mining by Analyzing User and Email Behavior

Islam, Abdullah Ibn Nurul

dc.contributor.author	Islam, Abdullah Ibn Nurul
dc.date.accessioned	2021-10-12T06:14:26Z
dc.date.available	2021-10-12T06:14:26Z
dc.date.issued	2012-11-15
dc.identifier.citation	[1] Radicati Sara, “Email Statistics Report, 2009-2013”, The Radicati Group, Inc ., 2009 [2] Klensin J., “Technical Report RFC 2821, IETF, Simple mail transfer protocol”, Network Working Group, October 2008 [3] Denning, P. J. “Electronic junk. Communication of the ACM”, Purdue University, India, 1982, 25(3):163–165. [4] Ducheneaut, N. and Bellotti, V. “E-mail as habitat: an exploration of embedded personal information management. Interactions”, 2001, 8(5):30–38. [5] Mackey, W. E. Diversity in the use of electronic mail: A preliminary inquiry. In ACM Transactions on Information Systems, 1988, volume 6. [6] Xing Liu, Yueheng Sun, “An Adaptive Spam Filter Based on Bayesian Model and Strong Features”, School of Computer Science and Technology, Tianjin University, China, IEEE, June 2012. [7] Sangeetha C., Amudha P., Dr. Sivakumari S., “Feature Extraction Approach For Spam Filtering”, 2012, ISSN NO: 6602 3127, IJART, Vol. 2 Issue 3, pp 89-93. [8] http://eval.symantec.com/mktginfo/enterprise/other_resources/btate_ of_spam_report_09-2009.en-us.pdf, accessed on: 23-Aug-2012. [9] A. Zdziarski Jonathan, “Bayesian Noise Reduction: Contextual Symmetry Logic Utilizing Pattern Consistency Analysis”, accessed on 14-Aug-2012. [10] https://mail.google.com/mail/help/intl/en/fightspam/spamexplained.html, accessed on 10-Oct-2012 [11] Knill David C. and Pouget Alexandre, “The Bayesian brain: the role of uncertainty in neural coding and computation”, Center for Visual Science and References 55 the Department of Brain and Cognitive Science, TRENDS in Neurosc iences Vol.27 No.12 December 2004, University of Rochester, NY 14627, USA. [12] Hu Yin, Zhang Chaoyang, Hubei, China, on “An improved Bayesian Algorithm for Filtering Spam E-mail”, Network Center Huanggang Normal University Huangzhou, International Symposium on Intelligence Information Processing and Trusted Computing, IEEE, 2011. [13] Zhang Harry, “The Optimality of Naive Bayes”, Faculty of Computer Science , University of New Brunswick Fredericton, 2004 [14] Caruana, R.; Niculescu-Mizil, A. "An empirical comparison of supervised learning algorithms". Proceedings of the 23rd international conference on Machine learning. 2006 [15] Gerard Lynch, Erwan Moreau and Carl Vogel, “The Innovative Use of NLP for Building Educational Applications”, Centre for Next Generation Localisation Integrated Language Technology Group School of Computer Science and Statistics, Trinity College Dublin, Ireland, June 2012, pages 257–262, [16] Domingos Pedro, Pazzani Michael , “On the Optimality of the Simple Bayesian Classifier under Zero-One Loss”, Department of Information and Computer Science, University of California, Irvine, CA , 1997, Machine Learning, 29, pp 103–130 [17] Esquivel Holly and Akella Aditya, Tatsuya Mori, “On the Effectiveness of IP Reputation for Spam Filtering”, IEEE, Year 2010. [18] http://www.freebsd.org/cgi/man.cgi?query=hosts_access&sektion=5. Accessed on 10-Sep-2012 [19] Symantec Turn Tide Anti Spam Router, “Fighting Spam With A Multi Layered Architecture, White paper Enterprise Solution”, Symantec Corporation, October 2004 References 56 [20] B. Templeton, “Proper Principles For Challenge/Response Anti-Spam Systems;” http://www.templetons.com/brad/spam/challengeresponse.html, accessed on 10-Aug-2012. [21] Parrott Tom, “SPAM Filtering Proxy Server”, Department of Electronic and Computer Engineering, University of Portsmouh, 2006. [22] Ojha Gaurav, Kumar Tak Gaurav, “A Novel Approach Against E-Mail Attacks Derived From User-Awareness Based Techniques”, International Journal of Information Technology Convergence and Services (IJITCS), August 2012, Vol. 2, No. 4. [23] Cain Matt, “Spam Filter Testing Best Practices Content & Collaboration Strategies”, 02-Jan-2005. [24] M. Kucherawy, D. Crocker, Brandenburg Internet Working, Internet Engineering Task Force (IETF), ISSN: 2070-1721, June 2012 [25] David Schweikert, “Postgrey-Postfix Greylisting Policy Server, Clients Which Repeatedly Show To Be Able To Pass The Greylist, Are Entered In A ‘Clients White list’, For Which No Greylisting Is Done Anymore", March-2011 [26] E. -S. M. El-Alfi, “Learning Methods for Spam Filtering”, International Journal of Computer Research, 2008, vol 16, No. 4. [27] Drake Christine, Oliver Jonathan and Koontz Eugene. “Anatomy of a phishing email”. In Proceedings of the First Conference on Email and Anti-Spam, CEAS’2004, 2004. [28] Pantel Patrick and Lin. Dekang ``SpamCop-A Spam Classification & Organization Program”, Proceedings of AAAI-10 Workshop on Learning for Text Categorization.	en_US
dc.identifier.uri	http://hdl.handle.net/123456789/1183
dc.description	Supervised by Professor Dr. Md. Abdul Mottalib, Computer Science and Engineering (CSE), Islamic University of Technology (IUT), Board Bazar, Gazipur-1704. Bangladesh.	en_US
dc.description.abstract	Electronic Mail is the “killer network application”. It is ubiquitous and pervasive. In a relatively short timeframe, the Internet has become irrevocably and deeply entrenched in our modern society primarily due to the power of its communication substrate linking people and organizations around the globe. Much work on email technology has focused on making email easy to use, permitting a wide variety of information and information types to be conveniently, reliably, sent throughout the Internet. However, the analysis of the vast storehouse of email content accumulated or produced by individual users has received relatively little attention other than for specific tasks such as spam and virus filtering. Users in the email continuously receive spam and they get into trouble wasting their time and also harmful emails can cause harm to the computers. This thesis presents an implemented framework for data mining behavior models from email data. The EMT is a data mining tool kit designed to analyze email corpora, including the entire set of email sent and received by an individual user, revealing much information about individual users as well as the behavior of groups of users in an organization. A number of machine learning and anomaly detection algorithms are embedded in the system to model the user’s email behavior in order to classify email for a variety of tasks. There are different methods for detection of spam through email. The main goal is to develop a method that outperforms the existing methods in terms of detection of spam, ham and wrongly classified spam, i.e. need is to improve the accuracy of the proposed method compared to the other existing methods. The other goal is to implement the proposed algorithm for reducing the time. So, to recapitulate, this thesis also deals the accuracy and process timing based on prioritization of detecting email messages. The proposed method uses prioritization of process criterion which is unavailable in the earlier existing methods. It also uses the post-filtering concept which contributes for the enhancement of accuracy of the proposed method. Thus the proposed method, which we name as MAN is responsible for spam detection and outperforms Abstract xii the existing methods. This method also provides user convenient spam detection process. So, by using the concepts of post-filtering, process prioritization and different criterion in order to detect spam, the optimum accuracy for detecting spam will be possible.	en_US
dc.language.iso	en	en_US
dc.publisher	Department of Computer Science and Engineering (CSE), Islamic University of Technology (IUT), Board Bazar, Gazipur-1704, Bangladesh	en_US
dc.title	Designing Spam Mail Filtering Using Data Mining by Analyzing User and Email Behavior	en_US
dc.type	Thesis	en_US