Predicting Users’ Personality from Social Media using Linguistic and Social Network Features

Show simple item record

dc.contributor.author Marouf, Ahmed Al
dc.date.accessioned 2020-10-26T07:16:52Z
dc.date.available 2020-10-26T07:16:52Z
dc.date.issued 2019-11-15
dc.identifier.citation [1] Top 15 Valuable Facebook Statistics, https://zephoria.com/top-15-valuablefacebook- statistics [Last accessed on 10th April, 2019 at 12:00 PM] [2] L. R Goldberg,., "The structure of phenotypic personality traits", Journal of American Psychologist, issue 48, pp. 26–34, 1994. [3] A. A. Marouf, R. Ajwad, M. T. R. Kyser, “Community Recommendation Approach for Social Networking Sites based on Mining Rules”, 2nd International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT 2015), 21-23 May, 2015. 543. [4] M. M. Hasan, N. H. Shaon, A. A. Marouf, M. K. Hasan, H. Mahmud, and Md. Mohiuddin Khan, “Friend Recommendation Framework for Social Networking Sites using User‘s Online Behavior”, IEEE- Computer and Information Technology (ICCIT), December 2015, pp. 539-543. [5] N. Du, B. Wu, X. Pei, B. Wang, and L. Xu, ‘‘Community detection in large-scale social networks,’’ in Proc. 9th WebKDD 1st SNA-KDD Workshop Web Mining Social Network Analysis, 2007, pp. 16–25. [6] S. Adal and J. Golbeck., “Predicting personality with social behavior”, Proceedings of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2012. [7] F. Alam, E. A. Stepanov, G. Riccardi, “Personality Traits Recognition on Social Network – Facebook”, 7th International AAAI Conference on Weblogs and Social Media Workshop on Computational Personality Recognition (Shared Task), pp. 6-9. [8] HA Schwartz, JC Eichstaedt, ML Kern, L Dziurzynski, SM Ramones, M Agrawal, “Personality, Gender, and Age in the Language of Social Media” The Open- Vocabulary Approach. PLoS ONE 8(9): e73791, 2013. [9] S Poria, A Gelbukh, B Agarwal, E Cambria, N Howard “Common Sense Knowledge Based Personality Recognition from Text”. In 12th Mexican International Conference on Artificial Intelligence, Vol. 8266, 2013, pp. 484-496, 2013. [10] D Quercia, M Kosinski, D Stillwell, J Crowcroft, “Our twitter profiles, our selves: Predicting personality with twitter”. In Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third International Conference on Social Computing (SocialCom), pp. 180–185. 2011. [11] J Corr, Philip.; G Matthews, “The Cambridge handbook of personality psychology” (1. publ. ed.). Cambridge: Cambridge University Press. ISBN 978-0-521-86218-9. (2009). 82 [12] . Cherry, “What is Personality and Why it matters?”, [Online] < https://www.verywellmind.com/what-is-personality-2795416> [13] M. S. H. Mukta, M. E. Ali and J. Mahmud. "User Generated vs. Supported Contents: Which One Can Better Predict Basic Human Values?." International Conference on Social Informatics. Springer International Publishing, 2016. [14] C. P. Williams, "Language, Identity, Culture, and Diversity", [Online] <https://www.newamerica.org/education-policy/edcentral/ multilingualismmatters/>, February 23, 2013. [15] M. Kosinski, S. Matz, S. Gosling, V. Popov and D. Stillwell, “Facebook as a research tool for the social sciences: Opportunities, challenges, ethical considerations, and practical guidelines.”, American Psychologist. vol. 70 issue. 6, pp. 543, February 2015. [16] S. Friedman, Howard, B. K., Stephanie "Personality, Type a behavior, and coronary heart disease: The role of emotional expression". Journal of Personality and Social Psychology. 53 (4): 783–792. doi:10.1037/0022-3514.53.4.783, 1987. [17] H. J. Eysenck, "Type A Behavior and Coronary Heart Disease: The Third Stage". Journal of Social Behavior and Personality. 5: 25–44. 1990. [18] J. L. Holland, "Award for distinguished scientific applications of psychology:." American Psychologist, Vol 63(8), Nov 2008, 672-674. [19] I. Briggs with P. B. Myers (1995) “Gifts Differing: Understanding Personality Type.” Mountain View, CA: Davies-Black Publishing. 1980 [20] MBTI basics, The Myers-Briggs Foundation, 2014, Retrieved 18 June 2014. [21] Myers-Briggs Type Indicator (MBTI), CPP.com, Menlo Park, CA, 2014, Retrieved 18 June 2014. [22] S Rothmann, EP Coetzer, "The big five personality dimensions and job performance". SA Journal of Industrial Psychology. 29. doi:10.4102/sajip.v29i1.88 (24 October 2003). [23] Five Factor Model, https://relivingmbadays.wordpress.com/2012/09/15/five-factormodel- of-personality/ [24] The Big Five traits, https://natashafelderpsych220com.wordpress.com/ 2016/05/26/the-big-5/ [25] International Personality Item Pool, [Online] Available at https://ipip.ori.org/ [26] I.B Myers, M.H. McCaulley, N.L. Quenk, A.L Hammer, “MBTI Manual: A Guide to the Development and Use of the Myers-Briggs Type Indicator”, Third Edition. Consulting Psychologists, Palo Alto, CA. 1998 83 [27] W. Youyou, M. Kosinski and D. Stillwell, "Computer-based personality judgments are more accurate than those made by humans" Proceedings of the National Academy of Sciences (PNAS), 2015. [28] P. T. Jr. Costa and R. R. McCrae, “NEO-PI-R professional manual. Odessa, FL: Psychological Assessment Resources, 1992. [29] L. R. Goldberg, “A broad-bandwidth, public-domain, personality inventory measuring the lower-level facets of several five-factor models” In I. Mervielde, I. Deary, F. De Fruyt, & F. Ostendorf (Eds.), Personality psychology in Europe, Vol. 7 (pp. 7-28). Tilburg, The Netherlands: Tilburg University Press. http://ipip.ori.org/newBroadbandText.htm, 1999. [30] O. P. John and S. Srivastava, “The Big Five trait taxonomy: History, measurement, and theoretical perspectives” In L. A. Pervin & O. P. John (Eds.), Handbook of personality: Theory and research (2nd ed., pp. 102–138). New York: Guilford Press, 1999. [31] G. Saucier, “Mini-markers: A brief version of Goldberg’s unipolar Big-Five markers” Journal of Personality Assessment, vol. 63, pp. 506–516, 1994. [32] M. B. Donnellan, F. L. Oswald, B. M. Baird, R. E. Lucas, “The mini-IPIP scales: tiny-yet effective measures of the Big Five factors of personality” Psychology Assessment, June 18, 2006, vol. 2, pp. 192-203. PubMed PMID: 16768595 [33] S. D.Gosling, P. J.Rentfrow, W. B. Swann, “A very brief measure of the Big-Five personality domains. Journal of Research in Personality, 37,504–528, 2003. [34] J. A. Johnson, “Developing a short form of the IPIP-NEO: A report to HGW Consulting”. Unpublished manuscript. Department of Psychology University of Pennsylvania, DuBois, PA, 2000. [35] M. Kosinski, S. Matz, S. Gosling, V. Popov and D. Stillwell, “Facebook as a research tool for the social sciences: Opportunities, challenges, ethical considerations, and practical guidelines.”, American Psychologist, vol. 70, issue. 6, pp. 543, February 2015. [36] M. Kosinski, D. Stillwell and T. Graepel, “Private traits and attributes are predictable from digital records of human behavior.”, In Proceedings of the National Academy of Sciences of the United States of America (PNAS), pp. 5802-5805, 2013. [37] M.D. Back, J.M. Stopfer, S. Vazire, S. Gaddis, S.C. Schmukle, B. Egloff, Gosling, S.D., ”Facebook profiles reflect actual personality, not self-idealization”, Psychological Science 21, 372–374 (2010). [38] G Farnadi, S. Zoghbi, Moens, M., De Cock, M.: Recognising personality traits using Facebook status updates. In: Proceedings of the WCPR, pp. 14–18 (2013). 84 [39] Golbeck, J., Robles, C., Turner, K.: Predicting personality with social media. In: CHI’11 Extended Abstracts on Human Factors in Computing Systems, pp. 253–262. ACM (2011) [40] D.J Stillwell,., M Kosinski,.”myPersonality Project Website. myPersonality Project” (2015). [URL] http://mypersonality.org [41] Goldberg, L.R., Johnson, J.A., Eber, H.W., Hogan, R., Ashton, M.C., Cloninger, C.R., Gough, H.G.: The international personality item pool and the future of publicdomain personality measures. Journal of Research in Personality 40(1), 84–96 (2006) [42] Rammstedt, B., John, O.P.: Measuring personality in one minute or less: A 10-item short version of the Big Five Inventory in English and German. Journal of research in Personality 41(1), 203–212 (2007). [43] Biel, J., Gatica-Perez, D.: The YouTube lens: Crowdsourced personality impressions and audiovisual analysis of vlogs. Multimedia, IEEE Transactions on 15(1), 41–55 (2013). [44] Biel, J.I., Aran, O., Gatica-Perez, D.: You are known by how you vlog: Personality impressions and nonverbal behavior in youtube. In: Proceedings of the AAAI International Conference on Weblogs and Social Media (ICWSM), pp. 446–449 (2011). [45] Y. Bachrach, M. Kosinski, T. Graepel, P. Kohli and D. Stillwell, "Personality and Patterns of Facebook Usage" Proceedings of the 4th Annual ACM Web Science Conference (WebSci'12), pp. 24-32, June 22-24, 2012, Illinois, USA. [46] J. Golbeck, C. Robles and K. Turner, "Predicting personality with social media" Proceedings of CHI 11 Extended Abstracts on Human Factors in Computing Systems, pp. 253-262, May 7-12, 2011, Vancouver, B, Canada. [47] D. Quercia, R. Lambiotte, D. Stillwell, M. Kosinski and J. Crowsroft, "The personality of popular Faebook users", Proceedings of CSCW 2012, pp. 955-964, February 11-15, 2012, Seattle, Washiton, USA. [48] K. Moore and J. C. McElory, "The influence of personality on Facebook usage, wall postings, and regret", Journal of Computers in Human Behavior, vol.28, issue. 1, pp. 267-274, January 2012. [49] A. Ortigosa, R. M. Carro and J. I. Quiroga, "Predicting user personality by mining social interactions in Facebook", Journal of Computer and System Sciences, vol. 80, issue. 1, pp. 57-71, February 2014. 85 [50] A. Eftekhar, C. Fullwood and N. Morris, "Capturing personality from Facebook photos and photo-related activities: How much exposure do you need?", Journal of Computers in Human Behavior, vol, 37, pp. 162-170, August 2014. [51] P. Howlader, K. K. Pal, A. Cuzzocrea and S. D. M. Kumar, “Predicting facebookusers' personality based on status and linguistic features via flexible regression analysis techniques”, SAC '18 Proceedings of the 33rd Annual ACM Symposium on Applied Computing, pp. 339-345, Pau, France, April, 2018. [52] T. Tandera, Hendro, D. Suhartono, R. Wongso and Y. L. Prasetio, “Personality Preddiction System from Facebook Users”, 2nd International Conference on Computer Science and Computational Intellignece (ICCSCI), Bali, Indonesia, October, 2017. [53] D. Markovikj, S. Gievska, M. Kosinski and D. Stillwell, "Mining Facebook Data for Predictive Personality Modeling", AAAI Technical Report, Computational Personality Recognition (Shared Task), 2013. [54] V. Kaushal and M. Patwardhan, “Emerging trends in personality indentification using online social networks-A literature survey”, ACM Transactions on Knowledge Discovery from Data, vol. 12, issue. 2, Article.15, January 2018. [55] J. W. Pennebaker, M. E. Francis and R. J. Booth. 2001. Linguistic Inquiry and Word Count: LIWC2001. Erlbaum, Mahwah, NJ (www.erlbaum.com). [56] M. Coltheart, “The MRC psycholinguistic database” Quarterly Journal of Experimental Psychology 33A, pp. 497–505, 1981. [57] K. Moffitt, J. Giboney, E. Ehrhardt, J. Burgoon, J. Nunamaker, “Structured programming for linguistic cue extraction” [Online].; 2010. Available from: http://splice.cmi.arizona.edu/. [58] Kucera and W. N. Francis, “Computational Analysis of Present-day American English” Brown University Press, Providence, 1967. [59] G. D. A. Brown.. A frequency count of 190,000 words in the London-Lund Corpus of English Conversation. Behavioural Research Methods Instrumentation and Computaters 16, 6 (1984), pp. 502–532, 1984. [60] M. A. Hall, "Correlation-based Feature Subset Selection for Machine Learning" University of Waikato, Hamilton, New Zealand. [61] M. Hall and L. A. Smith, “Feature Subset Selection: A CorrelationBased Filter Approach,” Proc. 4th International Conference on Neural Information Processing and Intelligent Information Systems, pp. 855-858, 1997. 86 [62] M. Doshi and R. K. Chaturvedi, "Correlation based Feature Selection (CFS) Technique to Predict Student Perfromance", International Journal of Computer Networks & Communications (IJCNC), Vol.6, No.3, May 2014. [63] A. Abbasi, S. France, Z. Zhang and H Chen, "Selecting Attributes for Sentiment Classification Using Feature Relation Networks", IEEE Transactions on Knowledge and Data Engineering, vol. 23, issue. 3, March 2011. [64] G. Forman, "An Extensive Empirical Study of Feature Selection Metrics for Text Classification", Journal of Machine Learning Research, vol. 3, pp. 1289-1305, March 2003. [65] Z. Gao, Y. Xu, F. Meng, F. Qi and Z Lin, "Improved information gain-based feature selection for text categorization", Proceedings of 4th International Conference on Wireless Communications, Vehicular Technology, Information Theory and Aerospace & Electronic Systems (VITAE), 11-14 May, 2014. [66] L. Yu, H. Liu, "Feature Selection for High-Dimensional Data: A Fast Correlation- Based Filter Solution" Proceedings of the Twentieth International Conference on Machine Learning, pp. 856-863, 2003. [67] E. Sarhrouni, A. Hammouch and D. Aboutajdine, "Application of Symmetric Uncertainty and Mutual Information to Dimensionality Reduction of and Classification Hyperspectral Images" International Journal of Engineering and Technolofy (IJET), vol. 4, issue. 5, pp. 268-276, 2012. [68] S. I. Ali and W. Shahzad, "A feature subset selection method based on symmetric uncertainty and Ant Colony Optimization", International Conference on Emerging Technologies, 8-9 October, 2012, Islamabad, Pakistan. [69] Pearson, Karl (1900). "On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling" (PDF). Philosophical Magazine. Series vol. 5, issue. 50, pp. 157–175. doi:10.1080/14786440009463897. [70] M. S. Nikulin, "Chi-squared test for normality", Proceedings of the International Vilnius Conference on Probability Theory and Mathematical Statistics, vol. 2, pp. 119–122, 1973. [71] B. Jacob, J. Chen, Y. Huang, and I. Cohen, “Pearson correlation coefficient,” in Noise Reduction in Speech Processing. Berlin, Germany: Springer-Verlag, pp. 1–4, 2009. [72] E. Loper and S. Bird, “NLTK: the natural language toolkit” In Proceedings of the ACL-02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics, 2002. 87 [73] K. J. Kim and S. B. Cho, “Ensemble classifiers based on correlation analysis for DNA microarray classification” Neurocomputing, vol. 70, Issues 1–3, December 2006, pp. 187-199. [74] B. Auffarth, M. Lopez-Sanchez and J. Cerquides, “Comparison of Redundancy and Relevance Measures for Feature Selection in Tissue Classification of CT images” Advances in Data Mining: Applications in Medicine, Web Mining, Marketing, Image and Signal Mining / [ed] Petra Perner, Heidelberg: Springer Berlin/Heidelberg, 2010, pp. 248-262. [75] M.M Mukaka, “Statistics corner: A guide to appropriate use of correlation coefficient in medical research” Malawi Medical Journal, 2012 September, vol. 24 issue. 3, pp. 69-71. [76] W. Duch, P. Matykiewicz and J. Pestianc, “Neurolinguistic approach to natural language processing with applications to medical text analysis” Journal of Neural Networks, vol. 21, issue. 10, December 2008, pp. 1500-1510. [77] I. Solti, C. R. Cooke, F. Xia and M. M. Wurfel, “Automated classification of radiology reports for acute lung injury: Comparison of keyword and machine learning based natural language processing approaches” IEEE International Conference on Bioinformatics and Biomedicine Workshop, 1-4 Nov. 2009, Washington, DC, USA [78] L. Antiqueira, M. G. V. Nunes, O. N. Oliveira Jr. and L. da F. Costab, “Strong correlations between text quality and complex networks features” Physica A: Statistical Mechanics and its Applications, vol. 373, issue. 1 January 2007, pp. 811- 820. [79] M. Chong, L. Specia, R. Mitkov, “Using Natural Language Processing for Automatic Detection of Plagiarism” In Proceedings of 4th International Plagiarism Conference, Northumbria University, Newcastle upon Tyne, UK. [80] E. Loper and S. Bird, “NLTK: the natural language toolkit” In Proceedings of the ACL-02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics, 2002. [81] I. Rish, “An empirical study of the naive bayes classifier”, In Proceedings of IJCAI- 01 workshop on Empirical Methods in AI”, pp. 41–46, Sicily, Italy, 2001. [82] S. R. Safavian and D. Landgrebe, “A survey of decision tree classifier methodology”, IEEE Transactions on Systems, Man and Cybernetics, pp. 660–674, 1991. [83] L. Breiman, “Random forests”, Machine Learning, 45:5–32, 2001. 88 [84] L. Breiman, J. H. Friedman and R. A. Olshen, “Classification and Regression Trees”, Wadsworth, 1984. [85] J. C. Platt, “Sequential minimal oprimization: A fast algorithm for training support vector machines”, Technical REport MSR-TR_98_14, Microsoft Research, 1998. [86] Niels Landwehr, Mark Hall and Eibe Frank, “Logistic Model Trees”, pp. 161-205, vol. 95, 2005. [87] Mukta, M. S. H.; Ali, M. E.; and Mahmud, J. 2016b. “User generated vs. supported contents: Which one can better predict basic human values?” In Social Informatics, pp. 454–470. Springer. [88] Gong, W., Lim, E.P., Zhu, F., “Characterizing silent users in social media communities”, In ICWSM, 2015. [89] Kosinski M, Stillwell D, Graepel T. “Private traits and attributes are predictable from digital records of human behavior”, In Proceedings of the National Academy of Sciences of the United States of America; 2013: PNAS. pp. 5802-5805. [90] Youyou W, Kosinski M, Stillwell D. “Computer-based personality judgments are more accurate than those made by humans”, In National Academy of Sciences; 2015. pp. 1036-1040. [91] Bodlund, O., Ekselius, L. & Linstrom, E. (1993) Personality traits and disorders among psychiatric outpatients and normal subjects on the basis of the SCID screen questionnaire. Nordisk Psykiatrisk Tidsskrift, 47, 425–433. [92] Coid, J. W. (2003) Formulating strategies for the primary prevention of adult antisocial behaviour: ‘high risk’ or ‘population’ strategies? In Early Prevention of Adult Antisocial Behaviour (eds Farrington, D. P. & Coid, J. W.), pp. 32– 78. Cambridge: Cambridge University Press. [93] E. Papalexakis, K. Pelechrinis, and C. Faloutsos, “Spotting misbehaviors in locationbased social networks using tensors,” in Proc. 23rd Int. Conf. World Wide Web, New York, NY, USA, 2014, pp. 551–552. [94] Wanita Sherchan, Surya Nepal, and Cecile Paris. 2013. A Survey of Trust in Social Networks. ACM Comput. Surv. 45, 4, Article 47 (Aug. 2013), 33 pages. DOI:http://dx.doi.org/10.1145/2501654.2501661 [95] Hanknson, P., Witmer, H., (2015), “Social Media and trust, a systematic literature review”, Journal of business and economics, Vol, No (3), pp: 517-524. DOI: 10.15341/jbe(2155-7950)/0..06.2015/010 [96] Jolliffe I.T. Principal Component Analysis, Series: Springer Series in Statistics, 2nd ed., Springer, NY, 2002, XXIX, 487 p. 28 illus. ISBN 978-0-387-95442-4. 89 [97] Abdi. H. & Williams, L.J. (2010). "Principal component analysis". Wiley Interdisciplinary Reviews: Computational Statistics. 2 (4): 433–459. [98] Mika, S.; et al. (1999). Fisher Discriminant Analysis with Kernels. IEEE Conference on Neural Networks for Signal Processing IX. pp. 41–48. [99] Campbell RS, Pennebaker JW (2003) The secret Life of Pronouns: Flexibility in Writing Style and Physical Health. In Journal of Psychological Science, American Psychological Society, Vol. 14, No. 1, January. [100] S. Mallik and Z. Zhao, "ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to Carcinogenesis", Genes, vol. 9, no. 1, p. 7, 2017. Available: 10.3390/genes9010007. [101] S. Mallik, T. Bhadra and U. Maulik, "Identifying Epigenetic Biomarkers using Maximal Relevance and Minimal Redundancy Based Feature Selection for Multi- Omics Data", IEEE Transactions on NanoBioscience, vol. 16, no. 1, pp. 3-10, 2017. Available: 10.1109/tnb.2017.2650217. [102] T. Bhadra, S. Mallik and S. Bandyopadhyay, "Identification of Multiview Gene Modules Using Mutual Information-Based Hypograph Mining", IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 49, no. 6, pp. 1119-1130, 2019. Available: 10.1109/tsmc.2017.2726553. [103] Trunk, G. V. (July 1979). "A Problem of Dimensionality: A Simple Example". IEEE Transactions on Pattern Analysis and Machine Intelligence. PAMI-1 (3): 306–307. doi:10.1109/TPAMI.1979.4766926. en_US
dc.identifier.uri http://hdl.handle.net/123456789/557
dc.description Supervised by Prof. Dr. Md. Kamrul Hasan en_US
dc.description.abstract Social media such as Facebook, Twitter, Google+ etc. has become a huge repository of textual data and images as each of the users’ are creating posts, sharing views or news, capturing the moments via photos etc. User generated textual data such as statuses can be considered as the essential language to communicate in social media with others. Predicting personality traits from these social media data is a sophisticated task performed in computational social science. Among several personality prediction models, the Big Five Factor Model is one of the widely used personality traits hypothesis used by computational psychologists. The five traits that are centered for identifying ones personality are Openness-to-experience (O), Conscientiousness (C), Extraversion (E), Agreeableness (A), and Neuroticism (N). The first four traits are considered as positive traits and the only negative personality trait is neuroticism. In this thesis, we have focused on predicting these personality traits utilizing linguistic & social network features and identifying the prominent features using feature selection algorithms for each of the traits separately. We have evaluated the efficiency of machine learning techniques using the extracted features. To determine the most prominent features for individual personality traits and features that are commonly found in every personality traits, manual and automated feature selection has been applied. It is anticipated that the analysis reported in this study can be applied to develop personalized recommendation systems in social media, predicting personality disorder and identifying the trust issues in social media. en_US
dc.language.iso en en_US
dc.publisher Department of Computer Science and Engineering, Islamic University of Technology, Gazipur, Bangladesh en_US
dc.subject Social Media, Computational Personality Prediction, Personality Traits, Psycholinguistic Features, Social Network Features, Automated Feature Selection Algorithms. en_US
dc.title Predicting Users’ Personality from Social Media using Linguistic and Social Network Features en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IUT Repository


Advanced Search

Browse

My Account

Statistics