Abstract:
Social media such as Facebook, Twitter, Google+ etc. has become a huge repository of textual
data and images as each of the users’ are creating posts, sharing views or news, capturing the
moments via photos etc. User generated textual data such as statuses can be considered as the
essential language to communicate in social media with others. Predicting personality traits from
these social media data is a sophisticated task performed in computational social science. Among
several personality prediction models, the Big Five Factor Model is one of the widely used
personality traits hypothesis used by computational psychologists. The five traits that are centered
for identifying ones personality are Openness-to-experience (O), Conscientiousness (C),
Extraversion (E), Agreeableness (A), and Neuroticism (N). The first four traits are considered as
positive traits and the only negative personality trait is neuroticism. In this thesis, we have focused
on predicting these personality traits utilizing linguistic & social network features and identifying
the prominent features using feature selection algorithms for each of the traits separately. We have
evaluated the efficiency of machine learning techniques using the extracted features. To determine
the most prominent features for individual personality traits and features that are commonly found
in every personality traits, manual and automated feature selection has been applied. It is
anticipated that the analysis reported in this study can be applied to develop personalized
recommendation systems in social media, predicting personality disorder and identifying the trust
issues in social media.