Study on the Accent Independent Features for Speech Emotion Recognition

Tabassum, Nowshin; Tabassum, Tasfia; Safa, Tahiya Sultana

dc.contributor.author	Tabassum, Nowshin
dc.contributor.author	Tabassum, Tasfia
dc.contributor.author	Safa, Tahiya Sultana
dc.date.accessioned	2023-01-27T09:35:54Z
dc.date.available	2023-01-27T09:35:54Z
dc.date.issued	2022-05-30
dc.identifier.citation	[1] Mehmet Berkehan Ak¸cay and Kaya O˘guz. “Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers”. In: Speech Communication 116 (2020), pp. 56–76. [2] Sabur Ajibola Alim and N Khair Alang Rashid. Some commonly used speech feature extraction algorithms. IntechOpen, 2018. [3] Moataz M. H. El Ayadi, Mohamed S. Kamel, and Fakhri Karray. “Speech Emotion Recognition using Gaussian Mixture Vector Autoregressive Models”. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP ’07 4 (2007), pp. IV-957-IV–960. [4] Felix Burkhardt et al. “A database of German emotional speech”. In: INTERSPEECH. 2005. [5] Carlos Busso et al. “IEMOCAP: Interactive emotional dyadic motion capture database”. In: Language Resources and Evaluation 42 (Dec. 2008), pp. 335–359. doi: 10.1007/s10579-008-9076-6. [6] Houwei Cao, Ragini Verma, and Ani Nenkova. “Speaker-sensitive Emotion Recognition via Ranking: Studies on Acted and Spontaneous Speech”. In: Computer Speech Language 29 (Feb. 2014). doi: 10.1016/j.csl.2014. 01.003. [7] John HL Hansen et al. “Getting started with SUSAS: a speech under simulated and actual stress database.” In: Eurospeech. Vol. 97. 4. 1997, pp. 1743–46. 53 [8] Philip Jackson and SJUoSG Haq. “Surrey audio-visual expressed emotion (savee) database”. In: University of Surrey: Guildford, UK (2014). [9] Manas Jain et al. Speech Emotion Recognition using Support Vector Machine. 2020. [10] Markus K¨achele et al. “Prosodic, spectral and voice quality feature selection using a long-term stopping criterion for audio-based emotion recognition”. In: 2014 22nd International Conference on Pattern Recognition. IEEE. 2014, pp. 803–808. [11] Ruhul Amin Khalil et al. “Speech Emotion Recognition Using Deep Learning Techniques: A Review”. In: IEEE Access 7 (2019), pp. 117327–117345. doi: 10.1109/ACCESS.2019.2936124. [12] Shadi Langari, Hossein Marvi, and Morteza Zahedi. “Efficient speech emotion recognition using modified feature extraction”. In: Informatics in Medicine Unlocked 20 (2020), p. 100424. [13] Aijun Li et al. “CASS: A phonetically transcribed corpus of Mandarin spontaneous speech”. In: Sixth International Conference on Spoken Language Processing. 2000. [14] Zhentao Liu et al. “Emotional feature selection of speaker-independent speech based on correlation analysis and Fisher”. In: 2015 34th Chinese Control Conference (CCC) (2015), pp. 3780–3784. [15] Steven R Livingstone and Frank A Russo. “The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English”. In: PloS one 13.5 (2018), e0196391. 54 [16] Olivier Martin et al. “The eNTERFACE’05 audio-visual emotion database”. In: 22nd International Conference on Data Engineering Workshops (ICDEW’06). IEEE. 2006, pp. 8–8. [17] Gary McKeown et al. “The semaine database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent”. In: IEEE transactions on affective computing 3.1 (2011), pp. 5–17. [18] Rosanna Milner et al. “A Cross-Corpus Study on Speech Emotion Recognition”. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). 2019, pp. 304–311. doi: 10.1109/ASRU46091.2019.9003838. [19] Tin Nwe, S.W. Foo, and Liyanage De Silva. “Speech Emotion Recognition Using Hidden Markov Models”. In: Speech Communication 41 (Nov. 2003), pp. 603–623. doi: 10.1016/S0167-6393(03)00099-2. [20] Turgut Ozseven. “A novel feature selection method for speech emotion ¨ recognition”. In: Applied Acoustics 146 (2019), pp. 320–326. [21] M. Kathleen Pichora-Fuller and Kate Dupuis. Toronto emotional speech set (TESS). Version DRAFT VERSION. 2020. doi: 10 . 5683 / SP2 / E8H2MF. url: https://doi.org/10.5683/SP2/E8H2MF. [22] Rajesvary Rajoo and Ching Chee Aun. “Influences of languages in speech emotion recognition: A comparative study using Malay, English and Mandarin languages”. In: 2016 IEEE Symposium on Computer Applications Industrial Electronics (ISCAIE). 2016, pp. 35–39. doi: 10 . 1109 / ISCAIE . 2016 . 7575033. [23] K Sreenivasa Rao, Shashidhar G Koolagudi, and Ramu Reddy Vempada. “Emotion recognition from speech using global and local prosodic features”. In: International journal of speech technology 16.2 (2013), pp. 143–160. 55 [24] Fabien Ringeval et al. “Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions”. In: 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG). IEEE. 2013, pp. 1–8. [25] Fardin Saad et al. Is Speech Emotion Recognition Language-Independent? Analysis of English and Bangla Languages using Language-Independent Vocal Features. Nov. 2021. [26] J¨urgen Schmidhuber. “Deep learning in neural networks: An overview”. In: Neural networks 61 (2015), pp. 85–117. [27] B. Schuller, G. Rigoll, and M. Lang. “Hidden Markov model-based speech emotion recognition”. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP ’03). Vol. 2. 2003, pp. II–1. doi: 10.1109/ICASSP.2003.1202279. [28] Maheshwari Selvaraj, R Bhuvana, and S Padmaja. “Human speech emotion recognition”. In: International Journal of Engineering & Technology 8 (2016), pp. 311–323. [29] Linhui Sun, Sheng Fu, and Fu Wang. “Decision tree SVM model with Fisher feature selection for speech emotion recognition”. In: EURASIP Journal on Audio, Speech, and Music Processing 2019.1 (2019), pp. 1–14. [30] Shiqing Zhang. “Emotion Recognition in Chinese Natural Speech by Combining Prosody and Voice Quality Features”. In: Sept. 2008, pp. 457–464.	en_US
dc.identifier.uri	http://hdl.handle.net/123456789/1670
dc.description	Supervised by Dr. Hasan Mahmud, Assistant Professor, Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704. Bangladesh This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2022.	en_US
dc.description.abstract	Great progress have been made in speech recognition but we still have a long way to go to have a smooth human-computer interaction because the computer still finds it difficult to understand the emotional state of the speaker. This has introduced a brought into light a relatively recent research field, namely Speech Emotion Recognition. There are some implicit information about the emotions in every speech signal, which can be extracted through speech processing methods. There are many systems proposed in the literature to identify the emotional state through speech. Extraction of features from speech, Selecting a suitable feature set, designing a proper classifications method and preparing an proper dataset are the main points of designing a Speech Emotion Recognition (SER) systems. However despite significant progress in this area there still remains many things which are not well understood, specially, when attention was given to the cultural differences of people. Emotions Recognition in speech can vary from person to person based on their age, gender, language, accents and many other factors. To explore how much accents affect SER, we looked into how the feature varies for different accents in the domain of Speech Emotion Recognition. This paper focuses on the issue if Speech Emotion Recognition is Accent Independent or not. Study on different speech features, experiments on their extraction process and reduction techniques and experiments on selection of accent independent features are carried out. Which will be used to train a model and will lead us to a conclusion if SER depends on accents or not and which of the features of Speech help identify the emotions more accurately despite of the accent.	en_US
dc.language.iso	en	en_US
dc.publisher	Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladesh	en_US
dc.subject	Speech Emotion Recognition, Feature Selection, Prosodic, Spectral, Voice Quality	en_US
dc.title	Study on the Accent Independent Features for Speech Emotion Recognition	en_US
dc.type	Thesis	en_US