dc.identifier.citation |
[1] Mehmet Berkehan Ak¸cay and Kaya O˘guz. “Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers”. In: Speech Communication 116 (2020), pp. 56–76. [2] Sabur Ajibola Alim and N Khair Alang Rashid. Some commonly used speech feature extraction algorithms. IntechOpen, 2018. [3] Moataz M. H. El Ayadi, Mohamed S. Kamel, and Fakhri Karray. “Speech Emotion Recognition using Gaussian Mixture Vector Autoregressive Models”. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP ’07 4 (2007), pp. IV-957-IV–960. [4] Felix Burkhardt et al. “A database of German emotional speech”. In: INTERSPEECH. 2005. [5] Carlos Busso et al. “IEMOCAP: Interactive emotional dyadic motion capture database”. In: Language Resources and Evaluation 42 (Dec. 2008), pp. 335–359. doi: 10.1007/s10579-008-9076-6. [6] Houwei Cao, Ragini Verma, and Ani Nenkova. “Speaker-sensitive Emotion Recognition via Ranking: Studies on Acted and Spontaneous Speech”. In: Computer Speech Language 29 (Feb. 2014). doi: 10.1016/j.csl.2014. 01.003. [7] John HL Hansen et al. “Getting started with SUSAS: a speech under simulated and actual stress database.” In: Eurospeech. Vol. 97. 4. 1997, pp. 1743–46. 53 [8] Philip Jackson and SJUoSG Haq. “Surrey audio-visual expressed emotion (savee) database”. In: University of Surrey: Guildford, UK (2014). [9] Manas Jain et al. Speech Emotion Recognition using Support Vector Machine. 2020. [10] Markus K¨achele et al. “Prosodic, spectral and voice quality feature selection using a long-term stopping criterion for audio-based emotion recognition”. In: 2014 22nd International Conference on Pattern Recognition. IEEE. 2014, pp. 803–808. [11] Ruhul Amin Khalil et al. “Speech Emotion Recognition Using Deep Learning Techniques: A Review”. In: IEEE Access 7 (2019), pp. 117327–117345. doi: 10.1109/ACCESS.2019.2936124. [12] Shadi Langari, Hossein Marvi, and Morteza Zahedi. “Efficient speech emotion recognition using modified feature extraction”. In: Informatics in Medicine Unlocked 20 (2020), p. 100424. [13] Aijun Li et al. “CASS: A phonetically transcribed corpus of Mandarin spontaneous speech”. In: Sixth International Conference on Spoken Language Processing. 2000. [14] Zhentao Liu et al. “Emotional feature selection of speaker-independent speech based on correlation analysis and Fisher”. In: 2015 34th Chinese Control Conference (CCC) (2015), pp. 3780–3784. [15] Steven R Livingstone and Frank A Russo. “The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English”. In: PloS one 13.5 (2018), e0196391. 54 [16] Olivier Martin et al. “The eNTERFACE’05 audio-visual emotion database”. In: 22nd International Conference on Data Engineering Workshops (ICDEW’06). IEEE. 2006, pp. 8–8. [17] Gary McKeown et al. “The semaine database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent”. In: IEEE transactions on affective computing 3.1 (2011), pp. 5–17. [18] Rosanna Milner et al. “A Cross-Corpus Study on Speech Emotion Recognition”. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). 2019, pp. 304–311. doi: 10.1109/ASRU46091.2019.9003838. [19] Tin Nwe, S.W. Foo, and Liyanage De Silva. “Speech Emotion Recognition Using Hidden Markov Models”. In: Speech Communication 41 (Nov. 2003), pp. 603–623. doi: 10.1016/S0167-6393(03)00099-2. [20] Turgut Ozseven. “A novel feature selection method for speech emotion ¨ recognition”. In: Applied Acoustics 146 (2019), pp. 320–326. [21] M. Kathleen Pichora-Fuller and Kate Dupuis. Toronto emotional speech set (TESS). Version DRAFT VERSION. 2020. doi: 10 . 5683 / SP2 / E8H2MF. url: https://doi.org/10.5683/SP2/E8H2MF. [22] Rajesvary Rajoo and Ching Chee Aun. “Influences of languages in speech emotion recognition: A comparative study using Malay, English and Mandarin languages”. In: 2016 IEEE Symposium on Computer Applications Industrial Electronics (ISCAIE). 2016, pp. 35–39. doi: 10 . 1109 / ISCAIE . 2016 . 7575033. [23] K Sreenivasa Rao, Shashidhar G Koolagudi, and Ramu Reddy Vempada. “Emotion recognition from speech using global and local prosodic features”. In: International journal of speech technology 16.2 (2013), pp. 143–160. 55 [24] Fabien Ringeval et al. “Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions”. In: 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG). IEEE. 2013, pp. 1–8. [25] Fardin Saad et al. Is Speech Emotion Recognition Language-Independent? Analysis of English and Bangla Languages using Language-Independent Vocal Features. Nov. 2021. [26] J¨urgen Schmidhuber. “Deep learning in neural networks: An overview”. In: Neural networks 61 (2015), pp. 85–117. [27] B. Schuller, G. Rigoll, and M. Lang. “Hidden Markov model-based speech emotion recognition”. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP ’03). Vol. 2. 2003, pp. II–1. doi: 10.1109/ICASSP.2003.1202279. [28] Maheshwari Selvaraj, R Bhuvana, and S Padmaja. “Human speech emotion recognition”. In: International Journal of Engineering & Technology 8 (2016), pp. 311–323. [29] Linhui Sun, Sheng Fu, and Fu Wang. “Decision tree SVM model with Fisher feature selection for speech emotion recognition”. In: EURASIP Journal on Audio, Speech, and Music Processing 2019.1 (2019), pp. 1–14. [30] Shiqing Zhang. “Emotion Recognition in Chinese Natural Speech by Combining Prosody and Voice Quality Features”. In: Sept. 2008, pp. 457–464. |
en_US |
dc.description.abstract |
Great progress have been made in speech recognition but we still have a long
way to go to have a smooth human-computer interaction because the computer
still finds it difficult to understand the emotional state of the speaker. This has
introduced a brought into light a relatively recent research field, namely Speech
Emotion Recognition. There are some implicit information about the emotions
in every speech signal, which can be extracted through speech processing methods.
There are many systems proposed in the literature to identify the emotional state
through speech. Extraction of features from speech, Selecting a suitable feature set,
designing a proper classifications method and preparing an proper dataset are the
main points of designing a Speech Emotion Recognition (SER) systems. However
despite significant progress in this area there still remains many things which are
not well understood, specially, when attention was given to the cultural differences
of people. Emotions Recognition in speech can vary from person to person based on
their age, gender, language, accents and many other factors. To explore how much
accents affect SER, we looked into how the feature varies for different accents in the
domain of Speech Emotion Recognition. This paper focuses on the issue if Speech
Emotion Recognition is Accent Independent or not. Study on different speech
features, experiments on their extraction process and reduction techniques and
experiments on selection of accent independent features are carried out. Which
will be used to train a model and will lead us to a conclusion if SER depends
on accents or not and which of the features of Speech help identify the emotions
more accurately despite of the accent. |
en_US |