DETEKSI EMOSI BERDASARKAN WICARA MENGGUNAKAN DEEP LEARNING MODEL

Siska Rahmadani; Cicih Sri Rahayu; Agus Salim; Karno Nur Cahyo

doi:10.51401/jinteks.v4i3.1952

Authors

Siska Rahmadani Universitas Nusa Mandiri
Cicih Sri Rahayu Universitas Nusa Mandiri
Agus Salim Universitas Nusa Mandiri
Karno Nur Cahyo Universitas Nusa Mandiri

DOI:

https://doi.org/10.51401/jinteks.v4i3.1952

Keywords:

Emotion Detection, EmoDB, Deep Learning, Feature Extraction

Abstract

The ability of computers to imitate human abilities has been an interesting thing to develop. In several studies, emotion recognition has been studied both through facial photos and verbal and non-verbal speech. This study aims to explore various deep learning methods to get the best model for detecting emotions using the EmoDB dataset. Feature extraction is done using Zero Crossing Rate, Chroma_stft, Mel Frequency Cepstral Coefficients (MFCC), Root Mean Square (RMS) and MelSpectogram. In the pre-processing stage, data augmentation techniques are applied by applying noise injection, shifting time and changing the audio pitch and speed. From the results of the study, it was stated that the best deep learning method based on the accuracy value was CNN-BiLSTM.

References

K. Sailunaz, M. Dhaliwal, J. Rokne, and R. Alhajj, “Emotion detection from text and speech: a survey,” Soc. Netw. Anal. Min., vol. 8, no. 1, 2018.

B. Dong and X. Wang, “Comparison deep learning method to traditional methods using for network intrusion detection,” Proc. 2016 8th IEEE Int. Conf. Commun. Softw. Networks, ICCSN 2016, pp. 581–585, 2016.

Y. Ding et al., “TSception:A Deep Learning Framework for Emotion Detection Using EEG,” Proc. Int. Jt. Conf. Neural Networks, 2020.

S. Kwon, “A CNN-Assisted Enhanced Audio Signal Processing,” Sensors, 2020.

A. Jaiswal, A. Krishnama Raju, and S. Deb, “Facial emotion detection using deep learning,” 2020 Int. Conf. Emerg. Technol. INCET 2020, pp. 1–5, 2020.

D. R. Rizvi, I. Nissar, S. Masood, M. Ahmed, and F. Ahmad, “An LSTM based deep learning model for voice-based detection of Parkinson’s disease,” Int. J. Adv. Sci. Technol., vol. 29, no. 5 Special Issue, pp. 337–343, 2020.

V. Gupta, “Voice Disorder Detection Using Long Short Term Memory (LSTM) Model,” 2018.

R. Monir, D. Kostrzewa, and D. Mrozek, “Singing Voice Detection: A Survey,” Entropy, vol. 24, no. 1, 2022.

A. Shewalkar, D. nyavanandi, and S. A. Ludwig, “Performance Evaluation of Deep neural networks Applied to Speech Recognition: Rnn, LSTM and GRU,” J. Artif. Intell. Soft Comput. Res., vol. 9, no. 4, pp. 235–245, 2019.

L. Santamaria-Granados, M. Munoz-Organero, G. Ramirez-Gonzalez, E. Abdulhay, and N. Arunkumar, “Using Deep Convolutional Neural Network for Emotion Detection on a Physiological Signals Dataset (AMIGOS),” IEEE Access, vol. 7, pp. 57–67, 2019.

D. Nagajyothi and P. Siddaiah, “Speech recognition using convolutional neural networks,” Int. J. Eng. Technol., vol. 7, no. 4.6 Special Issue 6, pp. 133–137, 2018.

S. Siami-Namini, N. Tavakoli, and A. S. Namin, “The Performance of LSTM and BiLSTM in Forecasting Time Series,” Proc. - 2019 IEEE Int. Conf. Big Data, Big Data 2019, pp. 3285–3292, 2019.

L. Muda, M. Begam, and I. Elamvazuthi, “Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques,” vol. 2, no. 3, pp. 138–143, 2010.

“Emo-DB.” [Online]. Available: http://emodb.bilderbar.info/download/. [Accessed: 05-Aug-2022].