Voices are an important modality for emotional expression. Speech is a relevant communicational channel enriched with emotions: the voice in speech not only conveys a semantic message but also the information about the emotional state of the speaker.

Speech Emotion Recognition (SER) is the process of  classifying the input utterance into some emotional classes based on it’s vocal features. So the main goal in SER is correctly extracting the emotional state of a speaker from his/her speech.