Applied Informatics Group

Multimodal Recognition of Socio-emotional Signals

The interpretation of facial expressions, head gestures, and prosodic information are important non-verbal cues for intelligent systems. The usage of these cues enables such a system to gain information about the mental state of the user and the quality of an interaction.

Research Questions

As a basis for the interpretation of a human face we use facial point extraction methods. By tracking these points over time we can calculate relevant features for the classification of the human's internal state [1] or facial communicative signals [2]. In contrast to many other groups we do not want to recognize the seven basic emotions of Ekman but concentrate on socio-emotional signals like smiling, agreement, or confusion.

In addition to visual cues we analyse the prosody. We extract different features like pitch, energy and MFCCs from speech signals to detect socio-emotional signals and recognise user states such as hesitation and uncertainty. Other cues for the user state can be gained from filled-pause analysis. We use these prosodic features and combine them with the visual cues to get better classification results for the mental state of the user.


Franz Kummert

Related Projects

Related Publications

Recent Best Paper/Poster Awards

Goal Babbling of Acoustic-Articulatory Models with Adaptive Exploration Noise
Philippsen A, Reinhart F, Wrede B (2016)
International Conference on Development and Learning and on Epigenetic Robotics (ICDL-EpiRob) 


Are you talking to me? Improving the robustness of dialogue systems in a multi party HRI scenario by incorporating gaze direction and lip movement of attendees
Richter V, Carlmeyer B, Lier F, Meyer zu Borgsen S, Kummert F, Wachsmuth S, Wrede B (2016)
International Conference on Human-agent Interaction (HAI) 


"Look at Me!": Self-Interruptions as Attention Booster?
Carlmeyer B, Schlangen D, Wrede B (2016)
International Conference on Human Agent Interaction (HAI)


For members