Applied Informatics Group

Bootstrapping Speech Production Skills in Human Robot Interaction

This project aims at implementing a developmental model of speech acquisition. With such a model the robot can learn how to control its articulatory system similar to how infants learn how to speak


Project Overview

Current robotic systems are able to communicate with humans using preprogrammed dialog acts, but they cannot flexibly react to changing circumstances. With articulatory speech synthesis a flexible adaptation of the robot's speaking behaviour would be more easy: the robot had knowledge about articulation. It could, thus, easily adapt speaking style and speed on-line like humans are doing it. A major problem with articulatory speech synthesis is  the control of the high-dimensional vocal tract model. Due to co-articulation effects the mapping from desired speech sounds to the required articulatory configuration is non-trivial.

To learn this mapping we propose a developmental approach based on goal babbling: by gradually exploring the possibilities of the vocal tract the robot builds a parametric model of how to produce specific speech sounds.


Recent Results

Our model autonomously explores the capabilities of its vocal tract using goal babbling. It is influenced by ambient speech, i.e. it learns to reproduce sounds that are frequent in the environment. We demonstrated that a range of different speech sounds can be learned using two different vocal tract simulations [1], [2].
Apart from its potential to be used in robotics, the proposed model constitutes a framework for investigating mechanisms that humans (especially infants) make use of when learning how to speak. This allows to gain important insights into infant speech learning, e.g. concerning the role of hyperarticulation in infancy [3].

Related Publications

Recent Best Paper/Poster Awards

Goal Babbling of Acoustic-Articulatory Models with Adaptive Exploration Noise
Philippsen A, Reinhart F, Wrede B (2016)
International Conference on Development and Learning and on Epigenetic Robotics (ICDL-EpiRob) 


Are you talking to me? Improving the robustness of dialogue systems in a multi party HRI scenario by incorporating gaze direction and lip movement of attendees
Richter V, Carlmeyer B, Lier F, Meyer zu Borgsen S, Kummert F, Wachsmuth S, Wrede B (2016)
International Conference on Human-agent Interaction (HAI) 


"Look at Me!": Self-Interruptions as Attention Booster?
Carlmeyer B, Schlangen D, Wrede B (2016)
International Conference on Human Agent Interaction (HAI)


For members