Robot Audition and Computational Auditory Scene Analysis

“Robot Audition” is a relatively new field of research proposed in 2000, straddling “artificial intelligence, signal processing, and robotics.” Our big theme in Robot Audition Research is how to make robots understand the surrounding sound scenes that humans normally experience. In such an environment, noise and reverberations change dynamically. For this reason, it is essential to improve noise robustness in real time in real environments where noise and reverberation are sometimes greater than the target signal. In response to these problems, we have employed elemental technology based on active audition that positively utilize robot’s own movements as the key such as location estimation of sound sources (sound source localization), extraction of target sound sources (sound source separation), and recognition of extracted sound sources (speech recognition.)

 

The technology that we have cultivated up to now is open to the public as Robot Audition Open Source Software HARK (Honda Research Institute Japan Audition for Robots with Kyoto University). By using HARK, for example, it is possible to build a capability like Prince Shotoku who can understand ten persons’ petitions simultaneously. Considering that the sound that humans usually hear is a mixed sound in which various sounds are intermingled, such a technique would be essential in dealing with real environments.

In recent years, we are also focusing on research such as distance estimation by audible sound, technology to restore damaged acoustic signals by deep learning, identification of acoustic signals by deep learning. If we can build such technology, we can expect to evolve robot audition research as an environment understanding technology that can understand surrounding environment

Publications

Back to Top