May 7 - Special session n. 4
Communication and Interaction in Smart Environments
(chaired by Sadaoki Furui)

1) Steve Renals, Thomas Hain, and Herve Bourlard
"Interpretation of Multiparty Meetings the AMI and AMIDA Projects."
2) Gerasimos Potamianos, Rajesh Balchandran, Mark E Epstein, Jing Huang, Vit Libal, and Etienne Marcheret
"Far-field Multimodal Speech Perception and Conversational Interaction in Smart Spaces."
3) Hiroshi G. Okuno
"Computational Auditory Scene Analysis and Its Application to Robot Audition."
4) John H. L. Hansen, Wooil Kim, and Pongtep Angkititrakul
"Advances in Human-machine Systems for In-vehicle Environments."


Special session n. 4 - abstracts


Steve Renals, Thomas Hain, and Herve Bourlard "Interpretation of Multiparty Meetings the AMI and AMIDA Projects."
The AMI and AMIDA projects are collaborative EU projects concerned with the automatic recognition and interpretation of multiparty meetings. This paper provides an overview of the advances we have made in these projects with a particular focus on the multimodal recording infrastructure, the publicly available AMI corpus of annotated meeting recordings, and the speech recognition framework that we have developed for this domain.

>> Go up

Gerasimos Potamianos, Rajesh Balchandran, Mark E Epstein, Jing Huang, Vit Libal, and Etienne Marcheret "Far-field Multimodal Speech Perception and Conversational Interaction in Smart Spaces."
Robust speech processing constitutes a crucial component in the development of usable and natural conversational interfaces. In this paper we are particularly interested in human-computer interaction taking place in "smart" spaces - equipped with a number of far- field, unobtrusive microphones and camera sensors. Their availability allows multi-sensory and multi-modal processing, thus improving robustness of speech-based perception technologies in a number of scenarios of interest, for example lectures and meetings held inside smart conference rooms, or interaction with domotic devices in smart homes. In this paper, we overview recent work at IBM Research in developing state-of-the-art speech technology in smart spaces. In particular we discuss acoustic scene analysis, speech activity detection, speaker diarization, and speech recognition, emphasizing multi-sensory or multi-modal processing. The resulting technology is envisaged to allow far-field conversational interaction in smart spaces based on dialog management and natural language understanding of user requests.

>> Go up

Hiroshi G. Okuno "Computational Auditory Scene Analysis and Its Application to Robot Audition."
Robot capability of hearing sounds, in particular, a mixture of sounds, by its own microphones, that is, robot audition, is important in improving human robot interaction. This paper presents the robot audition open-source software, called "HARK" (HRI-JP Audition for Robots with Kyoto University), which consists of primitive functions in computational auditory scene analysis; sound source localization, separation, and recognition of separated sounds. Since separated sounds suffer from spectral distortion due to separation, the HARK generates a time-spectral map of reliability, called "missing feature mask", for features of separated sounds. Then separated sounds are recognized by the Missing-Feature Theory (MFT) based ASR with missing feature masks. The HARK is implemented on the middleware called "FlowDesigner" to share intermediate audio data, which enables near real-time processing.

>> Go up

John H. L. Hansen, Wooil Kim, and Pongtep Angkititrakul "Advances in Human-machine Systems for In-vehicle Environments."
As computing technology advances, the ability to integrate a wider range of personal services for in-vehicle environments increases. While these advances offer a diverse range of entertainment and information access opportunities, they generally are introduced into the vehicle with limited understanding of their impact to driver distraction and cognitive stress load. As the diversity of speech, video, biometric, and vehicle signals increases, improved corpora and system formulation are needed. In this study, we consider recent advances for in-vehicle human-machine systems for route navigation, noise suppression for robust speech recognition, and driver behavior modeling. Multi-microphone array processing is developed for noise suppression for hands-free communications as well as improved automatic speech recognition for route dialog interaction. Next, advances in modeling driver behavior are considered in the UTDrive project, which is focused on advancing smart vehicle technologies for improved safety while driving. Finally, a general discussion considers next generation advances for in-vehicle environments which sense driver cognitive stress/distraction to adapt interactive systems for improved safety.

>> Go up