May 7 - Special session n. 3
ASR Robustness, both with and without Arrays
(chaired by Satoshi Nakamura)

1) Jasha Droppo
"Single Channel Enhancement for Speech Recognition."
2) Richard Stern
"General Trends and Auditory Systems Approach."
3) Michael Seltzer
"Bridging the GAP: Towards a Unified Framework for Hands-free Speech Recognition Using Michropone Arrays."
4) John Mc Donough and Matthias Woelfel
"Distant Speech Recognition: Bridging the Gaps."


Special session n. 3 - abstracts


Jasha Droppo "Single Channel Enhancement for Speech Recognition."
Single channel speech enhancement has been an active field of research for several decades. The majority of research has focused on human perceptions of quality and intelligibility. Nevertheless, such algorithms can be applied directly to the noise-robust speech recognition problem with some success. More gain can be achieved by creating domain-specific solutions that are modeled on the more traditional enhancement algorithms. This paper covers the major pieces necessary for a modern single channel enhancement system suitable for automatic speech recognition.

>> Go up

Richard Stern "General Trends and Auditory Systems Approach."
It is well known that binaural processing is very useful for separating incoming sound sources as well as for improving the intelligibility of speech in reverberant environments. This paper describes and compares a number of ways in which the classic model of interaural cross-correlation proposed by Jeffress, quantified by Colburn, and further elaborated by Blauert, Lindemann, and others, can be applied to improving the accuracy of automatic speech recognition systems operating in cluttered, noisy, and reverberant environments. Typical implementations begin with an abstraction of cross-correlation of the incoming signals after nonlinear monaural bandpass processing, but there are many alternative implementation choices that can be considered. Typical implementations differ in the ways in which an enhanced version of the desired signal is developed using binaural principles, in the extent to which specific processing mechanisms are used to impose suppression motivated by the precedence effect, and in the precise mechanism used to extract interaural time differences.

>> Go up

Michael Seltzer "Bridging the GAP: Towards a Unified Framework for Hands-free Speech Recognition Using Michropone Arrays."
In this paper we describe two families of algorithms for hands-free speech recognition using microphone arrays. Enhancement-based approaches use a cascade of independent processing blocks to perform speech enhancement followed by speech recognition. We discuss the reasons why this approach may be sub-optimal and motivate the need for a solution that tightly integrates all processing blocks into a common unified framework. This leads to a second family of algorithms called unified approaches which considers all processing stages to be components of a single system that operates with the common goal of improved recognition accuracy. We describe several examples of such algorithms that have been shown to outperform more traditional signal-processing-based approaches. In doing so, we hope to convey the benefits of performing hands-free speech recognition in this manner and motivate further research in this area.

>> Go up

John Mc Donough and Matthias Woelfel "Distant Speech Recognition: Bridging the Gaps."
While great progress has been made in both fields, there is currently a relatively large rift between researchers engaged in acoustic array processing and those engaged in automatic speech recognition. This is unfortunate for many reasons, but most of all because it prevents the two sides, both of whom are investigating different aspects of the same problem, from truly understanding one another and cooperating. In many cases, the two sides see each other through the eyes of strangers. If ground breaking progress is to be made in the emerging field of distant speech recognition (DSR), this abysmal state of affairs must change. In this work, we outline five pressing problems in the DSR research field, and we make initial proposals for their solutions. The problems discussed here are by no means the only ones that must be solved in order to construct truly effective DSR systems. Nonetheless, their solution, in our view, will represent significant first steps towards this goal, inasmuch as the solution of each of these problems will require a substantial change in the mindsets and thought patterns of those engaged in this field of research.

>> Go up