May 8 - Poster session n. 3
Further topics in Hands-free Speech Communication
(chaired by Dirk Slok - Eurecom Institute, Sophia Antipolis Cedex, France)

1) Jean-Marc Valin
"Perceptually-motivated Nonlinear Channel Decorrelation for Stereo Acoustic Echo Cancellation"
2) Heng Zhang, Qiang Fu and Yonghong Yan
"A Compact-microphone-array-based Speech Enhancement Algorithm using Auditory Subbands and Probability Constrained Postfilter"
3) Marcus Zeller and Walter Kellermann
"Framewise Repeated Coefficient Updates for Enhanced Nonlinear AEC by Diagonal Coordinate Volterra Filters"
4) Kensaku Fujii, Naoya Saitoh, Ryoichi Oka and Mitsuji Muneyasu
"Acoustic Echo Cancellation Algorithm Tolerable for Double Talk"
5) Paolo Peretti, Lorenzo Palestini, Stefania Cecchi and Francesco Piazza
"A Subband Approach to Wave Domain Adaptative Filtering"
6) Alexandros Tsilfidis, John Mourjopoulos and Dionysis Tsoukalas
"Blind Estimation and Suppression of Late Reverberation utilising Auditory Masking"
7) Jani Even, Hiroshi Saruwatari and Kiyohiro Shikano
"Frequency Domain Blind Signal Extraction: Application to Fast Estimation of Diffuse Background Noise"
8) Zbynek Koldovsky and Petr Tichavsky
"Time-domain Blind Audio Source Separation Using Advanced Component Clustering and Reconstruction"
9) Tomohiro Nakatani, Takuya Yoshioka, Keisuke Kinoshita, Masato Miyoshi and Biing-Hwang Juang
"Speech Dereverberation in Short Time Fourier Transform Domain with Crossband Effect Compensation"
10) Stefan Goetze, Markus Kallinger, Alfred Mertins and Karl-Dirk Kammeyer
"System Identification for Multi-Channel Listening-Room Compensation using an Acoustic Echo Canceller"
11) Alessandro Bastari, Stefano Squartini and Francesco Piazza
"Joint Acoustic Feedback Cancellation and Noise Reduction within the Prediction Error Method Framework"
12) Francesco Nesta, Maurizio Omologo and Piergiorgio Svaizer
"Separating Short Signals in Highly Reverberant Environment by a Recursive Frequency-domain BSS"
13) Bernd Iser and Gerhard Schmidt
"Receive Side Processing for Automotive Hands-Free Systems"
14) Hamid Sepehr, Masoud Ahmadi and Paul Brennan
"Removal of Siren Mixed with Speech by Wavelet Transform and Adaptive Notch Filter"
15) Roland Potthast, Filippo Fazi, Phil Nelson and Jeongil Seo
"Two Sound Field Reconstruction Techniques based on Integral Equations"
16) Mahdi Triki and Dirk T.M. Slock
"Robust Delay-&-Predict Equalization for Blind SIMO Channel Dereverberation"


Poster session n. 3 - abstracts


Jean-Marc Valin "Perceptually-motivated Nonlinear Channel Decorrelation for Stereo Acoustic Echo Cancellation"
Acoustic echo cancellation with stereo signals is generally an underdetermined problem because of the high coherence between the left and right channels. In this paper, we present a novel method of significantly reducing inter-channel coherence without affecting the audio quality. Our work takes into account psychoacoustic masking and binaural auditory cues. The proposed non-linear processing combines a shaped comb-allpass (SCAL) filter with the injection of psychoacoustically masked noise. We show that the proposed method performs significantly better than other known methods for reducing inter-channel coherence.

>> Go up

Heng Zhang, Qiang Fu and Yonghong Yan "A Compact-microphone-array-based Speech Enhancement Algorithm using Auditory Subbands and Probability Constrained Postfilter"
A microphone-array-based speech enhancement algorithm is proposed in this paper. Two omni-directional microphones are placed in end-fire orientation. The received acoustic signals are processed by a modified version of adaptive null-forming algorithm, which features a gammmatone filterbank to decompose the input into a series of auditory subbands and implement null-forming in each of them. The approach also includes a postfiltering method to further enhance the desired signal. Considerable improvement on directivity and signalto- noise ratio (SNR) is achieved with comparatively small array size of about 2cm. It is also capable of canceling transient interference coming from directions other than look direction. Experiments con- firm the effectiveness of the proposed system.

>> Go up

Marcus Zeller and Walter Kellermann "Framewise Repeated Coefficient Updates for Enhanced Nonlinear AEC by Diagonal Coordinate Volterra Filters"
In this paper, we extend the approach of iterated coefficient updates to DFT-domain Volterra filters in diagonal coordinates. This technique, which has recently been successfully applied to the partitioned block frequency-domain Volterra filter, repeatedly uses the same block of data of the DFT-domain LMS adaptation and thus performs several updates per frame. Therefore, the available signal innovation is exploited more efficiently as this procedure can be implemented using a recursive scheme which demands a relatively moderate increase in computational power compared to the signifi- cantly improved performance. The benefits of such an approach are demonstrated by experimental results regarding both the echo return loss enhancement (ERLE) and the coefficient error norms of higherorder kernels for both noise and real speech signals.

>> Go up

Kensaku Fujii, Naoya Saitoh, Ryoichi Oka and Mitsuji Muneyasu "Acoustic Echo Cancellation Algorithm Tolerable for Double Talk"
This paper proposes a step size control method capable of steadily canceling acoustic echo resisting double talk. The method is characterized by applying a sub-adaptive filter to the control. The step size and the number of taps of the subadaptive filter are larger and fewer than those of the main adaptive filter used for canceling the acoustic echo, respectively. Accordingly, the sub-adaptive filter can reduce the residual echo more rapidly than the main adaptive filter. The proposed method applies the step size calculated using the residual echo to the main adaptive filter, and thereby, quickly and steadily reduces the acoustic echo. This paper finally verifies that the proposed method can provide almost the same convergence speed as that obtained by applying a fixed large step size to the main adaptive filter.

>> Go up

Paolo Peretti, Lorenzo Palestini, Stefania Cecchi and Francesco Piazza "A Subband Approach to Wave Domain Adaptative Filtering"
Wave Domain Adaptive Filtering (WDAF) is a recently developed technique applied to systems based on Wave Field Analysis/Synthesis. It is used to reduce the computational complexity of adaptive algorithms employed in these scenarios, where the number of involved loudspeakers and microphones is particularly high. In this paper we will present a streaming based frame-by-frame implementation of WDAF and we will show how its subband extension, through a cosine modulated uniform filter bank, lead to an improved convergence rate and reduced mean square error.

>> Go up

Alexandros Tsilfidis, John Mourjopoulos and Dionysis Tsoukalas "Blind Estimation and Suppression of Late Reverberation utilising Auditory Masking"
A new method for blind estimation and suppression of late reverberation of speech signals is presented. The proposed algorithm consists of two steps. In a first step, the reverberation time is blindly determined from the reverberant signal. Then, an approximation of the power spectrum of late reverberation is subtracted from the power spectrum of the reverberant signal. Hence, a preliminary estimation of the anechoic speech spectrum is derived. In a second step, the auditory masking threshold of the clean spectrum estimation is calculated and used to define the coefficients for a nonlinear filter for the reverberant signal, which produces the final enhanced speech signal. The performance of the algorithm is demonstrated on artificially generated signals. Subjective tests are conducted and their results indicate that the quality of the speech signals obtained by the proposed method is superior when compared to previous methods.

>> Go up

Jani Even, Hiroshi Saruwatari and Kiyohiro Shikano "Frequency Domain Blind Signal Extraction: Application to Fast Estimation of Diffuse Background Noise"
In this paper, we propose to replace frequency domain blind signal separation (FD-BSS) in hands-free speech applications by frequency domain blind signal extraction (FD-BSE). Unlike conventional FD-BSS methods that aim at separating all the signals, FD-BSE extracts only one signal from the mixture resulting in a faster algorithm. After presenting the FD-BSE method based on mutual information minimization, we show that the proposed method enables fast estimation of the background diffuse noise in a hands-free human/machine communication scenario.

>> Go up

Zbynek Koldovsky and Petr Tichavsky "Time-domain Blind Audio Source Separation Using Advanced Component Clustering and Reconstruction"
We present a novel time-domain method for blind separation of convolutive mixture of audio sources (the cocktail party problem). The method allows efficient separation with good signal-to-interference ratio (SIR) and signal-to-distortion ratio (SDR) using short data segments only. In practice, we are able to separate 2-4 speakers from audio recording of the length less than 6000 samples, which is less than 1 s in the 8 kHz sampling. The average time needed to process the data with filter of the length 20 was 2.2 seconds in Matlab v. 7.2 on an ordinary PC with 3GHz processor.

>> Go up

Tomohiro Nakatani, Takuya Yoshioka, Keisuke Kinoshita, Masato Miyoshi and Biing-Hwang Juang "Speech Dereverberation in Short Time Fourier Transform Domain with Crossband Effect Compensation"
It has recently been shown that the maximum likelihood estimation approach with a time-varying source model is very effective in achieving speech dereverberation based only on a short observation. In addition, STFT domain processing has been shown to be promising for implementing this dereverberation approach in a computationally efficient way. This paper presents a way of further improving the STFT domain speech dereverberation in terms of both computational cost and accuracy. One important issue here is how to calculate time-domain convolution with a long filter precisely using STFT. We introduce an STFT domain filtering method with crossband effect compensation for this purpose. Experimental results show that the proposed method allows us to implement the dereverberation algorithm in the STFT domain more precisely with less computational cost than the existing method.

>> Go up

Stefan Goetze, Markus Kallinger, Alfred Mertins and Karl-Dirk Kammeyer "System Identification for Multi-Channel Listening-Room Compensation using an Acoustic Echo Canceller"
Modern hands-free telecommunication devices jointly apply several subsystems, e.g. for noise reduction (NR), acoustic echo cancellation (AEC) and listening-room compensation (LRC). In this contribution the combination of an equalizer for listening room compensation and an acoustic echo canceller is analyzed. Inverse filtering of room impulse responses (RIRs) is a challenging task since they are, in general, mixed phase systems having hundreds of zeros inside and outside near the unit circle in the z-domain. Furthermore, a reliable estimate of the RIR which shall be inverted is important. Since RIRs are time-variant due to possible changes of the acoustic environment, they have to be identified adaptively. If an AEC (or any other adaptive method) is used to identify the time variant room impulse responses the estimate's distance to the real RIRs may be too high for a satisfying equalization, especially in periods of initial convergence of the AEC or after RIR changes. Therefore, we propose to estimate the convergence state of the AEC and to incorporate this knowledge into the equalizer design.

>> Go up

Alessandro Bastari, Stefano Squartini and Francesco Piazza "Joint Acoustic Feedback Cancellation and Noise Reduction within the Prediction Error Method Framework"
The principal aim of this work is to study the problem of Acoustic Feedback Cancellation (AFC) in the presence of superimposed background noise and to propose an innovative architecture for the joint acoustic feedback path estimation and noise reduction (NR), whose effectiveness is proved by experimental results. The proposed architecture is an evolution of a scheme already known as PEM-AFROW, which is based on a particular implementation of the Prediction Error Method for an unbiased estimation of the aocustich echo path. The objective of the contribution is to properly extend the applicability of such a technique to the more general case of noise presence, whose effects should be likely minimized to improve speech rendering quality. On purpose, a noise reduction stage based on the MMSE-LSA estimation has been implemented within the closedloop system, keeping the on-line processing characteristics of the framework.

>> Go up

Francesco Nesta, Maurizio Omologo and Piergiorgio Svaizer "Separating Short Signals in Highly Reverberant Environment by a Recursive Frequency-domain BSS"
A new approach to the permutation problem for Blind Source Separation (BSS) in the frequency domain is presented. The independence of the separation across the frequencies, and thus the probability that a permutation may occur, is minimized by a recursive linking of the ICA stage. A recursive adaptive estimation of smooth demixing matrices is used to initialize the Independent Component Analysis (ICA) in order to force it to converge with a coherent permutation across the whole spectrum. Since no information about non stationarity of the signals is exploited, the proposed method works also for short utterances (0.5-1s) and in highly reverberant environments (T60=700ms). Furthermore it is shown that the recursive initialization increases the accuracy of the ICA when a small amount of data observation is available.

>> Go up

Bernd Iser and Gerhard Schmidt "Receive Side Processing for Automotive Hands-Free Systems"
In the sending path of automotive hands-free systems several subunits - such as acoustic echo cancellation (AEC) and noise reduction (NR) - improve the quality of the outgoing signal. These units are usually realized in the frequency or subband domain in order to reduce the computational complexity. In the receiving path, however, only a few signal processing stages - such as bandwidth extension (BWE) [1] or gain adjustment - are realized in recent systems [2, 3]. These units are implemented in most cases in the time domain, since two analysis-synthesis schemes (one in the sending and one in the receiving path) would introduce more delay than allowed by ITU- or VDA-recommendations [4]. According to the best knowledge of the authors linking of conventional processing schemes in the sending path (AEC andNR) with those of the receiving path has not yet been addressed in research on hands-free systems. For the car environment some ampli- fier manufacturers perform a volume control in dependence of the driving speed of the car. Some have even the possibility of placing a microphone in the cabin for measuring the noise level within the car [2, 5]. But this does not apply to hands-free telephony. The estimated power spectral density (PSD) of the background noise (already estimated within the NR unit) can be used to adjust the BWE unit. Since in high noise conditions, artifacts introduced by a BWE scheme are less audible a stronger extension can be used compared to stand-still operation. Taking also the estimated echo spectrum into account (beside the noise PSD) an estimate for the SNR within the car cabin can be obtained. Using this estimate one could perform an automatic gain control of the receive signal for retaining a particular SNR within the car while the noise or the speaking level of the remote partner is changing. This can also be done in a frequency specific manner, resulting in a frequency selective adaptive equalization. No further microphone has to be placed in the cabin and the volume can be controlled independent of the amplifier using the resources (AEC, NR) already available.

>> Go up

Hamid Sepehr, Masoud Ahmadi and Paul Brennan "Removal of Siren Mixed With Speech By Wavelet Transform and Adaptive Notch Filter"
Siren signals are used to warn other cars and pedestrians about the presence of an emergency vehicle. This signal is leaked into the communication path between the emergency vehicle and the emergency call centre and substantially degrades the intelligibility of the speech signal. In the past, research has been mainly focused on siren cancellation on the source of siren signal by using an extra microphone which requires consideration of acoustical characteristics of the emergency vehicle. The solution and approach in this paper is for attenuation of the siren in the emergency call centre without having any knowledge about the emergency vehicle, type of siren and the presence of a siren.

>> Go up

Roland Potthast, Filippo Fazi, Phil Nelson and Jeongil Seo "Two sound field reconstruction techniques based on integral equations"
We describe two methods for sound field reconstruction which can be used for splitting the sound field coming from two different sources in space. The methods provide a constructive scheme to separate for example two voices recorded simultaneously by a synchronized array of microphones for which the location of the speakers in space is approximately known. The first method is based on the potential approach of Kirsch-Kress and has been first suggested by Ben-Hassen and Potthast [3]. The second method is based on Potthast's Point Source Method [2], where we propose a new version which can be used for source splitting. For both methods we describe the basic derivation steps and discuss convergence, numerical realization and stabilization aspects.

>> Go up

Mahdi Triki and Dirk T.M. Slock "Robust Delay-&-Predict Equalization for Blind SIMO Channel Dereverberation"
We consider the blind multichannel dereverberation problem for a single source. We have shown before [5] that the single-input multioutput (SIMO) reverberation filter can be equalized blindly by applying multivariate Linear Prediction (LP) to its output (after SISO input pre-whitening). In this paper, we investigate the LP-based dereverberation in a noisy environment, and/or under acoustic channel length underestimation. Considering ambient noise and late reverberation as additive noises, we propose to introduce a postfilter that transforms the multivariate prediction filter into a somewhat longer equalizer. The postfilter allows to equalize to non-zero delay. Both MMSE-ZF and MMSE design criteria are considered here for the postfilter. Simulations show that the proposed scheme is robust in noisy environments and channel length underestimation, and performs better compared to the classic Delay-&-Predict equalizer and the Delay-&-Sum beamformer.

>> Go up