May 6 - Special session n. 2
Unsupervised Microphone Arrays for Hands-free Speech Communication
(chaired by Shoji Makino)

1) Herbert Buchner and Walter Kellermann
"A Fundamental Relation between Blind and Supervised Adaptive Filtering Illustrated for Blind Source separation and Acoustic Echo Cancellation."
2) Patrick A. Naylor, Xiang (Shawn) Lin, and Andy W. H. Khong
"Near-common Zeros in Blind Identification of SIMO Acoustic Systems."
3) Maria G. Jafari, Mark D. Plumbley, Mike E. Davies
"Speech Separation using an Adaptive Sparse Dictionary Algorithm."
4) RShoko Araki, Masakiyo Fujimoto, Kentaro Ishizuka, Hiroshi Sawada, and Shoji Makino
"A DOA Based Speaker Diarization System for Real Meetings."


Special session n. 2 - abstracts


Herbert Buchner and Walter Kellermann "A Fundamental Relation between Blind and Supervised Adaptive Filtering Illustrated for Blind Source separation and Acoustic Echo Cancellation."
In recent years broadband signal aquisition by sensor arrays, e.g., for speech and audio signals in a hands-free scenario, has become a popular research field in order to separate certain desired source signals from competing or interfering source signals ((blind) source separation or interference cancellation) and to possibly dereverberate them (blind deconvolution). In various practical scenarios, some or even all interfering source signals may be directly accessible and/or some side information on the propagation path is known. In these cases we can tackle the separation problem by supervised adaptation algorithms, e.g., the popular LMS- or RLS-type algorithms, rather than the more involved blind adaptation algorithms. In contrast, for blind estimation, such as in the blind source separation (BSS) scenario where both the propagation paths and the original source signals are unknown, the method of independent component analysis (ICA) is typically applied. Traditionally, the ICA method and supervised adaptation algorithms have been treated as different research areas. In this paper, we establish a conceptually simple, yet fundamental relation between these two worlds. This is made possible using the previously introduced generic broadband adaptive filtering framework, called TRINICON. As we will demonstrate, not only both the well-known blind and supervised adaptive filtering algorithms turn out as special cases of this generic framework, but we also gain various new insights and synergy effects for the development of new and improved adaptation algorithms.

>> Go up

Patrick A. Naylor, Xiang (Shawn) Lin, and Andy W. H. Khong "Near-common Zeros in Blind Identification of SIMO Acoustic Systems."
The common zeros problem for Blind System Identification (BSI) has been well known to degrade the performance of classic BSI algorithms and therefore limits performance of subsequent speech dereverberation. Recently, we have shown that multichannel systems cannot be well identified if near-common zeros are present. In this work, we further study the near-common zeros problem using channel diversity measure. We then investigate the use of forced spectral diversity (FSD) based on a combination of spectral shaping filters and effective channel undermodelling. Simulation results show the effectiveness of the proposed approach.

>> Go up

Maria G. Jafari, Mark D. Plumbley, Mike E. Davies "Speech Separation using an Adaptive Sparse Dictionary Algorithm."
We present a greedy adaptive algorithm that builds a sparse orthogonal dictionary from the observed data. In this paper, the algorithm is used to separate stereo speech signals, and the phase information that is inherent to the extracted atom pairs is used for clustering and identification of the original sources. The performance of the algorithm is compared to that of the adaptive stereo basis algorithm, when the sources are mixed in echoic and anechoic environments. We find that the algorithm correctly separates the sources, and can do this even with a relatively small number of atoms.

>> Go up

Shoko Araki, Masakiyo Fujimoto, Kentaro Ishizuka, Hiroshi Sawada, and Shoji Makino "A DOA Based Speaker Diarization System for Real Meetings."
This paper presents a speaker diarization system that estimates who spoke when in a meeting. Our proposed system is realized by using a noise robust voice activity detector (VAD), a direction of arrival (DOA) estimator, and a DOA classifier. Our previous system utilized the generalized cross correlation method with the phase transform (GCC-PHAT) approach for the DOA estimation. Because the GCC-PHAT can estimate just one DOA per frame, it was difficult to handle speaker overlaps. This paper tries to deal with this issue by employing a DOA at each time-frequency slot (TFDOA), and reports how it improves diarization performance for real meetings / conversations recorded in a room with a reverberation time of 350 ms.

>> Go up