Multi-talker Speech Separation and Recognition       

links

 Peder A. Olsen photo

Multi-talker Speech Separation and Recognition - overview


Understanding speech in the presence of multiple talkers is one of the most challenging problems in automatic speech recognition. In this demo our system separates and recognizes the speech of mixtures of up to four speakers recorded in a single channel: can you separate and recognize speech as well as our machine? Try it with and without looking at the transcripts.



Speech Separation Demos




3 speaker mixture:


mixture of 3 speakers

separated speech:

speaker 1: BIN WHITE AT S 6 NOW
speaker 2: LAY RED IN I 1 NOW
speaker 3: LAY GREEN IN J ZERO AGAIN




4 speaker mixture :


mixture of 4 speakers number 1

separated speech:

speaker 1: PLACE WHITE AT D ZERO SOON
speaker 2: PLACE RED IN H 3 NOW
speaker 3: LAY BLUE AT P ZERO NOW
speaker 4: PLACE GREEN WITH B 8 SOON




4 speaker mixture #2:


mixture of 4 speakers number 2

separated speech:


speaker 1: BIN WHITE WITH T 6 PLEASE
speaker 2: SET BLUE BY A 8 NOW
speaker 3: BIN RED WITH Z 4 SOON
speaker 4: PLACE GREEN WITH Y 4 NOW




Publications


Steven Rennie, John R. Hershey and Peder A. Olsen
"Single Channel Multi-talker Speech Recognition: Graphical Modeling Approaches,"
IEEE Signal Processing Magazine, Special issue on Graphical Models, November 2010.

John R. Hershey, Steven Rennie, Peder A. Olsen and Trausti Kristjansson
"Super-human multi-talker speech recognition: A graphical modeling approach,"
Computer, Speech and Language, 2010, Special issue: Speech Separation and Recognition.

Martin Cooke, John R. Hershey, Steven Rennie,
"Monaural Speech Separation Challenge,"
Computer, Speech and Language, 2010, Special issue: Speech Separation and Recognition.

Steven J. Rennie, John R. Hershey and Peder A. Olsen,
"Hierarchical Variational Loopy Belief Propagation for Multi-talker Speech Recognition,"
ASRU 2009, Merano, Italy.

Steven J. Rennie, John R. Hershey and Peder A. Olsen,
"Variational Loopy Belief Propagation for Efficient Multi-talker Speech Recognition,"
Interspeech 2009, p. 1331-1334, September 6-10, Brighton, UK.

Steven J. Rennie, John R. Hershey and Peder A. Olsen,
"Single-channel speech separation and recognition using loopy belief propagation,"
ICASSP 2009, p. 3845-3848, April 19-24, Taipei, Taiwan.