George A. Saon  George A. Saon photo       

contact information

Large vocabulary continuous speech recognition
Thomas J. Watson Research Center, Yorktown Heights, NY USA
  +1dash914dash945dash2985

links

Professional Associations

Professional Associations:  IEEE


2017

Knowledge distillation across ensembles of multilingual models for low-resource languages
Cui, Jia and Kingsbury, Brian and Ramabhadran, Bhuvana and Saon, George and Sercu, Tom and Audhkhasi, Kartik and Sethy, Abhinav and Nussbaum-Thom, Markus and Rosenberg, Andrew
Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on, pp. 4825--4829
Abstract

English Conversational Telephone Speech Recognition by Humans and Machines
Saon, George and Kurata, Gakuto and Sercu, Tom and Audhkhasi, Kartik and Thomas, Samuel and Dimitriadis, Dimitrios and Cui, Xiaodong and Ramabhadran, Bhuvana and Picheny, Michael and Lim, Lynn-Li and others
arXiv preprint arXiv:1703.02136, 2017
Abstract

Network architectures for multilingual speech representation learning
Sercu, Tom and Saon, George and Cui, Jia and Cui, Xiaodong and Ramabhadran, Bhuvana and Kingsbury, Brian and Sethy, Abhinav
Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on, pp. 5295--5299
Abstract

Direct Acoustics-to-Word Models for English Conversational Speech Recognition
Audhkhasi, Kartik and Ramabhadran, Bhuvana and Saon, George and Picheny, Michael and Nahamoo, David
arXiv preprint arXiv:1703.07754, 2017
Abstract


2016

Domain Adaptation of CNN Based Acoustic Models Under Limited Resource Settings.
Suzuki, Masayuki and Tachibana, Ryuki and Thomas, Samuel and Ramabhadran, Bhuvana and Saon, George
INTERSPEECH, pp. 1588--1592, 2016
Abstract

The IBM 2016 English Conversational Telephone Speech Recognition System
George Saon, Tom Sercu, Steven Rennie, Hong-Kwang J Kuo
arXiv preprint arXiv:1604.08242, 2016

On the importance of event detection for ASR
David Haws, Dimitrios Dimitriadis, George Saon, Samuel Thomas, Michael Picheny
2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5705--5709


2015

The IBM BOLT speech transcription system.
Thomas, Samuel and Saon, George and Kuo, Hong-Kwang Jeff and Mangu, Lidia
INTERSPEECH, pp. 3150--3153, 2015
Abstract

A nonmonotone learning rate strategy for sgd training of deep neural networks
Nitish Shirish Keskar, George Saon
2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4974--4978

Order-free spoken term detection
Lidia Mangu, George Saon, Michael Picheny, Brian Kingsbury
2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5331--5335

Improvements to the IBM speech activity detection system for the DARPA RATS program
Samuel Thomas, George Saon, Maarten Van Segbroeck, Shrikanth S Narayanan
2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4500--4504

Deep convolutional neural networks for large-scale speech tasks
Tara N Sainath, Brian Kingsbury, George Saon, Hagen Soltau, Abdel-rahman Mohamed, George Dahl, Bhuvana Ramabhadran
Neural Networks64, 39--48, Pergamon, 2015

The ibm 2015 english conversational telephone speech recognition system
George Saon, Hong-Kwang J Kuo, Steven Rennie, Michael Picheny
arXiv preprint arXiv:1505.05899, 2015


2014

A distributed architecture for fast SGD sequence discriminative training of DNN acoustic models
Saon, George
Spoken Language Technology Workshop (SLT), 2014 IEEE, pp. 183--188
Abstract

Joint training of convolutional and non-convolutional neural networks.
Soltau, Hagen and Saon, George and Sainath, Tara N
ICASSP, pp. 5572--5576, 2014
Abstract

Automatic Speech Recognition
Hagen Soltau, George Saon, Lidia Mangu, Hong-Kwang Kuo, Brian Kingsbury, Stephen Chu, Fadi Biadsy
Natural Language Processing of Semitic Languages, pp. 409--459, Springer Berlin Heidelberg, 2014

Unfolded Recurrent Neural Networks for Speech Recognition
George Saon, Hagen Soltau, Ahmad Emami, Michael Picheny
Fifteenth Annual Conference of the International Speech Communication Association, 2014

Parallel Deep Neural Network Training for LVCSR Tasks using Blue Gene/Q
Tara N Sainath, I-hsin Chung, Bhuvana Ramabhadran, Michael Picheny, John Gunnels, Brian Kingsbury, George Saon, Vernon Austel, Upendra Chaudhari
Fifteenth Annual Conference of the International Speech Communication Association, 2014

Analyzing convolutional neural networks for speech activity detection in mismatched acoustic conditions
Samuel Thomas, Sriram Ganapathy, George Saon, Hagen Soltau
2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2519--2523

A comparison of two optimization techniques for sequence discriminative training of deep neural networks
George Saon, Hagen Soltau
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, pp. 5567--5571

Improvements to filterbank and delta learning within a deep neural network framework
Tara N Sainath, Brian Kingsbury, Abdel-rahman Mohamed, George Saon, Bhuvana Ramabhadran
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, pp. 6839--6843


2013

The IBM speech activity detection system for the DARPA RATS program.
George Saon, Samuel Thomas, Hagen Soltau, Sriram Ganapathy, Brian Kingsbury
INTERSPEECH, pp. 3497--3501, 2013

Speaker adaptation of neural network acoustic models using i-vectors.
Saon, George and Soltau, Hagen and Nahamoo, David and Picheny, Michael
ASRU, pp. 55--59, 2013
Abstract

The IBM keyword search system for the DARPA RATS program
Lidia Mangu, Hagen Soltau, Hong-Kwang Kuo, George Saon
Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on, pp. 204-209

Improvements to deep convolutional neural networks for LVCSR
Tara Sainath, Brian Kingsbury, Abdel-rahman Mohamed, George Dahl, George Saon, Hagen Soltau, Tomas Beran, Aleksandr Aravkin, Bhuvana Ramabhadran
Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on, pp. 315-320

Exploiting diversity for spoken term detection
Lidia Mangu, Hagen Soltau, Hong-Kwang Kuo, Brian Kingsbury, George Saon
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pp. 8282-8286

The IBM speech activity detection system for the Darpa RATS program
George Saon, Samuel Thomas, Hagen Soltau, Sriram Ganapathy, Brian Kingsbury
INTERSPEECH, 2013


2012

Sparse Bayesian Factor Analysis for Stereo-based Stochastic Mapping
Xiaodong Cui, Mohamed Afify, George Saon, Viabhava Goel
INTERSPEECH, 2012

Discriminative feature-space transforms using deep neural networks
George Saon, Brian Kingsbury
INTERSPEECH, 2012

Boosting systems for large vocabulary continuous speech recognition
George Saon, Hagen Soltau
Speech communication 54(2), 212-218, Elsevier, 2012

Bayesian Sensing Hidden Markov Models
George Saon, Jen-Tzung Chien
Audio, Speech, and Language Processing, IEEE Transactions on 20(1), 43-54, 2012

Large-vocabulary continuous speech recognition systems: A look at some recent advances
George Saon, Jen-Tzung Chien
Signal Processing Magazine, IEEE 29(6), 18-33, 2012


2011

Some properties of Bayesian sensing hidden Markov models
George Saon, Jen-Tzung Chien
Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on, pp. 65-70

The IBM 2009 GALE Arabic speech transcription system
Brian Kingsbury, Hagen Soltau, George Saon, Stephen Chu, Hong-Kwang Kuo, Lidia Mangu, Suman Ravuri, Nelson Morgan, Adam Janin
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pp. 4672--4675




2010

The IBM Attila speech recognition toolkit
Hagen Soltau, George Saon, Brian Kingsbury
Spoken Language Technology Workshop (SLT), 2010 IEEE, pp. 97--102

The IBM 2008 GALE Arabic speech transcription system
George Saon, Hagen Soltau, Upendra Chaudhari, Stephen Chu, Brian Kingsbury, Hong-Kwang Kuo, Lidia Mangu, Daniel Povey
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pp. 4378--4381

Boosting Systems for LVCSR
G Saon, H Soltau
Eleventh Annual Conference of the International Speech Communication Association, 2010


2009

Dynamic network decoding revisited
H Soltau, G Saon
Automatic Speech Recognition \& Understanding, 2009, pp. 276--281


Large margin semi-tied covariance transforms for discriminative training
G Saon, D Povey, H Soltau
Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing-Volume 00, pp. 3753--3756


2008


Boosted MMI for model and feature-space discriminative training
Daniel Povey, Dimitri Kanevsky, Brian Kingsbury, Bhuvana Ramabhadran, George Saon, Karthik Visweswariah
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on, pp. 4057--4060


2007

Lattice-based unsupervised maximum likelihood linear regression for speaker adaptation
M Padmanabhan, G A Saon, G G Zweig
US Patent 7,216,077, 2007 - Google Patents, Google Patents
US Patent 7,216,077

Lattice-based viterbi decoding techniques for speech translation
G. Saon, M. Picheny
Automatic Speech Recognition \& Understanding, 2007. ASRU. IEEE Workshop on, pp. 386--389

The IBM 2006 Gale Arabic ASR system
Hagen Soltau, George Saon, Brian Kingsbury, Jeff Kuo, Lidia Mangu, Daniel Povey, Geoffrey Zweig
Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on, pp. IV--349


2006

On the effect of word error rate on automated quality monitoring
G Saon, B Ramabhadran, G Zweig
Proceedings of Spoken Language Technology Workshop, pp. 106--109, 2006

Automatic analysis of call-center conversations
G Zweig, O Shiohan, G Saon, B Ramabhadran, D Povey …
Proceedings of IEEE Internatinal Conference of Acoustics, …, 2006

Feature and model space speaker adaptation with full covariance Gaussians
D Povey, G Saon
Ninth International Conference on Spoken Language Processing, 2006 - ISCA

A non-linear speaker adaptation technique using kernel ridge regression
G Saon, IBMTJWR Center, Y Heights
2006 IEEE International Conference on Acoustics, Speech and …, 2006 - ieeexplore.ieee.org

Automated quality monitoring for call centers using speech and NLP technologies
Geoffrey Zweig, Olivier Siohan, George Saon, Bhuvana Ramabhadran, Daniel Povey, Lidia Mangu, Brian Kingsbury
Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume: demonstrations, pp. 292--295, Association for Computational Linguistics

Automated quality monitoring in the call center with asr and maximum entropy
Geoffrey Zweig, Olivier Siohan, George Saon, Bhuvana Ramabhadran, Daniel Povey, Lidia Mangu, Brian Kingsbury
Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on, pp. I--I

Advances in speech transcription at IBM under the DARPA EARS program
Stanley F Chen, Brian Kingsbury, Lidia Mangu, Daniel Povey, George Saon, Hagen Soltau, Geoffrey Zweig
Audio, Speech, and Language Processing, IEEE Transactions on 14(5), 1596--1608, IEEE, 2006


2005

Anatomy of an extremely fast LVCSR decoder
G Saon, D Povey, G Zweig
Ninth European Conference on Speech Communication and …, 2005 - ISCA

fMPE: Discriminatively trained features for speech recognition
Daniel Povey, Brian Kingsbury, Lidia Mangu, George Saon, Hagen Soltau, Geoffrey Zweig
ICASSP - IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 961--964, Philadelphia, 2005

The IBM 2004 conversational telephony system for rich transcription
Hagen Soltau, Brian Kingsbury, Lidia Mangu, Daniel Povey, George Saon, Geoffrey Zweig
ICASSP - IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 205--208, 2005


2004


Acoustic modeling with full-covariance Gaussians
G Saon, B Kingsbury, L Mangu, D Povey, H Soltau, G …
EARS STT Workshop, 2004

Training a 2300-hour fisher system
Brian Kingsbury, Stan Chen, Lidia Mangu, Dan Povey, George Saon, Hagen Soltau, Geoffrey Zweig
EARS STT Workshop, 2004

Fractional Fourier transform features for speech recognition
R Sahkaya, Y Gao, G Saon, IBMTJWR Center, NY …
IEEE International Conference on Acoustics, Speech, and …, 2004 - ieeexplore.ieee.org

Feature space gaussianization
G Saon, S Dharanipragada, D Povey, IBMTJWR Center, …
IEEE International Conference on Acoustics, Speech, and …, 2004 - ieeexplore.ieee.org

The bicoastal IBM/SRI CTS STT system
B Kingsbury, L Mangu, D Povey, G Saon, H Soltau, G Zweig, A Stolcke, R Gadde, W Wang, J Zheng, others
Rich Transcription (RT-04F) Workshop, 2004


2003


EARS Progress Update: Improved MPE, Inline Lattice Rescoring, Fast decoding, Gaussianization and …
D Povey, G Saon, L Mangu, B Kingsbury, G Zweig
EARS STT Workshop, St. Thomas, US Virgin Islands, 2003

CTS decoding improvements at IBM
George A Saon, D Povey, G Zweig
EARS STT workshop, 2003

The IBM 2003 1xRT speech-to-text system
G Saon, G Zweig, B Kingsbury, L Mangu
Proc. Spring 2003 Rich Transcription Workshop (RT-03s,null), 2003

An architecture for rapid decoding of large vocabulary conversational speech
G Saon, G Zweig, B Kingsbury, L Mangu, U Chaudhari
Eighth European Conference on Speech Communication and Technology, 2003

Toward domain-independent conversational speech recognition.
Brian Kingsbury, Lidia Mangu, George Saon, Geoffrey Zweig, Scott Axelrod, Vaibhava Goel, Karthik Visweswariah, Michael Picheny
INTERSPEECH, 2003


2002




Improvements to the IBM Aurora 2 multi-condition system
G Saon, J M Huerta
Seventh International Conference on Spoken Language Processing, 2002

Arc minimization in finite state decoding graphs with cross-word acoustic context
G Zweig, G Saon, F Yvon
Seventh International Conference on Spoken Language …, 2002 - ISCA

Digit recognition in noisy environments via a sequential GMM/SVM system
S Fine, G Saon, R Gopinath
IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2002

Automatic speech recognition performance on a voicemail transcription task
Mukund Padmanabhan, George Saon, Jing Huang, Brian Kingsbury, Lidia Mangu
Speech and Audio Processing, IEEE Transactions on 10(7), 433--442, IEEE, 2002

Robust speech recognition in noisy environments: The IBM SPINE-2 evaluation system
B Kingsbury, G Saon, L Mangu, M Padmanabhan, R Sarikaya
Proc, 2002

Improvements to the IBM Hub-5E system
J Huang, B Kingsbury, L Mangu, G Saon, R Sarikaya, G Zweig
NIST RT-02 Workshop, 2002

Robust speech recognition in noisy environments: The 2001 IBM SPINE evaluation system
Brian Kingsbury, George Saon, Lidia Mangu, Mukund Padmanabhan, Ruhi Sarikaya
Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on, pp. I--53


2001

Linear feature space transformations for speaker adaptation
George A Saon, G Zweig, M Padmanabhan
Proc. IEEE ICASSP, 2001

The awe and mystery of FMLLR."
G Saon
Seminar presentation, IBM Human Language Technologies, …, 2001


Minimum Bayes error feature selection for continuous speech recognition
G Saon, M Padmanabhan
Advances in Neural Information Processing Systems, 2001 - reference.kfupm.edu.sa

Robust digit recognition in noisy environments: the IBM Aurora 2 system
G Saon, J M Huerta, E E Jan
Seventh European Conference on Speech Communication and Technology, 2001

Linear feature space projections for speaker adaptation
G Saon, G Zweig, M Padmanabhan, IBMTJWR Center, Y …
2001 IEEE International Conference on Acoustics, Speech, and …, 2001 - ieeexplore.ieee.org

Data-driven approach to designing compound words for continuousspeech recognition
G Saon, M Padmanabhan, IBMTJWR Center, Y Heights
IEEE Transactions on Speech and Audio Processing, 2001 - ieeexplore.ieee.org

Evolution of the performance of automatic speech recognition algorithms in transcribing conversational telephone speech
M Padmanabhan, G Saon, G Zweig, J Huang, B Kingsbury, L Mangu
Instrumentation and Measurement Technology Conference, 2001. IMTC 2001. Proceedings of the 18th IEEE, pp. 1926--1931

Speech recognition for DARPA communicator
Andrew Aaron, S Chen, P Cohen, Satya Dharanipragada, Ellen Eide, Martin Franz, J-M Leroux, X Luo, Beno\^\it Maison, Lidia Mangu, others
Acoustics, Speech, and Signal Processing, 2001. Proceedings.(ICASSP'01). 2001 IEEE International Conference on, pp. 489--492


2000

Maximum likelihood discriminant feature spaces
G Saon, M Padmanabhan, R Gopinath, SCMLD Feature
Proceedings of ICASSP, 2000

Maximum likelihood discriminant feature spaces. to appear in
G Saon, M Padmanabhan, R Gopinath, S Chen
Proceedings of ICASSP2000

Minimum Bayes error feature selection
G Saon, M Padmanabhan
Sixth International Conference on Spoken Language Processing, 2000 - ISCA

Real-time multilingual HMM training robust to channel variations
EE Jan, JB Ordinas, G Saon, S Roukos
Sixth International Conference on Spoken Language Processing, 2000 - ISCA

Lattice-based unsupervised MLLR for speaker adaptation
M Padmanabhan, G Saon, G Zweig
ASR2000-Automatic Speech Recognition: Challenges for the new …, 2000 - ISCA


Performance Improvements in Voicemail Transcription
J Huang, B Kingsbury, L Mangu, M Padmanabhan, G Saon, G Zweig
Proceedings of DARPA Speech Transcription Workshop, Citeseer, 2000

Recent improvements in speech recognition performance on large vocabulary conversational speech (voicemail and switchboard)
J Huang, B Kingsbury, L Mangu, M Padmanabhan, G Saon, G Zweig
Sixth International Conference on Spoken Language Processing, 2000


1999

Cursive word recognition using a random field based hidden Markov model. Int
G Saon
Journal of Pattern Recognition and Artificial Intelligence, 1999

Recent Improvements in voicemail transcription
G Zweig, G Saon, M Padmanabhan, J Huang, S Basu
Broadcast News Workshop'99 Proceedings, 1999 - books.google.com


Cursive word recognition using a random field based hidden Markov model
G Saon
International Journal on Document Analysis and Recognition, 1999 - Springer

Recent improvements in voicemail transcription
M Padmanabhan, G Saon, S Basu, J Huang, G Zweig
Sixth European Conference on Speech Communication and …, 1999 - ISCA


1997


Binary pattern recognition using Markov random fields and HMMs
GA Saon, A Belaid
1997 IEEE International Conference on Acoustics, Speech, and …, 1997 - ieeexplore.ieee.org

Utilisation des processus markoviens en reconnaissance de l'criture
A Belaid, G Saon
Traitement du signal, 1997 - loria.fr


High performance unconstrained word recognition system combining hmms and markov random fields
G Saon, A Belaid
Automatic Bankcheck Processing, 1997 - books.google.com

Off-line handwritten word recognition using a mixed HMM-MRFapproach
G Saon, A Belaid
Document Analysis and Recognition, 1997., Proceedings of the …, 1997 - ieeexplore.ieee.org


1996

An efficient algorithm for parallel integer multiplication
B Singer, G Saon
Journal of Network and Computer Applications, 1996 - Elsevier

Recognition of unconstrained handwritten words using Markov random fields and HMMs
George A Saon, A Belaid
Fifth International Workshop on Frontiers in Handwriting …, 1996


1995

Stochastic trajectory modeling for recognition of unconstrainedhandwritten words
G Saon, A Belaid, Y Gong
Document Analysis and Recognition, 1995., Proceedings of the …, 1995 - ieeexplore.ieee.org


1994


Use of stochastic models in text recognition
A Belaid, G Saon
??????? ???, 1994 - dbpia.co.kr


Year Unknown

Bayesian Sensing Hidden Markov Models
G Saon, J T Chien
Audio, Speech, and Language Processing, IEEE Transactions on pp. 99, 1--1, IEEE, 0

Robust speech recognition in noisy environments: the IBM 2001 SPINE evaluation system
B Kingsbury, G Saon, L Mangu, M Padmanabhan, R …
Proc. ICASSP, 0