George Saon

Overview

Title

Speech strategy lead, distinguished research scientist

Location

IBM Research - Yorktown Heights Yorktown Heights, NY USA

Bio

George Saon received his M.Sc. and PhD degrees in Computer Science from Henri Poincare University in Nancy, France in 1994 and 1997. In 1995, Dr. Saon obtained his engineer diploma from the Polytechnic University of Bucharest, Romania. From 1994 to 1998, he worked on two-dimensional stochastic models for off-line handwriting recognition at the Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA). Since 1998, Dr. Saon is with the IBM T.J. Watson Research Center where he worked on a variety of problems spanning several areas of large vocabulary continuous speech recognition such as discriminative feature processing, acoustic modeling, speaker adaptation and large vocabulary decoding algorithms. Some of the techniques that he co-invented are well known to the speech community like heteroscedastic discriminant analysis (HDA), lattice-MLLR, fast FSM-based Viterbi decoding, i-vector speaker adaptation for DNNs, joint CNN/DNN training etc. Since 2001, Dr. Saon has been a key member of IBM's speech recognition team which participated in several U.S. government-sponsored evaluations for the EARS, SPINE, GALE, RATS and BOLT programs. He has published over 150 conference and journal papers and holds several patents in the field of ASR. He is the recipient of three best paper awards (EARS RT'04, INTERSPEECH 2010, ASRU 2011) and has served as an elected member of the IEEE Speech and Language Technical Committee.

Publications

Semi-Autoregressive Streaming ASR With Label Context
- - Siddanth Arora
  - George Saon
  - et al.
- 2024
- ICASSP 2024
MULTIPLE REPRESENTATION TRANSFER FROM LARGE LANGUAGE MODELS TO END-TO-END ASR SYSTEMS
- - Takuma Udagawa
  - Masayuki Suzuki
  - et al.
- 2024
- ICASSP 2024
Diagonal State Space Augmented Transformers for Speech Recognition
- - George Saon
  - Ankit Gupta
  - et al.
- 2023
- ICASSP 2023
Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems
- - Samuel Thomas
  - Jeff Kuo
  - et al.
- 2022
- ICASSP 2022
Improving End-to-End Models for Set Prediction in Spoken Language Understanding
- - Jeff Kuo
  - Zoltan Tuske
  - et al.
- 2022
- ICASSP 2022
Towards efficient end-to-end speech recognition with biologically-inspired neural networks
- - Thomas Bohnstingl
  - Ayush Garg
  - et al.
- 2021
- NeurIPS 2021

Visit Google Scholar

Patents

- 25 Mar 2024
- US
- 11942078
Chunking And Overlap Decoding Strategy For Streaming Rnn Transducers For Speech Recognition
- 19 Feb 2024
- US
- 11908458
Customization Of Recurrent Neural Network Transducers For Speech Recognition
- 19 Feb 2024
- US
- 11908454
Integrating Text Inputs For Training And Adapting Neural Network Transducer Asr Models
- 10 Jan 2024
- TW
- I829312
Integrating Text Inputs For Training And Adapting Neural Network Transducer Asr Models
- 09 Oct 2023
- US
- 11783811
Accuracy Of Streaming Rnn Transducer
- 05 Sep 2023
- GB
- 2602227
Fast - Soft-forgetting For Connectionist Temporal Classification Based Automatic Speech Recognition
- 28 Aug 2023
- US
- 11741946
Multiplicative Integration In Neural Network Transducer Models For End-to-end Speech Recognition
- 25 Oct 2021
- US
- 11158303
Soft-forgetting For Connectionist Temporal Classification Based Automatic Speech Recognition
- 18 Oct 2021
- US
- 11151996
Vocal Recognition Using Generally Available Speech-to-text Systems And User-defined Vocal Training
- 13 Sep 2021
- US
- 11120802
Diarization Driven By The Asr Based Segmentation

Top collaborators

George Saon

Overview

Title

Location

Bio

Publications

Semi-Autoregressive Streaming ASR With Label Context

MULTIPLE REPRESENTATION TRANSFER FROM LARGE LANGUAGE MODELS TO END-TO-END ASR SYSTEMS

Diagonal State Space Augmented Transformers for Speech Recognition

Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems

Improving End-to-End Models for Set Prediction in Spoken Language Understanding

Towards efficient end-to-end speech recognition with biologically-inspired neural networks

Patents

Chunking And Overlap Decoding Strategy For Streaming Rnn Transducers For Speech Recognition

Customization Of Recurrent Neural Network Transducers For Speech Recognition

Integrating Text Inputs For Training And Adapting Neural Network Transducer Asr Models

Integrating Text Inputs For Training And Adapting Neural Network Transducer Asr Models

Accuracy Of Streaming Rnn Transducer

Fast - Soft-forgetting For Connectionist Temporal Classification Based Automatic Speech Recognition

Multiplicative Integration In Neural Network Transducer Models For End-to-end Speech Recognition

Soft-forgetting For Connectionist Temporal Classification Based Automatic Speech Recognition

Vocal Recognition Using Generally Available Speech-to-text Systems And User-defined Vocal Training

Diarization Driven By The Asr Based Segmentation

Top collaborators

Brian Kingsbury

Samuel Thomas

Xiaodong Cui

Stanislaw Wozniak