Austin or Boston? Making artificial speech more expressive, natural, and controllable

Speech Technologies

Overview

Speech research is an important research field for IBM Research AI. The speech technology group at IBM Research - Tokyo has a long history of closely collaborating with the IBM global speech research teams located at the IBM T.J. Watson Research Center and IBM Research - Haifa. Our missions include (1) core research to further sophisticate speech technology and to present research findings in domestic/international academic conferences, (2) practical research to deliver the new research technologies to IBM Watson, and (3) creation of new speech solutions for real business scenarios and use-cases.

Improving speech recognition accuracy on casual conversation is a big challenge. Our team has strong expertise in deep learning for acoustic modeling and language modeling to further advance the state-of-the-art speech recognition systems. We have contributed to achieving the world-record achievement for the standard benchmark data set (Switchboard) and made a meaningful impact both for academic fields and IBM businesses.
IBM Watson Speech Services (Speech-to-Text and Text-to-Speech) are cloud-based speech recognition and synthesis and is now available in various languages including English, Japanese, Spanish, and so on. We are proud of enabling these services. We are inventing novel algorithms in acoustic and language modeling that can be directly applied to IBM Watson.
In addition to core and practical research, deploying speech recognition into new real-world applications is also an important technical challenge. We have various experiences in call-center solutions that tightly couple speech recognition and natural language processing. Recently, we deployed a real-time call-center agent support system that combines Watson Speech-to-Text and natural language processing technologies for large Japanese clients (see videos below).

To perform these missions, we are intensively doing research on deep learning for acoustic modeling, language modeling, and front-end processing, and on integration of speech technology and natural language processing. Please refer to the publication page for our published works.