Cognitive Human-Computer Interaction - overview
Cognitive systems learn and interact naturally with humans to extend what either humans or machine could do on their own. Cognitive systems help human experts make better decision by penetrating the complexity of Big Data. We strive to develop a set of new technologies, combine them together to humanize the human–computer interface, including speech recognition, speech synthesis, question & answer, imaging understanding, etc.
With smart phones' population, many IT companies are investing on the research and development of speech technologies. We also saw that speech technology made enormous strides over the last five years with the help from deep learning technique. For example, it is now possible to use speech to input text into smartphones with very high accuracy. This is a critical usability feature given the tiny size of mobile's screen. However, it is not true that speech recognition is now a “solved problem”. Speech recognition accuracy on spontaneous speech — for example, speech in conversations and meetings — is still very low. Even state of the art technology from top laboratories can’t hit more than 50% of the words correct in such challenging environments, needless to say the other more challenging situations like mixed languages, heavy accent and strong ambient noise, etc.
IBM is a pioneer in speech recognition area, in the 1970s IBM scientists applied statistical modeling principles which learn the model parameters from voice training data and text corpus, revolutionized the field and setting the stage for today’s machine learning technology explosion. In the 1990s, IBM Research – China built a speech research team, the team had the privilege of working with giants in the speech field like Lalit Bahl, David Nahamoo, Michael Picheny, etc. In 1997, the team developed a mandarin speech dictation system named ViaVoice, which is the first speaker-independent and continuous speech dictation system in the world. Since then, the team has continued to produce a stream of groundbreaking research and also a set of other speech products/innovations, including telephony speech recognition (“Websphere Voice Server”), embedded applications (“eVV’ – embedded ViaVoice), voice morphing, query by humming, IBM transcription system (iTrans). This team has therefore got dozens of IBM Research Accomplishment Awards in the past 20 years. In 2009 IBM was awarded the IEEE Corporate Innovation Award for its long term contributions to the field. All team members consider themself very lucky to be able to work with such a great team.
IBM Watson is a system for reasoning over unstructured information. Initially, all this information came in as text and all interactions were typed or GUI-based. With speech capabilities like speech recognition and speech synthesis, we are building a conversational cognitive system now. In order to meet this goal, besides speech technologies, we are also devoted to the research and development of question & answer system using advanced artificial intelligence, machine learning and deep learning approaches. It is clear that speech system and question & answer system should be jointly optimized, simply concatenating two systems together will not achieve a reasonable performance. Based on our successful work in data-driven methodologies for speech recognition, we focus on advanced machine learning algorithms to understand questions and score answers precisely. To support domain-independent human-computer-interaction, advanced classification and knowledge graph technology are also in our radar screen.
Our bold idea is to develop cognitive human-computer-interaction system to allow computer to understand and engage human in a transparent and multimodal way. This cognitive human-computer-interaction system addresses needs expressed by individual developers, startups, and large enterprises. By running our innovation on IBM Cloud environment, we have been guided by the feedback and usage patterns of our 77,000+ active developers, and over 350 partners (with 100 in market today) across 17 industries.