Last February, IBM’s Watson computer faced off against two of the finest competitors in U.S. television’s popular game show Jeopardy, and came away with a resounding victory. Watson’s innovation was the capability to understand open-ended natural language questions, with all of their idiosyncrasies and complexities. Although Watson was cheating in a way (by having the questions input through text rather than through host Alex Trebek’s voice), the ability of computers to understand questions in natural language represents a big step forward in computer comprehension of natural language semantics.
Watson’s capability was so impressive that its creators have found a new role for its abilities in the healthcare industry. For Watson, though, this is an altogether different and more challenging task. Recognizing language at a meaning level is a victory for the IBM team, but being able to extract that meaning from an acoustic stream is a challenge that even Watson’s terabytes cannot currently overcome.
To tackle this obstacle, IBM has partnered with researchers at Nuance Communications, the makers of the popular Dragon Naturally Speaking software, who bring the acoustic muscle in voice recognition. Together they will work with the University of Maryland and Columbia University to bring Watson’s deep question-answering abilities to enhance patient care. Their goal? Allow Watson to search the enormous amount of information in journals, prior cases, and reference materials in order to come up with a direct, relevant, and accurate answer to the open question of the healthcare provider. The partnership between Watson’s question answering and Nuance’s clinical language understanding is expected to find significant application in electronic health records (EHRs), where it will take information from spoken speech, extract the relevant data, and input it into the structured organization of EHRs.
But the ability to answer open natural language questions will benefit more than just healthcare. Search engine companies have an enormous interest in being able to adequately answer questions of their searchers in their original form. Google has added voice recognition into its Chrome Internet browser as well as its Android smartphone operating system.
More than that, Google is recording the results and building a corpus of spoken speech and the interpretations of its software to improve it in the future. Between Chrome and the Android app, Google could soon possess the largest database of spoken language matched with its proper interpretation. By comparing these, Google will rapidly be able to improve its voice recognition, and perhaps, implement it in other places as well.
At the same time, Apple is rumored to be working on improving its own voice recognition software for its newest operating system, iOS 5, as well as for the iPhone. Apple’s first move into voice recognition came with its acquisition of Siri, whose first product was a voice recognition app for the iPhone. With Siri’s expertise, Apple is focusing on improving its voice recognition capabilities across platforms with an eye towards voice-activated searches and artificial intelligence applications.
The success of these voice recognition initiatives could have a big impact on the way users interact with information in general, eventually turning touch screens and keyboards into things of the past, in favor of spoken questions that return an answer from the vast stores of information. With advances being made on both meaning comprehension and spoken word recognition in computers, many interfaces could end up including significant spoken components.
The first major steps toward serious, permanent voice recognition features are already showing up in smartphones, search engines, and EHRs. With companies like Google, Apple, and IBM dedicating increasing resources to voice recognition, technology seems destined to be experienced more and more with the voice and ears, and less with hands and eyes.