|
|
| Speech Technologies White Paper |
| Introduction |
How cool would it be if you could just TELL an application the data you needed, and it actually went out and did it?
Over the past 25 years or so, information technology has improved in geometric fashion, pushing man-made devices to the limits of known physics.
Pure technological advances are the enablers, but real progress is made when the average person can utilize these advances directly to improve the quality of everyday
living and working. The history of the application of advanced technology describes a series of "jumps" that are best described as "killer applications". Killer apps
materialize as a result of the convergence of maturing underlying technologies, but infrequently. Some good examples are the electronic spreadsheet, email and the web browser.
It's been over eight years since the last killer app appeared on the scene; the next one will be more of a "killer genre" than killer app:
widely-used voice-enabled human-machine interfaces.
Some may point out that this is old hat, that we've had this capability for some time. However, a true voice-enabled human-machine interface encompasses more than just
a machine's ability to convert speech to text and text to speech: the machine must also be able to understand the speech, relate it to a stored knowledgebase, then
respond in a useful manner. Until now, the convergence of speech recognition and natural language processing (also known as natural language understanding) has been seen rarely outside the laboratory and has remained
the province of highly funded private research endeavors. These research projects are just now maturing into off-the-shelf tools that can be integrated with existing
proven technologies to finally foment the next killer app.
These off-the-shelf tools fall into several categories, mirroring the research fields from which they've sprung: speech processing, speech synthesis and
natural-language processing (NLP). Speech processing converts speech into text. Speech synthesis converts text into speech. NLP understands grammar:
how words connect and how their definitions relate to one another. This last field stands on its own, but also contributes to the other two, because
computers listen, speak, and interpret more accurately when they have guidelines to what words can mean.
The infrastructure required to support the next killer app has also matured, almost to the point of commodity. Input and output devices (PDA's, microphones, speakers, PC's)
are reliable, cheap and ubiquitous and are supported by equally reliable, cheap and ubiquitous high-speed networks, wired or wireless, over short and long distances.
The applications are just waiting to be built.
|
|