|
|
| Speech Technologies White Paper |
| Next Steps |
Speech application development is very much a craft. This project is not just about technology; it requires a multi-disciplinary approach to
problem-solving that allows parallel teams working in separate areas to address the constituent tasks:
- Collaborate with subject matter experts in both the information domain and various technical areas to creating a working
framework of specific application features. Because the system must be "taught" how to interpret and answer plain English queries
that reference an unseen knowledgebase of Art information, a team of art-knowledgeable students would be necessary. Other disciplines
that come to bear on the project include proficiency in graphic design, web technologies, database technologies and networking
technologies.
- Further concentrated research is needed to create more robust base semantic models in support of the natural language engine.
Other speech and linguistic specialists may be required to maximize the effectiveness of the semantic engine.
- Even though we have great confidence in the ability of our technology architecture to scale with the most demanding applications,
empirical evidence needs to be gathered to bolster our suppositions. Testing the system in a "production" environment can utilize
student teams that develop empirical approaches to determining system operating efficiencies, performance metrics, and mean time
between failures. Different devices need to be tested in varying environments that simulate the best and worst to be expected. Boundary
conditions need to be measured to create predictable models for scalability in large-scale deployments; a plan needs to be developed to
properly "forward-deploy" the appropriate infrastructure pieces in order to localize scaling and performance capabilities. A primary
research goal would be to determine how exactly to distribute voice recognition and text-to-speech servers to maximize availability and
performance without being cost-prohibitive.
- While specialized software tools have already been created as a normal course of our R&D efforts to date, some additional software
tools are needed to streamline the creation, tuning and maintenance of actual applications. A team of student software developers can
create a suite of back-end tools and techniques for logging system activity, reporting, threshold alerts, database updates and
maintenance and other necessary system "chores". For larger deployments, putting effective time-saving tools into the hands of the
support and implementation teams are essential to the success of the project. In addition, existing speech, voice and natural language
engine toolsets are sparse; we've identified a few new tools that would allow orders of magnitude reduction in the labor costs of
creating, managing and synchronizing voice recognition grammars and semantic models.
- A promising research target is the use of Statistical Language Model (SLM) grammars in the construction of more accurate
speaker-independent voice recognition; it is new and largely unproven, but could have an enormous impact on especially complex
information retrieval systems and very large knowledgebases.
- An emerging area of research concerns the advancement of text-to-speech capabilities, especially in the area of improving the
naturalness of the machines spoken tone. There is compelling evidence that we can build in the ability for the machine to detect the
emotional tone of the user (sad, upset, ecstatic, hurried) and be able to tune the response in an appropriate way. For example, the
machine could "soothe" an upset user (or reciprocate to an excited one) just through changing the tone of the response. This has the
potential to take the "user-friendliness" aspect to a place that no system has ever seen.
- Further concentrated research on advanced microphone and headset technologies, especially for maximizing application quality in "noisy"
environments. We are aware of certain lab-bound technologies that hold enormous promise to completely solve this problem; it is a
critical area for many potential applications, especially for mission-critical field usage.
- End-to-end security testing needs to be done to accommodate most future applications, especially those of field operations.
This can
include, but is not limited to, hardware security (such as integrated fingerprint/voiceprint authentication), proximity security
(provided by RFID and GPS locators), wireless transmission security and basic extranet security (SSL encryption and user/password
authentication). While each of these is individually well-served by COTS solutions, integrating all of it into a seamless whole is
uncharted territory. RFID research, by itself, is a broad, burgeoning area of interest; it is an integral part of our recent R&D work.
- Other tasks
- Develop comprehensive data administration tools
- Expand natural language processing (NLP) grammars
- Synchronize context-free grammars with NLP grammar changes
- Develop a stronger exception handling process
- Perform thorough distance and noise testing to establish satisfactory working parameters
- Evaluate alternate PDA devices
- Determine optimal RFID working parameters and form factors
- Develop a detailed deployment strategy and identify specific target applications
- Identify aesthetic improvements to visible hardware pieces
|
|