Introduction
Backgrounder
Application
Technology
Next Steps
The Future
Sources
Resources
NLP Demo
Speech Technologies White Paper
Next Steps
Speech application development is very much a craft. This project is not just about technology; it requires a multi-disciplinary approach to problem-solving that allows parallel teams working in separate areas to address the constituent tasks:

  • Collaborate with subject matter experts in both the information domain and various technical areas to creating a working framework of specific application features. Because the system must be "taught" how to interpret and answer plain English queries that reference an unseen knowledgebase of Art information, a team of art-knowledgeable students would be necessary. Other disciplines that come to bear on the project include proficiency in graphic design, web technologies, database technologies and networking technologies.
  • Further concentrated research is needed to create more robust base semantic models in support of the natural language engine. Other speech and linguistic specialists may be required to maximize the effectiveness of the semantic engine.
  • Even though we have great confidence in the ability of our technology architecture to scale with the most demanding applications, empirical evidence needs to be gathered to bolster our suppositions. Testing the system in a "production" environment can utilize student teams that develop empirical approaches to determining system operating efficiencies, performance metrics, and mean time between failures. Different devices need to be tested in varying environments that simulate the best and worst to be expected. Boundary conditions need to be measured to create predictable models for scalability in large-scale deployments; a plan needs to be developed to properly "forward-deploy" the appropriate infrastructure pieces in order to localize scaling and performance capabilities. A primary research goal would be to determine how exactly to distribute voice recognition and text-to-speech servers to maximize availability and performance without being cost-prohibitive.
  • While specialized software tools have already been created as a normal course of our R&D efforts to date, some additional software tools are needed to streamline the creation, tuning and maintenance of actual applications. A team of student software developers can create a suite of back-end tools and techniques for logging system activity, reporting, threshold alerts, database updates and maintenance and other necessary system "chores". For larger deployments, putting effective time-saving tools into the hands of the support and implementation teams are essential to the success of the project. In addition, existing speech, voice and natural language engine toolsets are sparse; we've identified a few new tools that would allow orders of magnitude reduction in the labor costs of creating, managing and synchronizing voice recognition grammars and semantic models.
  • A promising research target is the use of Statistical Language Model (SLM) grammars in the construction of more accurate speaker-independent voice recognition; it is new and largely unproven, but could have an enormous impact on especially complex information retrieval systems and very large knowledgebases.
  • An emerging area of research concerns the advancement of text-to-speech capabilities, especially in the area of improving the naturalness of the machines spoken tone. There is compelling evidence that we can build in the ability for the machine to detect the emotional tone of the user (sad, upset, ecstatic, hurried) and be able to tune the response in an appropriate way. For example, the machine could "soothe" an upset user (or reciprocate to an excited one) just through changing the tone of the response. This has the potential to take the "user-friendliness" aspect to a place that no system has ever seen.
  • Further concentrated research on advanced microphone and headset technologies, especially for maximizing application quality in "noisy" environments. We are aware of certain lab-bound technologies that hold enormous promise to completely solve this problem; it is a critical area for many potential applications, especially for mission-critical field usage.
  • End-to-end security testing needs to be done to accommodate most future applications, especially those of field operations. This can include, but is not limited to, hardware security (such as integrated fingerprint/voiceprint authentication), proximity security (provided by RFID and GPS locators), wireless transmission security and basic extranet security (SSL encryption and user/password authentication). While each of these is individually well-served by COTS solutions, integrating all of it into a seamless whole is uncharted territory. RFID research, by itself, is a broad, burgeoning area of interest; it is an integral part of our recent R&D work.
  • Other tasks
    • Develop comprehensive data administration tools
    • Expand natural language processing (NLP) grammars
    • Synchronize context-free grammars with NLP grammar changes
    • Develop a stronger exception handling process
    • Perform thorough distance and noise testing to establish satisfactory working parameters
    • Evaluate alternate PDA devices
    • Determine optimal RFID working parameters and form factors
    • Develop a detailed deployment strategy and identify specific target applications
    • Identify aesthetic improvements to visible hardware pieces