SBIR Phase I: Robust Speech-to-Text Messaging

Period of Performance: 01/01/2006 - 12/31/2006

$100K

Phase 1 SBIR

Recipient Firm

TravellingWave
1200 Mercer St Suite 412, Suite 345
Seattle, WA 98109
Principal Investigator

Abstract

This Small Business Innovation Research (SBIR) Phase I research project addresses the fundamental problem of inputting text, using speech, into embedded devices like cellular phones. This technology has immediate applications for Text Messaging (short messaging service or SMS). Existing interfaces for Text-Messaging input broadly include the 9-digit keypad and the miniature keyboards. It is widely acknowledged that these interfaces are clumsy and lack the speed and user friendliness of a full-size keyboard. This project's objective is to develop a highly robust, complementary speech-to-text messaging interface, with a goal of near 100% task-completion accuracy (TCA) in real-world noisy environments. Using this, a mobile user will be able to speak messages into a device and have that device type the same. Currently, TravellingWave (TW) has developed (based on the company's patent-pending predictive speech-to-text technologies) speech-to-text messaging software; in clean environments, this product yields the desired 100% TCA. The proposed research involves developing novel front-end signal-processing algorithms (based on adaptive filter banks), optimized to TW's predictive speech-to-text technologies. Specifically, a bank of simple adaptive filters will be developed, each of which estimates and tracks the frequency location of a dominant spectral peak and its amplitude, while discriminating against background noise and interference. It is anticipated that the algorithms resulting from Phase I research will enable its current technology to work under real-world noisy environments and reduce the processing power requirements of the company's overall software application; increasing its overall adoption. The technology is relevant to speech-to-text messaging applications for mobile devices. However, more broadly, the underlying technology may be viewed as an enhanced multi-modal user-interface for the ever-shrinking mobile device: users can now input text using their own voice. The socioeconomic impact of such a rich user-interface technology may be envisioned using the following examples: (a) a user driving an automobile can dictate an email to a mobile device which then sends it across a wireless network, (b) an enterprise executive can access the wealth of information (while on the go) residing on the Internet using a mobile device, (c) a disabled person may communicate in a hands-free-eyes-free mode using text messaging, (c) a warehouse industry worker may input text into a remote database while working in a hands-busy-eyes-busy environment. When adopted in the consumer market this technology will increase the understanding of the language semantics people use, the expectations, the overall use of this new mode of interface, and hence will broaden the overall understanding of several concepts underlying human-machine interface technology.