SBIR Phase I: Touch-free Voice Trigger for Power Constrained Systems

Period of Performance: 12/15/2016 - 08/31/2017


Phase 1 SBIR

Recipient Firm

ifire bio, LLC
712 Sw 16th Ave #216
Gainesville, FL 32601
Firm POC, Principal Investigator


The broader impact/commercial potential of this Small Business Innovation Research (SBIR) Phase I project will be to bring positive impactful changes to the economy, society, and scientific/technological understanding through intelligent voice-enabled technology. Commercially, the introduction of an intelligent, ultra-low power, embedded voice trigger will capture an important portion of the $113 billion global voice and speech recognition market, contributing to its expansion and that of related markets such as wearables and Internet of Things (IoT). The long-term potential impact is broad and encompasses the key players in the voice recognition ecosystem such as the mobile phone industry, semiconductor companies, chip designers, technology providers and system integrators, who will be enabled to innovate with the proposed platform. Societally, the day-to-day lives of users will be made easier by empowering them through the faster and natural use of speech commands to access device functionality and to retrieve processed information in real time. Furthermore, by performing onboard speech processing, the proposed technology alleviates many societal issues and challenges surrounding privacy and data protection with cloud-based solutions. This Small Business Innovation Research Phase I project aims to develop an always-on, touch-free, embedded voice trigger with ultra-low power consumption for deployment in smart products and IoT systems. This novel speech processing technology transforms analog speech signals into biologically-inspired data-efficient binary pulse trains and processes them using adaptive state models. These models can be deployed in a network of reprogrammable automata with high accuracy using only the timing information of the digital pulses produced by the user's voice. Compared with conventional speech features, pulse trains, as used in the human auditory system, are highly robust to noise and lead to extremely small footprint hardware implementations, which are ideally suited for low power on-chip processing. Furthermore, by processing only ones and zeros with automata, the proposed approach requires no arithmetic, enabling real-time, complex voice processing capability that would otherwise be achieved at prohibitive costs or be relegated to the cloud. This keyword spotting technology is pre-configurable, customized for each user, and can be used in cellphones or similar portable devices with no network connection required after calibration. It offers increased security and privacy to access portable device features by eliminating the need to send potentially sensitive data continuously online.