SBIR Phase II: VocaliD- Infusing Unique Vocal Identities into Synthesized Speech

Period of Performance: 04/01/2016 - 03/31/2018

$748K

Phase 2 SBIR

Recipient Firm

Vocalid, Inc.
BELMONT, MA 02478
Firm POC, Principal Investigator

Abstract

The broader impact/commercial potential of this Small Business Innovation Research (SBIR) Phase II project is to offer custom crafted digital voices for text-to-speech applications. Each one of us has a unique voiceprint - an essential part of our self-identity. Though the quality of text-to-speech technology has improved, voice options remain limited. For the 2.5 million Americans (and tens of millions worldwide) living with voicelessness who rely on devices to talk, access to a custom digital voice is a game changer. It's the difference between a functional solution and being heard, uniquely, as oneself. Enhanced opportunities for social connection increase quality of life, independence, and access to educational and vocational resources that can narrow the gap between those with and without disability. This immediate unmet societal need, coupled with the increasing proliferation of devices that speak to us and for us, creates a compelling, timely and significant commercial opportunity for high quality, personalized digital voices that can be produced at scale. By leveraging the company's crowdsourced human voicebank and proprietary voice matching and blending algorithms the technology has the potential to empower everyone to express themselves through their own voice. This Small Business Innovation Research Phase II project builds on the company's NSF-funded research and Phase I results that support feasibility and commercialization of a customized voice building technology. The text-to-speech market, encompassing assistive technologies, enterprise and consumer applications, is currently valued at around $1B and is rapidly growing and ripe for innovation. To create custom voices, the company leverages the source-filter theory of speech production. From those who are unable or unwilling to record several hours of speech the company extracts a brief vocal sample - even a single vowel contains enough 'vocal DNA' to seed the personalization process. Identity cues of the source are then combined with filter properties of a demographically and acoustically matched donor in the company's voicebank. The result is a voice that captures the vocal identity of the recipient but the clarity of the donor. Phase II technical objectives address the need for 1) customer-driven voice customization, 2) quality assurance of crowdsourced recordings, 3) voice aging algorithms, and 4) targeted donor recruitment algorithms. These advances will help secure the assistive technology beachhead and spur innovations for broader applications such as virtual reality, personal robotics, and digital persona for the Internet of Things.