SBIR Phase I: A Cocktail Party Technology: Real-Time Conversation Separation from Background Voices and Sounds

Period of Performance: 12/01/2016 - 05/31/2017


Phase 1 SBIR

Recipient Firm

Yobe Inc
21 Mill Street 2nd Floor
Wethersfield, CT 06109
Firm POC, Principal Investigator


The broader impact/commercial potential of this Small Business Innovation Research (SBIR) Phase I project is that it will for the first time make it possible to create voice technologies whose performance in speech and speaker recognition does not significantly degrade due to the presence of interfering voices or environmental sounds. This issue has kept many voice technologies out of both the mobile and IoT markets. It is expected that the company's unique artificial intelligence platform will deliver a fully scalable, real-time software solution. Solving this challenge will make the currently noisy world of smartphones more realistic for voice technologies (like voice authentication) that to date have avoided the space. This Small Business Innovation Research Phase I project concerns a novel technology that is the result of innovatively combining advanced signal processing, broadcast studio methodologies, and artificial intelligence techniques to perform aggressive separation of voice from background voices and sounds. It also automatically repairs the biometrics of the separated voice signals on the basis of empirically formulated signal-dependent rules. The technology has already been demonstrated through informal tests to be significantly better than existing technologies in separating two-person conversations from highly overlapped background voices and sounds captured on a pair of closely spaced (few centimeters) omnidirectional microphones. This SBIR Phase I project seeks to firmly establish the clear superiority of this technology over any other existing voice separation technology. The ultimate goal of the project is to demonstrate that when the proposed technology is properly optimized as a frontend to state-of-the-art automatic speech recognition or state-of-the-art automatic speaker recognition, the recognition error rates in noisy multi-voice environments are comparable to those obtained in noiseless single-voice environments.