SBIR Phase I: Decoding Obfuscated Text to Find Trafficking Victims

Period of Performance: 01/01/2016 - 06/30/2016

$150K

Phase 1 SBIR

Recipient Firm

Marinus Analytics LLC
4620 Henry Street Array
Pittsburgh, PA 15213
Firm POC
Principal Investigator

Abstract

The broader impact/commercial potential of this Small Business Innovation Research (SBIR) Phase I project goes beyond combating sex trafficking in the United States. The proposed technology finds compounded influence with extension to countries in North America, Asia, and Europe, especially those with sufficient Internet penetration. This innovation will enable previously impossible information extraction algorithms to be applied to data on trafficking activity. The United States will lead by example by empowering its law enforcement to find and rescue victims of human trafficking and prosecute their exploiters, showing the world that exploitation will not be tolerated in our society. The proposed innovation will also increase collaboration across fragmented jurisdictions domestically and internationally, streamlining investigative workflows and enhancing productivity. The capability to decode Unicode characters into meaningful intelligence is an important and novel innovation impacting other investigations related to black market economies, which are becoming increasingly commonplace. Many detectives need analogous capabilities in domains including drug, gun, and animal trafficking and the sale of counterfeit goods online (e.g. pharmaceuticals, licensed merchandise) to empower them to find patterns left by perpetrators. Both international expansion of existing tools and expansion of domains served will multiply the commercial potential of this project. This Small Business Innovation Research (SBIR) Phase I project will develop new machine learning technology to decode patterns of text used by sex traffickers in online advertisements to evade prosecution. Recently, criminals have developed new ways to avoid law enforcement detection, particularly by use of look-alike Unicode characters and symbols, which cannot be easily translated by a computer for automated search or deeper analysis. There is no existing solution that can comprehensively and accurately decode the obfuscated information. We will apply machine learning methods to train a model to predict the appropriate Latin character translation most likely to be represented by alternative symbols, allowing us to automatically and predictively decode obfuscated text. We will produce a set of algorithms to solve this problem, empowering law enforcement to stay ahead of criminal tactics. Beyond the United States, we will deploy the innovation internationally, as well as into new domains, including but not limited to, drug, gun, and animal trafficking, spam, phishing schemes, and attempts to avoid keyword-based alerting systems. These illicit activities are a significant detriment to both our society and economy, and urgent solutions are needed to give those who combat these activities the tools they need to stay ahead.