SBIR Phase II:Statistical Inference for Advanced Entity Resolution

Period of Performance: 01/01/2013 - 12/31/2013


Phase 2 SBIR

Recipient Firm

InferLink Corporation
2361 Rosecrans Ave., Suite 348
El Segundo, CA 90245
Principal Investigator, Firm POC


This Small Business Innovation Research (SBIR) Phase II project aims to make it possible to do a better job of integrating information about entities, such as people, companies, and products, extracted from heterogeneous data sources. This integration problem can be challenging when different formats and terminology are used to describe the same entity. This problem can be addressed by a statistical learning approach that allows a system to estimate the probability of a match between entity references, rather than computing a score based on ad-hoc rules or weights. The research focuses on refinements to this statistical learning approach that will enable a system to handle diverse types of real-world data. Because the approach is based on sound statistical principles and uses evidence compiled from large datasets, it can produce more accurate results than existing commercial methods. Moreover, these advantages are amplified when handling data that that has highly variable, missing or noisy attributes, such as data extracted from websites. The broader impact of this project lies in enabling enterprises to perform more accurate and reliable data integration. Today, enterprises often have difficulty utilizing data extracted from unstructured or semi-structured source because the extracted data is noisy and difficult to integrate. This capability is critical for some of the nation's largest companies and institutions. For instance, the technology being developed in this project will reduce the cost of integrating data from hospitals and health information providers. It also can help intelligence agencies do a better job of connecting the dots, when investigating companies and individuals, and help human resource managers do a better job of finding;and recruiting job candidates. Ultimately the technology resulting from this project will help many types of enterprises make better use of the growing amount of information accessible through the Web and private networks.