Semi-Automated Processing of Interconnected Dyads Using Entity Resolution (SPIDER)

Period of Performance: 09/09/2015 - 09/08/2016


Phase 1 SBIR

Recipient Firm

Charles River Analytics, Inc.
625 Mount Auburn Street Array
Cambridge, MA 02138
Principal Investigator
Principal Investigator


DESCRIPTION (provided by applicant): The overarching aim of this proposal is to develop and verify a new and innovative software system that will assist researchers in more rigorously constructing social networks. During the past two decades, studies have increasingly employed social network analysis (SNA) to understand HIV and sexually transmitted infections (STI) transmission. The mapping of "risk networks," in which individuals are connected by infection-spreading ties, has yielded especially valuable insight into the behavioral epidemiology of HIV and STI, and has informed promising interventions designed for people at risk for or living with HIV. However, despite these advances and the burgeoning popularity of SNA-based HIV/STI research, major methodological and technological challenges are hindering further progress in the field. SNA's ability to catalyze major epidemiologic advances relies on researchers'ability to construct valid representations of participants'networks from behavioral data. The standard protocol for constructing risk networks, or identifying direct and indirect relationships among participants and their partners, involves matching participants'names and demographics with data provided about named partners. This process of identifying and matching duplicate individuals in the network (i.e., "entity resolution" [ER]) is often conducted through laborious, manual cross-referencing procedures. These procedures are limited in their reproducibility and may lead to misspecification of network structure. Further complicating valid network construction is that ER criteria are: (1) not formalized;(2) specified differently across studies n various settings and populations;and (3) rarely, if ever, explained in the published literature. Semi-automated tools that combine powerful automated ER processes with capacities for customization and qualitative input have the potential to dramatically improve the speed and accuracy of risk network construction. Current tools for ER in health research tend to focus on a static subset of available ER techniques (e.g., similarity in demographics, phonetic-based matching techniques) without incorporating state- of-the-art approaches (e.g., machine learning). The proposed software, Semi-automated Processing of Interconnected Dyads using Entity Resolution (SPIDER), will provide users with a system that enables efficient, semi-automated network construction using a library of robust, statistically rigorous ER algorithms, rich desktop-based annotation tools, and secure web-based technologies. The customizability of SPIDER will allow for multi-disciplinary utility in studies using varying designs and will include innovative features that specifically respond to emerging methodological trends in HIV/STI research. The overarching goal of this project is to improve the efficiency and quality of network construction used in research, thereby improving the evidence base for network-based interventions that mitigate the spread of HIV.