Building Open-domain Semantic Search (BOSS)

Period of Performance: 08/20/2015 - 08/18/2017

$750K

Phase 2 SBIR

Recipient Firm

Decisive Analytics Corp.
1400 Crystal Drive Array
Arlington, VA 22202
Firm POC
Principal Investigator

Abstract

ABSTRACT:Each day, enormous amounts of information are generated, and the rate at which this happens continues to increase. Much of the information is unstructured text, which is exceptionally difficult to use. The challenge of exploiting high volume unstructured text is especially significant for military, government and private sector analysts. To actually perform analysis within narrow operational deadlines, analysts first need to identify relevant information against a background of high levels of noise. In the Building Open-domain Semantic Search (BOSS) project, we are building a next-generation semantic search capability that efficiently labels and indexes all concepts expressed in unstructured text. By using raw data to develop a model of possible concepts, BOSS can easily adapt to many domains and languages. With BOSS-based indexed concepts, users can search for the ideas they are interested in, rather than being restricted to keywords, which can be ambiguous and do not uniquely identify relevant concepts. In addition to indexing concepts, BOSS also structures data, identifying entities, events and the relations between them. Our goal in Phase II of this effort is to refine and operationalize this concept-based retrieval technology developed in Phase I and to deliver this capability to our customers.BENEFIT:Military, government and private sector analysts must work with and process an overwhelming amount of unstructured text, and the rate at which that text is produced is increasing. Current information extraction (IE) and information retrieval (IR) tools are insufficient for modern analysis needs because they are either too restricted to specific tasks or too broad to produce analysis-ready results. In the Building Open-domain Semantic Search (BOSS) proposal, we describe a next-generation semantic search tool. This tool retrieves concepts, not keywords, from unstructured text. Searching over concepts retrieves more relevant information while limiting the amount of irrelevant information. This is because, unlike keywords, concepts can uniquely identify the many different forms of an idea an analyst is looking for without requiring complex handmade queries. Retrieved text will be semantically structured so that it identifies relationships between the events, entities and ideas discussed. These analysis-ready results will dramatically reduce the time analysts spend reading and filtering text. Finally, user modeling tailors search results to maximize user-specific relevance. By the end of Phase II, we will have built a high-TRL, operational-quality concept retrieval system that will allow analysts to more thoroughly exploit text from any genre or domain.