STTR Phase I: SunDIAL: Slot DIscovery And Linking

Period of Performance: 01/01/2016 - 12/31/2016

$225K

Phase 1 STTR

Recipient Firm

RedShred
5520 Research Park Drive Suite 100
Baltimore, MD 21228
Firm POC, Principal Investigator

Research Institution

University of Maryland, Baltimore County
1000 Hilltop Circle
Baltimore, MD 21250
Institution POC

Abstract

The broader impact/commercial potential of this Small Business Technology Transfer (STTR) Phase I project results from it improving the accessibility of documents and providing a powerful tool for consumers and businesses to enable routing, categorizing, and understanding of documents. By producing summaries of opportunity documents for business developers, SunDIAL will accelerate pipeline and business development decisions for businesses of all sizes through automatically generated, SaaS-delivered structured summaries. SunDIAL will advance the ability of practical information extraction systems by generating summaries from any text document. Other broader impacts and benefits include furthering the technology to empower individuals to get the gist quickly of formal documents such as consumer credit contracts, health insurance policies, complex industry request for proposals, grants and other difficult-to-read documents. A successful outcome will allow users to quickly assess documents through infobox displays, triage documents on mobile devices, write rules to sort or filter them automatically, and review key information before ever opening the document. This Small Business Technology Transfer (STTR) Phase I project will discover unsupervised and unrestricted slots and fillers (attributes and values) to construct structured summaries based on keywords and hidden patterns in document collections. SunDIAL goes beyond the state of the art by not requiring a manually crafted catalogue of slots and complements supervised approaches by discovering new slots. While conventional Natural Language Processing (NLP) approaches are effective on well-formed sentences, the techniques described here are effective for semi-structured content such as section headers, lists and tables. Furthermore, NLP based approaches for fact extraction and text summarization are primarily lexical, requiring further processing for disambiguating and linking to unique entities and concepts in a knowledge base. SunDIAL further advances the state of the art by eliminating these steps as it identifies keywords and links to knowledge base concepts as a first step in the discovery process. Linking concepts to a knowledge base provides the additional advantage that the terms can be explicitly mapped to semantic concepts in other ontologies. SunDIAL will lead to advances in knowledge discovery, and advance the methodologies for information retrieval, information extraction, slot filling, and knowledge-base population.