Joint Learning of Text-based Categories

Period of Performance: 07/27/2016 - 02/26/2017

$148K

Phase 1 SBIR

Recipient Firm

Systems & Technology Research
600 West Cummings Park Array
Woburn, MA 01801
Firm POC
Principal Investigator

Abstract

STR proposes to build Categories via Context-Driven Dimensionality Reduction (C2D2R), a novel, highly efficient, and scalable information processing pipeline for joint learning of categories of entities and relations, and document topics. C2D2R operates in three stages: shallow pre-processing of text inputs, context-driven dimensionality reduction, and joint category inference and labeling. Our system will build upon the FACTORIE open source machine learning library, and will be tested on DTRA mission relevant datasets. We will measure the performance of the C2D2R system in terms of category coherence, agreement with data annotations, and computation time. The result will be computationally efficient and scalable, will work with minimal or no human supervision, require no predefined classes, be readily adaptable to new domains, and enable conditioning on any dimension to see the effect on other variables.