Context-sensitive Content Extraction and Scene Understanding

Period of Performance: 09/30/2008 - 09/29/2010


Phase 2 SBIR

Recipient Firm

11600 Sunrise Valley Drive
Reston, VA 20191
Principal Investigator


Automatic visual content extraction and scene understanding is a critical enabling technology for video surveillance, security and forensic analysis applications. The task involves identifying objects in the scene, describing their inter-relations, and detecting events of interests. The project addresses this need by developing algorithms to extract syntactic, semantic and conceptual information from visual data. We adopted the modeling and conceptualization framework of stochastic attribute image grammar. In this framework, a visual vocabulary is defined from pixels, primitives, parts, objects and scenes. The image grammar provides a principled mechanism to list visual elements and objects present in the scene and describe their relations. A bottom-up top-down strategy is used for inference to provide a description of the scene and its constituent elements. A text generation system then converts the semantic information to text report. The Phase I study has demonstrated the feasibility of this approach. In Phase II, we plan to extend the technology to handle more complex scene and achieve the following objectives: (1) classification of more than 20 types of scene elements; (2) data fusion with data from multiple cameras and other modalities; (3) complex events detection; and (4) enhanced text report generation and forensic analysis.