Hierarchical Dynamic Exploitation of FMV (HiDEF) through the use of Video Learning for Analysis from Deep Embeddings

Period of Performance: 07/31/2015 - 04/30/2016


Phase 1 SBIR

Recipient Firm

Commonwealth Computer Research, Inc.
1422 Sachem Pl., Unit #1 Array
Charlottesville, VA 22901
Principal Investigator


ABSTRACT:Recent advances in in machine learning have dramatically increased the state of the art in related tasks, such as image recognition and machine translation. Most of this progress has centered around families of neural network algorithms that are broadly called deep learning. It is believed that there is a significant opportunity to apply these breakthroughs in image, text, and video processing, that leverage a collection of deep learning techniques to dramatically improve the automated understanding of full motion videos collected from aerial platforms. Furthermore, the representation learned from the raw video data will be sufficiently rich that it will be possible to automatically extract a text description of the content. This generated text content can subsequently be used to provide accurate semantic discovery of video content from analyst-formulated natural language queries or questions, and fusion with existing knowledge bases of information extracted from a text corpus. This will enable important indications and warnings, and dramatically increase the availability of forensic data that can be analyzed to develop predictive algorithms, making it possible to identify future threats sooner.BENEFIT:The approach that we have outlined so far, while targeted at recognizing and describing activities of interest in aerial surveillance videos, is widely applicable to understanding the content of many different varieties of video sources. Within the Department of Defense (DoD) and the Intelligence Community (IC), the need for this capability should only grow, for example as more and better drones become available to units deployed in foreign locations. Drones with cameras are a cheap and effective way to perform surveillance over an area, but only with software tools that can prioritize video content through automated understanding. DoD and IC organizations that we will target include the Marine Corps, small deployable / expeditionary Army units, and the CIA. ??Additionally, we expect that the law enforcement market for this technology will be significant, for similar reasons. Organizations such as police and the coast guard are only now beginning to experiment with drones and surveillance cameras. While the size of this market depends on the extent to which society accepts this variety of monitoring, we expect there to be a large number of scenarios in which it is deemed acceptable, for example monitoring the United States border (DHS and Border Patrol), the coasts (Coast Guard), and areas surrounding prisons. ??However, the largest possible market may be in commercial rather than government applications. The private security market is very large and growing. Our proposed solution offers a valuable product that could complement the offerings of existing commercial security companies, who are unlikely to have the advanced technology required to automatically detect activities of interest in their security videos. Rather, they often employ people whose job it is to monitor these videos. Not only is it expensive to pay employees for this task, generally suspicious or otherwise interesting activities will be only be noticed if the person happens to be monitoring the camera at the right time. ??In addition to use cases for commercial security, we envision uses cases both in the television news industry and the technology industry. In television news, reporters film many activities that include people or events of interest. This information is often used immediately, for an upcoming news report. It is necessary to watch and edit the footage to generate the news report, and our system could streamline this process. Beyond that, our system could be especially useful in analyzing and cataloging the content of video segments so that they can be saved and semantically recovered from a historical repository for future research and programming. ??In the technology industry, millions of videos are being uploaded and stored by millions of users to sites such as Facebook, YouTube, and Vine. The industry has only scratched the surface of what can be done to enable better organization, automated understanding, and discovery of these videos, for example to improve user experience by automatically tagging their videos. The models that we will develop in this effort will be enabling capabilities for each of these challenges, so offer a large set of business-to-business opportunities. ????