Bayesian Textual and Multimedia Information Retrieval

Period of Performance: 09/30/2002 - 09/29/2004

$372K

Phase 2 SBIR

Recipient Firm

Insightful Corporation
INSIGHTFUL CORPORATION, 1700 WESTLAKE AVE N, STE 500
Seattle, WA 98109
Principal Investigator

Abstract

DESCRIPTION (provided by applicant): Industry transformation and rapid advancements are creating tremendous amounts of electronic multimedia information. We are developing a multimedia search agent for health data networks and data banks. The search agent employs a generalized probabilistic model that bridges the gap between automatic feature extraction and semantic understanding. What distinguishes our approach is the integration of Bayesian methodology with a fast and scalable semantic interpreter. The semantic interpreter can generate a bootstrap database of prior probabilities, overcoming a major weakness of the traditional probabilistic model. We also provide a principled approach to user feedback, contrasting existing probabilistic and nonprobabilistic ad-hoc methods. Relevance feedback on an initial database of prior probabilities can incrementally improve retrieval results to unprecedented levels of precision/recall. The model supports interactive definition and training of new semantic labels in a collaborative environment. Semantic labels organize index term dependencies in a tree-like structure with probabilities at each node, and allow the user to define concepts that match specific information needs more closely than the raw feature information found in an indexed database. Finally, we propose a comprehensive approach to the difficult problem of combining probability distributions, or relevance judgements, from different search engines. PROPOSED COMMERCIAL APPLICATIONS: The proposed methodology adds a customizable interpretive layer to electronic collections and archival multimedia databases. The Phase I prototype has opened an immediate partnership opportunity in this area. The Bayesian search and retrieval functions will become an add-on module to many database management systems. The ability to train the system to recognize visual concepts gives us a definite advantage In image mining. The decision-maker software can merge the results of different experts or search engines, a problem for which presently there are only ad hoc unsatisfactory solutions, and great opportunities in the Web and e-commerce.