High Precision Retrieval from Molecular Data Resources

Period of Performance: 02/01/2005 - 01/31/2006


Phase 1 SBIR

Recipient Firm

Insightful Corporation
Seattle, WA 98109
Principal Investigator

Research Topics


DESCRIPTION (provided by applicant): The appearance of new scientific research methods has greatly increased the volume of molecular data in all the basic medical sciences. While many data resources are available on-line and generally very useful to researchers, they are difficult to use for a non-expert user. Since every database uses its own user interface and vocabulary, querying these databases and combining results can be very time consuming. In addition, current search engines return too many inaccurate results. There is thus an urgent need for development of better access to the molecular databases through user-friendly interface and high-precision retrieval. The ultimate goal of this project is to develop technologies that allow user to access biological information stored in heterogeneous data resources by entering queries in explicit sentences or questions. Natural language search system provides a unified and transparent interface by translating questions into appropriate database retrievals. It also promises higher precision than conventional keyword-based search engines. The specific objective of this Phase I research is to develop algorithms for extracting higher level semantic structures composed of concepts, and relationships between concepts, from both questions and potential answers. A domain ontology, such as the UMLS and the GO ontology, is incorporated to provide a conceptual framework between linguistic primitives/structures and domain-specific concepts/relations. Answers are retrieved through dynamic, query-driven entity-relational filtering, transformation and matching. In Phase I, the feasibility and efficacy of the proposed approach will be demonstrated and tested on question interpretation and answer retrieval from the MEDLINE bibliographical database.