The NamesforLife Semantic Index of Phenotypic and Genotypic Data for Systems Biology

Period of Performance: 01/01/2014 - 12/31/2014


Phase 2 STTR

Recipient Firm

Namesforlife, Llc
333 Albert Ave., Suite 202
East Lansing, MI 48823
Principal Investigator, Firm POC

Research Institution

Michigan State University
Office of Sponsored Programs
East Lansing, MI 48824


The DOE Systems Biology Knowledgebase (KBase) was envisioned to provide a framework for modeling dynamic cellular processes of microorganisms, plants and metacommunities. The KBase will provide the tools and data to enable rapid iteration of experiments that draw on a wide variety of data and allow end-users to infer how cells and communities respond to natural or induced perturbations, and ultimately to predict outcomes. The Systems Biology Knowledgebase Implementation Plan defines the needs and priorities for this initiative, which include biofuel production, bioremediation and carbon sequestration. Ultimately, the KBase will become a platform for accelerated acquisition of basic and applied biological knowledge. Predictive models depend on high quality input data. The authors of the Implementation Plan recognize that many different types of data are required to build such models. But not all data are of similar quality nor are they amenable to computational analysis without extensive cleaning, interpretation and normalization. Key among those needed to make the KBase fully operational are phenotypic data, which are more complex than sequence data, occur in a wide variety of forms, often use complex and non-uniform descriptors and are scattered about, principally in the scientific and technical literature or in specialized databases. Incorporating these data into the KBase requires expertise in harvesting, modeling and interpreting the data. The Semantic Index of Phenotypic and Genotypic Data for Systems Biology seeks to address this problem by developing a descriptive ontology of phenotypes for Bacteria and Archaea. The ontology is based on concepts and observational data drawn from the taxonomic literature. In the Phase I project we developed software to extract a list of over 40,000 candidate terms that were used to describe 5,750 species of Bacteria and Archaea. In Phase II, the extracted terms were used to create a phenotypic ontology, a repository of phenotypic data and normalized phenotypic descriptions for each of the species. During the first year of Phase II, we developed new design patterns for ontology development that have the potential to unlock knowledge that is not readily apparent in the literature. A patent application covering this technology was filed. In the proposed Phase IIb project, we will apply these novel modeling techniques to encode axioms to automatically resolve ambiguity attributed to the semantic imprecision of published phenotypic literature. This will support inference by machine reasoners that can correctly interpret phenotypic language and data. This service will provide researchers with consistent interpretations that are usable for predictive modeling and in other research and commercial applications. The Company continues to develop proprietary terminology discovery and extraction tools as new product offerings for use in document and terminology management systems. These products augment the subscription data and annotation services developed for the publishing industry.