Bayesian Methods and Experimental Design for Molecular Biology Experiments

Period of Performance: 08/01/2007 - 07/31/2009


Phase 1 SBIR

Recipient Firm

Insightful Corporation
Seattle, WA 98109
Principal Investigator


DESCRIPTION (provided by applicant): The goal of this proposal is to provide a suite of software tools for bioinformatics and systems biology researchers who are using molecular biology (Omics) data to identify the best experimental design and to analyze the resulting experimental data using Bayesian tools. A common problem for most bioinformatics experiments is low power due to low replication. This problem can be alleviated economically when an increase in adoption and use of a specific platform leads to a decrease in associated costs, thereby enabling an increase in samples allocated per treatment. Yet, many bioinformatics experiments remain underpowered as researchers use the offsets of decreased costs to explore more complex questions. When designing an experiment, the allocation of samples to treatment regimens, and the choice of treatments to test, are traditionally the only variables to manipulate. Bayesian experimental design provides a framework to find the optimal design out of n possible designs subject to a utility function that can include such items as time and material costs. Bayesian statistical methods have been gaining substantial favor in bioinformatics and systems biology as they provide a highly flexible framework for fitting and exploring complex models. Bayesian models also provides to domain experts such as biologists and physicians easily interpretable models through posterior probabilities which are more naturally understood than the traditional p-value. While a number of open source tools based on Bayesian models are available, most are applied best in the context of a specific research data analysis problem or model and are not integrated into a single, complete system for data analysis. We propose to research and develop a statistical analysis software package S+OBAYES (for S-PLUS and R) with generalized tools for Bayesian design of experiments, empirical and fully Bayesian analysis, and modeling and simulation using modern commercial software development practices. These tools will provide functionality for finding the optimal choice and layout of experimental treatments for molecular biology experiments and for fitting Bayesian linear and non-linear models to a variety of data types including time series. We propose to validate the software in molecular biology research problems such as the detection of differential gene, protein, and metabolite abundance. The benefits of this work will be a commercial-quality software package with validated statistical methodology and interactive visualization tools that will appeal to molecular biologists and systems biology investigators. The results of the proposed work will expedite discoveries in basic science, early disease detection, and drug discovery and development.