Accurate accessible cloud software for protein folding for molecular biologists

Period of Performance: 01/01/2016 - 12/31/2016


Phase 2 SBIR

Recipient Firm

Dnastar, Inc.
Madison, WI 53705
Principal Investigator


DESCRIPTION (provided by applicant): Most drugs interact with protein molecules to elicit a cellular response. Traditional drug discovery is a laborious and expensive experimental process, so computational approaches to assess protein function and to accelerate the discovery process are in high demand. Virtual drug screening and structure-based drug design represent computational approaches that can be important to the modern drug discovery and development process. Both are reliant on high-resolution tertiary (3D) protein structures and are hampered by the slow and often unsuccessful methods of experimental structure determination. Protein structure prediction is poised to impact human health by accelerating the construction of high-confidence structural models of drug targets and biopharmaceuticals, which will help identify new therapeutic strategies. However, current methods are very limited in their ability to predict high-resolution models, which is preventing broad classes of therapeutics from being discovered. Also, technologies are needed to predict as early as possible if a candidate drug will fail in the development process. With improvements in accuracy, protein structure prediction can be used to lower drug development costs and focus experiments on the most promising drug candidates. DNASTAR recently released NovaFold-a commercial version of the world leading I-TASSER protein folding algorithm (Yang Zhang, U. Michigan) running on a cloud computing platform. Since 2006, I-TASSER has won the biennial Critical Assessment of Protein Structure Prediction (CASP) competition, a blind study where teams worldwide test their tools against unpublished protein structures. The current product is proving useful to the molecular biology community; however, it cannot take advantage of the cloud's extensive parallelization opportunities nor is it adapted to benefit from protein motion calculations, each of which could dramatically improve the accuracy of the program's predictions. We propose to create a massively parallel software pipeline that predicts the highest frequency of high-resolution protein structures that are suitable for drug screening and drug design projects. In Phase I, we will evaluate the best way to use faster, deeper, and more diverse computing techniques to predict more accurate structures. This includes evaluating parallelization techniques to perform at least 100 times more calculations than are performed by the program today and confirming that an increase in prediction accuracy is achievable by using modified structure template scaffolds. In Phase II, we will use protein motion to improve the accelerated sampling technique. Additionally, we will combine that approach with recent Monte Carlo simulation advancements and massive parallelization in a distributed computing environment to enhance the accuracy further. Ultimately, instead of just 14 simulations per protein like the original algorithm, we wil support thousands of interconnected simulations. At the conclusion of this work, we will deliver a cloud-based software product of suitable accuracy to be relied upon for pharmaceutical biosimulation projects.