Building an open-source cloud-based computational platform to improve data access

Period of Performance: 05/01/2014 - 04/30/2015


Phase 1 SBIR

Recipient Firm

Biodatomics, LLC
Bethesda, MD 20814
Principal Investigator


We propose to develop a novel, cost-effective, cloud-based data and analytics platform that will provide efficient data storage solutions and enhanced analytics, annotation and reporting capabilities for supporting and accelerating clinical and molecular research in the treatment of substance use disorders (SUD). This open source platform, which leverages existing BioDX technology, will provide a centralized, multi-user environment that enables and encourages collaborative research and information dissemination among team members. One of the unmet infrastructural challenges of modern molecular research is the availability of computational platforms that allow the management of large databases, easy access to data, the availability of powerful customizable tools for data mining, analysis and visualization, and integration of different data sources to allow successful analysis of complex data problems. Such problems are commonplace in high- throughput molecular research. This proposal aims to fill this gap by developing a robust platform that integrates state-of-the-art open-source technologies for data storage, data access, data mining and analysis, annotation, visualization and reporting. We previously developed a cloud-based BioDatomics platform for Next Generation Sequencing (NGS), BioDX, which has been successful and has been used commercially by several clients. This proposal aims to develop a new platform leveraging our experience with the BioDX platform that integrates: data storage and real-time data querying using Cloudera Impala;powerful and customizable analytics tools using R and its derivative Bioconductor suite of programs for bioinformatics;annotation integration and reporting which is an existing feature of BioDX;and a visual programming interface that will simplify and enhance the development and maintenance of reproducible analytics workflows. We believe this powerful integrated data platform, if successful, will enable real-time collaboration, dramatically reduce data repository costs, and increase the efficiency and efficacy of data analyses for translating experimental data into actionable research products. We are committed to analyzing stakeholder needs and optimizing hardware, software and information technology systems to meet their demands. This platform will enhance stakeholder capabilities for developing, implementing and testing various models for substance addiction, risky behavior, discovery of molecular targets for treatment, genomic profiling of patients and other relevant scientific questions. Users will have access to modern statistical, machine learning, data mining and visualization tools. The initial phase of work will involve development of the platform, optimizing performance on the cloud and testing the integration of new technology. BioDatomics is committed to funding the next phase of work which will include usability testing and finalizing a commercial product, following which full commercialization will proceed. Preliminary commercialization plans have demonstrated that the project has the capacity to generate a million dollars in revenue during the first full year after commercial release. The ultimate beneficiaries of this platform will be government agencies, academic researchers and pharmaceutical companies pursuing collaborative projects to discover treatments for substance abuse disorders. This open source platform will enable significant savings to the end users in terms of data storage and analytic capabilities, and promises to have a major impact in increasing the success of molecular, clinical and translational research for substance abuse disorders.