Empowering real-time, web-based genomic big data analysis at commercial scale.

Period of Performance: 09/22/2016 - 08/31/2017


Phase 1 STTR

Recipient Firm

Frameshift Labs, LLC
BOSTON, MA 02116
Principal Investigator
Principal Investigator
Principal Investigator


? DESCRIPTION (provided by applicant): The academic iobio project delivers web-based applications that visualize and provide real-time interactive analysis with multiple different data types generated as part of next-generation sequencing projects. These applications are open-source and provided for free. This proposal aims to develop critical computational infrastructure to enable the iobio project to offer commercial applications, providing more power and functionality than is possible for the academic project. In particular, it is necessary to build multiple components that will facilitate management of large computational resources in an on-demand, scalable and robust manner and provide mechanisms for remote file storage. Additionally, functionality to seamlessly combine real-time analysis which is the mainstay of the academic developments with large scale analysis undertaken by commercial applications will empower all users of sequencing data from clinicians / genetic counsellors to large scale sequencing centres and institutions. The long-term objective of the proposal is to position Frameshift labs to rapidly and easily build new commercially viable web-based applications to tackle current bioinformatics analysis problems. This proposal also includes the development of a commercial application called multibam.iobio that will provide producers and consumers of large scale sequencing studies the means to evaluate the quality of their massive data sets. In order to perform population level genome wide association studies (GWAS) or more focused Mendelian studies on small family pedigrees, the quality of the data must be understood prior to expending large amounts of time and resources on analysis. The multibam.iobio application will visualize high-level statistics allowing outlier samples or data trends to be rapidly identified. More focused real-time analysis will be accessible for all samples. The effectiveness of sequencing projects, from focused somatic variant identification in tumor / normal pairs to population scale GWAS demands consistently high data quality, whether this is the underlying sequence alignments or genetic variants. Multibam.iobio will ensure that all interested parties, regardless of computational experience and resource limitations, can interrogate and fully understand their data.