Fast Biosequence Annotation via Reconfigurable Hardware

Period of Performance: 09/18/2007 - 08/31/2008


Phase 2 STTR

Recipient Firm

Becs Technology, Inc.
Saint Louis, MO 63132
Principal Investigator


DESCRIPTION (provided by applicant): Databases of biological sequences have proven valuable for understanding the organization of human and other genomes, for unraveling the etiology of genetic disorders, and for studying medically important pathogens. Unfortunately, these databases are growing at an exponential rate, posing a severe problem for bio-sequence annotation. Specialized hardware implementations of sequence comparison can dramatically accelerate BLAST and other algorithms used to annotate sequences. However, the utility of these hardware accelerators is limited by inflexible functionality, lack of a clear upgrade path to faster components, and limitations on the rate at which bio-sequence data can be streamed to the comparison logic. This proposal seeks to construct a novel hardware-based bio-sequence accelerator, the "smart disk" engine that addresses limitations of existing accelerators. The proposed engine combines the flexibility of reconfigurable FPGA logic, which can be reprogrammed at will and easily upgraded while running at hardware speeds, with an innovative architecture that ties the comparison logic closely to an array of hard disks, guaranteeing massive data bandwidth into the comparison hardware. The smart disk engine is designed to accelerate all stages of BLAST-like similarity search algorithms, not just the final Smith- Waterman stage. Phase I of this fast-track STTR application proposes to build the initial prototype of the smart disk engine, to implement comparison logic mirroring the stages of the widely used BLAST pipeline, to construct a transparent software front end to the engine, and finally to evaluate the performance of the prototype on large-scale tasks in bio-sequence annotation. Key innovations in this phase will be the integration of FPGA logic with the mass storage system, modeling the performance of the new architecture, and development of software control that can rapidly reprogram the FPGAs to construct comparison pipelines optimized for different types of comparison (BLASTN, BLASTP, etc). At the end of this phase, the combined hardware and software of the smart disk prototype should implement at least BLASTN- and BLASTP-like computations, run at least 30x faster than 2003-era commodity general-purpose processors, and successfully hide the complexity of the engine's hardware from the biological end user.