SHF: Small: Collaborative Research: The Automata Programming Paradigm for Genomic Analysis
Project runs from 03/01/2017 to 07/31/2019
Thanks to recent advances in DNA sequencing technology, a number of genomic analysis tasks – such as reference-based and de novo sequence assembly, taxa identifications in metagenomic sequences, orthology inference and regulatory motif search – can nowadays operate on increasingly large volumes of data. All these applications perform, at their core, some kind of pattern matching operations, a computation that maps naturally onto finite automata abstractions. It has been shown that large scale automata processing can be efficiently accelerated on streaming architectures such as Field Programmable Gate Arrays (FPGA). However, the low level programming interface of these devices has hampered their widespread adoption within the bioinformatics community. As an alternative, Micron Technology has recently announced its SDRAM-based Automata Processor (AP), which will come with an automata-based programming interface. However, the position that this emerging technology will take in the realm of existing streaming accelerators is unclear: in particular, its capabilities in handling big data and diverse computations as well as its programmability must still be understood. In this research we aim to study novel programmatic descriptions of several genomic analysis tasks obtained by re-describing each operation using an automata-based programming model, and map such this programming model onto FPGA platforms and onto Micron’s AP. Our goal is two-fold: on one hand, we aim to facilitate the adoption of these accelerators within the scientific community; on the other, we seek to investigate the benefits and limitations of these technologies when targeting a variety of pattern matching operations at large scale.