SHF: Small: Collaborative Research: Accelerated Data Transformation: A Software-Hardware Stack for Transducers
Project runs from 10/01/2019 to 09/30/2022
Big Data’s growing importance is evident from broad application to business, public policy, medicine, and research. Many Big Data applications perform frequent data transformations on unstructured data. Data transformations can be mapped onto finite state transducers – a computational model with a solid theoretical foundation. The goal of this work is to design and develop a software stack for transducer processing that supports diverse platforms such as CPU’s, GPU’s, and efficient data-intensive accelerators (such as our Unstructured Data Processor). We summarize our research efforts as follows. First, we will create a high-level interface consisting of sets of production rules that can be mapped on transducers and can support a variety of data transformation operations. This interface will enable specification of flexible, extensible, and composable transducer programs. Second, we will build a sophisticated compiler that maps the high-level programming interface onto the finite state transducer computational model and includes optimization techniques that exploit the properties of this model. This compiler will produce an intermediate representation that can be mapped onto diverse hardware. Third, we will address software challenges involved with mapping the optimized finite state transducers onto a data-intensive accelerator, managing data-parallelism, limited memory, and generation of compact and efficient code that leverages the hardware features of the accelerator, particularly specialized operations. Finally, we will investigate the limitations of the finite state transducer model and extend it so to support more general data transformations performed in modern data analytics system. An example is block-based data compression and decompression, which we have demontrated can be efficiently supported by data-intensive accelerators, but cannot be expressed in the transducer model. We call this model unbounded transducers.