SHF: Medium: Collaborative Research: A Comprehensive Methodology to Pursue Reproducible Accuracy in Ensemble Scientific Simulations on Multi- and Many-Core Platforms
Project runs from 01/01/2017 to 05/31/2020
The overarching goal of this project is to tackle reproducibility problems due to the use of floating point arithmetic in scientific simulations running on parallel platforms that include multicore processors coupled with many-core accelerators. Specifically, the project encompasses two major goals/activities:
First, identify common sources of accuracy errors and study their accumulation, propagation, and runtime effects in a controlled environment. This phase includes three research activities: (i) modeling into code motifs those computations that may lead to accuracy errors; (ii) providing multiple implementations of these motifs, which we call code inspectors, targeting different parallel platforms; and (iii) evaluating the accuracy and runtime of these implementations using a variety of datasets and stress conditions.
Second, install these code inspectors in real scientific code bases and, thus, study their behavior in uncertain environments. This phase includes two research activities: (i) prioritizing code segments based on quantitative impact scores and matching segments to inspector motifs and (ii) finding the optimal code inspector implementations and patching the code with them so as to optimize the overall result variance.