RI:Small: Neural Architecture Search with Deep Compositional Grammatical Structures
Project runs from 08/01/2019 to 07/31/2022
Explicit interpretability is largely missing in state-of-the-art computer vision and machine learning approaches, especially deep neural networks based methods. The goal of this project is to investigate principled methodologies of learning interpretability-driven models which address accuracy and transparency jointly. We focus on two domains under a unified framework: visual recognition (such as image classification, object detection and tracking), and agent autonomy in general game playing environments (such as ALS-Atari learning system and the Mario domain). We propose to integrate top-down image grammar models and bottom-up deep neural networks end-to-end. The proposed framework aims to rationalize prediction results (e.g., labels in visual recognition and actions in agent autonomy) by unfolding latent semantic configurations in visual inputs/states, i.e., sufficient statistics, in a weakly-supervised or self-supervised way. The project has three objectives. First, we will study a generic method which evaluates the post-hoc interpretability of any pre-trained model. Second, we will develop a novel interpretability-sensitive risk minimization method which learns interpretable models with end-to-end training. Finally, we will evaluate learned interpretable models qualitatively and quantitatively on both publicly available large-scale visual recognition benchmarks (such as ImageNet and COCO) and a proposed urban panorama benchmark for visual historians, and in intelligent game engine learning tasks.