Convolutional neural networks (CNNs) have shown excellent performance in many challenging image recognition tasks. However, it is often not clear what traces a CNN has learned to exploit. One particular risk is that the CNN does not focus on the intended high-level features such as object shapes but instead exploits less obvious statistical patterns. As a consequence, the CNN might behave unexpectedly and fail to generalize to other images. To avoid such unexpected failures, it is important to understand what traces a CNN is likely to rely on.
Since the computations inside a CNN are difficult to comprehend, we will treat the CNN as a black box and study its generalization capabilities to controlled variations in the input images. The primary goal of this project is to develop an image rendering pipeline for generating a synthetic image classification dataset. This pipeline should be based on a rendering toolkit such as Blender, Unity, or Unreal Engine. The rendering pipeline should allow varying several properties of the objects to be classified (e.g., size, shape, color, texture) as well as the image acquisition settings (e.g., lens distortion, sensor noise, image compression). The student will train a CNN on synthetic images where the displayed objects can be classified based on several properties. By systematically varying individual characteristics of the scene, we can get a better understanding of the properties the CNN has learned to rely on for classification.