CMX Lunch Seminar
With the increasing incentive to employ deep learning techniques in real-life scenarios, one naturally questions their reliability. For image classification, a well-known phenomenon called adversarial examples shows how small, humanly imperceptible input-perturbations can change the output of a neural network completely. This insight formed the field of adversarial robustness, which we explore in this talk. We discuss how regularizing the standard training objective with Lipschitz and TV regularization terms can lead to resilient neural networks.
Furthermore, we also explore the adversarial attack problem. We derive an associated gradient-flow for the so-called fast gradient sign method, which is commonly used to find malicious input-perturbations. Here, we work in an abstract metric setting, where we then highlight the distributional Wasserstein case, which relates back to the robustness problem. Finally, we also consider the attack problem in a realistic closed-box scenario, where we employ gradient-free optimizers.