PhD Dissertation Defense
Backdoor Attacks in Computer Vision: Towards Adversarially Robust Machine Learning Models
PhD Candidate: Aniruddha Saha (Computer Science)
12-2 ET Friday, 18 November 2022, WebEx
Committee: Drs. Hamed Pirsiavash (Advisor/Co-Chair), Anupam
Joshi (Chair), Tim Oates, Tom Goldstein (UMCP), Dr. Pin-Yu Chen (IBM)
Deep Neural Networks (DNNs) have become the standard building block in numerous machine learning applications, including computer vision, speech recognition, machine translation, and robotic manipulation, achieving state-of-the-art performance on complex tasks. The widespread success of these networks has driven their deployment in sensitive domains like health care, finance, autonomous driving, and defense-related applications.
However, DNNs are vulnerable to adversarial attacks. An adversary is a person with malicious intent whose goal is to disrupt the normal functioning of a machine learning pipeline. Research has shown that an adversary can tamper with the training process of a model by injecting misrepresentative data (poisons) into the training set. Moreover, if provided control over the training process as a third party, they can deliver a model to a victim, which deviates from normal behavior. These are called backdoor attacks. The manipulation is done in a way that the victim's model will malfunction only when a trigger is pasted on a test input. For instance, a backdoored model in a self-driving car might work accurately for days before it suddenly fails to detect a pedestrian when the adversary decides to exploit the backdoor. Vulnerability to backdoor attacks is dangerous when deep learning models are deployed in safety-critical applications.
This dissertation studies ways in which state-of-the-art deep learning methods for computer vision are vulnerable to backdoor attacks and proposes defense methods to remedy the vulnerabilities. We push the limits of our current understanding of backdoors and address the following research questions.
Can we design practical backdoor attacks? We propose Hidden Trigger Backdoor Attack - a novel clean-label backdoor attack where the poisoned images do not contain a visible trigger. This enables the attacker to keep the trigger hidden until its use at test-time. We believe such practical attacks reveal an important vulnerability of deep learning models. These flaws need to be studied extensively before deploying models in critical real-world applications.
Is it secure to train models on large-scale public data? Self-supervised learning (SSL) methods for vision have utilized large-scale unlabeled data to learn rich visual representations. These methods use public images downloaded from the web, e.g., Instagram-1B and Flickr image datasets. We show that if a small part of the unlabeled training data is poisoned, SSL methods are vulnerable to backdoor attacks. Backdoor attacks are more practical in self-supervised learning, since the use of large unlabeled data makes data inspection to remove poisons prohibitive. Hence, using large and diverse data to remove data biases and reduce labeling costs might unknowingly set up avenues for adversarial manipulation. Practitioners must make informed choices when choosing training data for machine learning models.
Can we design efficient and generalizable backdoor detection methods? Methods to detect backdoors in trained models have relied on analysis of model behavior on carefully chosen inputs. But often, this analysis has to be repeated for a new model. We propose a backdoor detection method that optimizes for a set of images which, when forwarded through any model, indicates successfully whether the model contains a backdoor. Our "litmus" test for backdoored models improves on state-of-the-art methods without requiring access to clean data during detection. It is computationally efficient and generalizes to new triggers as well as new architectures. We effectively detect backdoor attacks on thousands of networks with different architectures trained on four benchmark datasets, namely the German Traffic Sign Recognition Benchmark (GTSRB), MNIST, CIFAR10, and Tiny-ImageNet. This will serve as a benchmark for future research.
Do architectural design choices contribute to backdoor robustness? We find an intriguing difference between Vision Transformers and Convolutional Nets; interpretation algorithms effectively highlight the backdoor trigger on test images for transformers but not for CNNs when attacked with Hidden Trigger Backdoor Attacks. Based on this observation, we find that a test-time image blocking defense reduces attack success rate by a large margin for transformers. We show that such blocking mechanisms can be incorporated during the training process to improve robustness even further. We believe our findings will encourage the community to make better design choices in developing novel architectures robust to backdoor attacks.