Below is a video of my Ph. D. dissertation defense, which concerns primarily the paper Reliable Classification Explanations via Adversarial Attacks on Robust Networks, which has a pre-print available on ArXiv. In industry fields where accountability matters, it is important to have machine learning solutions which produce stable, explainable results. Through new mathematical techniques, such as a stochastic Lipschitz constraint, and designing new mechanisms for Neural Networks (NNs), such as a Half-Huber Rectified Linear Unit, I was successful in creating NNs 2.4x more resistant to adversarial examples than the previous state-of-the-art on the 1000-class ImageNet classification problem, while retaining the same accuracy on clean data with a smaller network. Explanations generated using AE possessed very discernible features, with a more obvious interpretation than prior, heatmap-based explanations. Furthermore, I demonstrated that the new adversarial examples produced by AE could be annotated and fed back into the training process, yielding improved adversarial resistance through a Human-In-The-Loop pipeline.
Full abstract:
This dissertation concerns methods for improving the reliability and quality of explanations for decisions based on Neural Networks (NNs). NNs are increasingly part of state-of-the-art solutions for a broad range of fields, including biomedical, logistics, user-recommendation engines, defense, and self-driving vehicles. While NNs form the backbone of these solutions, they are often viewed as ``black box’’ solutions, meaning the only output offered is a final decision, with no insight into how or why that particular decision was made. Prior methods of explaining NN decisions have been proposed, but suffer from two flaws identified by this dissertation. The first flaw is that existing explanation methods are very unstable, a result of adversarial examples. The second flaw arises as all state-of-the-art methods of explaining an NN’s decisions utilize heatmaps, highlighting areas considered relevant. An algorithm that can draw a circle around a cat does not necessarily know that it is looking at a cat; it only recognizes the existence of a salient object.
Sensory Relevance Model (SRM) is the umbrella term used by this dissertation to refer to methods of explanation which avoid these flaws by leveraging the full sensory domain of the input to demonstrate relevance. Two methods of achieving SRMs were studied: network bisection and Adversarial Explanations (AE). The network bisection SRM was motivated by previous studies on sparsity and the Locally Competitive Algorithm (LCA). It was closely related to attention models, but with the complete erasure of some input elements at an intermediate point in the network, and was trained via an auxiliary loss function. It was found that network bisection was not a suitable means of addressing the previously mentioned flaws. AE was found to be very effective.
Through new mathematical techniques, such as a stochastic Lipschitz constraint, and designing new mechanisms for NNs, such as a Half-Huber Rectified Linear Unit, this dissertation was successful in creating NNs 2.4x more resistant to adversarial examples than the previous state-of-the-art on the 1000-class ImageNet classification problem, while retaining the same accuracy on clean data with a smaller network. Explanations generated using AE possessed very discernible features, with a more obvious interpretation than heatmap-based explanations. Furthermore, it was demonstrated that the new adversarial examples produced by AE could be annotated and fed back into the training process, yielding improved adversarial resistance through a Human-In-The-Loop pipeline.
At this time, AE is an unparalleled technique, producing more reliable, higher-quality explanations for image classification decisions than were previously available. It is the hope of the author that this work provides a basis for future work in the realms of both adversarial resistance and explainable NNs, making algorithms more reliable for industry fields where accountability matters, such as biomedical or autonomous vehicles.