Trustworthy Machine Learning Paper Indexpage
Paper [1]:
White-box neural network attack, adversaries have full access to the model. Using Gradient Descent going back to update the input so that reconstructing the original training data.
About black-box attack, they mentioned using numeric gradient approximation.
Question: If the model does not overfit the dataset, cannot recover the training data.
Paper [2]:
Proposed black-box attack via online ML-as-a-S platform, targeting to extract parameters from simple structures by solving equations. Condifence values is the key to solve these equations.
Question: However, this method seems like brute force, and it would be tough when the type and structure of model are unknown or really complex. Ex. they query 10,000 times to steal a neural network, which will be identified as hacking activity in real environment. (or too expensive to query online service)
Paper [3]: Practical Black-Box Attacks against Machine Learning
The attack strategy consists in training a local model to substitute for the target DNN,using inputs synthetically generated by an adversary andlabeled by the target DNN. It uses the local substitute to craft adversarial examples, and find that they are misclassified by the targeted DNN. The attackers can only observe the labels of the model.
a. Use Jacobian-based dataset augmentation to train the local substitute:
b. Adversarial example crafting: Fast Gradient Sign Method (FGSM)
Paper [4]:EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES
Early attempts at explaining this phenomenon focused on nonlinearity and overfitting. We argue instead that the primary cause of neural networks’ vulnerability to adversarial perturbation is their linear nature. Linear behavior in high-dimensional spaces is sufficient to cause adversarial examples. If we add a small perturbation on the original input x:
in which η is small enough, so that:
In a linear model, the output will increase by:
Because of the L infinite constraint, the maximum increase caused by the perturbation is to assign each element in η with absolute value Ɛ. The sign of η is same as the sign of ω, so that the weighted perturbation will increase the value of the equation.
For a non-linear model, the weight cannot be derived directly. However, if we assume it is linear somehow, it can be calculated by taking derivative of the cost function. So using the same idear of adding perturbation on linear model. The adversarial perturbation is:
In many cases, a wide variety of models with different architectures trained on different subsets of the training data misclassify the same adversarial example. This suggests that adversarial examples expose fundamental blind spots in our training algorithms. On some datasets, such as ImageNet (Deng et al., 2009), the adversarial examples were so close to the original examples that the differences were indistinguishable to the human eye.
These results suggest that classifiers based on modern machine learning techniques, even those that obtain excellent performance on the test set, are not learning the true underlying concepts that determine the correct output label. Instead, these algorithms have built a Potemkin village that works well on naturally occuring data, but is exposed as a fake when one visits points in space that do not have high probability in the data distribution.
We mainly study on the relationship between parameters and outputs on Machine Learning courses, they are non-linear. How I make sense of it is that we can arbitrarily change parameters to generate a really non-linear output. However, when the parameters are fixed, the relationship between input and output is much linear than we expected. And in his reply on a forum, a graph is given to show how outputs of a model change as the perturbations change.
Paper [5]: ADVERSARIAL EXAMPLES IN THE PHYSICAL WORLD
In this paper, the possibility of creating adversarial examples for machine learning systems which operate in the physical world is explored. Images taken from a cell-phone camera are used as an input to an Inception v3 image classification neural network. It can successfully fool the model.
Demo address:
https://www.youtube.com/watch?v=zQ_uMenoBCk
The pipeline is changed, it is not intercept from the input of the neural network. It modify the physical data.
References:
[1] M. Fredrikson, S. Jha and T. Ristenpart, "Model inversion attacks that exploit confidence information and basic countermeasures," in 2015, . DOI: 10.1145/2810103.2813677.
[2] Florian Tramer, Fan Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. Stealing machine learning models via prediction apis. In 25th USENIX Security Symposium, USENIX Security 16, Austin, TX, USA, August 10-12, 2016., pages 601-618, 2016. Presentation: https://www.youtube.com/watch?time_continue=26&v=qGjzmEzPkiI
[3] Papernot, Nicolas, et al. Practical Black-Box Attacks Against Machine Learning. Feb. 2016. https://arxiv.org/pdf/1602.02697.pdf
[4]Goodfellow, Ian J., et al. Explaining and Harnessing Adversarial Examples. Dec. 2014.
-Goodfellow’s lecture on Adversarial Machine Learning:https://www.youtube.com/watch?v=CIfsB_EYsVI&t=1750s
-Deep Learning Adversarial Examples – Clarifying Misconceptions: https://www.kdnuggets.com/2015/07/deep-learning-adversarial-examples-misconceptions.html
-https://towardsdatascience.com/perhaps-the-simplest-introduction-of-adversarial-examples-ever-c0839a759b8d
-https://towardsdatascience.com/know-your-adversary-understanding-adversarial-examples-part-1-2-63af4c2f5830
[5] Kurakin, Alexey, et al. Adversarial Examples in the Physical World. July 2016.