Adversarial Attacks on Neural Networks – Bug or Feature?

Adversarial Attacks on Neural Networks – Bug or Feature?


Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. This will be a little non-traditional video
where the first half of the episode will be about a paper, and the second part will be
about…something else. Also a paper. Well, kind of. You’ll see. We’ve seen in the previous years that neural
network-based learning methods are amazing at image classification, which means that
after training on a few thousand training examples, they can look at a new, previously
unseen image and tell us whether it depicts a frog or a bus. Earlier we have shown that we can fool neural
networks by adding carefully crafted noise to an image, which we often refer to as an
adversarial attack on a neural network. If done well, this noise is barely perceptible
and, get this, can fool the classifier into looking at a bus and thinking that it is an
ostrich. These attacks typically require modifying
a large portion of the input image, so when talking about a later paper, we were thinking,
what could be the lowest number of pixel changes that we have to perform to fool a neural network? What is the magic number? Based on the results of previous research
works, an educated guess would be somewhere around a hundred pixels. A followup paper gave us an unbelievable answer
by demonstrating the one pixel attack. You see here that by changing only one pixel
in an image that depicts a horse, the AI will be 99.9% sure that we are seeing a frog. A ship can also be disguised as a car, or,
amusingly, with a properly executed one-pixel attack, almost anything can be seen as an
airplane by the neural network. And, this new paper discusses whether we should
look at these adversarial examples as bugs or not, and of course, does a lot more than
that! It argues that most datasets contain features
that are predictive, meaning that they provide help for a classifier to find cats, but also
non-robust, which means that they provide a rather brittle understanding that falls
apart in the presence adversarial changes. We are also shown how to find and eliminate
these non-robust features from already existing datasets and that we can build much more robust
classifier neural networks as a result. This is a truly excellent paper that sparked
quite a bit of discussion. And here comes the second part of the video
with the something else. An interesting new article was published within
the Distill journal, a journal where you can expect clearly worded papers with beautiful
and interactive visualizations. But this is no ordinary article, this is a
so-called discussion article where a number of researchers were asked to write comments
on this paper and create interesting back and forth discussions with the original authors. Now, make no mistake, the paper we’ve talked
about was peer-reviewed, which means that independent experts have spent time scrutinizing
the validity of the results, so this new discussion article was meant to add to it by getting
others to replicate the results and clear up potential misunderstandings. Through publishing six of these mini-discussions,
each of which were addressed by the original authors, they were able to clarify the main
takeaways of the paper, and even added a section of non-claims as well. For instance, it’s been clarified that they
don’t claim that adversarial examples arise from software bugs. A huge thanks to the Distill journal and all
the authors who participated in this discussion, and Ferenc Huszár, who suggested the idea
of the discussion article to the journal. I’d love to see more of this, and if you
do too, make sure to leave a comment so we can show them that these endeavors to raise
the replicability and clarity of research works are indeed welcome. Make sure to click the link to both works
in the video description, and spend a little quality time with them. You’ll be glad you did. I think this was a more complex than average
paper to talk about, however, as you have noticed, the usual visual fireworks were not
there. As a result, I expect this to get significantly
fewer views. That’s not a great business model, but no
matter, I made this channel so I can share with you all these important lessons that
I learned during my journey. This has been a true privilege and I am thrilled
that I am still able to talk about all these amazing papers without worrying too much whether
any of these videos will go viral or not. Videos like this one are only possible because
of your support on Patreon.com/TwoMinutePapers. If you feel like chipping in, just click the
Patreon link in the video description. This is why every video ends with, you know
what’s coming… Thanks for watching and for your generous
support, and I’ll see you next time!

64 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *