What convolutional neural networks see

What convolutional neural networks see

neural networks and in particular convolutional neural networks have been at the heart of many recent research projects with an artistic flavor some of the better known ones have been deep dream a neural algorithm artistic style or style transfer deep generator networks and most recently wave nets which learn to generate audio they've also been found within many practical applications so everything from self-driving cars to speech to text systems and a eyes that can play the game of Go the recent success comes from an ability to accurately recognize and describe images but the way they do this remains a mystery to most people we can get a few intuitions about what's really going on by inspecting them looking inside them and seeing how they see the world what you're looking at is a neural network which is processing my webcam in real-time it scans the image looking for patterns or what we call features the patterns look like these so some are lines or edges or gradients just really minimal multi pixel patterns things like that and these responses which we sometimes call feature Maps or activations show us the presence of those features inside of the image so in the first layer of the network we've discovered edges gradients and patterns like that things get a little more interesting when we repeat this process many times through a sequence of layers at each layer of the network we take the feature Maps from the previous layer stack them together into a new volume of data and do another round of convolution on top of them so the activation Maps in the second layer which we're looking at here are more interesting because rather than looking for patterns from the raw pixels of the original image we're now looking for patterns from the activation maps of the previous layer of the network so for example it might be able to combine vertical edges and horizontal edges to detect corners which we can think of as higher-level features as we do this process many times progressing through every layer of the network we acquire higher and higher level features or representation of the image so we go from things like edges and gradients to corners and grids to get progressively more complex features may be things like leaves or fences or door handles to get even higher level features houses cause people and so on this process of pushing data through the network over many layers of transformations is why these algorithms are sometimes called deep neural networks or deep learning the deep just means the network has many layers we finally arrive at the last layer of network we have this compact representation of the content inside of the image and we can attach one more classification layer on top of that so that we can describe accurately what's inside the image so for example if I place my phone in front of the camera it'll go ipod or if I place this water bottle in front of the camera it'll accurately detect a water bottle now it can be a bit hard to understand what the feature detectors are looking for but it turns out that there are ways and there has been some work done in the past some of the first work came from Rob Tyler and Matt Fergus in 2013 where they showed patches of actual images which caused certain feature detectors to light up another nice resource is deep visualization toolbox which was made by Jason us in Seattle and was a major inspiration for the visualization software that you saw in the last slide if you're interested in learning more about how these impressive algorithms work or even getting your hands dirty and working with them yourself using a series of practical guides and tutorials I encourage you to check out ml for a github that IO which is an in-progress free online book about machine learning for artists


  • If people understood Youd be Over in a Day says:

    Teach the neural network to label innocent actions as imminent threats, baking cookies? More like murdering people with a hatchet, threat detected, eliminate now.

  • Raghu R says:

    Woww!! Unique video on the net. Where live video was displayed with the filters, layers and convolutions… very helpful for visualizing how neural networks work… like machine learning this is for "Human Learning" 🙂 Good one!

  • 이총명 says:

    amazing! thank you for great video

  • mark wright says:

    such a great clear video thanks!

  • dumbcreaknuller says:

    thinking is not consciousness. the self comes from the center of the brain and not from the brain lobes. the brain lobes take care of the thinking part but the center of the brain take care of the conscious part. the thinking processor complement consciousness. deep learning is already thinking but is not aware that its thinking since it does not have a center of self to act on.

  • Unknown Entity says:

    So, the reason it's called "deep" learning is simply because it has more than one layer…

  • cashphattichaddi says:


  • Al Bencomo says:

    Gene, why doesn't detect you as a person/human? It recognizes the iphone and bottle, but it classifies you as a "bow tie". why is that?

  • samu says:


  • 변정현 says:

    Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." European Conference on Computer Vision. Springer International Publishing, 2014.

  • Gabriel Santana says:

    This is amazing!

  • Hugo Passarinho says:

    Hello Gene – can you share what setup you were using for this ?

  • Victor Ponomarenko says:


Leave a Reply

Your email address will not be published. Required fields are marked *