Deep Learning with Tensorflow – Introduction to Convolutional Networks

Deep Learning with Tensorflow – Introduction to Convolutional Networks


Hello, and welcome! In this video, we will provide an overview
of the convolutional neural network model. Convolutional neural networks, or CNNs, have
gained a lot of attention in the machine learning community over the last few years. This is due to the wide range of applications
where CNNs excel, and you can see a partial list here. In a later part of this video, we’re going
to focus on the object recognition problem, to get an idea of how a CNN’s structure allows
it to extract the elements from an image. So for example, how could a CNN learn to pick
out a person, a dog, and the chairs from the image, even if the chairs are only partially
visible? This is not an easy task for computers, and
it took years of dedicated research to achieve this. Historically, the goal of machine learning
was to ‘Move Humanity closer to the unreachable General Artificial Intelligence’. But not surprisingly, this goal ended up being
lofty and difficult to attain. So what scientists started doing was developing
a series of models and algorithms that excelled in specific tasks. One of these tasks is object recognition,
but even this can be difficult without the right foundation. So the original goal of the CNN was to form
the best possible representation of the visual world in order to support recognition tasks. The CNN solution needed to have two key features. It needed to be able to detect the objects
in the image and place them into the appropriate category. But it also needed to be robust against differences
in pose, scale, illumination, conformation, and clutter. The importance of the second feature can’t
be overstated, since this was historically a huge limitation of hand-coded algorithms. When faced with a difficult problem like this,
it’s sometimes helpful to look at a working example in nature. The mammalian brain is a great example of
an object classifier, and as it turns out, we can mimic the functionality of the brain,
at least partially. But how does a mammalian brain accomplish
this? It starts with the photoreceptors in the retina
receiving information from the outside world. The primary part of the visual cortex, or
V1, uses simple and complex cells to start processing the input. V4 identifies different textures, and the
inferotemporal cortex puts everything together to recognize the object being observed. The main idea is that the process has multiple
layers, different types of cells, and increasingly complex functionality. We can use this as inspiration to develop
a high-level approach to the object recognition problem. We start with an image, we extract a few primitive
features, we combine the features together to form the parts of the object, and then
we combine the parts together to form the object itself. The Convolutional Neural Network was developed
from this sequence of steps. Let’s walk through this process with a real-world
example to get a better idea of how this all works. Take a look at this image here. Let’s say we wanted a network to be able to
recognize one of the buildings. It first helps to carefully consider how the
scene is organized. At the highest level we simply have an image
of the city. But we can start to extract oriented edges,
we can join them to form doors and windows, and then put those parts together to form
a building. It’s important now to remember how our biological
system would process this image. Like we discussed before, we’d start by taking
in the image via the retina, and then allowing increasingly complex parts of our brain to
extract features and ultimately form an object. So now that we see the biological system’s
method for recognizing objects, we can briefly discuss how a CNN might use this type of processing. Consider the building in the lower right hand
corner of this image. This is the raw image that a CNN would receive
as input. Some of the best primitive features for a
building would be horizontal lines, and vertical lines. By combining these lines together, the net
can start to form the component parts like windows and the building’s external shape. It can then use these basic parts to form
the complete object. So at this point, you should have a basic
understanding of the motivation behind a convolutional neural network, and how they might be used
in an application. Thank you for watching this video.

2 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *