Deep Convolutional Neural Networks for Image Classification

Many slides from Lana Lazebnik, Rob Fergus, Andrej Karpathy

Deep learning • Learn a feature hierarchy all the way from pixels to classifier • Each layer extracts features from the output of previous layer • Train all layers jointly

Image/ Video Pixels

Layer 1

Layer 2

Layer 3

Simple Classifier

Linear classifiers revisited • When the data is linearly separable, there may be more than one separator (hyperplane)

Which separator is best?

Perceptron From Wikipedia: In machine learning, the perceptron is an algorithm for supervised learning ofbinary classifiers: functions that can decide whether an input (represented by a vector of numbers) belongs to one class or another.[1] It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector. The algorithm allows for online learning, in that it processes elements in the training set one at a time.

Perceptron Input x1 x2

Weights w1 w2

Output: y=sgn(wx + b) x3 . . . xD

w3

wD

Can incorporate bias as component of the weight vector by always including a feature with value set to 1

Loose inspiration: Human neurons From Wikipedia: At the majority of synapses, signals are sent from the axon of one neuron to a dendrite of another... All neurons are electrically excitable, maintaining voltage gradients across their membranes… If the voltage changes by a large enough amount, an all-or-none electrochemical pulse called an action potential is generated, which travels rapidly along the cell's axon, and activates synaptic connections with other cells when it arrives.

Perceptron update rule • Initialize weights randomly • Cycle through training examples in multiple passes (epochs) • For each training instance x with label y: • • • • •

Classify with current weights: y’ = sgn(wx) Update weights: w  w + α(y-y’)x α is a learning rate that should decay as 1/t (t is the epoch) What happens if y’ is correct? Otherwise, consider what happens to individual weights wi  wi + α(y-y’)xi – If y = 1 and y’ = -1, wi will be increased if xi is positive or decreased if xi is negative  wx will get bigger – If y = -1 and y’ = 1, wi will be decreased if xi is positive or increased if xi is negative  wx will get smaller

Convergence of perceptron update rule • Linearly separable data: converges to a perfect solution

• Non-separable data: converges to a minimum-error solution assuming learning rate decays as O(1/t) and examples are presented in random sequence

Multi-Layer Neural Networks • Network with a hidden layer:

• Can represent nonlinear functions (provided each perceptron has a nonlinearity)

Multi-Layer Neural Networks

Source: http://cs231n.github.io/neural-networks-1/

Multi-Layer Neural Networks • Beyond a single hidden layer:

Figure source: http://cs231n.github.io/neural-networks-1/

Training of multi-layer networks •

Find network weights to minimize the error between true and estimated labels of training examples: N

E(w) = å( y j - fw (x j ))

2

j=1



Update weights by gradient descent:

w1

E w  w  w

w2

Training of multi-layer networks •

Find network weights to minimize the error between true and estimated labels of training examples: N

E(w) = å( y j - fw (x j ))

2

j=1

E w  w  w



Update weights by gradient descent:



This requires perceptrons with a differentiable nonlinearity

Sigmoid:

g(t) =

1 1+ e-t

Rectified linear unit (ReLU): g(t) = max(0,t)

Training of multi-layer networks •

Find network weights to minimize the error between true and estimated labels of training examples: N

E(w) = å( y j - fw (x j ))

2

j=1

E w  w  w



Update weights by gradient descent:



Back-propagation: gradients are computed in the direction from output to input layers and combined using chain rule Stochastic gradient descent: compute the weight update w.r.t. one training example (or a small batch of examples) at a time, cycle through training examples in random order in multiple epochs



Multi-Layer Network Demo

http://playground.tensorflow.org/

Neural networks: Pros and cons • Pros • Flexible and general function approximation framework • Can build extremely powerful models by adding more layers

• Cons • Hard to analyze theoretically (e.g., training is prone to local optima) • Huge amount of training data, computing power may be required to get good performance • The space of implementation choices is huge (network architectures, parameters)

Neural networks for images feature map

weight mask

image

convolutional layer

Neural networks for images

image

convolutional layer

Convolution as feature extraction

. . .

Input

Feature Map

Convolutional Neural Networks • •

• •

Neural network with specialized connectivity structure Stack multiple stages of feature extractors Higher stages compute more global, more invariant features Classification layer at the end

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE 86(11): 2278–2324, 1998.

Biological inspiration • D. Hubel and T. Wiesel (1959, 1962, Nobel Prize 1981) •

Visual cortex consists of a hierarchy of simple, complex, and hyper-complex cells

Source

Convolutional Neural Networks Feature maps

Normalization

Spatial pooling

Non-linearity

Convolution (Learned)

Input Image

Convolutional Neural Networks Feature maps

Normalization

Spatial pooling

Non-linearity

. . .

Convolution (Learned)

Input Image

Input

Feature Map

Convolutional Neural Networks Feature maps

Normalization

Spatial pooling

Non-linearity

Convolution (Learned)

Input Image

Convolutional Neural Networks Feature maps

Normalization

Spatial pooling

Non-linearity

Convolution (Learned)

Input Image

Max

Convolutional Neural Networks Feature maps

Normalization

Spatial pooling

Non-linearity

Convolution (Learned)

Input Image

Feature Maps

Feature Maps After Contrast Normalization

Convolutional Neural Networks Feature maps

Normalization

Spatial pooling

Non-linearity

Convolution (Learned)

Input Image

Convolutional filters are trained in a supervised manner by back-propagating classification error

Simplified architecture

Softmax layer:

P(c | x) =

exp(w c × x) C

å exp(w k=1

k

× x)

Compare: SIFT Descriptor Lowe [IJCV 2004]

Image Pixels

Apply oriented filters

Take max filter response (L-inf normalization)

Spatial pool (Sum), L2 normalization

Feature Vector

Compare: Spatial Pyramid Matching SIFT features

Filter with Visual Words

Lazebnik, Schmid, Ponce [CVPR 2006]

= k-means Take max VW response (L-inf normalization)

Multi-scale spatial pool (Sum)

Global image descriptor

AlexNet • Similar framework to LeCun’98 but: • Bigger model (7 hidden layers, 650,000 units, 60,000,000 params) • More data (106 vs. 103 images) • GPU implementation (50x speedup over CPU) • Trained on two GPUs for a week

A. Krizhevsky, I. Sutskever, and G. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012

Using CNN for Image Classification Fully connected layer Fc7 d = 4096

AlexNet

Averaging Fixed input size: 224x224x3 d = 4096

“Jia-Bin” Softmax Layer P(c | x) =

exp(w c × x) C

å exp(w k=1

k

× x)

ImageNet Challenge Validation classification

Validation classification Validation classification

• ~14 million labeled images, 20k classes • Images gathered from Internet • Human labels via Amazon MTurk • Challenge: 1.2 million training images, 1000 classes

www.image-net.org/challenges/LSVRC/

ImageNet Challenge 2012-2014 Team

Year

Place

Error (top-5)

External data

SuperVision – Toronto (7 layers)

2012

-

16.4%

no

SuperVision

2012

1st

15.3%

ImageNet 22k

Clarifai – NYU (7 layers)

2013

-

11.7%

no

Clarifai

2013

1st

11.2%

ImageNet 22k

VGG – Oxford (16 layers)

2014

2nd

7.32%

no

GoogLeNet (19 layers)

2014

1st

6.67%

no

Human expert*

5.1%

http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/

ImageNet Challenge 2015

Deep Residual Nets

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Deep Residual Learning for Image Recognition, arXiv 2015

Deep Residual Nets

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Deep Residual Learning for Image Recognition, arXiv 2015

Deep learning packages • • • • • •

Caffe Torch Theano TensorFlow Matconvnet …

http://deeplearning.net/software_links/

Understanding Neural Nets

M. Zeiler and R. Fergus, Visualizing and Understanding Convolutional Networks, arXiv preprint, 2013

Map activation back to the input pixel space What input pattern originally caused a given activation in the feature maps?

Visualizing and Understanding Convolutional Networks [Zeiler and Fergus, ECCV 2014]

Layer 1

Visualizing and Understanding Convolutional Networks [Zeiler and Fergus, ECCV 2014]

Layer 2

Visualizing and Understanding Convolutional Networks [Zeiler and Fergus, ECCV 2014]

Layer 3

Visualizing and Understanding Convolutional Networks [Zeiler and Fergus, ECCV 2014]

Layer 4 and 5

Visualizing and Understanding Convolutional Networks [Zeiler and Fergus, ECCV 2014]

Breaking CNNs

http://arxiv.org/abs/1312.6199 http://karpathy.github.io/2015/03/30/breaking-convnets/

Breaking CNNs

http://arxiv.org/abs/1412.1897 http://karpathy.github.io/2015/03/30/breaking-convnets/

What is going on? • Recall gradient descent training: modify the weights to reduce classifier error E w  w  w

• Adversarial examples: modify the image to increase classifier error ¶E x ¬ x +a ¶x http://arxiv.org/abs/1412.6572 http://karpathy.github.io/2015/03/30/breaking-convnets/

What is going on?

x

¶E ¶x

¶E x ¬ x +a ¶x

http://arxiv.org/abs/1412.6572 http://karpathy.github.io/2015/03/30/breaking-convnets/

Fooling a linear classifier • Perceptron weight update: add a small multiple of the example to the weight vector: w  w + αx • To fool a linear classifier, add a small multiple of the weight vector to the training example: x  x + αw

Fooling a linear classifier

http://karpathy.github.io/2015/03/30/breaking-convnets/

Google DeepDream • Modify the image to maximize activations of units in a given layer

https://github.com/google/deepdream/blob/master/dream.ipynb

Labeling Pixels: Semantic Labels

Pixel level loss function

Fully Convolutional Networks for Semantic Segmentation [Long et al. CVPR 2015]

Labeling Pixels: Semantic Labels

Transforming fully connected layers into convolution layers enables a classification net to output a heatmap. Adding layers and a spatial loss (as in Figure 1) produces an efficient machine for end-to-end dense learning Fully Convolutional Networks for Semantic Segmentation [Long et al. CVPR 2015]

Labeling Pixels: Semantic Labels Pixel classification is based on multi-level hypercolumns

Fully Convolutional Networks for Semantic Segmentation [Long et al. CVPR 2015]

Labeling Pixels: Semantic Labels

Fully Convolutional Networks for Semantic Segmentation [Long et al. CVPR 2015]

Labeling Pixels: Semantic Labels

Fully Convolutional Networks for Semantic Segmentation [Long et al. CVPR 2015]

Labeling Pixels: Edge Detection Classification branch

Canny to detect candidate locations

Avg. Extract patches at 4 scales 5 layers AlexNet

Regression branch

DeepEdge: A Multi-Scale Bifurcated Deep Network for Top-Down Contour Detection [Bertasius et al. CVPR 2015]

Classification vs. Regression

Edge detection results

Forty years of contour detection

Roberts (1965)

Sobel (1968)

Prewitt (1970)

Marr Hildreth (1980)

Canny (1986)

Perona Malik (1990)

Martin Fowlkes Malik (2004)

Maire Arbelaez Fowlkes Malik (2008)

Dollar Zitnick (2013)

Bertasi us (2015)

CNN for Image Restoration/Enhancement

Super-resolution [Dong et al. ECCV 2014]

Non-blind deconvolution [Xu et al. NIPS 2014]

Non-uniform blur estimation [Sun et al. CVPR 2015]

24 - Convolutional Neural Networks.pdf

Sign in. Loading… Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying.

4MB Sizes 3 Downloads 240 Views

Recommend Documents

Convolutional Neural Network Committees For Handwritten Character ...
Abstract—In 2010, after many years of stagnation, the ... 3D objects, natural images and traffic signs [2]–[4], image denoising .... #Classes. MNIST digits. 60000. 10000. 10. NIST SD 19 digits&letters ..... sull'Intelligenza Artificiale (IDSIA),

Inverting face embeddings with convolutional neural networks
Jul 7, 2016 - of networks e.g. generator and classifier are training in parallel. ... arXiv:1606.04189v2 [cs. ... The disadvantage, is of course, the fact that.

Locally-Connected and Convolutional Neural ... - Research at Google
Sep 6, 2015 - can run in real-time in space-constrained mobile platforms. Our constraints ... sponds to the number of speakers in the development set, N. Each input has a ..... the best CNN model have the same number of parameters and.

A Radial Neural Convolutional Layer for Multi-oriented Character ...
average recognition rate for multi-oriented characters is 93.10% ..... [14] U. Pal, F. Kimura, K. Roy, and T. Pal, “Recognition of English multi- oriented characters ...

Convolutional Neural Networks for Eye Detection in ...
Convolutional Neural Network (CNN) for. Eye Detection. ▫ CNN has ... complex features from the second stage to form the outputs of the network. ... 15. Movie ...

Convolutional Neural Networks for Small ... - Research at Google
Apple's Siri, Microsoft's Cortana and Amazon's Alexa, all uti- lize speech recognition to interact with these systems. Google has enabled a fully hands-free ...

Attention-Based Convolutional Neural Network for ...
the document are irrelevant for a given question. .... Feature maps for phrase representations pi and the max pooling steps that create sentence representations.

Interactive Learning with Convolutional Neural Networks for Image ...
data and at the same time perform scene labeling of .... ample we have chosen to use a satellite image. The axes .... For a real scenario, where the ground truth.

lecture 17: neural networks, deep networks, convolutional ... - GitHub
As we increase number of layers and their size the capacity increases: larger networks can represent more complex functions. • We encountered this before: as we increase the dimension of the ... Lesson: use high number of neurons/layers and regular

Deep Convolutional Neural Networks for Smile ...
Illustration of a convolutional neural network [4]. ...... [23] Ji, Shuiwang; Xu, Wei; Yang, Ming; Yu, Kai: 3D Convolutional Neural ... Deep Learning Tutorial.

Deep Convolutional Neural Networks On Multichannel Time Series for ...
Deep Convolutional Neural Networks On Multichannel Time Series for Human Activity Recognition.pdf. Deep Convolutional Neural Networks On Multichannel ...

Fine-tuning deep convolutional neural networks for ...
Aug 19, 2016 - mines whether the input image is an illustration based on a hyperparameter .... Select images for creating vocabulary, and generate interest points for .... after 50 epochs of training, and the CNN models that had more than two ...

The Power of Sparsity in Convolutional Neural Networks
Feb 21, 2017 - Department of Computer Science .... transformation and depend on number of classes. 2 ..... Online dictionary learning for sparse coding.

A Deep Convolutional Neural Network for Anomalous Online Forum ...
as releasing of NSA hacking tools [1], card cloning services [24] and online ... We propose a methodology that employs a neural network to learn deep features.

Convolutional Color Constancy - Jon Barron
chrominance space, thereby allowing us to apply techniques from object ... constancy, though similar tools have been used to augment ..... not regularize F, as it does not improve performance when ... ing and improve speed during testing.

Convolutional Multiplexing for Multicarrier Systems - IEEE Xplore
School of Information Engineering. Beijing University of Posts and Telecommunications. Beijing 100876, China. Email: [email protected], [email protected]. Xingpeng Mao. Department of Communication Engineering. Harbin Institute of Technology (Weiha

Visualizing and Understanding Convolutional Networks Giuseppe.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Visualizing and ...

Convolutional Multiplexing for Multicarrier Transmission
Convolutional Multiplexing for Multicarrier. Transmission: System Performance and Code Design. Wei Jiang and Daoben Li. School of Information Engineering, ...

Latte: Fully Convolutional Network in Halide - GitHub
May 9, 2016 - Latte: Fully Convolutional Network in Halide. Anbang Hu, X.D. Zhai. May 9 ... In Proceedings of the IEEE Conference on Computer Vision and.