Building high-level features using large scale unsupervised learning Quoc  V.  Le  

Stanford  University  and  Google  

Joint  work  with:  Marc’Aurelio  Ranzato,  Rajat  Monga,  MaEhieu  Devin,  Kai  Chen,     Greg  Corrado,  Jeff  Dean,  Andrew  Y.  Ng  

Hierarchy  of  feature  representaRons   Face  detectors  

Face  parts   (combinaRon     of  edges)  

edges  

pixels   Lee  et  al,  2009.  Sparse  DBNs.  

Faces  

Random  images  from  the  Internet  

Key  results  

Face  detector  

Human  body  detector  

Cat  detector  

Quoc  V.  Le  

Algorithm  

Sparse   autoencoder  

Sparse   autoencoder   Sparse   autoencoder  

Each  RICA  layer  =  1  filtering  layer  +  pooling  layer  +  local  contrast   normalizaRon  layer     See  Le  et  al,  NIPS  11  and  Le  et  al,  CVPR  11  for  applicaRons  on  acRon   recogniRon,  object  recogniRon,  biomedical  imaging     Very  large  model  -­‐>  Cannot  fit  in  a  single  machine                                                                -­‐>  Model  parallelism,  Data  parallelism    

Image  

Quoc  V.  Le  

Local  recepRve  field  networks   Machine  #1  

Machine  #2  

Machine  #3  

Machine  #4  

Features  

Image  

Le,  et  al.,  Tiled  Convolu,onal  Neural  Networks.  NIPS  2010  

Quoc  V.  Le  

Asynchronous  Parallel  SGDs  

Parameter  server  

Quoc  V.  Le  

Asynchronous  Parallel  SGDs  

Parameter  server  

Quoc  V.  Le  

Training  

Sparse   autoencoder  

Sparse   autoencoder  

Dataset:  10  million  200x200  unlabeled  images    from  YouTube/Web     Train  on  1000  machines  (16000  cores)  for  1  week     1.15  billion  parameters   -­‐  100x  larger  than  previously  reported     -­‐  Small  compared  to  visual  cortex    

Sparse   autoencoder  

Image  

Quoc  V.  Le  

Top  sRmuli  from  the  test  set  

Quoc  V.  Le  

Face  detector   OpRmal  sRmulus  via  opRmizaRon  

Quoc  V.  Le  

Face  detector  

Human  body  detector  

Cat  detector  

Quoc  V.  Le  

Random  distractors   Faces  

Frequency  

Feature  value   Quoc  V.  Le  

0  pixels  

20  pixels  

Feature  response  

Feature  response  

Invariance  properRes  

0  pixels   VerRcal  shils  

o   90  

o   0   3D  rotaRon  angle  

Feature  response  

Feature  response  

Horizontal  shils  

20  pixels  

0.4x  

1x  

1.6x  

Scale  factor   Quoc  V.  Le  

ImageNet  classificaRon   20,000  categories,  16,000,000  images     Hand-­‐engineered  features  (SIFT,  HOG,  LBP),    SpaRal  pyramid,     SparseCoding/Compression,  Kernel  SVMs    

Quoc  V.  Le  

20,000  is  a  lot  of  categories…     …   smoothhound,  smoothhound  shark,  Mustelus  mustelus   American  smooth  dogfish,  Mustelus  canis   Florida  smoothhound,  Mustelus  norrisi   whiteRp  shark,  reef  whiteRp  shark,  Triaenodon  obseus   AtlanRc  spiny  dogfish,  Squalus  acanthias   Pacific  spiny  dogfish,  Squalus  suckleyi   hammerhead,  hammerhead  shark   smooth  hammerhead,  Sphyrna  zygaena   smalleye  hammerhead,  Sphyrna  tudes   shovelhead,  bonnethead,  bonnet  shark,  Sphyrna  Rburo   angel  shark,  angelfish,  SquaRna  squaRna,  monkfish   electric  ray,  crampfish,  numbfish,  torpedo   smalltooth  sawfish,  PrisRs  pecRnatus   guitarfish   roughtail  sRngray,  DasyaRs  centroura   buEerfly  ray   eagle  ray   spoEed  eagle  ray,  spoEed  ray,  Aetobatus  narinari   cownose  ray,  cow-­‐nosed  ray,  Rhinoptera  bonasus   manta,  manta  ray,  devilfish   AtlanRc  manta,  Manta  birostris   devil  ray,  Mobula  hypostoma   grey  skate,  gray  skate,  Raja  baRs   liEle  skate,  Raja  erinacea   …  

SRngray  

Mantaray  

Quoc  V.  Le  

0.005%   9.5%   Random  guess  

State-­‐of-­‐the-­‐art   (Weston,  Bengio  ‘11)  

?  

Feature  learning     From  raw  pixels  

Quoc  V.  Le  

0.005%   9.5%   15.8%   Random  guess  

State-­‐of-­‐the-­‐art   (Weston,  Bengio  ‘11)  

Feature  learning     From  raw  pixels  

ImageNet  2009  (10k  categories):  Best  published  result:  17%                                                                                                                        (Sanchez  &  Perronnin  ‘11  ),                                                                                                                        Our  method:  19%     Using  only  1000  categories,  our  method  >  50%    

Quoc  V.  Le  

Feature  1  

Feature  2  

Feature  3  

Feature  4  

Feature  5  

Quoc  V.  Le  

Feature  6  

Feature  7    

Feature  8  

Feature  9  

Quoc  V.  Le  

Feature  10  

Feature  11    

Feature  12  

Feature  13  

Quoc  V.  Le  

Conclusions   •  RICA  learns  invariant  features   •  Face  neuron  with  totally  unlabeled  data                    with  enough  training  and  data   •  State-­‐of-­‐the-­‐art  performances  on     –  AcRon  RecogniRon   –  Cancer  image  classificaRon   –  ImageNet  

Cancer  classificaRon  

Feature  visualizaRon  

0.005%  

ImageNet   9.5%  

Random  guess  

AcRon  recogniRon  

Face  neuron  

Best  published  result  

15.8%   Our  method  

Joint  work  with  

Kai  Chen  

Greg  Corrado  

Rajat  Monga   Andrew  Ng  

AddiRonal   Thanks:  

Jeff  Dean   MaEhieu  Devin  

Marc Aurelio   Paul  Tucker   Ranzato  

Ke  Yang  

Samy  Bengio,  Zhenghao  Chen,  Tom  Dean,  Pangwei  Koh,   Mark  Mao,  Jiquan  Ngiam,  Patrick  Nguyen,  Andrew  Saxe,   Mark  Segal,  Jon  Shlens,    Vincent  Vanhouke,    Xiaoyun  Wu,     Peng  Xe,  Serena  Yeung,  Will  Zou  

References   •  Q.V.  Le,  M.A.  Ranzato,  R.  Monga,  M.  Devin,  G.  Corrado,  K.  Chen,  J.  Dean,  A.Y.   Ng.  Building  high-­‐level  features  using  large-­‐scale  unsupervised  learning.   ICML,  2012.   •  Q.V.  Le,  J.  Ngiam,  Z.  Chen,  D.  Chia,  P.  Koh,  A.Y.  Ng.  Tiled  Convolu7onal  Neural   Networks.  NIPS,  2010.     •  Q.V.  Le,  W.Y.  Zou,  S.Y.  Yeung,  A.Y.  Ng.  Learning  hierarchical  spa7o-­‐temporal   features  for  ac7on  recogni7on  with  independent  subspace  analysis.  CVPR,   2011.   •  Q.V.  Le,  J.  Ngiam,  A.  Coates,  A.  Lahiri,  B.  Prochnow,  A.Y.  Ng.     On  op7miza7on  methods  for  deep  learning.  ICML,  2011.     •  Q.V.  Le,  A.  Karpenko,  J.  Ngiam,  A.Y.  Ng.    ICA  with  Reconstruc7on  Cost  for   Efficient  Overcomplete  Feature  Learning.  NIPS,  2011.     •  Q.V.  Le,  J.  Han,  J.  Gray,  P.  Spellman,  A.  Borowsky,  B.  Parvin.  Learning  Invariant   Features  for  Tumor  Signatures.  ISBI,  2012.     •  I.J.  Goodfellow,  Q.V.  Le,  A.M.  Saxe,  H.  Lee,  A.Y.  Ng,    Measuring  invariances  in   deep  networks.  NIPS,  2009.  

hEp://ai.stanford.edu/~quocle  

Building high-level features using large scale ... - Research at Google

Model parallelism, Data parallelism. Image ... Face neuron with totally unlabeled data with enough training and data. •. State-‐of-‐the-‐art performances on.

9MB Sizes 0 Downloads 121 Views

Recommend Documents

Building High-level Features Using Large Scale ... - Research at Google
Using Large Scale Unsupervised Learning. Quoc V. Le ... a significant challenge for problems where labeled data are rare. ..... have built a software framework called DistBelief that ... Surprisingly, the best neuron in the network performs.

Large-scale Incremental Processing Using ... - Research at Google
language (currently C++) and mix calls to the Percola- tor API with .... 23 return true;. 24. } 25. } 26 // Prewrite tries to lock cell w, returning false in case of conflict. 27 ..... set of the servers in a Google data center. .... per hour. At thi

Building Large-Scale Internet Services - Research
~1 network rewiring (rolling ~5% of machines down over 2-day span). ~20 rack ...... Web Search for a Planet: The Google Cluster Architecture, IEEE Micro, 2003.Missing:

Large-scale Privacy Protection in Google Street ... - Research at Google
wander through the street-level environment, thus enabling ... However, permission to reprint/republish this material for advertising or promotional purposes or for .... 5To obtain a copy of the data set for academic use, please send an e-mail.

Large Scale Page-Based Book Similarity ... - Research at Google
tribution is a two-step technique for clustering books based on content similarity (at ... We found that the only truly reliable way to establish relationships between.

Tracking Large-Scale Video Remix in Real ... - Research at Google
out with over 2 million video shots from more than 40,000 videos ... on sites like YouTube [37]. ..... (10). The influence indexes above captures two aspects in meme diffusion: the ... Popularity, or importance on social media is inherently multi-.

Efficient Large-Scale Distributed Training of ... - Research at Google
Training conditional maximum entropy models on massive data sets requires sig- ..... where we used the convexity of Lz'm and Lzm . It is not hard to see that BW .... a large cluster of commodity machines with a local shared disk space and a.

Large-Scale Graph-based Transductive Inference - Research at Google
ri, pi and qi are all multinomial distributions (in general they can be any unsigned measure, see ... 128GB of RAM, each core operating at 1.6GHz, and no more than one thread per core. .... CIKM Workshop on Search and Social Media, 2008.

Distributed Large-scale Natural Graph ... - Research at Google
Natural graphs, such as social networks, email graphs, or instant messaging ... cated values in order to perform most of the computation ... On a graph of 200 million vertices and 10 billion edges, de- ... to the author's site if the Material is used

Large Scale Performance Measurement of ... - Research at Google
Large Scale Performance Measurement of Content-Based ... in photo management applications. II. .... In this section, we perform large scale tests on two.

VisualRank: Applying PageRank to Large-Scale ... - Research at Google
data noise, especially given the nature of the Web images ... [19] for video retrieval and Joshi et al. ..... the centers of the images all correspond to the original.