Be#er  Mixing  via  Deep  Representa2ons    

Yoshua  Bengio,  Grégoire  Mesnil,  Yann  Dauphin,  Salah  Rifai      U.  Montreal     ICML  2013      

MCMC Sampling Challenges •  Burn-­‐in   •  Going  from  an  unlikely  configura?on  to  likely  ones      •  Mixing   •  Local:  auto-­‐correla?on  between  successive  samples   •  Global:  mixing  between  major  “modes”     2  

This  talk  

For gradient & inference: More difficult to mix with better trained models •  Early  during  training,  density  smeared  out,  mode  bumps  overlap  

•  Later  on,  hard  to  cross  empty  voids  between  modes   Training  updates   vicious  circle  

Mixing  

3  

Are  we  doomed  if   we  rely  on  MCMC   during  training?   Will  we  be  able  to   train  really  large  &   complex  models?    

Poor Mixing: Depth to the Rescue

1-­‐layer     (RBM)  

2-­‐layer     (CAE)  

•  Sampling  from  DBNs  and  stacked  Contrac?ve  Auto-­‐Encoders:   1.  MCMC  sample  from  top-­‐level  single-­‐layer  model   2.  Propagate  top-­‐level  representa?ons  to  input-­‐level  repr.   •  Visits  modes  (classes)  faster  

h2   h1   x   4  

More Modes Visited Per Run •  Nearest-­‐neighbor  classifier  associates  a  class  to  each  sample   •  Count  how  many  different  classes  visited  in  short  MCMC  run   •  Deeper  nets  visit  modes  (classes)  faster   Toronto  Face  Database  

5  

#  classes  visited    

Space-Filling in Representation-Space •  High-­‐probability  samples  fill  more  the  convex  set  between  them   when  viewed  in  the  learned  representa?on-­‐space,  making  the   empirical  distribu?on  more  uniform  and  unfolding  manifolds   Linear  interpola?on  at  layer  2  

9’s  manifold  

3’s  manifold  

Linear  interpola?on  at  layer  1  

Linear  interpola?on  in  pixel  space  

Poor Mixing: Depth to the Rescue •  Deeper  representa?ons  è  abstrac?ons  è  disentangling   •  E.g.  reverse  video  bit,  class  bits  in  learned  representa?ons:  easy   to  Gibbs  sample  between  modes  at  abstract  level   •  Hypotheses:     •  more  abstract/disentangled  representa?ons  unfold  manifolds   and  fill  more  the  space   Pixel  space   9’s  manifold  

3’s  manifold  

Representa?on  space   9’s  manifold  

3’s  manifold  

  •  can  be  exploited  for  beaer  mixing  between  modes   7  

Be er Mixing via Deep Representa ons

Sampling from DBNs and stacked ContracTve Auto-‐Encoders: 1. MCMC sample from top-‐level single-‐layer model. 2. Propagate top-‐level representaTons to input-‐level repr. • Visits modes (classes) faster. 4 x h. 2 h. 1. 1-‐layer (RB. M). 2-‐layer (CA. E) ...

920KB Sizes 0 Downloads 162 Views

Recommend Documents

Income Mixing via Lotteries in an Equilibrium Sorting ...
Mar 14, 2012 - supports this point. In this figure, each + shows existence of a particular income group in a particular municipality. 1.0.3 Fact 3: Conditional Imperfect Sorting. In our third fact, we .... gives mixing in this paper.4 Since each loca

Income Mixing via Lotteries in an Equilibrium Sorting ...
Mar 5, 2012 - Fit of Estimation. Param. Value. Target. Data ..... School choice is constrained by the boundary of municipality of residence. ▻ Net migration to ...

Deep Learning via Semi-Supervised Embedding
vation of producing a useful analysis and visualization tool. Recently, the field of semi-supervised learning. (Chapelle et al., 2006), which has the goal of improv- ... using unlabeled data in deep neural network-based ar- ..... Graph SVM. 8.32.

DeepPose: Human Pose Estimation via Deep Neural Networks
art or better performance on four academic benchmarks of diverse real-world ..... Combined they contain 11000 training and 1000 testing im- ages. These are images from ..... We present, to our knowledge, the first application of. Deep Neural ...

Stability Bounds for Stationary ϕ-mixing and β-mixing Processes
j i denote the σ-algebra generated by the random variables Zk, i ≤ k ≤ j. Then, for any positive ...... Aurélie Lozano, Sanjeev Kulkarni, and Robert Schapire. Convergence and ... Craig Saunders, Alexander Gammerman, and Volodya Vovk.

Mixing navigation on networks
file-sharing system, such as GNUTELLA and FREENET, files are found by ..... (color online) The time-correlated hitting probability ps and pd as a function of time ...

EN ER MUNDO.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. EN ER MUNDO.

obayedhaq er angina.pdf
Page. 1. /. 13. Loading… Page 1 of 13. Page 1 of 13. Page 2 of 13. Page 2 of 13. Page 3 of 13. Page 3 of 13. obayedhaq er angina.pdf. obayedhaq er angina.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying obayedhaq er angina.pdf.

1/0/6017)? Appenze?er
3 shows a longitudinal section of a roller in which spherical hollow bodies are provided. The hollow cylinder forming the working roller periph. ' ery is indicated at ...

ER 11.24.14.pdf
PRINTED ON: EAGLE ROCK BOULEVARD +. YOSEMITE DRIVE. LOS ANGELES, CA 90041. 11/21/2014 9:53:54 AM. SITE PLAN. A0.1. 1010 EAGLE ROCK.

SpellingSort-er-words.pdf
paper early birth doctor nurse journey. dinner search shirt work church courage. summer learn bird world return journal. offer earn third worst purple nourish. mother heard dirt worry turn tourist. helper pearl circle worm burn glamour. letter girl l

ER 11.24.14.pdf
Landscape and hardscape at future street. dedication. Driveway ramp. Existing Barber Shop to remain. 2. Page 3 of 3. ER 11.24.14.pdf. ER 11.24.14.pdf. Open.

with neutrino mixing
Our model makes three predictions, under the assumption of the “big desert”, in running down the ... Moreover, the data parameterizing the Dirac operators of our finite geome- tries can be described in .... Wig)» V5 77 G E (25). 0 One has. J2=1

with neutrino mixing
analysis of higher derivatives gravity as in [12, 19]. Later, we explain in .... with the “big desert” prediction of the minimal standard modei (cf. [41]). The third ... Moreover, the data parameterizing the Dirac operators of our finite geome-

Medial frontal contribu ons to the N400m sentence ...
Cluster 2: Bilateral anterior cingulate cortex. Poster presented at the Bri sh Neuroscience Associa on Biennial Mee ng, Harrogate, April 2011. Email: [email protected]. Overview. • N400 ERP component increased in response to seman cally incongrue

EIPM ITC Masters add ons updated 2013.pdf
The EIPM is a School for executives specialised in Purchasing & Supply. Management. • The EIPM leads Research activities, Education and Trainings.

Exam Terms and Conditions (Google Cloud...ons) - Services
reside in the European Union, or at least 13 years of age if you reside in any other location. If you are at least .... Cloud technology in a way that transforms businesses and meaningfully impacts the people and customers ... Tracking an individual'

ARTEO_Fiche_MUTECO_Créer son site Internet.pdf
ARTEO_Fiche_MUTECO_Créer son site Internet.pdf. ARTEO_Fiche_MUTECO_Créer son site Internet.pdf. Open. Extract. Open with. Sign In. Main menu.

ER est essaouira lp.pdf
Page 3 of 9. ER est essaouira lp.pdf. ER est essaouira lp.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying ER est essaouira lp.pdf.

Episode 2 ER-3.pdf
Can Jared Diamond explain how America fell to guns, germs and steel? Two Empires. Spaniard Francisco Pizarro has gone down in history as the man who conquered the. Inca. Leading a small company of mercenaries and adventurers, this former swineherd. f

geocities pdf er free xps
Sign in. Loading… Whoops! There was a problem loading more pages. Whoops! There was a problem previewing this document. Retrying... Download. Connect ...