Yoshua Bengio, Grégoire Mesnil, Yann Dauphin, Salah Rifai U. Montreal ICML 2013
MCMC Sampling Challenges • Burn-‐in • Going from an unlikely configura?on to likely ones • Mixing • Local: auto-‐correla?on between successive samples • Global: mixing between major “modes” 2
This talk
For gradient & inference: More difficult to mix with better trained models • Early during training, density smeared out, mode bumps overlap
• Later on, hard to cross empty voids between modes Training updates vicious circle
Mixing
3
Are we doomed if we rely on MCMC during training? Will we be able to train really large & complex models?
Poor Mixing: Depth to the Rescue
1-‐layer (RBM)
2-‐layer (CAE)
• Sampling from DBNs and stacked Contrac?ve Auto-‐Encoders: 1. MCMC sample from top-‐level single-‐layer model 2. Propagate top-‐level representa?ons to input-‐level repr. • Visits modes (classes) faster
h2 h1 x 4
More Modes Visited Per Run • Nearest-‐neighbor classifier associates a class to each sample • Count how many different classes visited in short MCMC run • Deeper nets visit modes (classes) faster Toronto Face Database
5
# classes visited
Space-Filling in Representation-Space • High-‐probability samples fill more the convex set between them when viewed in the learned representa?on-‐space, making the empirical distribu?on more uniform and unfolding manifolds Linear interpola?on at layer 2
9’s manifold
3’s manifold
Linear interpola?on at layer 1
Linear interpola?on in pixel space
Poor Mixing: Depth to the Rescue • Deeper representa?ons è abstrac?ons è disentangling • E.g. reverse video bit, class bits in learned representa?ons: easy to Gibbs sample between modes at abstract level • Hypotheses: • more abstract/disentangled representa?ons unfold manifolds and fill more the space Pixel space 9’s manifold
3’s manifold
Representa?on space 9’s manifold
3’s manifold
• can be exploited for beaer mixing between modes 7
Sampling from DBNs and stacked ContracTve Auto-âEncoders: 1. MCMC sample from top-âlevel single-âlayer model. 2. Propagate top-âlevel representaTons to input-âlevel repr. ⢠Visits modes (classes) faster. 4 x h. 2 h. 1. 1-âlayer (RB. M). 2-âlayer (CA. E) ...
Mar 14, 2012 - supports this point. In this figure, each + shows existence of a particular income group in a particular municipality. 1.0.3 Fact 3: Conditional Imperfect Sorting. In our third fact, we .... gives mixing in this paper.4 Since each loca
Mar 5, 2012 - Fit of Estimation. Param. Value. Target. Data ..... School choice is constrained by the boundary of municipality of residence. â» Net migration to ...
vation of producing a useful analysis and visualization tool. Recently, the field of semi-supervised learning. (Chapelle et al., 2006), which has the goal of improv- ... using unlabeled data in deep neural network-based ar- ..... Graph SVM. 8.32.
art or better performance on four academic benchmarks of diverse real-world ..... Combined they contain 11000 training and 1000 testing im- ages. These are images from ..... We present, to our knowledge, the first application of. Deep Neural ...
file-sharing system, such as GNUTELLA and FREENET, files are found by ..... (color online) The time-correlated hitting probability ps and pd as a function of time ...
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. EN ER MUNDO.
Page. 1. /. 13. Loading⦠Page 1 of 13. Page 1 of 13. Page 2 of 13. Page 2 of 13. Page 3 of 13. Page 3 of 13. obayedhaq er angina.pdf. obayedhaq er angina.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying obayedhaq er angina.pdf.
3 shows a longitudinal section of a roller in which spherical hollow bodies are provided. The hollow cylinder forming the working roller periph. ' ery is indicated at ...
paper early birth doctor nurse journey. dinner search shirt work church courage. summer learn bird world return journal. offer earn third worst purple nourish. mother heard dirt worry turn tourist. helper pearl circle worm burn glamour. letter girl l
Landscape and hardscape at future street. dedication. Driveway ramp. Existing Barber Shop to remain. 2. Page 3 of 3. ER 11.24.14.pdf. ER 11.24.14.pdf. Open.
Our model makes three predictions, under the assumption of the âbig desertâ, in running down the ... Moreover, the data parameterizing the Dirac operators of our ï¬nite geome- tries can be described in .... Wig)» V5 77 G E (25). 0 One has. J2=1
analysis of higher derivatives gravity as in [12, 19]. Later, we explain in .... with the âbig desertâ prediction of the minimal standard modei (cf. [41]). The third ... Moreover, the data parameterizing the Dirac operators of our ï¬nite geome-
Cluster 2: Bilateral anterior cingulate cortex. Poster presented at the Bri sh Neuroscience Associa on Biennial Mee ng, Harrogate, April 2011. Email: [email protected]. Overview. ⢠N400 ERP component increased in response to seman cally incongrue
reside in the European Union, or at least 13 years of age if you reside in any other location. If you are at least .... Cloud technology in a way that transforms businesses and meaningfully impacts the people and customers ... Tracking an individual'
Can Jared Diamond explain how America fell to guns, germs and steel? Two Empires. Spaniard Francisco Pizarro has gone down in history as the man who conquered the. Inca. Leading a small company of mercenaries and adventurers, this former swineherd. f
Sign in. Loading⦠Whoops! There was a problem loading more pages. Whoops! There was a problem previewing this document. Retrying... Download. Connect ...