QCD-aware Recursive Neural Networks for Jet Physics arXiv:1702.00748 Gilles Louppe, Kyunghyun Cho, Cyril Becot, Kyle Cranmer

A machine learning perspective Kyle’s talks on QCD-aware recursive nets: • Theory Colloquium, CERN, May 24, https://indico.cern.ch/event/640111/

• DS@HEP 2017, Fermilab, May 10, https://indico.fnal.gov/ conferenceDisplay.py?confId=13497

• Jet substructure and jet-by-jet tagging, CERN, April 20, https://indico.cern.ch/event/633469/

• Statistics and ML forum, CERN, February 14, https://indico.cern.ch/event/613874/ contributions/2476427/

Today: the inner mechanisms of recursive nets for jet physics.

2 / 19

Credits: Lecun et al, 2015

Neural networks 101 Goal = Function approximation • Learn a map from x to y based solely

on observed pairs • Potentially non-linear map from x to y • x and y are fixed dimensional vectors

Model = Multi-layer perceptron (MLP) • Parameterized composition f (·; θ) of

non-linear transformations • Stacking transformation layers allows

to learn (almost any) arbitrary highly non-linear mapping

3 / 19

Learning • Learning by optimization • Cost function

J(θ; D) =

N 1 X `(yi , f (xi ; θ)) N i=1

• Stochastic gradient descent optimization

θm := θm−1 − η∇θ J(θm−1 ; Bm ) where Bm ∈ D is a random subset of D. How does one derive ∇θ J(θ)?

4 / 19

Credits: Goodfellow et al, 2016. Section 6.5.

Computational graphs f (x; θ = (W (1) , W (2) )) = W (2) relu(W (1) x) (simplified 1-layer MLP)   X  (1) 2  (2) 2  J(θ = (W (1) , W (2) )) = JMLE + λ  Wi,j + Wi,j i,j

5 / 19

Backpropagation • Backpropagation = Efficient computation of ∇θ J(θ) • Implementation of the chain rule for the (total) derivatives • Applied recursively from backward by walking the

computational graph from outputs to inputs

∂J dJMLE ∂J du (8) dJ = + (8) (1) (1) ∂JMLE dW dW ∂u dW (1) dJMLE = . . . (recursive case) dW (1) du (8) = . . . (recursive case) dW (1)

6 / 19

Recurrent networks Setup • Sequence x = (x1 , x2 , ..., xτ ) E.g., a sentence given as a chain of words • The length of each sequence may vary

Model = Recurrent network • Compress x into a single vector by recursively

applying a MLP with shared weights on the sequence, then compute output. • h(t) = f (h(t−1) , x (t) ; θ) • o = g (h(τ) ; θ)

How does one backpropagate through the cycle?

7 / 19

Credits: Goodfellow et al, 2016. Section 10.2.

Backpropagation through time • Unroll the recurrent computational graph through time • Backprop through this graph to derive gradients

unroll

−−−→

8 / 19

This principle generalizes to any kind of (recursive or iterative) computation that can be unrolled into a directed acyclic computational graph.

(That is, to any program!)

9 / 19

Credits: Goodfellow et al, 2016. Section 10.6.

Recursive networks Setup • x is structured as a tree E.g., a sentence and its parse tree • The topology of each training input may vary

Model = Recursive networks • Compress x into a single vector by recursively

applying a MLP with shared weights on the tree, then compute output.  v (x (t) ; θ) if t is a leaf • h(t) = f (h(tleft ) , h(tright ) ; θ) otherwise • o = g (h(0) ; θ)

10 / 19

Credits: pytorch.org/about

Dynamic computational graphs • Most frameworks (TensorFlow, Theano, Caffee or CNTK)

assume a static computational graph. • Reverse-mode auto-differentiation builds computational

graphs dynamically on the fly, as code executes. One can change how the network behaves (e.g. depending on the input topology) arbitrarily with zero lag or overhead. Available in autograd, Chainer, PyTorch or DyNet.

11 / 19

Credits: Neubig et al, 2017

Operation batching • Distinct per-sample topologies make it difficult to vectorize

operations. • However, in the case of trees, computations can be performed

in batch level-wise, from bottom to top.

On-the-fly operation batching (in DyNet) 12 / 19

From sentences to jets

Analogy: • word → particle • sentence → jet • parsing → jet algorithm 13 / 19

Jet topology • Use sequential recombination jet algorithms (kT , anti-kT , etc)

to define computational graphs (on a per-jet basis). • The root node in the graph provides a fixed-length embedding

of a jet, which can then be fed to a classifier. • Path towards ML models with good physics properties.

A jet structured as a tree by the kT recombination algorithm 14 / 19

QCD-aware recursive neural networks

Simple recursive activation: Each node k combines a non-linear transformation uk of the 4-momentum ok with the left and right embeddings hkL and hkR .  u    k   jet  hk =  σ W    h  

if k is a leaf  jet   hk L   jet h  + b h   kR   uk

otherwise

uk = σ (Wu g (ok ) + bu )  vi(k) if k is a leaf ok = otherwise ok + ok L

R

15 / 19

QCD-aware recursive neural networks Gated recursive activation: Each node actively selects, merges or propagates up the left, right or local embeddings as enabled with reset and update gates r and z. (Similar to a GRU.) jet hk

˜ jet h k



 zH  zL    zR  zN 



 u if k is a leaf    k jet ˜ jet otherwise = zH hk + zL hkL +   ,→ z hjet + z u R N k kR     jet rL hk    L jet  + b   = σ ˜ W˜ h h rR hk  R rN uk   jet   ˜ h k    jet    hk      = softmax  Wz  jetL  + bz    hk 

rL rR  = rN

R

uk 





jet h   kL  jet   sigmoid  W  r  hk  R uk

  + br  

15 / 19

Jet-level classification results • W-jet tagging example (data from 1609.00607) • On images, RNN has similar performance to previous CNN-based approaches.

• Improved performance when working with calorimeter towers, without image pre-processing.

• Working on truth-level particles led to significant improvement.

• Choice of jet algorithm matters.

16 / 19

From paragraphs to events

Analogy: • word → particle • sentence → jet • parsing → jet algorithm • paragraph → event

Joint learning of jet embedding, event embedding and classifier. 17 / 19

Event-level classification results RNN on jet-level 4-momentum v (tj ) only vs. adding jet-embeddings hj : • Adding jet embedding is much

better (provides jet tagging information). RNN on jet-level embeddings vs. RNN that simply processes all particles in the event: • Jet clustering and jet embeddings

help a lot!

18 / 19

Summary • Neural networks are computational graphs whose architecture

can be molded on a per-sample basis to express and impose domain knowledge. • Our QCD-aware recursive net operates on a variable length

set of 4-momenta and use a computational graph determined by a jet algorithm. Experiments show that topology matters. Alternative to image-based approaches. Requires much less data to train (10-100x less data). • The approach directly extends to the embedding of full

events. Intermediate jet representation helps. • Many more ideas of hybrids of QCD and machine learning!

19 / 19

QCD-aware Recursive Neural Networks for Jet Physics arXiv ... - GitHub

The topology of each training input may vary. Model = Recursive networks ... Reverse-mode auto-differentiation builds computational graphs dynamically on the ...

1MB Sizes 10 Downloads 246 Views

Recommend Documents

Neural Networks - GitHub
Oct 14, 2015 - computing power is limited, our models are necessarily gross idealisations of real networks of neurones. The neuron model. Back to Contents. 3. ..... risk management target marketing. But to give you some more specific examples; ANN ar

On parallelizing recursive neural networks on
As a first step, we changed the architecture proposed in Fig. 1 (a) and we used a .... (iv) the processor interconnection topology is the bi-directional ring. Fig.

Recursive Functions - GitHub
Since the successor function can increment by one, we can easily find primitive ... Here we have used the convention of ending predicate names with “?”; in this .... that is, functions that are undefined, or divergent, at some values in the domai

lecture 17: neural networks, deep networks, convolutional ... - GitHub
As we increase number of layers and their size the capacity increases: larger networks can represent more complex functions. • We encountered this before: as we increase the dimension of the ... Lesson: use high number of neurons/layers and regular

A Neural Conversational Model - arXiv
Jul 22, 2015 - However, most of these systems ... bined with other systems to re-score a short-list of can- ..... CleverBot: What is the color of the apple in the.

T81-559: Applications of Deep Neural Networks, Washington ... - GitHub
Oct 25, 2016 - T81-559: Applications of Deep Neural Networks, Washington University ... network and display a statistic showing how good of a fit you got.

T81-559: Applications of Deep Neural Networks, Washington ... - GitHub
Sep 11, 2016 - 9 from scipy.stats import zscore. 10 from .... submission please include your Jupyter notebook and any generated CSV files that the ques-.

T81-558: Applications of Deep Neural Networks Spring 2018 ... - GitHub
Jan 16, 2018 - mented, cited, and attributed, regardless of media or distribution. Even in the case of work licensed as public domain or Copyleft, (See: http://creativecommons.org/) the student must provide attri- bution of that work in order to upho

Learning Methods for Dynamic Neural Networks - IEICE
Email: [email protected], [email protected], [email protected]. Abstract In .... A good learning rule must rely on signals that are available ...

A Neural Representation of Sketch Drawings - arXiv
Apr 11, 2017 - outputs a latent vector of size Nz. Specifically, we feed the sketch sequence, S, and also the same ..... hybrids on pixel images, and has recently been tested on text generation [12]. Combining .... 2015/11/13/gan.html, 2015.

Recurrent Neural Networks
Sep 18, 2014 - Memory Cell and Gates. • Input Gate: ... How LSTM deals with V/E Gradients? • RNN hidden ... Memory cell (Linear Unit). . =  ...

Intriguing properties of neural networks
Feb 19, 2014 - we use one neural net to generate a set of adversarial examples, we ... For the MNIST dataset, we used the following architectures [11] ..... Still, this experiment leaves open the question of dependence over the training set.

Evolution in Materio: Exploiting the Physics of Materials for ... - arXiv
Nov 17, 2006 - that we would find that our computed results are only an approximation of the ... how tiny a region of space and no matter how tiny a region of time. ... the cells are pro- grammable so essentially any Boolean network can be con- .....

Evolution in Materio: Exploiting the Physics of Materials for ... - arXiv
Nov 17, 2006 - In summary, we have shown that we can use evolution to ..... spin state it acts essentially as a single giant classical spin system. Their quantum ...

Evolution in Materio: Exploiting the Physics of Materials for ... - arXiv
Nov 17, 2006 - computer, and the set of differential equations and boundary conditions that ... Of course we cannot directly program the molecular dy- namics and we do not have ... tionary programming in an FPGA was described by [10] and [11]. ... be

Convolutional Networks for Localization Yunus Emre - GitHub
1 Introduction: Image recognition has gained a lot of interest more recently which is driven by the demand for more sophisticated algorithms and advances in processing capacity of the computation devices. These algorithms have been integrated in our

Understanding LSTM Networks - GitHub
Aug 27, 2015 - (http://www-dsi.ing.unifi.it/~paolo/ps/tnn-94-gradient.pdf), who found some pretty ... In the next step, we'll combine these two to create an update to the state. .... (http://research.google.com/pubs/OriolVinyals.html), Greg Corrado .

Communication Networks IBPM - GitHub
Evaluation of Computer and Communication Systems (MMB) and. Dependability and Fault ... 2 These authors are with science+computing ag, Tuebingen, Germany. ▻ State-of-the art communication technology for interconnection in high-performance ... Extra

Quantum Statistical Physics - GitHub
We often call this model as a model of degenerate electron gas or a model for ..... When “t0” approaches - infinity, ˆH become ˆH0, the state vector in the ...... To study this B.S. equation, let us first introduce the total center of mass wave

Neural Graph Learning: Training Neural Networks Using Graphs
many problems in computer vision, natural language processing or social networks, in which getting labeled ... inputs and on many different neural network architectures (see section 4). The paper is organized as .... Depending on the type of the grap

Deep Neural Networks for Acoustic Modeling in Speech ... - CiteSeerX
Apr 27, 2012 - origin is not the best way to find a good set of weights and unless the initial ..... State-of-the-art ASR systems do not use filter-bank coefficients as the input ...... of the 24th international conference on Machine learning, 2007,