Kaggle Competition Google Cloud & YouTube-8M Video Understanding Challenge 5th place solution

Deep Learning Methods for Efficient Large Scale Video Labeling M. Pękalski, X. Pan and M. Skalic CVPR'17 Workshop on YouTube-8M Large-Scale Video Understanding Honolulu, HI July 26, 2017

Agenda 1.

Team You8M

2.

Models

3.

Data Augmentation and Feature Enginnering

4.

Training Methods

5.

Key to Success

Team You8M

Marcin Pękalski: Master’s degree in Mathematics and Master’s degree in International Economics. Currently a data scientist at Kambi, a B2B sportsbook provider. Stockholm, Sweden. Miha Skalic: Holds Master's degree in Biotechnology. Now working towards a PhD in Biomedicine at University Pompeu Fabra . Barcelona, Spain. Xingguo E. Pan: Trained as a physicist. After getting a PhD, he went into the financial industry. Chicago, USA.

Frame level Models

Bi-directional LSTM • • • •

Dynamic length RNN Two models running in oposite directions MoE with two experts applied to last layer 6 epochs took 3 days

Bi-directional GRU • • •

Similar structure as LSTM Layer sizes 625x2, 1250 Trained with 5 folds

Bi-directional LSTM model

Video level models

MoNN: Mixture of neural network experts input

MoNN3Lw 3 FC layers: • 2305x8 • 2305x1 • 2305x3

gate activations

fully connected fully connected

3 experts Features • mean_rgb/audio • std_rgb/audio • num_frames 5 epochs took 9 hours on GeForce GTX 1080ti

fully connected expert activation final predictions

# layers

Models

GAP

GAP

single checkpoint single fold

models of same family, checkpoints and folds ensemble

MoNN

0.823

0.835

LSTM

0.820

0.835

GRU

0.812

0.834

GAP All models ensemble

0.8419 Best score 0.8424

Model Correlation Models MoNN

• •

All models are implemented with Tensorflow. GAP scores are from private leaderboard.

LSTM GRU

MoNN

LSTM

GRU

1.0

0.96

0.96

1.0

0.98 1.0

Data augmentation & Feature engineering

Video splitting: • •

Split one video into two halves training samples: 6.3 million => 18.9 million

Video Level features: • mean-rgb/audio • std-rgb/audio • 3rd moments of rgb/audio • num_frames • Moments of entire video • top/bottom 5 per feature dimension

Training Being able to see the out-of-sample GAP greatly facilitated our management of the training process and of model and feature selection. Example: Both yellow and red models can well fit the train data, but yellow model peaked around 20 K steps on test data.

Monitor out-of-sample performance while training

Training

Dropout Truncated Labels Batch, Global and No Normalization

Training Methods that we experimented with and that helped to deepen our understanding of the data and models.

Exponential Moving Average Training on Folds Boosting Network

Key to Success

We tried a lot of different architectures (wide and narrow, deep and shallow, dropouts, ema, boosting, etc.). Combined video and frame level models. Tried different ensemble weighting techniques, utilizing individual model’s GAP score and the correlation between models. Generated different features: pre-assembled them into data for efficient training process. Data augmentation. Training on folds of data and averaging the results over folds and checkpoints.

Resources & Acknowledgement s

arXiv Paper: • https://arxiv.org/abs/1706.04572 Source Code: • https://github.com/mpekalski/Y8M

©Apache License, version 2

Acknowledgements: •

The authors would like to thank the Computational Bio-physics group at University Pompeu Fabra for letting us use their GPU computational resources. We would also like to thank Jose Jimenez for valuable discussions and feedback.

Deep Learning Methods for Efficient Large ... - Research at Google

Jul 26, 2017 - Google Cloud & YouTube-8M Video. Understanding Challenge ... GAP scores are from private leaderboard. Models. MoNN. LSTM GRU.

311KB Sizes 5 Downloads 417 Views

Recommend Documents

Deep Learning Methods for Efficient Large Scale Video Labeling
Jun 14, 2017 - We present a solution to “Google Cloud and YouTube-. 8M Video ..... 128 samples batch size) achieved private leaderboard GAP score of ...

Large-Scale Deep Learning for Intelligent ... - Research at Google
Android. Apps. GMail. Image Understanding. Maps. NLP. Photos. Robotics. Speech. Translation many research uses.. YouTube … many others . ... Page 10 ...

Cost-Efficient Dragonfly Topology for Large ... - Research at Google
Evolving technology and increasing pin-bandwidth motivate the use of high-radix .... cost comparison of the dragonfly topology to alternative topologies using a detailed cost model. .... energy (cooling) cost within the first 3 years of purchase [8].

cost-efficient dragonfly topology for large-scale ... - Research at Google
radix or degree increases, hop count and hence header ... 1. 10. 100. 1,000. 10,000. 1985 1990 1995 2000 2005 2010. Year .... IEEE CS Press, 2006, pp. 16-28.

Efficient Topologies for Large-scale Cluster ... - Research at Google
... to take advantage of additional packing locality and fewer optical links with ... digital systems – e.g., server clusters, internet routers, and storage-area networks.

Learning with Deep Cascades - Research at Google
based on feature monomials of degree k, or polynomial functions of degree k, ... on finding the best trade-off between computational cost and classification accu-.

Tera-scale deep learning - Research at Google
The Trend of BigData .... Scaling up Deep Learning. Real data. Deep learning data ... Le, et al., Building high-‐level features using large-‐scale unsupervised ...

Up Next: Retrieval Methods for Large Scale ... - Research at Google
KDD'14, August 24–27, 2014, New York, NY, USA. Copyright 2014 ACM .... YouTube official blog [1, 3] or work by Simonet [25] for more information about the ...

Efficient Inference and Structured Learning for ... - Research at Google
constraints are enforced by reverting to k-best infer- ..... edge e∗,0 between v−1 and v0. Set the weight ... does not affect the core role assignment, the signature.

Efficient Spatial Sampling of Large ... - Research at Google
geographical databases, spatial sampling, maps, data visu- alization ...... fairness objective is typically best used along with another objective, e.g. ...... [2] Arcgis. http://www.esri.com/software/arcgis/index.html. ... Data Mining: Concepts and.

Large Scale Distributed Deep Networks - Research at Google
second point, we trained a large neural network of more than 1 billion parameters and .... rameter server service for an updated copy of its model parameters.

LARGE SCALE DEEP NEURAL NETWORK ... - Research at Google
ral networks, deep learning, audio indexing. 1. INTRODUCTION. More than one billion people ... recognition technology can be an attractive and useful service.

Efficient Large-Scale Distributed Training of ... - Research at Google
Training conditional maximum entropy models on massive data sets requires sig- ..... where we used the convexity of Lz'm and Lzm . It is not hard to see that BW .... a large cluster of commodity machines with a local shared disk space and a.

Kernel Methods for Learning Languages - Research at Google
Dec 28, 2007 - its input labels, and further optimize the result with the application of the. 21 ... for providing hosting and guidance at the Hebrew University.

Efficient Methods for Large Resistor Networks
Abstract—Large resistor networks arise during the design of very-large-scale ... electrical charge on the pins of a package can be discharged, and whether the ...

Large Scale Learning to Rank - Research at Google
In this paper, we are concerned with learning to rank methods that can learn on large scale data sets. One standard method for learning to rank involves ...

Robust Large-Scale Machine Learning in the ... - Research at Google
and enables it to scale to massive datasets on low-cost com- modity servers. ... datasets. In this paper, we describe a new scalable coordinate de- scent (SCD) algorithm for ...... International Workshop on Data Mining for Online. Advertising ...

Large Scale Online Learning of Image Similarity ... - Research at Google
of OASIS learned similarity show that 35% of the ten nearest neighbors of a ..... the computer vision literature (Ojala et al., 2002, Takala et al., 2005), ...... Var10: bear, skyscraper, billiards, yo-yo, minotaur, roulette-wheel, hamburger, laptop-

Large-Scale Learning with Less RAM via ... - Research at Google
such as those used for predicting ad click through rates. (CTR) for sponsored ... Streeter & McMahan, 2010) or for filtering email spam at scale (Goodman et al., ...

Efficient Learning of Sparse Ranking Functions - Research at Google
isting learning tools with matching generalization analysis that stem from Valadimir. Vapnik's work [13, 14, 15]. However, the reduction to pairs of instances may ...

cuDNN: Efficient Primitives for Deep Learning
Theano [5], and Caffe [11] feature suites of custom kernels that implement basic operations such ... make it much easier for deep learning frameworks to take advantage of parallel hardware. ... software framework, or even data layout. .... to these d

Deep Learning in Speech Synthesis - Research at Google
Aug 31, 2013 - Heiga Zen. Deep Learning in Speech Synthesis. August 31st, 2013. 6 of 50 ..... w/ vs w/o grouping questions (e.g., vowel, fricative). − Grouping ...

Resurrecting the sigmoid in deep learning ... - Research at Google
Since error information backpropagates faithfully and isometrically through the network, this stronger requirement is called dynamical isometry [10]. A theoretical analysis of exact solutions to the nonlinear dynamics of learning in deep linear netwo

Development and Validation of a Deep Learning ... - Research at Google
Nov 29, 2016 - CR1/DGi/CR2, and Topcon NW using 45° fields of view. ..... A, Model performance on the tuning set (24 360 images) as a function of number.