The Monkeytyping Solution to YouTube-8M Video Understanding Challenge Heda Wang [email protected]

Teng Zhang [email protected]

Multimedia Signal and Intelligent Information Processing Laboratory Department of Electronic Engineering

Tsinghua University 2017/07/26

CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding

Heda Wang

2017/07/26

The framework train 4.9M

test

validate 22K

1.3M

109K

701K

4.9M -> 6.3M, single model GAP@20 +0.4% Linear stacking -> attention stacking, ensemble GAP@20 +0.1% CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding

Heda Wang

2017/07/26

Labels are correlated 𝑁 0, 𝜎 2 𝜎 = 0.3

Audi Racing Cars Cars Vechicles

Reconstruction Loss

FC (sigmoid)

FC (tanh)

100 4716

4716

GAP > 0.98 on validate set

CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding

Heda Wang

2017/07/26

Existing approaches for multi-label classification 

Probabilistic Graphic Models  



𝑃 𝐿1 , 𝐿2 , … , 𝐿𝑛 𝑋) Typically n < 100

(Ensemble of) Classifier Chains 

 

Sequentially training and testing Typically n < 200 Need to train a lot of classifiers

CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding

Heda Wang

2017/07/26

Explicitly model label correlation by Chaining

Video-level features

Mixture Of Expert

Prediction

Loss

CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding

FC-128 ReLU Heda Wang

2017/07/26

Explicitly model label correlation by Chaining

Frame-level features

LSTM or CNN

MoE

Prediction

Loss

CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding

FC-128 ReLU Heda Wang

2017/07/26

Explicitly model label correlation by Chaining

CNN

Frame-level features

CNN

MoE

Prediction

CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding

Loss

FC-128 ReLU

Heda Wang

2017/07/26

Explicitly model label correlation by Chaining Model

Video-level MoE 1D-CNN LSTM

Original Chaining Original Chaining Original Chaining

Parameters #mixture=16

Chaining 0.7965

#stage=8, #mixture=2 (1,2,3,3)x512 #stage=4, (1,2,3,3)x128

0.8106 0.7904 0.8179

#mixture=8 #stage=2, #mixture=4

0.8131 0.8172

CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding

Heda Wang

2017/07/26

Frame-level features

MoE

1D-conv

Pooling Over time

Prediction

LSTM

Loss

CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding

Heda Wang

2017/07/26

Modeling temporal multi-scale information

Network type

GAP@20

Vanilla LSTM

0.8131

Multi-Scale CNN-LSTM

0.8204

CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding

Heda Wang

2017/07/26

Attention pooling for saliency detection

Positional Embedding

Frame-level features

LSTM

MoE

Prediction

CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding

Loss

Heda Wang

Temporal Attention 2017/07/26

Attention pooling for saliency detection

Network type

GAP@20

Vanilla LSTM

0.8131

Attention LSTM

0.8157

Positional-embedded Attention LSTM

0.8169

CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding

Heda Wang

2017/07/26

Attention pooling for saliency detection Frames with low attention value

Frames with high attention value

CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding

Heda Wang

2017/07/26

The roadmap

Ensembles Ensemble of 27 single models

GAP@20 (Private LeaderBoard)

Includes 7 chaining models, 5 multi-scale models, 5 attention-pooling models, and 10 lstm models

0.8425

+ 11 bagging & boosting models + 8 distillation models + 28 cascade models Attention Weighted Stacking

0.8435 0.8437 0.8453 0.8459

CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding

Heda Wang

2017/07/26

Summary 

Multi-label video classification  



Address multi-label problem with chaining Model multi-scale temporal information Select salient frames with attention pooling-over-time

CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding

Heda Wang

2017/07/26

Summary 

Multi-label video classification  





More details    



Address multi-label problem with chaining Model multi-scale temporal information Select salient frames with attention pooling-over-time

And bagging, boosting, distillation, cascade, stacking, etc. Please refer to our paper Paper: https://arxiv.org/abs/1706.05150 Code: https://github.com/wangheda/youtube-8m

Thank you

CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding

Heda Wang

2017/07/26

Improving Keyword Search by Query Expansion ... - Research at Google

Jul 26, 2017 - YouTube-8M Video Understanding Challenge ... CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding ... Network type.

1MB Sizes 0 Downloads 371 Views

Recommend Documents

Query-Free News Search - Research at Google
Keywords. Web information retrieval, query-free search ..... algorithm would be able to achieve 100% relative recall. ..... Domain-specific keyphrase extraction. In.

Concept-Based Interactive Query Expansion - Research at Google
to develop high quality recommendation systems in e-commerce applications available in the Web [11, 16]. These applications take user sessions stored at ...

Query Suggestions for Mobile Search ... - Research at Google
Apr 10, 2008 - suggestions in order to provide UI guidelines for mobile text prediction ... If the user mis-entered a query, the application would display an error ..... Hart, S.G., Staveland, L.E. Development of NASA-TLX Results of empirical and ...

A Social Query Model for Decentralized Search - Research at Google
Aug 24, 2008 - social search as well as peer-to-peer networks [17, 18, 1]. ...... a P2P service, where the greedy key-based routing will be replaced by the ...

Query Expansion Based-on Similarity of Terms for Improving Arabic ...
same meaning of the sentence. An example that .... clude: Duplicate white spaces removal, excessive tatweel (or Arabic letter Kashida) removal, HTML tags ...

Improving semantic topic clustering for search ... - Research at Google
come a remarkable resource for valuable business insights. For instance ..... queries from Google organic search data in January 2016, yielding 10, 077 distinct ...

Discriminative Keyword Spotting - Research at Google
Oct 6, 2008 - Email addresses: [email protected] (Joseph Keshet), ...... alignment and with automatic forced-alignment. The AUC of the discriminative.

Google Search by Voice - Research at Google
May 2, 2011 - 1.5. 6.2. 64. 1.8. 4.6. 256. 3.0. 4.6. CompressedArray. 8. 2.3. 5.0. 64. 5.6. 3.2. 256 16.4. 3.1 .... app phones (Android, iPhone) do high quality.

Google Search by Voice - Research at Google
Feb 3, 2012 - 02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 1 ..... app phones (Android, iPhone) do high quality speech capture.

Google Search by Voice - Research at Google
Kim et al., “Recent advances in broadcast news transcription,” in IEEE. Workshop on Automatic ... M-phones (including back-off) in an N-best list .... Technology.

SMALL-FOOTPRINT KEYWORD SPOTTING ... - Research at Google
example, Google offers the ability to search by voice [1] on Android devices and Apple's iOS ... margin formulation [10, 11] or recurrent neural networks [12, 13].

how to search by keyword
To create a new Playlist, drag and drop the desired learning object into the New Playlist box in the right-hand column. STEP TWO. eMediaVA will prompt you to ...

Hippocratic Abbreviation Expansion - Research at Google
amount of annotated data is used to build models to determine whether to ... “normalization” of social media such as Twitter, but that work defines ..... tions with Google internal SVM training tools. Note that the .... Categorical data analysis.

Keyword Spotting Research
this problem is stated as a convex optimization problem with constraints. ...... Joachims T 2002 Optimizing search engines using clickthrough data Proceedings ...

Google Search by Voice: A case study - Research at Google
of most value to end-users, and supplying a steady flow of data for training systems. Given the .... for directory assistance that we built on top of GMM. ..... mance of the language model on unseen query data (10K) when using Katz ..... themes, soci

Improving Access to Web Content at Google - Research at Google
Mar 12, 2008 - No Javascript. • Supports older and newer browsers alike. Lynx anyone? • Access keys; section headers. • Labels, filters, multi-account support ... my screen- reading application, this site is completely accessible for people wit

Search by Voice in Mandarin Chinese - Research at Google
client application running on an Android mobile telephone with an intermittent ... 26-30 September 2010, Makuhari, Chiba, Japan .... lar Mandarin phone.

Voice Search for Development - Research at Google
26-30 September 2010, Makuhari, Chiba, Japan. INTERSPEECH ... phone calls are famously inexpensive, but this is not true in most developing countries.).

The SMAPH System for Query Entity ... - Research at Google
Jul 6, 2014 - sifier is eventually used to make a final decision on whether to add an .... This way, the disambiguation is informed with a richer context and a ...

A Simple Linear Ranking Algorithm Using Query ... - Research at Google
we define an additional free variable (intercept, or benchmark) for each ... We call this parameter .... It is immediate to apply the ideas here within each category. ... international conference on Machine learning, pages 129–136, New York, NY, ..

Improving semantic topic clustering for search ... Research
[6] L. Hong and B. D. Davison. Empirical study of topic modeling in Twitter. In Proceedings of the First Work- shop on Social Media Analytics, pages 80 88. ACM,.