The YouTube-8M Kaggle Competition: Challenges and Methods Haosheng Zou*, Kun Xu*, Jialian Li, Jun Zhu Presented by: Yinpeng Dong All from Tsinghua University 2017.7.26

Contents ■ ■ ■ ■

Introduction & Definition Challenges Our Methods & Results Other Methods

2

Introduction ■

3

Problem Definition ■

4

Challenges 1. 2. 3. 4. 5. 6. 7.

Dataset Scale Noisy Labels Lack of Supervision Temporal Dependencies Multi-modal Learning Multiple Labels In-class Imbalance

5

Challenges (cont.) 1. Dataset Scale: ◻ ◻ ◻



5M (or 6M) training videos, 225 frames / video, 1024 (+128) dimension features / frame. Disk I/O in each mini-batch. Validation takes several (~10) hours.

Downsample; smaller validation set; …

2. Noisy Labels: ◻ ◻ ◻



Rule-based annotated labels, not crowdsourcing 14.5% recall w.r.t. crowdsourcing, positive→negative Negative dominates; learning the annotation system

Ensemble; more randomness; … 6

Challenges (cont.) 3. Lack of Supervision: ◻ ◻



No information about each frame. Only video-level supervision for the whole model.

Attention; auto-encoders; …

4. Temporal Dependencies: ◻ ◻



Features haven’t yet taken into account. Humans can still understand videos at 1 fps.

RNNs; clustering-based models (e.g. VLAD); …

7

Challenges (cont.) ■

8

Challenges (cont.) ■

9

Our Methods, High-Level ■

Random cropping: Take 1 frame every 5 frames ◻ ◻



Multi-Crop Ensemble: ◻ ◻



Rougher temporal dependencies Only the start index is randomized One model, varying the start index Uniformly averaging

Early Stopping: ◻ ◻

Fix 5 epochs of training at most Train directly on training and validation sets.

10

Our Methods, Model ■

Prototype: stacked LSTM (1024-1024) + LR / 2MoE



Layer Normalization Late Fusion



11

Our Methods (cont.) ■

Attention



Bidirectional LSTM 12

Our Results

13

Other Methods ■

Separating Tasks ◻ ◻



Loss Manipulation ◻



Different frame understanding block, thus different video descriptor for each meta-task 25 verticals as meta-tasks, too slow (15 exmpls / s) Ignore negative labels when predicted confidence < 0.15

Unsupervised Representation Learning ◻

Using visual to reconstruct both visual and audio features

14

Conclusion 1. 2. 3. 4. 5. 6. 7.

Dataset Scale Noisy Labels Lack of Supervision Temporal Dependencies Multi-modal Learning Multiple Labels In-class Imbalance

15

Thank you! Q&A

The YouTube-8M Kaggle Competition ... - Research at Google

Jul 26, 2017 - 5M (or 6M) training videos, 225 frames / video, 1024 ... Attention; auto-encoders; … 4. Temporal ... Train directly on training and validation sets.

949KB Sizes 1 Downloads 120 Views

Recommend Documents

The MAD Geography Bowl competition at the annual ...
Team members for the Geography Bowl at the AAG meeting in Seattle are: Victoria Roman, George Washington University. Trevor Tisler, George Washington University. Chris Dube, University of Maryland-College Park. Colin Reisser, George Washington Univer

BeyondCorp - Research at Google
41, NO. 1 www.usenix.org. BeyondCorp. Design to Deployment at Google ... internal networks and external networks to be completely untrusted, and ... the Trust Inferer, Device Inventory Service, Access Control Engine, Access Policy, Gate-.

article - Research at Google
Jan 27, 2015 - free assemblies is theoretically possible.41 Though the trends show a marked .... loop of Tile A, and the polymerase extends the strand, unravelling the stem ..... Reif, J. Local Parallel Biomolecular Computation. In DNA-.

Contents - Research at Google
prediction, covering 2 well-known benchmark datasets and a real world wind ..... Wind provides a non-polluting renewable energy source, and it has been.

ausdm05 - Research at Google
Togaware, again hosting the website and the conference management system, ... 10:30 - 11:00 INCORPORATE DOMAIN KNOWLEDGE INTO SUPPORT VECTOR ...... strength of every objects oi against itself to locate a 'best fit' based on the.

Browse - Research at Google
tion rates, including website popularity (top web- .... Several of the Internet's most popular web- sites .... can't capture search, e-mail, or social media when they ..... 10%. N/A. Table 2: HTTPS support among each set of websites, February 2017.

BeyondCorp - Research at Google
Dec 6, 2014 - Rather, one should assume that an internal network is as fraught with danger as .... service-level authorization to enterprise applications on a.

sysadmin - Research at Google
On-call/pager response is critical to the immediate health of the service, and ... Resolving each on-call incident takes between minutes ..... The conference has.