The YouTube-8M Kaggle Competition: Challenges and Methods Haosheng Zou*, Kun Xu*, Jialian Li, Jun Zhu Presented by: Yinpeng Dong All from Tsinghua University 2017.7.26

Contents ■ ■ ■ ■

Introduction & Definition Challenges Our Methods & Results Other Methods

2

Introduction ■

3

Problem Definition ■

4

Challenges 1. 2. 3. 4. 5. 6. 7.

Dataset Scale Noisy Labels Lack of Supervision Temporal Dependencies Multi-modal Learning Multiple Labels In-class Imbalance

5

Challenges (cont.) 1. Dataset Scale: ◻ ◻ ◻



5M (or 6M) training videos, 225 frames / video, 1024 (+128) dimension features / frame. Disk I/O in each mini-batch. Validation takes several (~10) hours.

Downsample; smaller validation set; …

2. Noisy Labels: ◻ ◻ ◻



Rule-based annotated labels, not crowdsourcing 14.5% recall w.r.t. crowdsourcing, positive→negative Negative dominates; learning the annotation system

Ensemble; more randomness; … 6

Challenges (cont.) 3. Lack of Supervision: ◻ ◻



No information about each frame. Only video-level supervision for the whole model.

Attention; auto-encoders; …

4. Temporal Dependencies: ◻ ◻



Features haven’t yet taken into account. Humans can still understand videos at 1 fps.

RNNs; clustering-based models (e.g. VLAD); …

7

Challenges (cont.) ■

8

Challenges (cont.) ■

9

Our Methods, High-Level ■

Random cropping: Take 1 frame every 5 frames ◻ ◻



Multi-Crop Ensemble: ◻ ◻



Rougher temporal dependencies Only the start index is randomized One model, varying the start index Uniformly averaging

Early Stopping: ◻ ◻

Fix 5 epochs of training at most Train directly on training and validation sets.

10

Our Methods, Model ■

Prototype: stacked LSTM (1024-1024) + LR / 2MoE



Layer Normalization Late Fusion



11

Our Methods (cont.) ■

Attention



Bidirectional LSTM 12

Our Results

13

Other Methods ■

Separating Tasks ◻ ◻



Loss Manipulation ◻



Different frame understanding block, thus different video descriptor for each meta-task 25 verticals as meta-tasks, too slow (15 exmpls / s) Ignore negative labels when predicted confidence < 0.15

Unsupervised Representation Learning ◻

Using visual to reconstruct both visual and audio features

14

Conclusion 1. 2. 3. 4. 5. 6. 7.

Dataset Scale Noisy Labels Lack of Supervision Temporal Dependencies Multi-modal Learning Multiple Labels In-class Imbalance

15

Thank you! Q&A

The YouTube-8M Kaggle Competition ... - Research at Google

Jul 26, 2017 - 5M (or 6M) training videos, 225 frames / video, 1024 ... Attention; auto-encoders; … 4. Temporal ... Train directly on training and validation sets.

949KB Sizes 2 Downloads 325 Views

Recommend Documents

Competition and Fraud in Online Advertising ... - Research at Google
Advertising fraud, particularly click fraud, is a growing concern to the online adver- .... Thus, in equilibrium, ad network 1 will choose to filter at a level x1 greater than x∗, and win over ... 3828 of Lecture Notes in Computer Science, Springer

Accuracy at the Top - Research at Google
We define an algorithm optimizing a convex surrogate of the ... as search engines or recommendation systems, since most users of these systems browse or ...

The MAD Geography Bowl competition at the annual ...
Team members for the Geography Bowl at the AAG meeting in Seattle are: Victoria Roman, George Washington University. Trevor Tisler, George Washington University. Chris Dube, University of Maryland-College Park. Colin Reisser, George Washington Univer

Mathematics at - Research at Google
Index. 1. How Google started. 2. PageRank. 3. Gallery of Mathematics. 4. Questions ... http://www.google.es/intl/es/about/corporate/company/history.html. ○.

Research Paper Competition Guideline.pdf
This year AMSW uphold the theme. “Antimicrobial ... Don't forget to make your title interesting in order to ... Abstract must be submitted on .pdf and .docx format.

Faucet - Research at Google
infrastructure, allowing new network services and bug fixes to be rapidly and safely .... as shown in figure 1, realizing the benefits of SDN in that network without ...

BeyondCorp - Research at Google
41, NO. 1 www.usenix.org. BeyondCorp. Design to Deployment at Google ... internal networks and external networks to be completely untrusted, and ... the Trust Inferer, Device Inventory Service, Access Control Engine, Access Policy, Gate-.

VP8 - Research at Google
coding and parallel processing friendly data partitioning; section 8 .... 4. REFERENCE FRAMES. VP8 uses three types of reference frames for inter prediction: ...

JSWhiz - Research at Google
Feb 27, 2013 - and delete memory allocation API requiring matching calls. This situation is further ... process to find memory leaks in Section 3. In this section we ... bile devices, such as Chromebooks or mobile tablets, which typically have less .

Yiddish - Research at Google
translation system for these language pairs, although online dictionaries exist. ..... http://www.unesco.org/culture/ich/index.php?pg=00206. Haifeng Wang, Hua ...

traits.js - Research at Google
on the first page. To copy otherwise, to republish, to post on servers or to redistribute ..... quite pleasant to use as a library without dedicated syntax. Nevertheless ...

sysadmin - Research at Google
On-call/pager response is critical to the immediate health of the service, and ... Resolving each on-call incident takes between minutes ..... The conference has.

Introduction - Research at Google
Although most state-of-the-art approaches to speech recognition are based on the use of. HMMs and .... Figure 1.1 Illustration of the notion of margin. additional ...

References - Research at Google
A. Blum and J. Hartline. Near-Optimal Online Auctions. ... Sponsored search auctions via machine learning. ... Envy-Free Auction for Digital Goods. In Proc. of 4th ...

BeyondCorp - Research at Google
Dec 6, 2014 - Rather, one should assume that an internal network is as fraught with danger as .... service-level authorization to enterprise applications on a.

Browse - Research at Google
tion rates, including website popularity (top web- .... Several of the Internet's most popular web- sites .... can't capture search, e-mail, or social media when they ..... 10%. N/A. Table 2: HTTPS support among each set of websites, February 2017.

Continuous Pipelines at Google - Research at Google
May 12, 2015 - Origin of the Pipeline Design Pattern. Initial Effect of Big Data on the Simple Pipeline Pattern. Challenges to the Periodic Pipeline Pattern.

Collaboration in the Cloud at Google - Research at Google
Jan 8, 2014 - all Google employees1, this paper shows how the. Google Docs .... Figure 2: Collaboration activity on a design document. The X axis is .... Desktop/Laptop .... documents created by employees in Sales and Market- ing each ...

Collaboration in the Cloud at Google - Research at Google
Jan 8, 2014 - Collaboration in the Cloud at Google. Yunting Sun ... Google Docs is a cloud productivity suite and it is designed to make ... For example, the review of Google Docs in .... Figure 4: The activity on a phone interview docu- ment.

slide - Research at Google
Gunhee Kim1. Seil Na1. Jisung Kim2. Sangho Lee1. Youngjae Yu1. Code : https://github.com/seilna/youtube8m. Team SNUVL X SKT (8th Ranked). 1 ... Page 9 ...

1 - Research at Google
nated marketing areas (DMA, [3]), provides a significant qual- ity boost to the LM, ... geo-LM in Eq. (1). The direct use of Stolcke entropy pruning [8] becomes far from straight- .... 10-best hypotheses output by the 1-st pass LM. Decoding each of .

1 - Research at Google
circles on to a nD grid, as illustrated in Figure 6 in 2D. ... Figure 6: Illustration of the simultaneous rasterization of ..... 335373), and gifts from Adobe Research.

Condor - Research at Google
1. INTRODUCTION. During the design of a datacenter topology, a network ar- chitect must balance .... communication with applications and services located on.