Viral, Quality, and Junk Videos on YouTube: Separating Content From Noise in an Information-Rich Environment R. Crane and D. Sornette Chair of Entrepreneurial Risks Department of Management, Technology and Economics ¨ ¨ ETH-Zurich, CH-8032 Zurich, Switzerland

Introduction

sense by the collective actions of the community (by mak-

With the rise of web 2.0 there is an ever-expanding source

ing a video the `most-viewed'). We nd that videos chosen

of interesting media because of the proliferation of usergenerated content. However, mixed in with this is a large amount of noise that creates a proverbial “needle in the haystack” when searching for relevant content.

Although

there is hope that the rich network of interwoven metadata

by the editors (exogenous) have a strikingly different history than those chosen by the community (endogenous). While both classes show a power-law relaxation (inset) over onehundred days following the peak, the videos featured by the community clearly display signicant precursory growth.

may contain enough structure to eventually help sift through ular” things.

Most Viewed Today Front Page

Identifying only the most popular items can be useful, but doing so fails to take into account the famous “long tail” small, niche interests can outweigh the market share of the few blockbuster (i.e. most-popular) items—thus providing only content that has mass appeal and masking the interests of the idiosyncratic many. YouTube, for example, hosts over 40 million videos— enough content to keep one occupied for more than 200 years.

Are there intelligent tools to search through this

information-rich environment and identify interesting and

8

10 Aggregate Daily View Count

behavior of the web—the notion that the collective effect of

Aggregate Daily View Count

this noise, currently many sites serve up only the “most popMost Viewed Today Front Page 8

10

7

10

6

10

2

1

4

6 8

2

4

6 8

10 100 Time (days following peak)

7

10

6

10

relevant content? Is there a way to identify emerging trends or “hot topics” in addition to indexing the long tail for content that has real value?

0

Information about quality is contained in the dynamics

50

100 150 200 250 300 Time (days centered on peak)

350

Figure 1: A non-parametric superposition of all videos appearing on the `front page' (editorial featuring) and the

We demonstrate that this is possible based on a form of dy-

`most-viewed today' page (community featuring). One im-

namic ltering. In essence, the relaxation signature follow-

mediately sees the exogenous effect of editorial featuring,

ing a burst of viewing activity reveals information about the

revealed by the lack of precursory growth in the view count,

quality of the content. This signature depends on the sus-

whereas endogenous growth is seen in the case of commu-

ceptibility of the social network, in addition to the type of

nity featuring. Inset: power-law relaxation in the 100 days

perturbation that generated the burst.

following the peak reveals long-memory effects.

We begin by considering two classes of perturbations: endogenous and exogenous. Their distinction—which is not required to be known a priori—is illustrated in gure 1. Here we show the aggregate time-series for videos appearing on the front page of YouTube along with those appearing on the `most-viewed today' page. Videos appearing on the front-page are chosen by the editors, whereas those on the `most-viewed today' page are `chosen' in a collaborative

Once a burst of activity has been triggered, its relaxation depends on the susceptibility of the underlying social network. If the community is “ripe” for the content, then each generation of viewers can easily pass on the video to the next generation, and one will nd the view count relaxes slowly. If instead the community is “uninterested”, then even a well-

c 2008, Association for the Advancement of Articial Copyright °

orchestrated marketing campaign will fail to spread through

Intelligence (www.aaai.org). All rights reserved.

the network and one will witness a fast relaxation.

Description of Data and Model

6

These ideas have been formalized and tested using a mas-

3

5

Daily View Count

Daily View Count

sive database tracking the time-series of the daily views over 1 year for almost 5 million videos on the popular site

2

1

10

1000

YouTube.com.

10

4

10

5 8 6

an epidemic branching process on a social network.

count of a video results from many factors such as featuring on YouTube, emailing (or other forms of sharing videos), embedding and linking from external websites, discussion

10

30

40

4 2 4

4

10 0

10

20 30 Time (days)

40

50

This

book sales (Sornette et al. 2004). The instantaneous view

1

10

5

dynamical response of the daily view count in the context of model was previously applied successfully to the case of

5

2

8 7 6

Quantifying these effects can be achieved by studying the

6

6 8 6

10

4

Figure 2:

0

10

20

50

Time (days)

Examples of endogenous (left) and exogenous

(right) bursts of activity for individual videos. The exponent of the power-law relaxation (inset) can be used to classify videos as viral, quality, or junk.

on blogs, in newspapers, television, and from social effects in which viewers may be inuenced by others in their network. The impact of these various factors may not be immediate, and this latency can be described by a response function

φ(t − ti ),

which on the basis of gure 1 we postulate

to be a long-memory process of the form with

0 < θ < 1.

φ(t) ∼ 1/t

1+θ

,

Using this, we can describe the rate of

views as a self-excited Hawkes conditional Poisson process that depends on all past events

λ(t) = V (t) +

X

µi

µi φ(t − ti )

(1)

is the number of potential viewers inuenced by

When the network is not “ripe”, corresponding to the case

hµi i

lowing a burst of activity depend on the susceptibility of the underlying social network to a particular video. We can therefore use the dynamic signature as a way of distinguishing—on the basis of the exponent of the

is less than 1, then the activity generated by an

exogenous event does not cascade beyond the rst few generations, and the activity is given by

1 Abare (t) ∼ (t − tc )1+θ

treme cases: viral videos, quality videos, and junk. In this context, viral videos are those with precursory word-of-mouth growth resulting from epidemic like prop-

a viewer at time ti and V (t) captures all spontaneous views that are not triggered by network effects. when

count dynamics implies that the relaxation signatures fol-

power law governing their relaxation—between three ex-

i|ti
Classication of Content As outlined above, the existence of memory in the view

agation through a social network, characterized by an exponent (1 − 2θ ). Quality videos are similar to viral videos, but experience a sudden burst of activity rather than a bottom-up growth, and because of the “quality” of their content, subsequently trigger an epidemic cascade through the social network, relaxing with an exponent (1−θ ). Lastly, junk videos are those that experience a burst of activity for some reason (spam, chance, etc) but do not spread through the social network. Therefore their activity is determined largely by the

(2)

rst-generation of viewers, and they should relax as (1 + θ ).

If instead the network is “ripe” for a particular video, then the bare response is renormalized as the spreading is propagated through many generations, and the theory predicts the activity to be described as

Aexo (t) ∼

1 (t − tc )1−θ

(3)

If in addition to being “ripe”, the burst of activity is not the result of an exogenous event, but is instead fueled by endogenous growth, the bare response is renormalized in a different way giving

Aendo (t) ∼

1 (t − tc )1−2θ

(4)

While these results strictly hold for an ensemble of timeseries because of the stochasticity involved, we nd a surprisingly large number of individual videos that seem to obey these power-law relaxations exactly. Examples of this are shown in gure 2, suggesting that we can apply this formalism to individual videos.

Figure 3: Exponents for videos grouped by the fraction of views contained in their most active day (peak) relative to the total.

This is a natural way of separating endogenous

from exogenous videos, since the former have signicant precursory growth, thus lowering the fractional weight contained in the peak.

Figure 3 shows the distribution of exponents obtained by grouping videos based on the fraction of views contained in their peak relative to the total. Videos experiencing an exogenous shock should have a very high percentage because there is little precursory growth, which is opposite for the endogenous case. Immediately one sees that based on this very simple criterion, the videos naturally fall into separate exponent classes, and we can extract

θ = 0.4 based on this

picture. A nal interesting result is that this classication does not rely on the magnitude of the largest peak, implying that identication of content can be made for large communities as well as more specialized, niche communities.

References Sornette, D.; Deschˆ atres, F.; Gilbert, T.; and Ageon, Y. 2004.

Endogenous versus exogenous shocks in complex

networks: an empirical test using book sale ranking. Phys. Rev. Lett. 93(22):228701.

Viral, Quality, and Junk Videos on YouTube ... -

Viral, Quality, and Junk Videos on YouTube: Separating Content From Noise in ... Although there is hope that the rich network of interwoven metadata.

158KB Sizes 2 Downloads 203 Views

Recommend Documents

Viral Growth How PayPal, YouTube And StumbleUpon Gained ...
Whoops! There was a problem loading more pages. Viral Growth How PayPal, YouTube And StumbleUpon Gained Rapid Traction Through Piggybacking.pdf.

Discriminative Tag Learning on YouTube Videos with ... - CiteSeerX
mance is obtained despite the high labeling noise. Fan et al. [8] also show that more effective classifiers can be ob- tained after pruning out the noisy tags by an ...

Descargar videos de youtube drippy
windows 8 64 bits gratis.descargar la biblia blackberry gratisen español.descargar peliculas nuevas 2013.Viaan email,cellular phone, remote ... blackberry gratis desde mi pc.descargaradobeflash player para bb z10.descargar libros gratis pdf.

Mining YouTube to Discover Extremist Videos, Users ... - Springer Link
The study focuses on. YouTube online video sharing and social networking website. In context to the ... bootstrapping or snow ball sampling [10]). Figure 1 shows various .... We identified top 12 most active users for the input set of videos and ...

Crowdsourcing Event Detection in YouTube Videos - CEUR Workshop ...
Presented with a potentially huge list of results, preview thumb- .... They use existing metadata and social features such as related videos and playlists a video ...

Descargar videos de youtube de san andreas
... De descargarandreas youtubesan de videos.descargar pdf paraandroid 2.2 ... descargar photoshop cs5 para macbook.descargar mapa v2 para need for speed ... need for speed most wanted rip en español.descargar gratis ultima version ...

Mining YouTube to Discover Extremist Videos, Users ...
is on mining YouTube to discover hate videos, users and virtual hidden .... the framework that we developed to analyze the data set from YouTube, the.

YouTube is helping Picovico videos reach wide audience
Picovico is a video creation app that automatically creates a beautiful video from a selection of users' online content—videos, photos from Facebook and Flickr ...

Viral Meningitis_11.3.2017_Information.pdf
Cover your coughs and sneezes with tissue or your upper shirt sleeve, not your hands. Avoid close contact such as kissing, hugging, or sharing cups or eating utensils with people who are. sick. Clean and disinfect potentially contaminated items and s

Viral Pharyngitis.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item.

Advertising on Google AdWords, YouTube, and the ...
SEO 2017: Learn search engine optimization with smart internet marketing strategies · 500 Social Media Marketing Tips: Essential Advice, Hints and Strategy for ...

Advertising on YouTube and TV: A Meta ... - Research at Google
Dec 3, 2015 - complemented with online advertising to increase combined reach. ... whether a TV campaign should add online advertising; secondly, we train ...

Optimal Taxation and Junk Food
Urbana, IL 61801, USA. Harry Tsang ... Grand Forks, ND 58202, USA. October ..... is available for expenditures on apples, bananas, oranges and other fresh.

Junk Food.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Junk Food.pdf.

1.12 Junk Food, Teacher.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... 1.12 Junk Food, Teacher.pdf. 1.12 Junk Food, Teacher.pdf. Open.