Anti-Preferential Attachment: If I Follow You, Will You Follow Me? Juan Lang

S. Felix Wu

Department of Computer Science University of California, Davis, California, USA [email protected]

Department of Computer Science University of California, Davis, California, USA Blekinge Institute of Technology, Karlskrona, Sweden [email protected]

Abstract—A common question in social networking research is how edges form to produce social graphs with the common characteristics, including a power-law degree distribution and a small diameter. One common model for edge formation in synthetic networks is preferential attachment. We examine the edge formation process of one Online Social Network (OSN), Buzznet, and look for evidence for preferential attachment. To our surprise, we find that a form of “anti-preferential attachment” is common, in which high-degree nodes add edges to lowdegree nodes, perhaps as a means of self-promotion. We also find that nodes are most likely to reciprocate edges from low-degree nodes, limiting the extent to which anti-preferential attachment can succeed in boosting a high-degree node’s in-degree.

I. I NTRODUCTION A topic of frequent interest in social networks is models by which the networks might grow. Many growth models have been proposed. One of the most common models is the preferential attachment model [1]. In the preferential attachment model, new nodes join a network, and form edges to existing nodes with probability proportional to the existing nodes’ degree. In undirected social networks, such a model is sufficient to describe the network’s growth. In a directed network, in contrast, two additional questions must be answered before the model is complete: First, if one node adds an edge to another, under what circumstances does the target of the first edge an edge to that edge’s source? That is, which nodes reciprocate edges, and to whom? Second, given reciprocal edges between two nodes, which edge comes first? In the preferential attachment model, the edges implicitly come from the lower-degree node first. For example, many users might be interesting in following a celebrity, but a celebrity may have relatively few users he or she is interested in following. While many models have been proposed, few studies have attempted to validate the models using order of edge addition in OSNs, perhaps because of the difficulty of getting edge creation data. In this study, we look for evidence for directed preferential attachment in one OSN, Buzznet (http://buzznet.com). We do not have a precise order of edge addition, but by bounding the dates on which edges could have been added, we are able to model the growth of the network. In contrast to some OSNs, in Buzznet edge formation is not by default a prerequisite for communication, hence edge formation can be studied independently of communication. Buzznet is also not a platform for third-party applications, so we do not have to

worry that edges are “fake” edges used to build capital in a social game [2]. This work has three main contributions. First, we show that by introducing an upper bound for edge creation time, we achieve a significant improvement in the precision with which edge creation times can be inferred on the basis of a single crawl. In many OSNs, it is possible to obtain an upper bound for edge creation time relatively easily: Facebook, for example, lists recent friend additions in a profile’s feed. Even the absence of a friend listing in a profile’s feed establishes an upper bound: if the oldest activity listed occurred on a date, and no friend additions are listed on or after that date, then that date serves as an upper found for edge formation. Our second main finding is that, in contrast to existing models, a form of “anti-preferential attachment” in which more popular nodes create edges to less popular nodes is common, perhaps as a means of self-promotion. If users add edges to other users, they may be hoping that the human tendency to reciprocate will induce the receiving users to add edges back to them, increasing their own popularity. Our third main finding is that users are more likely to reciprocate an edge from a less popular node than from a more popular node, limiting the extent to which self-promotional edge formation will succeed. The rest of this work is organized as follows. We present related work in Section II. Background for this work is presented in Section III. Results based on observations of the final crawled state of the social graph are presented in Section IV. Results based on the subset of precisely dated edges are presented in Section V. Results based on modeling edge formation are presented in Section VI. We conclude in Section VII. II. R ELATED W ORK Preferential attachment was first proposed by Barab´asi and Albert [1] to address a shortcoming of random graph models, that they did not exhibit a power-law distribution of node degree common to many real networks. Since its introduction, preferential attachment-like behavior has been observed in the growth of real networks, e.g. by Newman [3] and by Jeong et al [4]. The growth and evolution of OSNs have seen surprisingly little analysis, considering the volume of literature on OSNs in general. Leskovec et al [5] examine the

III. BACKGROUND We began this experiment by performing a breadth-first search-based crawl of Buzznet’s social graph, until no new nodes appeared in our crawl. This graph represented the largest-connected component (LCC) of Buzznet’s social network graph. In all, we gathered approximately 750,000 users and 9 million directed edges comprising the LCC. We also gathered a combination of precise and approximate creation dates for edges within the LCC, which we will describe in more detail shortly. More formally, let G = (V, E) represent the Buzznet social graph, where V is the set of users, and E is the set of edges. Given two users, u, v ∈ V , and an edge e = u → v ∈ E, u and v are neighbors, and u follows v. u is one of v’s followers, and v is one of u’s followees. Definition 1. User u’s in degree ki (u) = |v| such that there is an edge v → u ∈ E.

1 0.9 0.8 0.7 0.6 CDF

evolution of four OSNs and show that preferential attachmentlike behavior is present in each of them. They also show that there is a tendency for edges to exhibit triadic closure, i.e. to introduce edges between nodes that are two hops away from one another in the graph prior to the edge’s addition. Subsequent analysis by Kumar et al [6] of the evolution of two OSNs shows some phenomena not explained merely by preferential attachment, suggesting that more advanced models are needed to explain OSN growth. They only examine the graph consisting of mutual edges from the original OSNs, i.e. treating the social graphs as undirected graphs, and not examining reciprocity. Romero and Kleinberg [7] show that in Twitter, the triad closure process appears to produce “feedforward” triads, in which two of the nodes in a triad have edges to one of the nodes, while one of these two following nodes also follows the other following node. Meeder et al [8] show that the edge creation times for edges incident on users with a high rate of follower growth on Twitter can be inferred from a single crawl, rather than needing to sample the OSN repeatedly to obtain edge creation times. The method they use is similar to that we employ to constrain edge creation dates within Buzznet. Because Buzznet is much smaller than Twitter, and the most popular users on Buzznet have at least an order of magnitude fewer followers than the most popular users on Twitter, the precision we achieve is less than that they do. In contrast to their work, we estimate the creation dates for all users in the social graph, rather than only for the most popular users, and use these dates to draw conclusions they cannot. Reciprocity in human behavior is a well known phenomenon [9], and it has been demonstrated in many settings: see e.g. [10], [11]. Its presence among edges in OSNs has also been demonstrated many times, e.g. by Leskovec et al [5] and by Kumar et al [6]. Under what circumstances edges are and are not reciprocated in OSNs has not been studied before, to our knowledge.

0.5 0.4 0.3 0.2 0.1 0 1

10

100

Fig. 1.

Celebrities

8.6%

5.0%

3.5%

10000

100000

1e+06

CDF of in degree

0.2%

10.4%

Mixed

2.8%

Fans

1000 In degree

60.8%

3.7%

5.0%

Unknown

Fig. 2.

Edges between classes

Definition 2. User u’s out degree ko (u) = |v| such that there is an edge u → v ∈ E. The CDF of in degree for users in the LCC appears in Figure 1. Note that it follows the familiar power law distribution, with the largest group of users having an in degree of 1, and a very small fraction of users having a much larger in degree. We divide the users in the LCC into classes based on their followers and followees. First, we define a group of Celebrities, based on the users’ in degree. Definition 3. A Celebrity is a user whose in degree is above the 99.99th percentile. The reasoning for defining such a class is twofold: First, the existence of celebrities with a much higher in degree than average can be produced by the preferential attachment model, and we wish to investigate evidence for the preferential attachment model. Second, celebrities behave qualitatively differently online than do more typical users [12], [13]. Based on this definition of celebrities, we define the following non-overlapping classes: Definition 4. A Fan is a user who follows only Celebrities. Definition 5. A Mixed user is a user who is not a Celebrity and who follows at least one non-Celebrity.

Definition 6. An Unknown user is a user with no followees or whose followees are unknown.

Definition 8. The feasible start date for edge s → t fs (s, t) ≥ max[ max fs (u, t), max fs (s, v)]

Reasons for users’ followees being unknown include that their profiles were private or that they had no followees at the time of our crawl. Approximately 1/3 of the users in the LCC belong to one of each of the Fans, Mixed, or Unknown classes. The fraction of edges between each class of users is shown in Figure 2. The Unknown class has, by definition, no outgoing edges. There are 101 Celebrities in our crawl, with a minimum in degree of 3,032, and a maximum in degree of over 180,000. In all, the 101 Celebrities were incident on over 30% of all edges in the LCC. In general, Buzznet does not report the date on which edges were created. For each user, Buzznet reports the date on which at most one of the user’s outgoing edges was added. We collected this edge date for many of the users in the LCC of the social graph. yielding a total of approximately 300,000 precisely dated edges, approximately 3% of the edges in the LCC. For the remaining edges, were were able to obtain a range of feasible dates on which edge could have been created. We did so by obtaining lower and upper bounds for each user’s edge creation dates, and combining these with the order of edge creation to yield a range of feasible dates. We now describe each of these in more detail:

Because this definition is recursive, and determining it would require creating a partial order of all edges in order to resolve it, we relax the constraint by considering only the account creation dates of all members of ip (t, s) and op (s, t):

A. Lower bound For most users, Buzznet reports the user’s account creation date1 . If an account’s creation date is missing, we set the creation date to the earliest activity we recorded from the site, January 1st, 2003. Let dc (u) be account u’s creation date. For an edge s → t, the earliest feasible date for the edge can be no earlier than max[dc (s), dc (t)]. We define this as the feasible start date determined by the creation date, Definition 7. fsc (s, t) = max[dc (s), dc (t)] Buzznet allows one to obtain a user’s followers or followees listed in the order in which the edges were added. Using this order allows us to improve the lower bound on an edge’s creation date. For a given edge s → t, there is a set of followers of t who added edges to t prior to s. Let this set be ip (t, s). Then edge s → t must have been added on or after the last feasible date of any of these edges. Let the feasible start date of edge s → t be fs (s, t). Then fs (s, t) ≥ max fs (u, t). u∈ip (t,s)

Similarly, for an edge s → t, there is a of s whom s added prior to t. Let this Then edge s → t must have been added latest feasible date of any of these edges. fs (s, t) ≥ max fs (v, t).

set of followees set be op (s, t). on or after the In other words,

v∈op (s,t)

Combining the two yields: 1 96%

of users in the LCC have a valid account creation date.

u∈ip (t,s)

v∈op (s,t)

Definition 9. The relaxed feasible start date for edge s → t fs0 (s, t) ≥ max[ max dc (u), max dc (v), dc (s), dc (t)] u∈ip (t,s)

v∈op (s,t)

The true creation date for edge s → t is d(s, t), which may be unknown. The relationship between the estimates for the feasible start date and the true creation date for edge s → t is: fsc (s, t) ≤ fs0 (s, t) ≤ fs (s, t) ≤ d(s, t) B. Upper bound For most users, Buzznet reports the user’s last date having logged into the site2 . This date, do (u), is an upper bound for any edges originating at u. When any user’s last date having logged in is unknown, we set the last feasible date for any edges originating at u to the last date of our crawl, June 10th, 2009. We can further refine this date for many edges using the order of edge creation, in a similar way to how we refine the feasible start date. Let fe (s, t) be our best upper bound for edge s → t. Let the set of followers of t who added edges to t after s be ia (t, s). Let the set of followees of s whom s added after t be oa (s, t). Then Definition 10. The feasible end date for edge s → t fe (s, t) ≤ min[ min fe (u, t), min fe (s, v)] u∈ia (t,s)

v∈oa (s,t)

Just as with the lower bound, we relax this upper bound by considering only the last login date of users in ia (t, s) and oa (s, t): Definition 11. The relaxed feasible end date for edge s → t fe0 (s, t) ≤ min[ min do (u), min do (v), do (s)] u∈ia (t,s)

v∈oa (s,t)

The relationship between the estimates for the feasible end date and the true creation date for edge s → t is: do (s) ≥ fe0 (s, t) ≥ fe (s, t) ≥ d(s, t) C. Total order of edge creation Using the determined bounds for an edge’s creation date, we can model the graph’s creation by assigning a precise date to each edge from among the range of feasible dates for the edge. Using a precise ordering of the edges allows us to investigate the phenomena we examine here, preferential attachment and reciprocation. We assign to each edge a date: Definition 12. The assigned date on which edge s → t is created is given by da (s, t) ∈ [fs0 (s, t), fe0 (s, t)]. 2 The last online date for users in the LCC was crawled after the initial crawl which produced the LCC. 82% of the users in the LCC have a last online date within the initial crawl period.

We assign edges dates within their feasible range in more than one way, described in more detail in Section VI. Once all edges are assigned a date within their feasible range, we create a total ordering over all edges by permuting all edges assigned the same date, such that each edge is given a precise order in which it is created. Definition 13. Let τ (s, t) ∈ [1, 2, ..., |E|] be edge s → t’s assigned creation order, where the first edge created gets order 1, the second edge created gets order 2, and so on. This allows us to define a time for the graph. Definition 14. Given an order T ∈ [1, 2, ..., |E|], the graph GT = (V, ET ), where ET = s → t ∈ E|τ (s, t) ≤ T . While we know most nodes’ creation dates, we ignore them other than using them to constrain the edges’ feasible dates. D. Probability of reciprocating an edge We are interested in the probability that an edge is reciprocated. In the context of a static graph, this probability is simple: an edge is either reciprocated or not. In the context of an evolving graph, we can ask instead, given that an edge is created on a particular date, what is the probability that the edge is reciprocated at a later date? Definition 15. The probability that edge s → t is reciprocated is   if t → s ∈ E and τ (t, s) > τ (s, t). 1, pr (s, t) = 0, if t → s ∈ / E.   undefined, otherwise. The undefined case covers when edge t → s exists, but was created prior to edge s → t: only the first edge is counted as having been reciprocated, as the second edge is in response to the first. We will use this probability for groups of edges to examine the probability of groups of nodes reciprocating edges. In particular, we will look at the probability that nodes with a particular degree reciprocate an edge. The degree we will examine is also dependent on the date within the graph, which requires augmenting our definition of degree: Definition 16. User u’s in degree at time T ki (u, T ) = |v| such that there is an edge v → u ∈ ET . Definition 17. User u’s out degree ko (u, T ) = |v| such that there is an edge u → v ∈ ET . E. Coping with inconsistent data While we (and other authors) treat the data retrieved from the OSNs we study as ground truth, there are cases in which the data are inconsistent. For example, we concluded our crawl of the LCC on June 10th, 2009, yet there are users in the LCC whose account creation date reported by Buzznet is after that date. As another example, there are users who have activity on the site prior to their account creation date. Some inconsistencies could be explained by ordinary behavior on the site, e.g. a user account being deleted and another being created

with the same account name later could produce inconsistent dates. Others must be the result of errors in the data reported by the site. In order to deal with potentially inconsistent data, when applying the constraints described here, if we discover that data from a user produce an impossible constraint, i.e. an edge whose first feasible date is after the edge’s last feasible date, we do not apply that user’s data on any edge. In other words, we blacklist users whose data produce impossible edge constraints. Doing so results in a potential loss of precision, but we hope in little loss of accuracy. We will discuss the accuracy of the data when we present our results, which we do now. IV. R ESULTS BASED ON S TATIC G RAPH Figure 2 shows the fraction of edges among all the classes we defined in Section III. As we noted earlier, a large fraction of edges are incident on the Celebrities class, suggesting that preferential attachment may be part of the edge formation process. Nonetheless, an interesting pattern appears: with the exception of edges to the Unknown class, who have no outgoing edges, the fraction of edges from the Celebrities class to each other class is larger than the fraction of edges from that class to the Celebrities. Partly, this is a result of automatically created edges: when users create new accounts on Buzznet, they are automatically followed by two accounts, buzzbot and panasonicyouth. In order not to conflate the behavior of automatically created edges, edges incident on these two users are excluded from remaining analysis. We first look at the difference between the in-degree ki of a node and its out-degree ko . Based on a directed interpretation of the preferential attachment model, we might expect that Celebrities would have a much higher in-degree than outdegree, as new nodes would add edges to them with high probability, but the Celebrities would not reciprocate many of these edges, since most of their followers must have low indegree. In such case, we would expect the difference between the in-degree and out-degree to be positive among Celebrities. Similarly, we might expect that Fans would exhibit a negative difference between their in-degree and out-degree, as their own in-degrees are quite low. Table I shows the median and mode difference between in-degree and out-degree for the various classes of users (excluding Unknown, who have no out edges.) The Mixed and Fans classes both have median and mode differences between in- and out-degree of 0, implying that it is the norm to have similar numbers of followers and followees in Buzznet. It is striking to note that the median and mode difference between in- and out-degree among Celebrities is negative, indicating that most celebrities within Buzznet have higher numbers of followees than followers. Indeed, among the 99 Celebrities excluding the automatic edge generators, 90 of them have higher out-degree than in-degree, and in some cases the difference is quite large. One user, for example, has an indegree of nearly 30,000, but an out-degree of over 150,000. There are also notable exceptions: one user has an out-degree of 1, but an in-degree of over 50,000.

TABLE I D IFFERENCE IN D EGREE

Class

Difference between in- and out-degree Median Mode

Celebrities Fans Mixed

-1020.0 0.0 0.0

-496.0 0.0 0.0 TABLE II E DGE RECIPROCITY

Class

Celebrities

0.0%

Mixed

0.0%

10.1%

45.5%

1.8%

0.0% 42.4%

0.2%

Edges reciprocated (%)

LCC Celebrities Fans Mixed Unknown

Fans

57 91 42 81 0

Fig. 3.

0.0%

Precisely dated edges between classes

TABLE III R ECIPROCAL EDGES WITH DIFFERENCES IN EDGE DATE

A closer look at the rate at which each user class reciprocates edges sheds more light on the behavior of each class. The fraction of all incoming edges reciprocated by each class of users is shown in Table II. Overall, among all user pairs with edges between them, 57% of user pairs have mutual edges, but the rate at which each class reciprocates edges differs. The Celebrities reciprocate at a higher rate than any other class, reciprocating over 90% of all edges they receive. Fans, by contrast, only reciprocate just over 40% of all edges they receive. Since Fans only follow Celebrities, this could indicate that Fans have no interest in other users, regardless of whether the other users have interest in them. The Mixed users also reciprocate at a high rate, while the Unknown users by definition reciprocate no edges. Combined, the state of the static graph seems to suggest that the most popular users in Buzznet may have achieved their popularity by following many users, inducing many of them to reciprocate the edge. It is possible that the graph grew in other ways: for instance, it could be the case that large numbers of users added edges to Celebrities, and removed them prior to our crawl. We have no data on edge removal, so we cannot rule out this possibility, though it does seem unlikely. To investigate the anti-preferential attachment behavior we see in the final state of the graph during our crawl, we look next at the behavior of the precisely dated edges we collected. V. R ESULTS U SING P RECISELY DATED E DGES As we stated in Section III, we collected approximately 300,000 precisely dated edges from Buzznet. We then applied the constraints process described there, and for those edges where the earliest feasible date was equal to the latest feasible date, we assigned this date as the edge’s true date. Doing so yielded a total of approximately 500,000 precisely dated edges. 59% of precisely dated edges have a reciprocal edge, slightly higher than the fraction among all edges in the LCC. Of the precisely dated edges with a reciprocal edge, approximately 10,000 of them have precise dates for both edges. Of these, 55% have both reciprocal edges created on the same day, i.e. with ambiguous ordering. The edge pairs with precise

Edge order Equal degree Lower → higher first Higher → lower first

Number of pairs

Mean degree difference

127 889 3569

n/a 43.8 902.9

dates and a difference in edge creation date are summarized in Table III. Of the reciprocal edge pairs with precise dates and a difference in edge creation date, approximately 80% of the pairs have the edge from the higher-degree node first, and approximately 20% of the pairs have the edge from the lowerdegree node first. For those with an edge from a lower-degree node first, the mean difference in degree is relatively small, while for those with an edge from a higher-degree node first, the difference in degree is relatively large. In other words, for those few reciprocal edge pairs whose edge dates are known and different, preferential attachment does not predict edge formation. Rather, a form of anti-preferential attachment, where higher-degree nodes “welcome” lower-degree nodes, appears common. Nonetheless, we cannot be certain that these edge pairs are representative. In fact, we are confident that precisely dated edges from high-degree nodes are under-represented: as we described in Section III, Buzznet reports at most one precisely dated edge from each user. For users with high out degree, this represents a much lower fraction of their outgoing edges than are represented among users with lower degree. This phenomenon is displayed visually in Figure 3. As can be seen, the fraction of precisely dated edges from Celebrities is dramatically underrepresented relative to their presence in the LCC, whereas precisely dated edges from Fans to Celebrities are over-represented relative to their presence in the LCC. In order to investigate preferential attachment in the social graph as a whole, we must make use of imprecisely dated edges. We describe how we do so in the subsequent section. VI. M ODELING E DGE F ORMATION We begin the modeling process by applying the constraints process described in Section III. We first constrain the edges’

TABLE IV F EASIBLE RANGE PRECISION Average precision (days) Mean Median

fsc fs0 fs0 and fe0

655.2 429.2 391.1

628 393 331

Accuracy (%) 99.9 96.0 95.0

feasible start dates by using the account creation dates of the edge partners, i.e. by determining fsc . We then constrain the edges’ feasible start dates by determining the relaxed constraint fs0 using the order of edge creation. We finally apply the relaxed constraint to the feasible edge date, fe0 . The results of applying each of these constraints are summarized in Table IV. As can be seen, the average precision for each date is quite broad: at best, we achieve a precision of approximately one year on average. Nonetheless, the addition of an upper bound on edge creation increases precision significantly: while applying the lower bound constraint fs0 increases the average precision by approximately 20% over the simpler fsc , applying the upper bound constraint fe0 increases the average precision by approximately an additional 30%. Table IV also presents the accuracy of the edges, measured by computing the fraction of precisely dated edges whose actual edge creation date is within the feasible range determined by the constraints process. As can be seen, the weakest constraints fsc have the highest fraction of accurate edges. Applying the relaxed edge start constraint fs0 results in 4% of the precisely dated edges falling outside of their computed feasible range. Applying fe0 and fs0 results in a very small reduction in accuracy compared to applying fs0 alone. Because the overall accuracy remains high, and because the source of error is inconsistency in the data we retrieve from the site, we accept the constraints derived from fs0 and fe0 . To continue to model the edge formation process, we wish to know when within the feasible range edges are most likely to have been created. Are they likely to have been created early within a feasible range, or late, or at any time within it? In order to answer this, we examine the distribution of feasible range sizes in order to determine when edges are likely to have been formed. The distribution of feasible range sizes (in days) is shown in Figure 4. The fraction of edges with each feasible date range size is shown in a log scale for visual clarity. We used maximum likelihood estimation to compare a number of distributions to this, and the distribution with the highest likelihood of matching is the exponential distribution. While the sizes have discrete values, and the exponential distribution is continuous, we can think of edges having been created at a specific time, whose value is continuous, while the data we have, edge creation date, is artificially discrete. Figure 4 also shows a fit exponential curve. We use this curve to reason about the expected fraction of edges created on various days. As a starting point, assume the probability of an edge having been created at any time within its feasible range is uniform. Clearly, since the mode width of feasible range is 1 day, we

Observed Fit

0.01

0.001 Fraction of edges

Constraints applied

0.1

0.0001

1e-05

1e-06

1e-07 0

500

Fig. 4.

1000 1500 Feasible date range (days)

2000

Feasible date range distribution

would expect the highest fraction edges to have been created on the first day within the feasible range, as this is the only possibility for the largest fraction of edges. We determine the expected probability of an edge having been created on any day within its feasible range, and compare this probability to that which we see using the precisely dated edges. Let w(s, t) represent the width of edge s → t’s feasible range, in days. Let P (dr (s, t) = n) represent the probability that edge s → t was created on day n within the feasible range: P (dr (s, t) = 1) is the probability that edge s → t was created on the first feasible day, P (dr (s, t) = 2) is the probability that edge s → t was created on the second feasible day, and so on. The probability that an edge was created on day n within a feasible range of width x is ( P (w(s,t)=x) , x ≥ n. x P (dr (s, t) = n|w(s, t) = x) = 0, x
a X P (w(s, t) = x) x x=n

(1)

We fit an exponential curve to the probability distribution function (PDF) of the feasible range widths, using an equation of the following form: P DF (ω) = αe−ω/β

(2)

The continuous-valued variable ω replaces the discrete-valued w to represent the width of a feasible range. The fit values for α and β are 0.002665964 and 220.787, respectively. Because the PDF is continuous, while the probability we are trying to determine is discrete, we define the probability that the feasible width has value x to be: P (w(s, t) = x) = P DF (ω ≤ x) − P DF (ω ≤ x − 1) (3)

350

1

vs. kin(s) vs. ko(s) vs. kin(t) vs. ko(t)

Observed Expected 300

Median feasible range size (days)

Fraction of edges

0.1

0.01

0.001

250

200

150

100

0.0001 50

1e-05 50

Fig. 5.

100 150 200 Day on which edge appears

250

300

This is Zx x 1−x αe−ω/β dω = −αβe−ω/β x−1 = −αβ(e−x/β − e β ) x−1

Substituting into Equation 1 results in 1−x a X e−x/β − e β x x=n

1

10 Degree

Fig. 6.

Precise edge dates vs. day within feasible range

P (dr (s, t) = n) = −αβ

0

(4)

A comparison of the expected probability of edges appearing by day, assuming edges are created with uniform probability across their feasible range, to the observed probability among the precisely dated edges appears in Figure 5. As can be seen, there is much a higher probability of edges having been created on the first day within the feasible range than expected. The peak probability, that an edge is created on the first feasible day, is approximately 0.6. To a certain extent, we expect edges to be created early in their feasible ranges, given the tendency of edges to decay [13], [14]. Aside from the peak probability edge creation occuring early, the shape of the distribution appears to match the expected distribution. While the size of the feasible range has a predictable distribution, it isn’t independent of a user’s degree. The constraints described in Section III may constrain edges more tightly when a user has more followers and followees, yet users with shorter lifetimes also have tighter upper bounds, and as has been shown elsewhere, users with higher degree also have higher lifetimes [13]. The relationship between the median size of the feasible range of an edge and the degree of the nodes the edge is incident on is shown in Figure 6, for the in- and out-degree of the source node and target node (ki (s), ko (s), ki (t), and ko (t), respectively) for degrees up to about the 97th percentile. Above the 97th percentile, the density of nodes with a given degree is low enough that the median size of the feasible range varies widely. As Figure 6 shows, for the vast majority of nodes, the median feasible date range increases as the degree of the source or target node increases, showing that the upper bound produced by a user’s lifetime is a stronger bound than

100

Median feasible range size vs. degree

that inferred from other followers and followees. This matches the increase in precision from fe0 shown in Table IV. We model the growth of the graph in two ways: assigning the earliest feasible date to each edge, and choosing a date uniformly at random from among the range of feasible dates. Choosing the earliest feasible date is meant to capture the tendency for edges to have been created early in their feasible range. Choosing a date uniformly at random among the feasible dates is meant to address possible bias that choosing the earliest date might introduce, which we will discuss more shortly. As we stated in Section III, once a date is assigned to each edge, we permute the order of edges assigned the same date in order to produce a total order over all edges. For both approaches, we model the growth of the graph over 10 runs, and compute the probability that edges are reciprocated, pr , vs. the degree of both the source and target nodes. In order to avoid issues due to truncation, we only compute pr for edges created in the first half of the network’s lifetime. In other words, we compute pr for all edges s → t such that τ (s, t) ≤ |E| 2 . The pr we report here is the median of pr over all runs. The results from modeling the growth of the graph using the earliest feasible date are shown in Figure 7. As can be seen, there is a positive relationship between the degree of the edge target and the probability of edge reciprocation: the higher the receiving node’s degree, the higher the probability that the receiving node will reciprocate the edge. This behavior matches the reciprocity behavior we saw within the classes: the Celebrities had the highest rate of edge reciprocation, which may be part of the reason their degree is so high. A more surprising finding is that the probability of a node reciprocating an edge decreases as the degree of the edge’s source increases. In other words, users are most likely to reciprocate edges from the least popular nodes than from the most popular ones. This seems to suggest that the anti-preferential attachment behavior we see in the static graph and among the precisely dated edges will have limited success in increasing nodes’ in-degree. On the other hand, since lower-degree nodes have narrower

1

Probability of reciprocating an edge

0.8

0.6

0.4

0.2 vs. ki(s) vs. ko(s) vs. ki(t) vs. ko(t)

0 1

10

100

1000

Degree

Fig. 7.

For future work, we intend to bound the feasible dates using the tighter bounds available by considering a partial order over all edges. We also intend to investigate more closely the source of inconsistency in our data. For instance, it may be possible to identify specific users whose data are inconsistent, rather than discarding constraints from large numbers of users. We also intend to perform similar analysis of other directed OSNs in order to see whether similar behavior exists in other OSNs. We thank the anonymous reviewers for their insightful comments. This work was supported by the National Science Foundation FIND (Future Internet Design) program under Grant No. 0832202, GENI, MURI under ARO (Army Research Office), and was sponsored by the Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-09-2-0053.

Probability of reciprocating an edge (earliest)

R EFERENCES 1

Probability of reciprocating an edge

0.8

0.6

0.4

0.2 vs. ki(s) vs. ko(s) vs. ki(t) vs. ko(t)

0 1

10

100

1000

Degree

Fig. 8.

Probability of reciprocating an edge (uniform)

feasible ranges, it could be that the earliest feasible date assignment results in edges appearing to come from lowerdegree nodes first, increasing the probability of edges from lower-degree nodes being reciprocated. In order to address this possible bias, we model the growth of the graph using edge dates assigned uniformly at random among the feasible dates; the results are shown in Figure 8. As can be seen, the results are qualitatively identical to those obtained with the earliest feasible date assignment: the highest degree nodes are the most likely to reciprocate edges, while users are most likely to reciprocate edges from low degree nodes. VII. C ONCLUSION In this work, we examine the growth of one OSN and look for evidence of preferential attachment. Instead, we find evidence of a kind of anti-preferential attachment, where highdegree nodes create edges to low-degree nodes, who may or may not reciprocate them. We find that the success of this behavior may be somewhat reduced by the fact that nodes are more likely to reciprocate edges from low-degree nodes than from high-degree ones.

[1] A.-L. Barab´asi and R. Albert, “Emergence of scaling in random networks,” Science, vol. 286, no. 5439, pp. 509–512, 1999. [2] A. Nazir, S. Raza, C.-N. Chuah, and B. Schipper, “Ghostbusting facebook: detecting and characterizing phantom profiles in online social gaming applications,” in Proceedings of the 3rd conference on Online social networks, WOSN’10, (Berkeley, CA, USA), pp. 1–1, USENIX Association, 2010. [3] M. E. J. Newman, “Clustering and preferential attachment in growing networks,” Phys. Rev. E, vol. 64, p. 025102, Jul 2001. [4] H. Jeong, Z. N´eda, and A. L. Barab´asi, “Measuring preferential attachment in evolving networks,” EPL (Europhysics Letters), vol. 61, no. 4, p. 567, 2003. [5] J. Leskovec, L. Backstrom, R. Kumar, and A. Tomkins, “Microscopic evolution of social networks,” in Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’08, (New York, NY, USA), pp. 462–470, ACM, 2008. [6] R. Kumar, J. Novak, and A. Tomkins, “Structure and evolution of online social networks,” in Link Mining: Models, Algorithms, and Applications (P. S. S. Yu, J. Han, and C. Faloutsos, eds.), pp. 337–357, Springer New York, 2010. 10.1007/978-1-4419-6515-8 13. [7] D. M. Romero and J. Kleinberg, “The directed closure process in hybrid social-information networks, with an analysis of link formation on twitter,” in Proc. 4th International AAAI Conference on Weblogs and Social Media, 2010. [8] B. Meeder, B. Karrer, A. Sayedi, R. Ravi, C. Borgs, and J. Chayes, “We know who you followed last summer: inferring social link creation times in twitter,” in Proceedings of the 20th international conference on World wide web, WWW ’11, (New York, NY, USA), pp. 517–526, ACM, 2011. [9] A. W. Gouldner, “The norm of reciprocity: A preliminary statement,” American Sociological Review, vol. 25, no. 2, pp. pp. 161–178, 1960. [10] H. Gintis, “Strong reciprocity and human sociality,” Journal of Theoretical Biology, vol. 206, no. 2, pp. 169 – 179, 2000. [11] M. Dufwenberg and G. Kirchsteiger, “A theory of sequential reciprocity,” Games and Economic Behavior, vol. 47, no. 2, pp. 268 – 298, 2004. [12] S. Wu, J. M. Hofman, W. A. Mason, and D. J. Watts, “Who says what to whom on Twitter,” in Proceedings of World Wide Web Conference (WWW ’11), 2011. [13] J. Lang and S. F. Wu, “Social network user lifetime,” in International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2011, IEEE Computer Society, July 2011. [14] B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi, “On the evolution of user interaction in facebook,” in Proceedings of the 2nd ACM workshop on Online social networks, WOSN ’09, (New York, NY, USA), pp. 37–42, ACM, 2009.

Anti-Preferential Attachment

Department of Computer Science. University of .... the largest group of users having an in degree of 1, and a very small fraction of users ..... one year on average.

414KB Sizes 1 Downloads 281 Views

Recommend Documents

FormMedicationDuringSchoolHours-ATTACHMENT A- ENGLISH ...
9/07 Attachment A. BUL-3878.1. Student Health and Human Services Page 2 of 2 September 24, 2007. DISTRICT PROCEDURES REGARDING MEDICATION TAKEN DURING. SCHOOL HOURS. 1. Prescription medications must be clearly labeled by a U.S. dispensing pharmacy an

attachment=1296 - Vidyarthiplus
DEGREE EXAMINATION, NOVEMBER/DECEMBER 2010. Fifth Semester. Information Technology ... ACCOUNTING. (Regulation 2008). Time : Three hours ...

Attachment 14 .pdf
FY2017-18 ENACTED BUDGET. DEP Federal Funds. $217,881,000. Page 1 of 1. Attachment 14 .pdf. Attachment 14 .pdf. Open. Extract. Open with. Sign In.

Attachment 11 .pdf
Reim - Host Municipality Permit App Review 0 0 344 0 0 0 0 0 0. Administration of Recycling Program 170 123 136 0 0 0 0 0 0. County Planning Grants 215 361 ...

Attachment 5 .pdf
Deep Mine Permit Fees 27 1 3 20 20 20 20 20 20. Industrial Waste Const. Permit Fees 8 14 20 28 28 300 300 300 300. Industrial Waste Permit Fees 536 537 ...

OSP - Attachment A.pdf
Page 1 of 1. NON-RESIDENT PRESCRIPTION DRUG OUTLET – ATTACHMENT A. Complete this form if applying as Corporation, Limited Liability Corporation or Non-Profit Corporation. If you are applying as a Corporation, you must supply the Name, Social Securi

Attachment 6 .pdf
Laboratory Certification Fees 0 0 0 0 0 0 0 0 0. Operator Certification Fees * 328 303 308 270 270 315 315 315 315. Operator Certif Trng Fees * 57 58 54 53 53 ...

WHOLESALER - Attachment A.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. WHOLESALER ...

Attachment - Atlantic Monthly.pdf
go to the brink. ..... She had marched into his room and demanded to know .... that pragmatism alone will be enough to navigate a world on the verge of even ...

Attachment 3 .pdf
FY2017-18 ENACTED BUDGET. General Fund - Augmentations. Page 1 of 1. Attachment 3 .pdf. Attachment 3 .pdf. Open. Extract. Open with. Sign In. Main menu.

Attachment 1 .pdf
70252 (F) Indoor Radon Abatement 700 700 700 700 700 700 700 700 700. 70260 (F) Non-Point Source Implement 14,800 14,800 14,800 14,800 14,800 ...

Attachment 10 .pdf
... 15,000 15,000 15,000 15,000. Marcellus Legacy Fund 4,110 3,393 3,102 3,120 3,116 3,112 3,107 3,103 3,100. Transfer to Environmental Education Fund 0 0 ...

Attachment 10 .pdf
Hazardous Sites Cleanup 6,794 2,411 3,464 0 0 0 0 0 0. Host Municipality Grants 0 6 0 0 0 0 0 0 0. Small Business Pollution Prevention 115 54 57 0 0 0 0 0 0.

Attachment 5 .pdf
Interest on Great Lakes Protect Fund 33 21 21 8 8 8 8 8 8 ... Sand & Gravel Royalty - PFBC 48 51 46 50 50 50 50 50 50. Industrial Strmwtr ... Attachment 5 .pdf.

WHOLESALER - Attachment A.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. WHOLESALER - Attachment A.pdf. WHOLESALER - Attachment A.pdf. Open. Extract. Open with. Sign In. Main menu.

Attachment 4 .pdf
Industrial Land Recycling Fund $ 300,000. Waste Transportation Safety Account $ 3,561,000. Electronics Material Recycling Account $ 453,000. Sewage Fac Prgm Admin (EA) $ 752,000. Used Tire Pile Remediation (EA) $ 1,003,000. DEPARTMENT OF ENVIRONMENTA

attachment (10).pdf
5 days ago - New Delhi –110014. Ph. No 9958317219. New Delhi. 19th September 2017. Page 3 of 39. attachment (10).pdf. attachment (10).pdf. Open.

attachment (10).pdf
Sep 18, 2017 - New Delhi –110014. Ph. No 9958317219. New Delhi. 19th September 2017. Page 3 of 39. attachment (10).pdf. attachment (10).pdf. Open.

Attachment 8 .pdf
Page 1 of 1. Attachment 8. Acid Mine Drainage Abatement and Treatment $ 16,086,000 Non-Coal SMCRF/General Operations $ 4,122,000. Clean Air ...

Attachment 4 .pdf
DEPARTMENT OF ENVIRONMENTAL PROTECTION. FY2017-18 ENACTED BUDGET. Restricted Revenues. $85,607,000. Page 1 of 1. Attachment 4 .pdf.

Attachment 9 .pdf
Revenue: Major Emission Facilities (Title V):. Permanent Emission Fees $17,523 $17,443 $16,103 $15,329 $15,975 $15,038 $14,901 $14,754 $14,751. Total Receipts - Major Emission Facilities 17,523 17,443 16,103 15,329 15,975 15,038 14,901 14,754 14,751.

Attachment 2 .pdf
DEPARTMENT OF ENVIRONMENTAL PROTECTION. FY2017-18 ENACTED BUDGET. General Fund. Page 1 of 1. Attachment 2 .pdf. Attachment 2 .pdf. Open.

Attachment 14 .pdf
Coastal Zone Management $ 4,700,000 Energy & Environmental ... Water Quality Outreach Training $ 200,000 Air Pollution Control Grant $ 5,010,000.

Attachment 9 .pdf
ACTUAL ACTUAL ACTUAL AVAILABLE BUDGET PLAN YR 1 PLAN YR 2 PLAN YR 3 PLAN YR 4. Beginning Balance $41,990 $39,966 $38,245 $31,086 ...