Seventh IEEE International Conference on Data Mining

Predicting Blogging Behavior Using Temporal and Social Networks Bi Chen*, Qiankun Zhao†† , Bingjun Sun† , Prasenjit Mitra*† *College of Information Sciences and Technology † Department of Computer Science and Engineering †† AOL Labs China † * The Pennsylvania State University, University Park, PA 16802, USA †† 26F, Tower-B, Tsinghua Science Park Haidian District, Beijing, China 10058. [email protected], [email protected], [email protected], [email protected]

Abstract

tent, temporal, and social dimensions together while analyzing blogs and used it to predict blogging behavior. Our work is applicable to weblogs (some of) whose authors have a substantial number of posts (50 is adequate to be statistically meaningful) and that allows users to comment on blog posts. Blogs provide online advertisers a vessel for effective targeted advertising for a new product or service. The model can be used to create a recommender system that can help people find potential academic collaborators, business partners, etc. Another potential application is event detection. Automated event detection has important uses. For example, a terrorism analyst may not have the time to read the millions of blogs around the world, but automatic event detectors can alert her about an external event. In this work, we do not build the end applications for targeted advertising, recommender systems or event detectors, but construct blogging-behavior models and bloggingbehavior predicting systems that, we believe, can form the basis of such applications. Our problem can be defined as follows: Definition. Given the topics that were discussed in a community blog from the past time to now, how do we predict the topics that will be discussed in the future for the whole community blog, and for any given individual blogger. In our work, the blogging-behavior models refer to patterns of topic transition within and across different bloggers over the temporal dimension and social dimension. There exists no automatic or systematic process for constructing blogging-behavior models by analyzing the social, content, and temporal information embedded in a historical blog corpus together. We propose the general blogging-behavior model, based on the overall topic transition over time for predicting the behavior of the whole community blog; the profile-based blogging-behavior model, by adding the user profiles (a profile captures the historical topic transition for each individual blogger) information; and the social-Network and profile-based blogging-behavior model, by taking into ac-

Modeling the behavior of bloggers is an important problem with various applications in recommender systems, targeted advertising, and event detection. In this paper, we propose three models by combining content, temporal, social dimensions: the general blogging-behavior model, the profile-based blogging-behavior model and the socialnetwork and profile-based blogging-behavior model. The models are based on two regression techniques: Extreme Learning Machine (ELM), and Modified General Regression Neural Network (MGRNN). We choose one of the largest blogs, a political blog, DailyKos 1 , for our empirical evaluation. Experiments show that the social network and profile-based blogging behavior model with ELM regression techniques produce good results for the most active bloggers and can be used to predict blogging behavior.

1 Introduction Blog data is a collection of formal or informal text communication data that arrive over time. Compared with general web pages, blog data have the following dimensions: Content Dimension: topics of the blog posts; Temporal Dimension: blog posts are often tagged with timestamps; and Social Dimension: blog posts and comments are connected by quotation and by interactions between bloggers and other users via comments. There exists research in: burst detection[5], and trend detection[1], using textual content; structural and topic evolution/flow pattern extraction[8, 10, 12], which focus on the content and temporal dimensions; social network analysis[6, 9] and the diffusion of information in the blogspace[3], which focuses on the content and temporal dimensions. No existing works, except our previous work[14][15] has considered the con1 http://www.dailykos.com

1550-4786/07 $25.00 © 2007 IEEE DOI 10.1109/ICDM.2007.97

431 439

count the social neighbors and their influences for predicting the behavior of individual bloggers. These models use historical behavior of bloggers and the entire blog graph (constructed from the blog posts and comments) to predict the future behavior by applying two different kinds of regression techniques: Extreme Learning Machine (ELM)[4], and Modified General Regression Neural Network (MGRNN)[13]. Our contribution lies in showing that these two models can model our selected community blog with acceptable precision (above 0.7) for the most active bloggers in the first several weeks. We validate our models with an empirical evaluation on a large community blog. Our results show that our models can form a good basis for the eventual development of applications based on blogging-behavior models.

b1

Blogger 1

Blogger 6

Blogger 3

Blogger 4

b2

b3

b4

Blogger 7

Blogger 2

b6

b1

b2

b3

b5

b4

b7

b6



b7

Blogger 8 b8

Blogger 9

b9

b8

b9



b1

b0

b1

Blogger 5 b5

b3

b5

Time 1

b3

Time 2

Figure 1. Graph Representation of Blog Data

dimensions and their correlations for blog data.

3 Blogging-Behavior Models

2 Related Work

In this section, first we present our blog dataset and its corresponding graph representation. Then we introduce how to extract blogging-behavior features for different models, and finally we review regression techniques that we use to construct our models in brief.

Kumar, et al.[5], they model the blogsphere as a graph of bloggers connected by hyperlinks and studied the evolution of the graph in terms of graph properties such as in-degree, out-degree, strongly connected components, and communities. Gruhl, et al.[3], studied the dynamics of information propagation in two levels: a macroscopic characterization of topic propagation and a microscopic characterization of propagation from individual to individual, using the theory of infectious diseases to model the flow. Licamele and Getoor present the definition of social capital, and investigate the friendship relations and the organizer and participation relations from the social network[7]. They show that social capital is a better publication predictor than publication history in real academic collaboration networks. However, the above social-network-based bloganalysis approaches ignored the fact that the content, social, and temporal dimensions of blogs are interrelated and they assumed that these dimensions are independent. There are works using content analysis as well. Traditionally, these approaches are based on simple counts of entries, links, keywords, and phrases[2, 5, 3]. More recently, Chi, et al.[1], introduced the eigen-trend concept to represent the temporal trend in a group of blogs with common interests using the singular value decomposition and higherorder singular value decomposition. Qamra, et al.[9], propose a Content-Community-Time model that cluster the posts according to their contents, their timestamps, and the community structures,to automatically discover stories. In their approach, only links between posts are taken into considertation. Shen, et al.[11], propose three novel approaches to find latent friends, which share the similar topic distribution in their blogs, by analyzing the contents of their blog entries. However, the above approaches mainly focus on either the content of blogs or combining social or temporal information to improve content analysis. In summary, there is no systematic study of the temporal, content, and social

3.1

Blog Data and Representation

We chose the political blog, DailyKos, as an example dataset. We collected 249,543 blog entries from October 12, 2003 to October 28, 2006. Since some authors blog infrequently, in our experiments, authors with less than 45 blog entries are deleted because inferring bloggingbehavior from a few entries may not be correct. As a result, there are 131,869 blog entries left with 1,287 authors and 1,008,467 comments. The blog dataset can be represented as a multi-graph, where each node represents a blogger and each edge is created due to comments between the two bloggers (shown in Figure1). Each node is in turn corresponds to a sequence of graphs of his/her own blog entries over time. The edge consists of a set of edges that connects nodes in these graphs. For example, in Figure1, the node Blogger7 and Blogger8 are represented as two sequences of blog entry graphs in the right hand side. Within a time window(a given length of time), there will be a set of edges that links blog entries from one blogger to another blogger, which is represented as the gray lines in right side of this figure. We propose to represent a blog entry using its topic, which indicates the subject of blog entry instead of words that appear in the entry. We use a tool for data clustering, CLUTO 2 , to partition blog based on their topics. Each node has a topic and each edge represents comments between topics, and the edge in the blogger graph now represents a sequence of edges, which denote the links between different bloggers at the topic level at different time points. 2 http://glaros.dtc.umn.edu/gkhome/views/cluto

440 432



3.2

General Blogging-Behavior Features And Model

distribution, we can use the topic distribution vector in the previous section. For the profile-based topic distribution, we propose to add personal topic distribution vector Tp(j)z to the general blogging-behavior features, Tp(j)z = < t1j , t2j , t3j , · · · , tnj >z , where tij represents the distribution of topic i for blogger j within time window z. Here the weight of tij is calculated as the percentage of blog entries posted by blogger j belonging to topic i (denoted as |tij |) against the total number of blog entries posted by blogger j (denoted as |tj |) in the time window z. From the dataset we observed that sometimes, within a time window, a blogger has no blog entries at all. Then, we propose to approximate the topic distribution vector for bloggers that have no blog entries with respect to his previous topic distribution vector and a decay factor. The intuition is that the topic distribution vector will decay to the vector < |T1 | , |T1 | , · · · , |T1 | >, which means the blogger does not prefer any topics. Formally:  |tij | |tj | , if tj = 0 tij = tij · e−λ + |T1 | · (1 − e−λ ), if tj = 0

The general blogging-behavior model is proposed to capture the transition between topics over time in the blogspace. That is, given the list of topics that were discussed in the previous time windows, we want to predict what kinds of topics will be more likely to be discussed in the next time window. The general blogging-behavior model is used to monitor and predict the general trend and transition in the entire blogspace instead of that of any individual blogger. All blog entries are first clustered into a set of topics based on the words it contains, and then each blog entry is represented by a topic vector. To identify the general blogging-behavior features, the historical data is first partitioned into a sequence of time windows on a daily, weekly, or monthly basis. For each time window z, the content of the blog entries is represented as a topic distribution vector Tz = < t1 , t2 , t3 , · · · , tn >z that represents the distributions of blog entries with respect to the list of topics, where n is the number of topics, ti represents the weight of the ith topic within time window z. The ith component of a topic distribution vector can be calculated as the total number of blog entries belonging to ith topic divided by the total number of blog entries in time window z. Hence, the weight of each topic is the normalized value of the number of blog entries in that topic and the sum of the weights is 1. Since a topic distribution vector can be built for each time window, general blogging-behavior features will be achieved in terms of a time series of topic distribution vectors. Based on the general blogging-behavior features Tz , we can train the general blogging-behavior model and predict future blogging behaviors by using regression techniques. We take the previous k topic distribution vectors Tz , from z-k+1th time window to the zth time window, as the input vectors, and take the topic distribution vector Tz+1 in the z+1th time window as the target vector to train the model. Then, using trained regression model, the hidden transitions relations between topics can be estimated and used to predict the topic distribution at the next time window.

3.3

where λ is the decay factor, tij is the weight of topic i for blogger j in the previous time window, and |T | is the total number of topics.Note that Tp (j)z is normalized such that the sum of the weights is 1 for the second case. Based on the profile-based blogging-behavior features < Tz , Tp (j)z > for blogger j, we can train the profile-based blogging-behavior model, and predict future blogging behaviors of blogger j by using regression techniques. We take the previous k combined vectors < Tz , Tp (j)z >, from (z-k+1)th time window to the zth time window, as the input vectors, and take the combined vector < Tz+1 , Tp (j)z+1 > in the (z+1)th time window as the target vector to train the model. Then, using trained regression model, the future blogging behavior of blogger j can be predicted based on historical general blogging-behavior and his/her own historical blogging behavior. Besides posting blog entries, a blogger also posts comments to blog entries written by other bloggers. We improve the profile-based blogging-behavior model by adding another comment distribution vector. We simply treat a comment having the same topic as the corresponding blog entry. That is, if a comment written to a blog entry which is on topic i, this comment is considered on topic i too. Com p (j)z = ment distribution vector can be represented as C < c1j , c2j , c3j , · · · , cnj >z , where cij represents the distribution of comment on topic i for blogger j within time window z. Here the weight of cij is calculated as the percentage of comments, belonging to topic i (denoted as |cij |), posted by blogger j, against the total number of comments posted by blogger j (denoted as |cj |) in the time window z. By adding the comment distribution vector to the

Profile-Based Blogging-Behavior Features and Model

Different bloggers have different backgrounds and interests. Hence they have different blogging-behavior patterns. We can not simply use the general blogging-behavior model to predict individual bloggers’ behaviors. What a blogger posts in his blog entries depends on not only the overall trend of topics in the whole blogspace, but also his/her own interests. As a result, not only the general topic distribution vector but also the profile of the corresponding user are used as the input to the regression model. For the general topic

441 433

3.5

profile-based blogging-behavior features, we get the improved profile-based blogging-behavior model. We treat the improved profile-based blogging-behavior features <  p (j)z > as the same way to train the regresTz , Tp (j)z , C sion model.

3.4

For time series regression, traditional feed-forward network learning algorithms, like back-propagation algorithm, are normally used for prediction. However, considering the speed and adaptation problems of traditional feed-forward network learning algorithms, we will choose two different regression techniques in our blogging behavior models: Extreme Learning Machine (ELM) [4], and Modified General Regression Neural Network (MGRNN) [13]. ELM has extremely fast learning speed, which is thousands of times faster than that of the traditional feed-forward network learning algorithm, as well as reasonable precision. MGRNN is presented as an easy-to-use ‘black box’ robust tool which can compete with optimized feed-forward networks, as well as reasonable speed and no adaptation required by the users.Because of the limit space available, we do not review the ELM and MGNN techniques. For more information, please refer to [4] and [13].

Social-Network and Profile-Based Blogging-Behavior Features and Model

In the profile-based blogging-behavior model, the assumption is that each individual blogger is independent or each blogger contributes equally to the general topic transition. However, in reality, this is not always true. Usually, not only the overall topic transition and the profile of the bloggers, but also the social neighbors and their blog entries affect the topics, about which a blogger will blog. The reason is that bloggers that are socially connected share similar interests and profiles. As a result, we propose the social network and profile-based blogging-behavior model, by adding social-network features of a blogger to the improved profile-based blogging-behavior model. Here, social network refers to the relations between bloggers created by comments in blog entries. Besides the general topic distribution, topic distribution and comment distribution of individual bloggers, a list of social neighbors with the weighted relations and their topic distributions are added as the input to the regression model as well. Specifically, the social network features of a blog z= ger j in time window z are represented as a vector S(j) < s1j , s2j , s3j , · · · , snj >z , where  z= S(j)

m  Cj→x

· Tp (x)z , T Cj =

Regression Techniques Used In Blogging Behavior Models

4 Performance Evaluation 4.1

Evaluation Standards

In this section, we evaluate the proposed bloggingbehavior models on the Dailykos dataset. To evaluate the quality of the predicted future blogging-behavior, we define precision as the similarity between the predicted vector and the ground truth is calculated as the metric. T  · T P recision = Sim(T  , T ) = |T  ||T | The content of the Dailykos blog dataset focuses on political issues. It is reasonable to cluster the total blog entries into a small number of topics. Because the results we found from the experiments are not influenced by the number of topics, in the following experiments, we clustered the total blog entries into 30 topics and achieved good results. On the time dimension, we partitioned the data into 159 weeks, where blog entries within the same week are taken as equal in the temporal dimension. The first 139 weeks are taken as training data and the last 20 weeks are taken as testing data. In the following experiments, λ and η are set to 0.2 and 0.8, respectively. 1 week refers to the approach that uses only data in the previous week to predict bloggingbehavior pattern in the next week, 3 weeks refers to the approach that uses the data in the previous 3 weeks to predict the blogging-behavior pattern in the next week, similarly 5 weeks and 10 weeks are defined. Further more, the selected 1287 bloggers are ranked according to the number of blog entries they have posted during past 159 weeks. In our evaluation phrase, top 50 bloggers who post blog entries larger than 325 are defined as the most active bloggers; bloggers ranked between 51 to 150 are defined as active bloggers who post blog entries

m 

Cj→x T Cj x=1 m is the total number of social neighbors of blogger j in the network, Cj→x represents the number of comments written by blogger j to blog entries posted by blogger x in a certain time window, and T Cj represents the total number of comments written by blogger j in the same time window. Based on the social-network and profile-based blogging p (j)z , S(j)  z > for behavior features < Tz , Tp (j)z , C blogger j, we can train the social-network and profilebased blogging-behavior model, and predict future blogging behaviors of blogger j by using regression techniques. We take the previous k combined vectors <  p (j)z , S(j)  z >, from (z-k+1)th time window Tz , Tp (j)z , C to the ith time window, as the input vectors, and take  p (j)z , S(j)  z > in the combined vector < Tz , Tp (j)z , C the (z+1)th time window as the target vector to train the model. Then, by using trained regression model, the future blogging behavior of blogger j can be predicted based on historical general blogging-behavior, his/her own historical blogging-behavior, and his/her neighbors’ historical blogging-behavior. x=1

442 434

less than 325 but larger than 146; bloggers ranked between 151 to 300 are defined as less active bloggers who post blog entries less than 146 but larger than 80; the rest of 787 bloggers are defined as the least active bloggers who post blog entries less than 80.

4.2

features improves the quality of prediction as shown in Figure 4, while comment distribution features used in the improved profile-based blogging-behavior model do not promote the precision of prediction significantly. However, their improvement of the social-network and profile-based model over the profile-based blogging-behavior model is statistically significant in terms of the paired t-Test. It is interesting to notice that precision for prediction in the 10th week goes up again. The reason is that the (improved) profile-based blogging-behavior model, and the social network and profile-based blogging-behavior model have incorporated the general blogging-behavior features as their background information. Since the general bloggingbehavior features on the 10th week imply the unpredictable election campaign event, the precision for prediction goes up again. Although using MGRNN regression techniques we can achieve good results, training and testing these bloggingbehavior models for the most active bloggers takes too much time. In practice, it is not efficient to train and test models for each blogger. Therefore, we choose ELM regression techniques to train and test all bloggers. From Figure 5 and Figure 6, we see that MGRNN regression achieves a little bit better quality than ELM regression when they are used on the most active bloggers and active bloggers. And for less active bloggers and the least active bloggers, ELM and MGRNN regression achieve similar quality. However, from table 1, we can see the ELM regression is almost 500 hundred of times faster than MGRNN regression. Combining the precision and efficiency into consideration, we think the social network and profile-based blogging-behavior model with ELM regression is the best model of all our proposed models.

Evaluating and Comparing Models

Figure2 shows the general blogging-behavior model with MGRNN regression in the overall blogspace to predict the behavior of the whole community. The X-axis refers to the distance between the week being predicted and the latest week in the training data. The prediction based on 10 weeks is the best and all the four approaches produce very accurate (> 0.9) prediction for the subsequent 4 weeks. The more the amount of historical information being used, the more accurate the blogging-behavior prediction is. It is interesting to notice that, the precision for predicting the 9th week (from Aug 08, 2006 to Aug 15, 2006) drops dramatically. In reality, a political event happened on Aug 10, 2006, when three-time Senator, Joseph Lieberman, lost his re-election campaign to political newcomer Ned Lamont. 3 A great number of blog entries began to talk about this unpredictable event, which caused the precision for prediction to drop. When the effects of this event subsided, the precision for prediction went up again. The general blogging-behavior model performs well for the whole community. However, considering the diversity of individual bloggers, we can not use only the general model to predict the behavior of individual bloggers. In experiments, we choose only 50 bloggers as the most active bloggers, the blog entries posted by these bloggers almost consist of 30% of total blog entries. Figure4 (the owes line) shows the average precision for predicting the bloggingbehavior of the most active bloggers by using the general blogging-behavior model. Obviously, the general bloggingbehavior model does not perform well on the individual level. Hence, we use the profile-based blogging-behavior model, and the social network and profile-based bloggingbehavior model for predicting the blogging-behavior of individual bloggers. Figure 3 shows the average precision of the profile-based blogging-behavior model for the most active bloggers. It can be observed that the model can accurately predict 6 subsequent weeks (precision > 0.7) using 10 weeks of historical data. However, the precision promoted by using more historical information is not evident as shown in figure 2. In the following figures, all experiments are using 10 weeks of historical data. Figure 4 shows the average of precision of the proposed blogging-behavior models for the most active 50 bloggers. Generally, using social-network-based blogging-behavior

5 Conclusion In this paper, we propose to model the blogging-behavior over blogspace from multiple dimensions: temporal, content, and social dimensions. Experiments with real blog dataset show that our blogging-behavior models produce promising blogging behavior prediction results. In the future, we will do more experiments of our blogging-behavior models on other kinds of blogspace.

References [1] Y. Chi, B. L. Tseng, and J. Tatemura. Eigen-trend: trend analysis in the blogosphere based on singular value decompositions. In CIKM , 68–77, 2006. [2] N. S. Glance, M. Hurst, and T. Tomokiyo. Blogpulse: Automated trend discovery for weblogs. In WWW, 2004. [3] D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins. Information diffusion through blogspace. In WWW, 491–501, 2004. [4] Guang-Bin Huang, Qin-Yu Zhu, and Chee-Kheong Siew. Extreme Learning Machine: A New Learning Scheme of Feedforward Neural Networks. In IJCNN2004 , 25–29, 2004.

3 http://transcripts.cnn.com/TRANSCRIPTS/0608/09/ltm.08.html

443 435

1.0

0.7

1.3

Predicting the most active bloggers Profile-based Blogging Behavior Model Improved Profile-based Blogging Behavior Model Social Network and Profile-based Blogging Behavior Model General Blogging Behavior Model

1.2 1.1

0.7

Predicting the whole community 1 Week 3 Weeks 5 Weeks 10 Weeks

0.5

Predicting the most active bloggers 1 Week 3 Weeks 5 Weeks 10 Weeks

0.4

0.5 2

3

4

5

6

7

8

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20

Predicting Week

Predicting Week

Figure 2. General Blogging Behavior Model (Predicting community)

0.8 0.7 0.6 0.5 0.4 0.3

1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20

Predicting Weeks

Figure 3. Profile-based Blogging Behavior Model (Predicting individuals)

1.1

0.9

0.1 1

9 10 11 12 13 14 15 16 17 18 19 20

1.0

0.2

0.3 1

Figure 4. Comparison of the models (Predicting individuals)

1.1

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2

Using MGRNN for the Most Active Bloggers Using MGRNN for Active Bloggers Using MGRNN for less Active Bloggers Using MGRNN for the least Active Bloggers

1.0 0.9

Predicting Precision

Using MGNN for the Most Active Bloggers Using ELM for the Most Active Bloggers Using ELM for Active Bloggers Using ELM for less Active Bloggers Using ELM for the least Active Bloggers

1.0

Predicting Precision

Predicting Precision

0.8

0.6

0.6

Predicting Precision

Predicting Precision

0.9

0.8 0.7 0.6 0.5 0.4 0.3 0.2

0.1

Time (Minutes) The Most MGRNN Active Bloggers ELM Active MGRNN Bloggers ELM Less Active MGRNN Bloggers ELM The Least MGRNN Active Bloggers ELM

Social network & profile-based Train Time Test Time 131.42 0.03 0.25 0.03 389.25 0.09 0.74 0.09 781.82 0.17 1.52 0.17 1991.27 0.43 3.67 0.43

0.1 1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20

Predicting Week

Figure 5. Predicting Blogging Behavior of Bloggers At Different Active Levels (Predicting individuals)

1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20

Predicting Week

Figure 6. Predicting Blogging Behavior of Bloggers At Different Active Levels (Predicting individuals)

Table 1. Time Comparison between MGRNN regression and ELM regression

[12] X. Song, B. L. Tseng, C.-Y. Lin, and M.-T. Sun. Personalized recommendation driven by information flow. In SIGIR , 509– 516, 2006. [13] D. Tomandl and A. Schober. A modified general regression neural network (mgrnn) with new, efficient training algorithms as a robust ’black box’-tool for data analysis. Neural Netw., 14(8):1023–1034, 2001. [14] Q.K. Zhao and P. Mitra. Event Detection and Visualization for Social Text Streams. ICSWM, 2007. [15] Q.K. Zhao, P. Mitra and B. Chen Temporal and Information Flow based Event Detection from Social Text Streams. AAAI, 2007.

[5] R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In WWW , 568–576, 2003. [6] R. Kumar, J. Novak, and A. Tomkins. Structure and evolution of online social networks. In KDD, 611–617, 2006. [7] L. Licamele, and L. Getoor. Social Captital in FriendshipEvent Networks. In ICDM, 959–964, 2006. [8] D. Metzler, Y. Bernstein, W. B. Croft, A. Moffat, and J. Zobel. The recap system for identifying information flow. In SIGIR, 678–678, 2005. [9] A. Qamra, B. Tseng, and E. Y. Chang. Mining blog stories using community-based and temporal clustering. In CIKM, 58–67, 2006. [10] Y. Qi and K. S. Candan. Cuts: Curvature-based development pattern analysis and segmentation for blogs and other text streams. In HYPERTEXT, 1–10, 2006. [11] D. Shen, J.T. Sun, Q. Yang and Z. Chen. Latent Friend Mining from Blog Data. ICDM, 552–561, 2006.

444 436

Predicting Blogging Behavior Using Temporal and Social Networks

Experiments show that the social network and profile-based blogging behavior model with ELM re- gression techniques produce good results for the most ac-.

163KB Sizes 2 Downloads 249 Views

Recommend Documents

Predicting Blogging Behavior Using Temporal and ...
the profile-based blogging-behavior model and the social- network and profile-based ... Seventh IEEE International Conference on Data Mining. 1550-4786/07 ...

Discrete temporal models of social networks - CiteSeerX
Abstract: We propose a family of statistical models for social network ..... S. Hanneke et al./Discrete temporal models of social networks. 591. 5. 10. 15. 20. 25. 30.

Discrete temporal models of social networks - CiteSeerX
We believe our temporal ERG models represent a useful new framework for .... C(t, θ) = Eθ [Ψ(Nt,Nt−1)Ψ(Nt,Nt−1)′|Nt−1] . where expectations are .... type of nondegeneracy result by bounding the expected number of nonzero en- tries in At.

Discrete Temporal Models of Social Networks - Steve Hanneke
Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213 USA. Abstract ..... ary distribution. Answering this would not only be.

Social networks and parental behavior in the ... - Semantic Scholar
tural identities (Olivier, Thoenig, and Verdier (2008)), education (Botticini and ..... Special acknowledgment is due Ronald R. Rindfuss and Barbara Entwisle for ..... degree of selection on observables as a guide to the degree of selection on ...

Social networks and parental behavior in the ... - Semantic Scholar
have less (more) incentive to socialize their children, the more widely dominant ..... cent Health (AddHealth).9 The AddHealth survey has been designed to study ...

Predicting Item Adoption Using Social Correlation
these items to anyone with an internet connection. Con- sequently, sellers ...... case studies involving two types of users: one with a low self-dependency (relying ...

Random walks on temporal networks
May 18, 2012 - in settings such as conferences, with high temporal resolution: For each contact .... contexts: the European Semantic Web Conference (“eswc”),.

Random walks on temporal networks
May 18, 2012 - relationships in social networks [2] are a static representation of a succession of ... its nearest neighbors, the most naive strategy is the random walk search, in .... of vertex i, Pr (i; t), as the probability that vertex i is visit

MessageReaper: Using Social Behavior to Reduce ...
aged to avoid spam: e.g. MySpace and Orkut have been used extensively for spammers [10, 20]. Further- more, if the social network is attacked by a worm then.

Rule Based Data Filtering In Social Networks Using ...
Abstract— In today's online world there is a need to understand a premium way out to get better the data filtering method in social networks. By implementing the ...

Improving the Readability of Clustered Social Networks using Node ...
Index Terms—Clustering, Graph Visualization, Node Duplications, Social Networks. 1 INTRODUCTION. Social networks analysis is becoming increasingly popular with online communities such as FaceBook, MySpace or Flickr, where users log in, exchange mes

Predicting Prime Numbers Using Cartesian Genetic Programming
that can map quite long sequences of natural numbers into a sequence of dis- ..... and assigned a fitness value based on the hamming distance from the perfect.

Title Predicting Unroll Factors Using Supervised ... - CiteSeerX
SPECfp benchmarks, loop unrolling has a large impact on performance. The SVM is able to ... tool for clarifying and discerning complex decision bound- aries. In this work ... with applying machine learning techniques to the problem of heuristic tunin

Rule Based Data Filtering In Social Networks Using Genetic Approach ...
A main part of social network content is constituted by ... The main part of this paper is the system provided that customizable content based message filtering for OSNs, Based on ML .... “Sarah Palin,” and “John McCain,” then both documents

Occupational mismatch and social networks
May 13, 2013 - high, networks provide good matches at a higher rate than the formal .... and for a sufficiently high homophily level social networks pay a ...... The discount rate δ is set to 0.988 which corresponds to a quarterly interest rate.

Social Networks and Research Output
Empirical strategy: role of social networks. • How much can prediction be .... 4.54% .14∗∗. 2-Coauthors prod. .32 .731. 3.62% .10∗∗. Top 1% coauthor .31 .738.

Social Networks and Research Output
Aim: Assess whether networks have explanatory power? Doe they ... Two roles for the network. • Conduit for ideas: Communication in the course of research ...

Optimal Taxation and Social Networks
Nov 1, 2011 - We study optimal taxation when jobs are found through a social network. This network determines employment, which workers may influence ...

gender, social networks and performance
Oct 20, 2017 - 1 rule of business. Sallie Krawcheck ... In turn, earnings of both executives and financial managers are largely based on ..... focuses on dyadic relationship in the classroom with 5 classes of 25-35 students each. Benenson ...