Link Prediction of Multimedia Social Network via Unsupervised Face Recognition Dijun Luo, Heng Huang

University of Texas at Arlington

[email protected],[email protected]

ABSTRACT

by the society. Due to the current and potential commercial benefit, social network has been one of the focuses of both industrial and academic community. Multimedia social network is the social network allowing users to store and share audio, digital video, and photography. Among previous social network studies, efforts have been put on semisupervised/unsupervised object recognition of the photo album on web such as [3, 4, 5]. More recently, social network relationships of photo-based authentication are also incorporated [8]. In this paper we think in a different way. We try to construct the social network by the content of photo albums on web. More specifically we recover friend networks using unsupervised face recognition through social network albums. Because most pictures in blogs or web sites are taken in the wild condition, unsupervised face recognition is more difficult than the same problem using regular images or videos. To our best knowledge, this paper is the first one to predict social networks’ links using image content. We first formulate this task as a Core Set Discovery problem, which can be tackled by unsupervised or semi-supervised learning approaches. We develop an unsupervised algorithm, CAP (Constraint Affinity Propagation), to solve this problem. The constraint is imposed on the similarity matrix such that the similarity between two faces in the same photo is negative infinity. We impose this constraint to Affinity Propagation to separate those face which are taken under similar illumination conditions in the same image. Empirical studies on both synthetic and real world (Facebook) social networks indicate that our approach generates reasonable results while only using simple features of the faces. We also show that CAP is the most suitable approach compared to several other methods.

We propose a new challenge for predicting links of social networks by unsupervised face recognition on photo albums. We solve the task by formulating it into Kernel Set Discovery problem. We enhance Affinity Propagation algorithm to tackle the problem with more constraints. More specifically, the face cannot appear more than once in the same photo and we impose constraints such that detected face images in the same photograph are never clustered into the same person. We construct a synthetic dataset based on AT&T image benchmark for empirical validation. Moreover, we validate our algorithms by a real world application which contains a real friend relation on the Web 2.0 social network system. Results indicate our Constraint Affinity Propagation method is suitable to unsupervisedly predict links of social network.

Categories and Subject Descriptors J.4 [Computer Applications]: Social and Behavioral Sciences—Sociology

General Terms Algorithms, Human Factors

Keywords Link Prediction, Social Networks, Constraint Affinity Propagation

1.

INTRODUCTION

Although the concept dates back to the 1960s, social networks only gain focus in very recent years when the Internet opens door to our real life. More recently the social network floods via Web 2.0 which has driven many the success or failure of companies on the Internet. Social network applications such as flickr (flickr.com), facebook (facebook.com) or myspace (myspace.com) have attracted enough attention

2. SOCIAL NETWORK CONSTRUCTION BY PHOTOGRAPHS We begin the investigation with an illustration of the task of Social Network Construction in Fig. 1. We suppose two persons have two different photo albums. If there is at least one photo in which two person appear together, we assume that they are friends and the purpose of Social Network Construction is to detect such friend relationship in an unsupervised way. There are several challenges here: (1) The images are often taken in a wild condition which means the face recognition correct rate would not be high; (2) The face detection algorithm might be imperfect, resulting some nonface objects; (3) We know the owner of each image without any additional information.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MM’09, October 19–24, 2009, Beijing, China. Copyright 2009 ACM 978-1-60558-608-3/09/10 ...$10.00.

805

Figure 1: Social Network Construction using photo Albums. Step A is to extract faces using common facial detection algorithms. Then a Core Set Discovery algorithm is applied in step B. If we found two faces which are in the same photo and are in two different sets, a friendship link is added between two objects (step C). We naturally split the problem into two subproblems: WhoAmI in K-means, each instance must be assigned into one group and link prediction. which has the nearest centroid then calculate the centroid 2.1 WhoAmI again. Given a set of images in an album, we try to determine Suppose the Kernel Set Discovery problem has been solved, who is the owner of the album. We call this problem as then both WhoAmI and link prediction are also solved. ‘WhoAmI’. Here we assume that the album has enough im(1) For the album of person k, we evaluate how well the ages in which there are owner’s faces and assume that the i-th kernel set is assigned to it by owner appears the most frequently. For example, in left part |πi ∩ Ak | of Fig. 1, there are four images which contain 13 faces (inS(i, k) = . |Ak | cluding 1 error detection). Among those faces, four are from the owner of the album. The density of the owner’s photos Using the S, we build a bipartite graph F = (Π ∪ A, H) provides essential information to extract the owner’s face. where Π = {π1 , π2 , ..., πK } and the value of edge h(i, k) in For example, we can find some parts which have the highest the bipartite graph is defined as the value of S(i, k). Usdensity to detect who is the owner. More formally speaking, ing Hungarian algorithm [6], we do bipartite graph cut by given a set of faces {I1 , I2 , ...In } ∈ A where A denote an maximizing weight matching and generate the assignment album, WhoAmI tries to seek a subset of A which belongs of each kernel set to an album. to the same person appearing with highest frequency. (2) If there is one image in which two images from two kernel sets i, j, we add an edge between the corresponding 2.2 Link Prediction album assigned in (1). In edge recovery, we formalize the relation in a graph G = Algorithm 1 summarizes the procedure of unsupervisedly {A, E}, where A = {A1 , A2 , ..., AK } is a set of albums, and social networks discovery. (i, j) ∈ E denotes the owners of Ai and Aj are friends. Link In the output G of this algorithm, if G(i, j) = 1, the owner prediction tries to construct the edges in graph G according of album i and j are considered to be friends. In the alto the album sets. If two persons appear in the same photo, gorithm, we need another algorithm KSD(A) to perform we add an edge between two corresponding albums. Kernel Set Discovery which will be discussed in the next section.

3.

KERNEL SET DISCOVERY

4. SOLUTIONS FOR KERNEL SET DISCOVERY

Here we combine the WhoAmI and Edge Recovery steps together to tackle the problem. We formalize the task as a Kernel Set Discovery problem: Given a set of instances X = {x1 , x2 , ..., xn } where xi ∈ Rm is the feature of object i and n is the size of the set. Kernel Set Discovery is to find K subset of X : π1 , π2 , ..., πK such that πi ∩ πj = Φ, ∀i 6= j where Φ is the empty set, and that instances in all the subset are close to each other. It differs from the problem of clustering in that for clustering ∪K i=1 πi = X, but here we get rid of this constraint. This is the case in social networks we discuss above: in each album, not all the faces belong to the owner. We are only interested in the core sets which are corresponding to the owner. But in traditional clustering problem, every instance must have a label (i.e. must be in one group). For example,

In the previous section, we propose an algorithm to construct social network which requires Kernel Set Discovery. Here we offer two naive approaches to this problem. We also develop a new method, CAP (Constraint Affinity Propagation), which will also be verified to be efficient in Kernel Set Discovery problems.

4.1 Density-based Method The most naive way to discover kernel set is to find the top K densest part in the feature space. We measure the |N σ | density as d(i) = σi , where Niσ the set of neighbors of instance i within distance σ: Niσ = {xj : |xi − xj | ≤ σ}. We find the densest parts one by one. We first find the

806

Algorithm 1 Unsupervised Social Network Construction. Input A set of photo album A = {A1 , A2 , ..., AK }, Initialization Gi,j = 0, i, j = 1, 2, ...K. 1 Face extraction Detect face for each images, if there is face, add the face to the corresponding album. 2 Kernel Set Discovery [π1 , π2 , ..., πK ] = KSD(A). 3 Kernel set assignment i ∩Ak | S(i, k) = |π|A k| Construct bipartite graph F = (Π ∪ A, S) where Π = {π1 , π2 , ..., πK } Get assignment index of the kernel set using Hungarian algorithm on F . 4 Friend edge discovery for each picture for each pair face (k1 , k2 ) in the picture if k1 ∈ Π, k2 ∈ Π Consider album i and j are friends, where i, j are the labels for k1 , k2 defined in step (3). G(i, j) = 1. end if end for end for Output Friend relationship graph G.

minimized. The only difference is the choice of centroid. However, in Affinity Propagation, we can impose a natural constraint such that it is suitable for our unsupervised social network recovery tasks. We first introduce the Affinity Propagation approach. In Fray and Dueck’s approach, they simultaneously consider all data points as potential exemplars (which can be interpreted as the centroid of a cluster). By viewing each data point as a node in a network, they recursively transmit realvalued messages (including responsibility and availability) along edges. The algorithm stops when a good set of exemplars and corresponding clusters emerges. At each iteration, the magnitude of each message is a measurement of the current affinity that one data point picks up another data point as centroid, based on which this method is called “Affinity Propagation”. For any data point i and a centroid k, the “responsibility” r(i, k) (which is sent from data point i to candidate centroid k) represents the accumulated evidence for how well point k is to serve as the centroid for point i, taking into account other potential exemplars for point i. The “availability” a(i, k) (which is sent from centroid k to point i) measures the accumulated evidence for how appropriate it would be for point i to select point k as its centroid, taking into account the support from other points that point k should be a centroid. In each iteration, for each data point i, they use j = arg maxk a(i, k) + r(i, k) as the parent to build a graph. Those points which have no parent are set to be centroids. Here we also make a slight modification to be able to solve the Kernel Set Discovery problem.

densest instance i and its neighbors Niσ , then set π1 = Niσ , and remove Niσ from the dataset, then find the next kernel set using the same strategy. Here we re-size the face images into the same size (64 × 64 in our experiments) and use a vector to represent a face by putting all pixels into one line. For the distance measurement, we use Euclidean distance.

4.4 Constraint Affinity Propagation Method One property of Affinity Propagation approach is that we can control similarities between data points in a natural way. In social network albums, we know that one person cannot appear twice or more in one photo. This is an important constraint to correct the labels of face images because the illumination conditions for different faces are the same in the same photo. Then the Euclidian distance of different person in the same photo might be very small and might be clustered into the same group. Imposing this constraint can guarantee that this never happens. From this intuition, we develop the approach of Constraint Affinity Propagation (CAP) as following. We first construct similarities between faces. If two faces are in the same photo, the similarity is set to be -1. Then we directly run Kernel Affinity Propagation to obtain the core set.

4.2 Kernel K-means K-means is an efficient unsupervised pattern discovery method [7] on which we develop another method to solve Kernel Set Discovery problem. In this method (Kernel Kmeans), we perform a K-means algorithm on the feature space and only pick up the faces which are near the corresponding centroid. This method is not so naive as the density based approach in the sense that the selected kernel sets are dependent on each other when we choose different parameter σ. For Kernel K-means, we choose the kernel set independently.

5. EXPERIMENTAL RESULTS

4.3 Kernel Affinity Propagation

In this section, we construct two datasets to verify the efficiency of the proposed approaches.

Affinity Propagation is a K-medians clustering developed recently [1, 2]. The difference of K-means and K-medians is that in K-means we allow the centroid of one group to be in a continue space, and all the instances are assigned according to the centroids. But in K-medians, the centroid of a group has to be one of the instance in the group. This makes the clustering even more harder. Fortunately, Frey and Dueck developed an efficient algorithm [2] to solve this problem. Theoretically speaking, this method should not better than Kernel K-means, since they both tries find K partitions of the data such that the sum of perturbation in groups is

5.1 Synthetic Dataset The first dataset is generated from AT&T images benchmark 1 . In the AT&T database, there are ten different images of each of 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expression (open/close eyes, smiling/no-smiling) and facial details (glasses/no glasses). All images were taken against a dark homogeneous background with the subjects 1

807

http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html

AT&T Method Density KK AP CAP

A 54.9 52.1 64.6 72.9

WhoAmI P R 73.2 57.3 69.1 55.8 72.8 73.7 67.4 73.7

F 64.3 61.8 73.2 70.4

Friend Connection A P R F 59.4 36.4 40.0 38.1 60.9 41.4 60.0 49.0 62.5 45.2 70.0 54.9 64.1 46.2 90.0 61.0

A 41.2 48.2 53.2 55.9

WhoAmI P R 53.8 44.7 45.5 39.7 54.1 44.1 60.4 58.5

Facebook Friend Connection F A P R F 48.8 44.4 32.5 30.0 31.2 42.4 44.4 51.6 40.0 45.0 48.5 47.2 34.8 40.0 37.2 59.4 52.7 48.6 60.0 53.7

Table 1: WhoAmI and Friend Connection Recovery performance on AT&T synthetic dataset and Facebook dataset for density-based method (Density), kernel K-means (KK), Affinity Propagation (AP), and Constraint Propagation (CAP). A: Accuracy; P: Precision; R: Recall; F: F-measure.

in an upright, frontal, position (with tolerance for some side movement). We resize all the images to 64 × 64. We simulate a social network as following. We choose the first 8 people as the owners of the 8 albums. The friend relationship graph is illustrated in Figure 5.2. In each album, we randomly pick one image from the owner and one from his/her friend and put them into the same photo. Repeat this three times to generate 3 images for each friend. In reality, there might be some other persons outside this social network community, so we also add some face image outside these 8 persons to simulate this situation. For each album we randomly select 3 other persons to add to it. We generate 168 photos in all. We compare four approaches discussed in the previous section in Table 1. We measure the accuracy, precision, recall, and F measure for both WhoAmI and friend edge recovery. Overall CAP achieves 72.9% accuracy on WhoAmI detection and 64.1% on the final friend connection recovery. The recall of friend connection also reaches 90% for CAP. Even though these method are naive and we only use the simplest features for the faces, the final performance is quite reasonable in this synthetic dataset.

Figure 2: dataset.

ture space and exploring more kernel clustering algorithms which allow our specific constraint as in Constraint Affinity Propagation.

7. REFERENCES

5.2 Community in Facebook

[1] D. Dueck, B. J. Frey, N. Jojic, V. Jojic, G. Giaever, A. Emili, G. Musso, and R. Hegele. Constructing treatment portfolios using affinity propagation. In M. Vingron and L. Wong, editors, RECOMB, volume 4955 of Lecture Notes in Computer Science, pages 360–371. Springer, 2008. [2] Frey and Dueck. Clustering by passing messages between data points. SCIENCE: Science, 315, 2007. [3] A. C. Gallagher and T. H. Chen. Using group prior to identify people in consumer images. In Semantic Learning Applications in Multimedia, pages 1–8, 2007. [4] Y. Jing and S. Baluja. Pagerank for product image search. In WWW, pages 307–316. ACM, 2008. [5] G. Kim, C. Faloutsos, and M. Hebert. Unsupervised modeling of object categories using link analysis techniques. In CVPR, pages 1–8, 2008. [6] H. W. Kuhn. The hungarian method for the assignment problem. Naval Research Logistics Quarterly, pages 83–97, 1955. [7] S. P. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28:128–137, 1982. [8] N. F. S. Yardi and A. Bruckman. Photo-based authentication using social networks. In WOSN. IEEE, 2008.

We download photo albums from a small subset of the Facebook (www.facebook.com) users which contains 6 owners and 157 photos in total. In this dataset, some of the images contain more than two faces and some of the images do not contain any faces of these 6 people. We manually label the friend connection graph by observing the albums and label the faces after the face detection process. The WhoAmI and friend connection recovery performance can be found in right part of Table 1. In this dataset, faces are taken in wild conditions and there are some face detection errors, thus all performance are much lower than the performance on the synthetic dataset. But final accuracy can still reach 55.9% (WhoAmI) and 52.7% (connection recovery).

6.

Friend relationship graph in synthetic

CONCLUSIONS

In this paper, we propose a new problem for social network construction using unsupervised computer vision approaches. We formalize the recovery problem as a Kernel Set Discovery task and develop 4 simple algorithms to tackle the tasks. Even though all methods are strait forward on the simplest feature space, we obtain reasonable results on one synthetic dataset and one real world dataset from Facebook. Our future work includes employing more robust fea-

808

Link prediction of multimedia social network via ...

Oct 24, 2009 - Permission to make digital or hard copies of all or part of this work for personal or ... In edge recovery, we formalize the relation in a graph G = {A, E}, where A = {A1 .... rent affinity that one data point picks up another data point.

579KB Sizes 4 Downloads 180 Views

Recommend Documents

Collective Churn Prediction in Social Network
Jun 11, 2011 - social network service [1]–[4]. Threats arising from churn have substantial impact on the profitability of service providers as retaining an existing ...

Collective Churn Prediction in Social Network
Jun 11, 2011 - 1) Through analysis of the social network data, we propose a simple yet robust .... Churn rate for proportion of churn friends (%). (b) Social ...

Link Failure Monitoring via Network Coding
approach based on linear network coding that overcomes this problem. We provide ...... in IP networks,” in 22nd Annual Joint Conf. the IEEE Computer and.

Social Network Discovery by Mining Spatio-Temporal ... - Springer Link
Centre for IT Services, Nanyang Technological University, Nanyang Avenue, Singapore 639798 email: [email protected]. Abstract. Knowing patterns of relationship in a social network is very useful for law enforcement agencies to investigate collaboratio

Instructional design of interactive multimedia: A cultural ... - Springer Link
device. Advertisements, for instance, provide powerful artifacts that maintain, manipulate, and transform ... among others, video, audio, glossaries, text, and main ...

On a Probabilistic Combination of Prediction Sources - Springer Link
On a Probabilistic Combination of Prediction Sources ... 2 Prediction Techniques ...... Heckerman, D., Kadie, C.: Empirical Analysis of Predictive Algorithms for.

On a Probabilistic Combination of Prediction Sources - Springer Link
method individually. Keywords: Recommender Systems, Collaborative Filtering, Personalization,. Data Mining. 1 Introduction. Nowadays, most of the popular ...

PREDICTION OF NETWORK LOAD IN BUILDING ... - Semantic Scholar
service provider can have different demands of bandwidth. ... It provided the ability to build and configure a simple field bus network (based on the LonWorks standard) and to perform a static load prediction. The network structure has to be managed

FAIRNESS DYNAMICS IN MULTIMEDIA COLLUDERS' SOCIAL ...
ABSTRACT. Multimedia social network analysis is a research area with growing importance, in which the social network members share multimedia contents ...

PREDICTION OF NETWORK LOAD IN BUILDING ... - Semantic Scholar
2.1 Basic Load Calculation. In (Schmalek, 1995) the performance evaluation has been done for fieldbusses (i.e. LonTalk). The load behavior of networks of different sizes has been analyzed, but the network performance was not investigated starting wit

Bird Flu Outbreak Prediction via Satellite Tracking
Satellite Tracking. Yuanchun Zhou and Mingjie Tang, Chinese Academy of Sciences. Weike Pan, Hong Kong Baptist University. Jinyan Li, University of Technology, Sydney. Weihang Wang, Jing ... analysis tools have been lacking for a long time in China. T

Personalized QoS Prediction for Web Services via ...
lected QoS data consistent with the format of collaborative filtering. ..... should specify a file directory in the computer and the client program could get all the file ...

Laser cooling of molecules via single spontaneous ... - Springer Link
posed to use losses in an optical cavity instead of sponta- ... cooling scheme which applies to molecular gas. In Sec- tion 4, we ...... Princeton, NJ, 1950). 16.

Network Tomography via Compressed Sensing
that require high-level quality-of-service (QoS) guarantees. In. 1996, the term network tomography was coined by Vardi [1] to encompass this class of methods ...

Network Tomography via Compressed Sensing
and fast network monitoring methods has increased further in recent years due to the complexity of new services (such as video-conferencing, Internet telephony ...

Wireless Network Coding via Modified 802.11 ... - ee.washington.edu
protocol stack on a Software Radio (SORA) platform to support. WNC and obtain achievable throughput estimates via lab-scale experiments. Our results show that network coding (at the ... Engineering, University of Washington, Seattle, WA, 98195 USA, e

Network Support for Mobile Multimedia Using a Self ...
We propose a self-adaptive distributed proxy system that pro- ... Our work is heavily influenced by the research in cluster based scalable network ..... As a result, Linux ..... resource requirement of each operator in terms of network I/O, file.

Social Network Effects
Conclusion and discussion. Social Network Effects. Bertil Hatt. EconomiX, France Telecom R&D. Séminaire Draft – Nanterre. October 10, 2006 ...

Social Network Effects
Oct 10, 2006 - worth implementing—and best fit for a limited number of close peers. ...... suitable model for the economics of hosting blogs—and to explain ...

social network hindi.pdf
Page 1 of 4. social network hindi. The social network 2010 hindi eng dual audio 720p brrip 1gb. Watch. the social network 2010 hindi dubbed online watch ...

Social Network Effects
Oct 10, 2006 - economic model for providers of such services, and suggest in- sights on ..... a joint adoption): e. g. downloading the client application of an IM. ..... suitable model for the economics of hosting blogs—and to explain their spec-.