IJRIT International Journal of Research in Information Technology, Volume 3, Issue 5, May 2015, Pg.337-346

International Journal of Research in Information Technology (IJRIT)

www.ijrit.com

ISSN 2001-5569

A Detailed Survey on Anonymization Methods of Social Networks Mr. Hare Ram Singh Assistant Professor, Department of Information Science and Engineering RGIT, Bangalore Mr. Shailesh kumar Assistant Professor, Department of Information Science and Engineering VVIET, Mysore Mr. Harsha S Associate Professor, Department of Information Science and Engineering VVIET, Mysore Mr. Theja N Assistant Professor, Department of Computer Science and Engineering VVIET, Mysore

Abstract The rapid growth and exponential use of social network has become part of personal and professional life. In general, social networks are structures made of social entities (e.g., individuals) that are linked by some specific types of interdependency such as friendship, relationship, likeness, similarity. Most users of social media (e.g., Facebook, LinkedIn, MySpace, Twitter, Flickr, YouTube) have many linkages in terms of friends, connections, and/or followers. To provide the privacy of the social entities and their links, different anonymization techniques are used by the social network providers. This paper provides the detailed survey on different methods of anonymization provided by different social networks. Keywords: Social network; social computing; analysis; social media; security; graph.

Introduction Social networks are among the foremost widespread sites on the web since Internet has bred several varieties of information sharing systems [1]. As Alexa’s Top 500 Global Sites statistics (retrieved on May 2011) indicate, Facebook and Twitter, two popular online social networking services, rank at second and ninth place, respectively. A social network describes entities and connections between them. The entities are often individuals; they are connected by personal relationships, interactions, or flows of information. Social network analysis is concerned with uncovering patterns in the connections between

Mr. Hare Ram Singh, IJRIT-337

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 5, May 2015, Pg.337-346

entities. It has been widely applied to organizational networks to classify the influence or popularity of individuals and to detect collusion and fraud. Social network analysis can also be applied to study disease transmission in communities, the functioning of computer networks, and emergent behavior of physical and biological systems. Technological advances have made it easier than ever to collect the electronic records that describe social networks. However, agencies and researchers who collect such data are often faced with a choice between two undesirable outcomes. They can publish data for others to analyze, even though that analysis will create severe privacy threats, or they can withhold data because of privacy concerns, even though that makes further analysis impossible [2]. Similarly, researchers in the field of computer networking analyze internet topology, internet traffic and routing properties using network traces that can now be collected at line speeds at the gateways of institutions and by ISPs. These traces represent a social network where the entities are internet hosts and the existence of communication between hosts constitutes a relationship. However network traces (even with packet content removed) contain sensitive information because it is often possible to associate individuals with the hosts they use, and because traces contain information about web sites visited, and time stamps which indicate periods of activity. The challenges in protecting network trace data are being actively addressed by the research community [3]. Online social networking services, while providing convenience to users, accumulate a treasure of usergenerated content and users’ social connections, which were only available to large telecommunication service providers and intelligence agencies a decade ago Online social networking data, once published, are of great interest to a large audience: Sociologists can verify hypotheses on social structures and human behavior patterns; third-party application developers can produce value-added services such as games based on users’ contact lists; advertisers can more accurately infer a user’s demographic and preference profile and thus can issue targeted advertisements. As the December 2010 revision of Facebook’s Privacy Policy phrases it: “We allow advertisers to choose the characteristics of users who will see their advertisements and we may use any of the nonpersonally identifiable attributes we have collected to select the appropriate audience for those advertisements.” Due to the strong correlation to users’ social identity, privacy is a major concern in dealing with social network data in contexts such as storage, processing, and publishing. Privacy control, through which users can tune the visibility of their profile, is an essential feature in any major social networking service [4]. The most appropriate method to provide privacy for the users in publishing social network is anonymization, i.e., removing plainly identifying labels such as names, social security numbers, postal or e-mail addresses, but retaining the network structure The motivation behind such anonymization is that, by removing the “who” information, the utility of the social networks is maximally preserved without compromising users’

Mr. Hare Ram Singh, IJRIT-338

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 5, May 2015, Pg.337-346

privacy. In several high-profile cases, anonymity has been unquestioningly interpreted as equivalent to privacy [5].

Literature Survey This section describes the different anonymization techniques and its feasibility, optimality and privacy

Fig.1: Anonymization removing ID’s Method 1: A natural mathematical model to represent a social network is a graph. A graph G consists of a set V of vertices and a set E

V * V of edges. Labels can be attached to both vertices and edges to represent

attributes. The vertices in the above diagram represent the social network users and the edges represents the friendships, followers and likeness etc. The naive anonymization is to remove those labels which can be uniquely associated with one vertex (or a small group of vertices) from V. This is closely related to traditional anonymization techniques employed on relational data set [6]. As a person registers to different social networking services, her connections in these services, which relate to her social relationships in the real world, might reveal valuable information which the attacker can make use of to threaten her privacy.

Security breach This section describes an security compromise of the anonymized social graph. Let an undirected graph GT ={VT; ET } represents the target social network after anonymization. We assume that the attacker has an undirected graph GB={VB, EB} which models his background knowledge about the social relationships among a group of people, i.e., VB are labeled with the identities of these people. The motivating scenario demonstrates one way to obtain GB. The attack concerned here is to infer the identities of the vertices VT by considering structural similarity between the target graph GT and the background graph GB . Thus, the two graphs GT and GB are syntactically (the social connections) similar but semantically (the meaning associated with such connections) different. By reidentifying the vertices in GT with the help of GB, the attacker associates the sensitive semantics with users on the anonymized GT and, thus, compromises the privacy of such users. An Mr. Hare Ram Singh, IJRIT-339

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 5, May 2015, Pg.337-346

example of sensitive semantics is the private chat sessions, and their associated timestamps, in the motivating scenario. We assume that, before the release of GT, the attacker obtains (either by creating or stealing) a few accounts and connects them with a few other users (the initial seeds) in GT. Besides user IDs, the attacker knows nothing about the relationship between the initial seeds and other users in GT.

Fig. 2: A randomly generated graph

Feasibility Successful retrieval of GF from GT is guaranteed if GF exhibits the following structural properties: •

GF is uniquely identifiable, i.e., no subgraph H

GT except GF is isomorphic to GF. For example, in

Fig. 2, subgraph {v1, v2, v3} is isomorphic to subgraph {v1, v4, v5} because there is a structurepreserving mapping v1→ v1, v2→v4, v3→ v5 between them. Therefore, the two subgraphs are structurally indistinguishable once the vertex labels are removed. •

GF is asymmetric, i.e., GF does not have any nontrivial automorphism. For example, in Fig. 2, sub graph {v1; v2; . . . ; v5} has an automorphism v1→v1, v2→v3, v3→ v4, v4 → v5, v5→ v2. Therefore, even if we could locate VF = {v1; . . . ; v5} from GT{ v2; . . . ; v5} are indistinguishable once their labels are removed.

Method 2:

Fig. 3: A Social Network,G;Anonymization of G, Anonymization Mapping

The above diagram shows an example of social network graph G, anonymization of G and anonymization mapping. Definition 1 (Naive Anonymization) The naive anonymization of a graph G = (V,E) is an isomorphic graph, Gna = (Vna,Ena), defined by a random bijection f : V→Vna. The edges of Gna are Ena = {(f(x), f(x’))|(x, x’)

E}. Mr. Hare Ram Singh, IJRIT-340

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 5, May 2015, Pg.337-346

Achieving anonymity through structural similarity Intuitively, nodes that look structurally similar may be indistinguishable to an adversary, in spite of external information. A strong form of structural similarity between nodes is automorphic equivalence. Two nodes x, y

V are automorphically equivalent (denoted x

A y) if there exists an isomorphism from the graph

onto itself that maps x to y. Fred and Harry are automorphically equivalent nodes in the graph of Figure 3. Bob and Ed are not automorphically equivalent: the subgraph around Bob is different from the subgraph around Ed and no isomorphism proving automorphic equivalence is possible.

a. Graph

b.Vertex Refinement

c. Equivalence Classes Fig.4: Simple Graphs and degree of neighbors Vertex refinement queries We define a class of queries, of increasing power, which report on the local structure of the graph around a node. These queries are inspired by iterative vertex refinement, a technique originally developed to efficiently test for the existence of graph isomorphism. The weakest knowledge query, H0, simply returns the label of the node. (Since our graphs are unlabeled, H0 returns

ᶝ on all input nodes.) The queries are

successively more descriptive: H1(x) returns the degree ᶝof x, H2(x) returns the list of each neighbors’ degree, and so on. The queries can be defined iteratively, where Hi(x) returns the multiset of values which are the result of evaluating Hi−1 on the set of nodes adjacent to x: Hi(x) = {Hi−1(z1),Hi−1(z2) . . . ,Hi−1(zm)} where z1 . . . zm are the nodes adjacent to x. Example 1 Figure 4 contains the same graph from Figure 3 along with the computation of H0, H1, and H2 for each node. For example: H0 is uniformly

. H1(Bob) = {

,

,

,

}, which we abbreviate in the

table simply as 4. Using this abbreviation, H2(Bob) = {1, 1, 4, 4} which represents Bob’s neighbors’ degrees. Mr. Hare Ram Singh, IJRIT-341

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 5, May 2015, Pg.337-346

For each query Hi we define an equivalence relation on nodes in the graph in the natural way. (Relative equivalence) Two nodes x, y in a graph are equivalent relative to Hi, denoted x ≡Hi y, if and only if Hi(x) = Hi(y). Example 2 Figure 4(c) lists the equivalence classes of nodes according to relations ≡ H0 , ≡H1 , and ≡H2. All nodes are equivalent relative to H0 (for an unlabeled graph). As i increases, the values for Hi contain successively more precise structural information about the node’s position in the graph, and as a result, equivalence classes are divided. To an adversary limited to knowledge query Hi, nodes equivalent with respect to Hi are indistinguishable.

Method 3: Ana

21

F

20740

Bob

25

M

83201

Chris

24

M

20742

Don

29

M

83209

Emma

28

F

83230

Fabio

31

M

83222

Gia

24

F

Halle

29

F

83201

Ian

23

M

20760

John

24

M

20740

20640

Fig. 5: A simple User’s data table In figure 5 a simple user’s data are provided which all are the in a social network and having some kind of relationship among each others. Now we have to anonymize the data given in the above diagram using a simple anonymization technique and we will divide the users into different groups and we will also anonymize the links between different users

Mr. Hare Ram Singh, IJRIT-342

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 5, May 2015, Pg.337-346

Fig. 6: Anonymize table from above table As it is shown in the above diagram all the names of the users have been replaced by blue or red colored ball based on the value of ages. Each user who/ she is less than 25 years old will be converted into red colored ball and remaining who all are greater than or equal to 25 will be replaced by blue colored ball.

Fig. 7: division of users into equivalence classes After anonymizing the users data we will analyze the link or relationship among different users. After analyzing the link we will divide the users into different equivalence classes.

Anonymization of Links: Now after anonymization of users of the social network we are going to anonymize the links between the users. To anonymize the links we have the following different methods: Original graph

Intact Link: In this method we can release the links between two clusters. Mr. Hare Ram Singh, IJRIT-343

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 5, May 2015, Pg.337-346

Partial link removal: In this method of link anonymization, we partially release the links between clusters.

Cluster-edge method: In this method the links between the nodes will no longer be existing. Only the links between the clusters will be existing which means some node links with some other nodes.

Constrained cluster-edge method: In this method there will be only one link between two clusters which means some node of one cluster is linked with some node of the other cluster.

All links removed: In this method of link anonymization all the links between both the clusters will be removed.

References [1]

A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and

analysis of online social networks. In Proc. of the 7th ACM SIGCOMM conference on Internet measurement, New York, NY, USA, pages 29-42, October 2007. [2] Michael Hay, Gerome Miklau, David Jensen, Philipp Weis, and Siddharth Srivastava “Anonymizing Social Networks”

Mr. Hare Ram Singh, IJRIT-344

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 5, May 2015, Pg.337-346

[3] R. Pang, M. Allman, V. Paxson, and J. Lee. The devil and packet trace anonymization. SIGCOMM Comput. Commun. Rev., 2006. [4] B. Krishnamurthy and C.E. Wills, “Characterizing Privacy in Online Social Networks,” Proc. First Workshop Online Social Networks (WOSN), 2008. [5] A. Narayanan and V. Shmatikov, “De-Anonymizing Social Networks,” Proc. IEEE 30th Symp. Security and Privacy, 2009. make use of to threaten her privacy. [6] A. Korolova, R. Motwani, S. Nabar, and Y. Xu, “Link Privacy in Social Networks,” Proc. 17th ACM Conf. Information and Knowledge Management (CIKM), 2008. [7] Wei Peng, Student Member, IEEE, Feng Li, Member, IEEE,Xukai Zou, Member, IEEE, and Jie Wu, Fellow, IEEE “A Two-Stage Deanonymization Attack against Anonymized Social Networks” IEEE trans feb 2014

Authors Bibilography: Hare Ram Singh has received his BE degree in Computer Science & Engineering in 2008 from JSSATE, Bangalore and M.Tech in Computer Science & Engineering from BTLIT, Bangalore. Currently, he is working as Assistant Professor in the Information

Science & Engineering department at RGIT, Bangalore. He

is

interested

in Computer Networks,

Cloud Computing, Image Processing and Data Mining research fields etc.

Shailesh kumar has received his BE degree in Computer Science & Engineering in 2009 from H.M.S.I.T Tumkur and M.Tech in Computer Science & Engineering from SSIT, Tumkur. Currently, he is working as Assistant Professor in the Information Science & Engineering

department at VVIET, Mysore. He

is interested in Computer Networks, Image Processing and

Algorithms research fields etc.

Harsha S has received his BE degree in Telecommunication & Engineering in 2005 from CIT, Codagu and M.Tech in Computer Networks & Engineering from National institute of Engineering. Currently, he is working as Associate Professor in the Information

Science &

Engineering

Mr. Hare Ram Singh, IJRIT-345

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 5, May 2015, Pg.337-346

department at VVIET, Mysore. He is interested in Computer Networks, Image Processing and Algorithms research fields etc.

Theja N has received his BE degree in 2009 and M.Tech 2012 in Computer Science & Engineering from SJCE, Mysore BTLIT, Bangalore respectively. Currently, he is working as Assistant Professor in the ISE

dept. at VVIET, Mysore.

Mr. Hare Ram Singh, IJRIT-346

A Detailed Survey on Anonymization Methods of Social Networks

Online social networking services, while providing convenience to users, .... successively more descriptive: H1(x) returns the degree ᶝof x, H2(x) returns the list ...

325KB Sizes 9 Downloads 328 Views

Recommend Documents

A Detailed Survey on Anonymization Methods of Social Networks
Social networks are among the foremost widespread sites on the web since Internet has bred several varieties of information ... (retrieved on May 2011) indicate, Facebook and Twitter, two popular online social networking services, rank at second and

A Survey on Obstruction of Confidential Information Attacks in Social ...
To appreciate the feasibility of probable inference attacks and the efficiency of a variety of techniques of sanitization combating against those attacks, various methods were applied. Keywords: Social networking, K-anonymity, Private information lea

Face Detection Methods: A Survey
IJRIT International Journal of Research in Information Technology, Volume 1, Issue 11, November, 2013, Pg. 282-289 ... 1Student, Vishwakarma Institute of Technology, Pune University. Pune .... At the highest level, all possible face candidates are fo

A Survey on Leveraging Deep Neural Networks for ...
data. • Using Siamese Networks. Two-stream networks, with shared weight .... “Learning Multi-domain Convolutional Neural Networks for Visual Tracking” in ...

A Survey of Security and Privacy in Online Social Networks - CiteSeerX
Social network “applications” are web pages that are .... In PCO [70], Rahman et al. develop an architecture ..... When a web application (e.g., Google maps).

A Survey of Security and Privacy in Online Social Networks - CiteSeerX
although other services are often incorporated. .... 1: Alice is a member of two different communi- .... take content from one social network and introduce it in.

On compressing social networks
far less compressible than Web graphs yet closer to host graphs and exploiting link ... code with parameter 4 (which we found to be the best in our ex- periments) [7]. .... called min-wise independent family suffices [10]; in practice, even pairwise 

Tour Recommendation on Location-based Social Networks
INTRODUCTION. For a visitor in a foreign city, it is a challenging ... Intl. Joint Conf. on Artificial Intelligence (IJCAI'15), 2015. [3] T. Tsiligirides. Heuristic methods ...

A Survey of Key Management Schemes in Wireless Sensor Networks
F. Hu is with Computer Engineering Dept., Rochester Institute of Technology, ...... sensor networks, 3G wireless and mobile networks, and network security.

A Survey of Key Management Schemes in Wireless Sensor Networks
Wireless sensor network, key management, security, key predistribution, pairwise key, ... F. Hu is with Computer Engineering Dept., Rochester Institute of Technology, Rochester, ..... phases that perform a particular job each, including Sender Setup,

User Interface Languages: a survey of existing methods
zSERC Post-doctoral Fellow, Human Computer Interaction Group, ..... the e ects of events is easier (although I would personally prefer even less reliance on.

A Survey of Eigenvector Methods for Web ... - Semantic Scholar
Oct 12, 2004 - Nevertheless, ties may occur and can be broken by any tie-breaking strategy. Using a “first come, first serve” tie-breaking strategy, the authority and hub scores are sorted in decreasing order and the ..... surfer's “teleportati

A survey of fast exponentiation methods
Dec 30, 1997 - Doney, Leong and Sethi [10] showed that the problem of finding the short- est addition sequence is NP-complete. 2 Basic Methods. 2.1 Binary Method. This method is also known as the “square and multiply” method. It is over. 2000 yea

A Survey of Eigenvector Methods for Web ... - Semantic Scholar
Oct 12, 2004 - Consider that this term-by-document matrix has as many columns as there are documents in a particular collection. ... priority on the speed and accuracy of the IR system. The final .... nonnegative matrix possesses a unique normalized

A survey of kernel methods for relation extraction
tasks were first formulated, all but one of the systems (Miller et al., 1998) were based on handcrafted ... Hardcom Corporation”. Fig. 1. Example of the .... method first automatically determined a dynamic context-sensitive tree span. (the original

A Survey of Noise Reduction Methods for Distant ...
H.3.1 [Information Storage and Retrieval]: Content. Analysis ... relation extraction paradigms can be distinguished: 1) open information ... While open information extraction does ..... to the textual source on which it is most frequently applied,.

Detailed guidance on the electronic submission of information on ...
Apr 19, 2017 - basis in line with the requirements and business processes described in this ...... to support data analytics and business intelligence activities; ...

Density, social networks and job search methods
problem, to screen them.2 The focus of this paper is different. .... to search for a job through friends and relative while the high educated will share their.

Detailed guidance on the electronic submission of information on ...
Apr 19, 2017 - marketing authorisation holders to the European Medicines Agency in accordance with Article .... Pharmacovigilance enquiry email (AP.7) .