The small world phenomenon

The phenomenon is surprising because

Size of graph is very large (> 6 billion for the planet). Graph is sparse in the sense that each person is connected to at most k other people (k about a 1000). Graph is decentralized; there is no dominant central vertex to which other vertices are directly connected. Graph is highly clustered, in that most friendship circles are strongly overlapping

10/14/2008

(c) Devika Subramanian, 2008

2

History of research

Karinthy (1929)

Hungarian novelist: “Chains” (5 degrees of separation)

Solomonoff and Rapoport (1951)

Erdos and Renyi (1960)

Theoretical biology: random graphs, phase transition in connectedness Pure mathematics: founders of random graph theory (giant components)

Milgram and Travers (1967):

Sociology and Psychology: acquaintance network, six degrees of separaration

Leskovec 10/14/2008

and Horvitz (2008) (c) Devika Subramanian, 2008

3

Models for small world?

Erdos-Renyi model

n nodes, each node has a probability p of being connected. k = average degree

10/14/2008

(c) Devika Subramanian, 2008

4

Erdos-Renyi model

Average degree k < 1 in ER(n,p) graph

Small, isolated clusters

Small diameters

Average degree k = 1 in ER(n,p) graph

A giant component appears

Diameter peaks

Average degree k > 1 in ER(n,p) graph

Almost all nodes connected

Diameter shrinks (c) Devika Subramanian, 2008

10/14/2008

5

Erdos-Renyi model

10/14/2008

(c) Devika Subramanian, 2008

6

“Giant component” property

In many real-world networks, we see

Small diameter Few connected components: often just one giant component that emerges at a threshold probability

Tipping points of Malcolm Gladwell

Degree distribution follows a power law

10/14/2008

(c) Devika Subramanian, 2008

7

Power law

Power law: y = f(x) = x^{-a}

10/14/2008

(c) Devika Subramanian, 2008

8

Degree distributions of real-world networks

10/14/2008

(c) Devika Subramanian, 2008

9

Barabasi-Albert model

Graph not static, but grows with time. Preferential attachment:

The probability that a new vertex will be connected to vertex i depends proportionally on its degree ki over the sum of all degrees in the graph

10/14/2008

(c) Devika Subramanian, 2008

10

BA graph generation

Start with a small fully connected graph Add vertex one by one, attaching m edges from new vertex to other vertices probabilistically in proportion to number of edges that vertex already has

10/14/2008

(c) Devika Subramanian, 2008

11

Properties of BA model

Small diameters

Threshold phenomena

Degree distribution follows power law

Explains formation of many graphs in the real world: WWW, collaboration networks, power networks, protein networks, citation networks, etc. networkx has a barabasi_albert() function to generate such graphs.

10/14/2008

(c) Devika Subramanian, 2008

12

Graph representations Devika Subramanian Comp 140 Fall 2008

Adjacency matrix representation

For a graph with n vertices, represent edges by n x n array If there is an edge between vertex i and vertex j, position (i,j) in array is a 1, otherwise it is a 0.

Can extend this representation to weighted graphs by replacing 1s and 0s by other numbers. 10/14/2008 (c) Devika Subramanian, 2008

14

Adjacency list representation

For each vertex in a graph, associate a list of adjacent vertices. For weighted graphs, associate a list of tuples (vertex,weight) representing adjacent vertices and their edge weights/costs.

10/14/2008

(c) Devika Subramanian, 2008

15

Graph representations 0 1

2

3

0

1

2

3

4

0

0

1

0

1

0

1

1

0

1

1

0

2

0

1

0

1

1

3

1

1

1

0

0

4

0

0

1

0

0

4

10/14/2008

(c) Devika Subramanian, 2008

16

Graph representations 0 1

2

0

[1,3]

1

[0,2,3]

2

[1,3,4]

3

[0,1,2]

4

[2]

3 4

10/14/2008

(c) Devika Subramanian, 2008

17

Weighted graph representation 0

210 1 203

440

314 2

270

260 3

4

10/14/2008

0

1

2

3

0

0

210

0

440 0

1

210

0

203

314

2

0

203

0

260 270

3

440

314

260

0

0

4

0

0

270

0

0

(c) Devika Subramanian, 2008

4

0

18

Weighted graph representations 0

210 1 203

314

440

2 270

260

0

[(1,210),(3,440)]

1

[(0,210),(2,203),(3.314)]

2

[(1,203),(3,260),(4,270)]

3

[(0,440),(1,314),(2,260)]

4

[(2,270)]

3 4

10/14/2008

(c) Devika Subramanian, 2008

19

networkx graph representation

Graphs packaged as objects

An object is some data together with a set of methods for accessing and manipulating the data. Noun-oriented programming (Guzdial), “ask, don’t touch” philosophy(Kay)

An abstraction that hides implementation details and exposes a clean interface to you.

10/14/2008

(c) Devika Subramanian, 2008

20

networkx Interface

import networkx as nx

G = nx.Graph()

G is an instance of a Graph

for i in range(10): G.add_edge(i,i+1)

nx.diameter(G)

nx.connected_component_subgraphs(G)

G = nx.binomial_graph(100,0.05) 10/14/2008 (c) Devika Subramanian, 2008

21

Python classes

A class is a blueprint for an object

Defines how to create an object Defines the interface to interact with the object class

10/14/2008

instance

(c) Devika Subramanian, 2008

22

networkx graph class

Constructor

https://networkx.lanl.gov/reference/ networkx/ Class Graph(object): def __init__(self,data=None,name=‘’): self.adj={} if data is not None: convert.from_whatever(data.create_using=self) self.name=name def nodes(self): return self.adj.keys() Accessor self refers to the object itself

10/14/2008

(c) Devika Subramanian, 2008

23

Graph class

Defines variables adj and name which are local to the graph object Instead of passing adjacency lists, node lists, we encapsulate the data in an object and pass the object; much cleaner! Can change underlying representation of graph object, without having package users change their code.

10/14/2008

(c) Devika Subramanian, 2008

24

networkx graph constructor def __init__(self,data=None,name=‘’): self.adj = {} if data is not None: convert.from_whatever(data.create_using=self) self.name = name

https://networkx.lanl.gov/reference/networkx/ 10/14/2008

(c) Devika Subramanian, 2008

25

networkx graph representation def add_node(self,n): if n not in self.adj: self.adj[n] = {} def nodes(self): return self.adj.keys() def neighbors(self,n): return self.adj[n].keys()

10/14/2008

def add_edge(self,u,v=None): if v is None: (u,v) = u if u not in self.adj: self.adj[u] = {} if v not in self.adj: self.adj[v] = {} if u == v: return self.adj[u][v] = None self.adj[v][u] = None

(c) Devika Subramanian, 2008

26

Dictionary of dictionaries 1

None

0

3

None

1

0

None

2

None

3

None

1

None

3

None

4

None

0 1

2 2

3 4 3

4

2

10/14/2008

None

(c) Devika Subramanian, 2008

0

None

1

None

2

None 27

Special graphs: digraphs

https://networkx.lanl.gov/reference/ networkx/ Inheritance

Basic functions are inherited

New methods specific to digraphs are added

Some functions are over-ridden.

Advantage: code reuse

10/14/2008

(c) Devika Subramanian, 2008

28

Public and private data

G = nx.Graph()

G.adj can be set to anything we like

Convention: anything with two leading underscores is private. Encapsulation or data hiding, so people access data via functions, rather than directly manipulate the internal structures.

G.add_node()

G.add_edge()

G.nodes()

G.edges()

10/14/2008

(c) Devika Subramanian, 2008

29

Advantages of encapsulation

By defining a specific interface you can keep other modules from doing anything incorrect to your data By limiting the functions you are going to support, you leave yourself free to change the internal data without messing up your users

Makes code more modular, since you can change large parts of your classes without affecting other parts of the program, so long as they only use your public functions

10/14/2008

(c) Devika Subramanian, 2008

30