Graph theoretic concepts Devika Subramanian Comp 140 Fall 2008
The small world phenomenon
The phenomenon is surprising because
Size of graph is very large (> 6 billion for the planet). Graph is sparse in the sense that each person is connected to at most k other people (k about a 1000). Graph is decentralized; there is no dominant central vertex to which other vertices are directly connected. Graph is highly clustered, in that most friendship circles are strongly overlapping
10/14/2008
(c) Devika Subramanian, 2008
2
History of research
Karinthy (1929)
Hungarian novelist: “Chains” (5 degrees of separation)
Solomonoff and Rapoport (1951)
Erdos and Renyi (1960)
Theoretical biology: random graphs, phase transition in connectedness Pure mathematics: founders of random graph theory (giant components)
Milgram and Travers (1967):
Sociology and Psychology: acquaintance network, six degrees of separaration
Leskovec 10/14/2008
and Horvitz (2008) (c) Devika Subramanian, 2008
3
Models for small world?
Erdos-Renyi model
n nodes, each node has a probability p of being connected. k = average degree
10/14/2008
(c) Devika Subramanian, 2008
4
Erdos-Renyi model
Average degree k < 1 in ER(n,p) graph
Small, isolated clusters
Small diameters
Average degree k = 1 in ER(n,p) graph
A giant component appears
Diameter peaks
Average degree k > 1 in ER(n,p) graph
Almost all nodes connected
Diameter shrinks (c) Devika Subramanian, 2008
10/14/2008
5
Erdos-Renyi model
10/14/2008
(c) Devika Subramanian, 2008
6
“Giant component” property
In many real-world networks, we see
Small diameter Few connected components: often just one giant component that emerges at a threshold probability
Tipping points of Malcolm Gladwell
Degree distribution follows a power law
10/14/2008
(c) Devika Subramanian, 2008
7
Power law
Power law: y = f(x) = x^{-a}
10/14/2008
(c) Devika Subramanian, 2008
8
Degree distributions of real-world networks
10/14/2008
(c) Devika Subramanian, 2008
9
Barabasi-Albert model
Graph not static, but grows with time. Preferential attachment:
The probability that a new vertex will be connected to vertex i depends proportionally on its degree ki over the sum of all degrees in the graph
10/14/2008
(c) Devika Subramanian, 2008
10
BA graph generation
Start with a small fully connected graph Add vertex one by one, attaching m edges from new vertex to other vertices probabilistically in proportion to number of edges that vertex already has
10/14/2008
(c) Devika Subramanian, 2008
11
Properties of BA model
Small diameters
Threshold phenomena
Degree distribution follows power law
Explains formation of many graphs in the real world: WWW, collaboration networks, power networks, protein networks, citation networks, etc. networkx has a barabasi_albert() function to generate such graphs.
10/14/2008
(c) Devika Subramanian, 2008
12
Graph representations Devika Subramanian Comp 140 Fall 2008
Adjacency matrix representation
For a graph with n vertices, represent edges by n x n array If there is an edge between vertex i and vertex j, position (i,j) in array is a 1, otherwise it is a 0.
Can extend this representation to weighted graphs by replacing 1s and 0s by other numbers. 10/14/2008 (c) Devika Subramanian, 2008
14
Adjacency list representation
For each vertex in a graph, associate a list of adjacent vertices. For weighted graphs, associate a list of tuples (vertex,weight) representing adjacent vertices and their edge weights/costs.
10/14/2008
(c) Devika Subramanian, 2008
15
Graph representations 0 1
2
3
0
1
2
3
4
0
0
1
0
1
0
1
1
0
1
1
0
2
0
1
0
1
1
3
1
1
1
0
0
4
0
0
1
0
0
4
10/14/2008
(c) Devika Subramanian, 2008
16
Graph representations 0 1
2
0
[1,3]
1
[0,2,3]
2
[1,3,4]
3
[0,1,2]
4
[2]
3 4
10/14/2008
(c) Devika Subramanian, 2008
17
Weighted graph representation 0
210 1 203
440
314 2
270
260 3
4
10/14/2008
0
1
2
3
0
0
210
0
440 0
1
210
0
203
314
2
0
203
0
260 270
3
440
314
260
0
0
4
0
0
270
0
0
(c) Devika Subramanian, 2008
4
0
18
Weighted graph representations 0
210 1 203
314
440
2 270
260
0
[(1,210),(3,440)]
1
[(0,210),(2,203),(3.314)]
2
[(1,203),(3,260),(4,270)]
3
[(0,440),(1,314),(2,260)]
4
[(2,270)]
3 4
10/14/2008
(c) Devika Subramanian, 2008
19
networkx graph representation
Graphs packaged as objects
An object is some data together with a set of methods for accessing and manipulating the data. Noun-oriented programming (Guzdial), “ask, don’t touch” philosophy(Kay)
An abstraction that hides implementation details and exposes a clean interface to you.
10/14/2008
(c) Devika Subramanian, 2008
20
networkx Interface
import networkx as nx
G = nx.Graph()
G is an instance of a Graph
for i in range(10): G.add_edge(i,i+1)
nx.diameter(G)
nx.connected_component_subgraphs(G)
G = nx.binomial_graph(100,0.05) 10/14/2008 (c) Devika Subramanian, 2008
21
Python classes
A class is a blueprint for an object
Defines how to create an object Defines the interface to interact with the object class
10/14/2008
instance
(c) Devika Subramanian, 2008
22
networkx graph class
Constructor
https://networkx.lanl.gov/reference/ networkx/ Class Graph(object): def __init__(self,data=None,name=‘’): self.adj={} if data is not None: convert.from_whatever(data.create_using=self) self.name=name def nodes(self): return self.adj.keys() Accessor self refers to the object itself
10/14/2008
(c) Devika Subramanian, 2008
23
Graph class
Defines variables adj and name which are local to the graph object Instead of passing adjacency lists, node lists, we encapsulate the data in an object and pass the object; much cleaner! Can change underlying representation of graph object, without having package users change their code.
10/14/2008
(c) Devika Subramanian, 2008
24
networkx graph constructor def __init__(self,data=None,name=‘’): self.adj = {} if data is not None: convert.from_whatever(data.create_using=self) self.name = name
https://networkx.lanl.gov/reference/networkx/ 10/14/2008
(c) Devika Subramanian, 2008
25
networkx graph representation def add_node(self,n): if n not in self.adj: self.adj[n] = {} def nodes(self): return self.adj.keys() def neighbors(self,n): return self.adj[n].keys()
10/14/2008
def add_edge(self,u,v=None): if v is None: (u,v) = u if u not in self.adj: self.adj[u] = {} if v not in self.adj: self.adj[v] = {} if u == v: return self.adj[u][v] = None self.adj[v][u] = None
(c) Devika Subramanian, 2008
26
Dictionary of dictionaries 1
None
0
3
None
1
0
None
2
None
3
None
1
None
3
None
4
None
0 1
2 2
3 4 3
4
2
10/14/2008
None
(c) Devika Subramanian, 2008
0
None
1
None
2
None 27
Special graphs: digraphs
https://networkx.lanl.gov/reference/ networkx/ Inheritance
Basic functions are inherited
New methods specific to digraphs are added
Some functions are over-ridden.
Advantage: code reuse
10/14/2008
(c) Devika Subramanian, 2008
28
Public and private data
G = nx.Graph()
G.adj can be set to anything we like
Convention: anything with two leading underscores is private. Encapsulation or data hiding, so people access data via functions, rather than directly manipulate the internal structures.
G.add_node()
G.add_edge()
G.nodes()
G.edges()
10/14/2008
(c) Devika Subramanian, 2008
29
Advantages of encapsulation
By defining a specific interface you can keep other modules from doing anything incorrect to your data By limiting the functions you are going to support, you leave yourself free to change the internal data without messing up your users
Makes code more modular, since you can change large parts of your classes without affecting other parts of the program, so long as they only use your public functions
10/14/2008
(c) Devika Subramanian, 2008
30