The 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD2011) 25 May 2011

LGM: Mining Frequent Subgraphs from Linear Graphs Yasuo Tabei

ERATO Minato Project Japan Science and Technology Agency joint work with Daisuke Okanohara (Preferred Infrastructure), Shuichi Hirose (AIST), Koji Tsuda (AIST) 1 1

Outline • Introduction to linear graph ★

Linear subgraph relation



Total order among edges

• Frequent subgraph mining from a set of linear graphs

• Experiments ★

Motif extraction from protein 3D structures 2

2

Linear graph (Davydov et al., 2004)

• Labeled graph whose vertices are totally ordered g = (V, E, L , L ) Linear graph • V

E

‣ V ⊂ N : ordered vertex set ‣ E ⊆ V × V : edge set ‣ LV → ΣV : vertex labels E E : edge labels ‣L →Σ Example: c

b

1 A

a

a

2 B

3

4 B

A

5 C

6 A

3 3

Linear subgraph relation



g1 is a linear subgraph of g2  

i) Conventional subgraph condition ★ Vertex labels are matched ★ All edges of g1 exist in g2 with the correct labels

ii) Order of vertices are conserved Example:

b

b

1 A

a

2 B

g1

3 A



c

a

1 A

2 A

3 B

a

4

g2

B

5 C

6 A

4 4





Subgraph but not linear subgraph g1 is a subgraph of g2 ★ vertex labels are matched ★ all edges in g1also exist in g2 with correct labels g1 is not a linear subgraph of g2 ★

the order of vertices is not conserved b b

c

1 A

2

3

A

B

1 A

g1

c

a

2

3

4

A

B

A

g2

5 5

Total order among edges in a linear graph

• Compare the left vertices first. If they

are identical, look at the right vertices



∀e1 = (i, j) , e2 = (k, l) ∈ Eg , e1


if and only if (i) i < k or (ii) i = k, j < l Example: e1

i

2

e2

j k

l

1

1

2

3 3

4

6 6

Outline • Introduction to linear graph ★

linear subgraph relation



Total order among edges

• Frequent subgraph mining from a set of linear graphs

• Experiments ★

Motif extraction from protein 3D structures 7

7

Frequent subgraph mining from linear graphs

• Enumerate all frequent subgraphs from a set of

linear graphs ★ Subgraphs included in a set of linear graphs at least τ times (minimum support threshold) Enumerate connected and disconnected subgraphs with a unified framework ★ Use reverse search for an efficient enumeration (Avis and Fukuda, 1993) Polynomial delay ★ gSpan = exponential delay ★



8 8

Enumeration of all linear subgraph of a linear graph

• Before considering a mining

algorithm, we have to solve the problem of subgraph enumeration first

to enumerate all subgraphs of • How the following linear graph without duplication

9 9

Search lattice of all subgraphs !"#$%

*+,-+!./!0+12!3!24 &

'

( ) 10 10

Reverse search (Avis and Fukuda, 1993)

• To enumerate all subgraphs without

duplication, we need to define a search tree in the search lattice

• Reduction map f

Mapping from a child to its parent ★ Remove the largest edge ★

2

f

3

2

1 1

1 2

3

4

1

2

3

11 11

Search tree induced by the reduction map

• By applying the reduction map to each element, search tree can be induced !"#$%

12 12

Inverting the reduction map

f

−1

• When traversing the tree from the root, children nodes are created on demand

• In most cases, the inversion of reduction map takes the following two steps:



Consider all children candidates



Take the ones that qualify the reduction map

• However, in this particular case, the

reduction map can be inverted explicitly



Can derive the pattern extension rule (parent to children) 13 13

Pattern extension rule

14 14

Traversing search tree from root

• Depth first traversal for its memory efficiency $&!'()*+!,$'!-+! .!/')--!-'!-+!

!"#$%

15 15

Frequent subgraph mining

• Basic idea: find all possible extensions of a

current pattern in the graph database, and extend the pattern

•★ Occurrence list L

G (g)

Record every occurrence of a pattern g in the graph database G



Calculate the support of a pattern g by the occurrence list !"#$%&'($""

anti-monotonicity of • Use the support for pruning

)$*+,+-

16 16

Outline • Introduction to linear graph ★

linear subgraph relation



Total order among edges

• Frequent subgraph mining from a set of linear graphs

• Experiments ★

Motif extraction from protein 3D structures 17

17

Motif extraction from protein 3D structures •

Pairs of homologous proteins in thermophilic organism and mesophilic organism



Construct a linear graph from a protein ★ Use vertex order from N- to C- terminal ★ Assign vertex labels from {1,...,6} ★ Draw an edge between pairs of amino acid residues whose distance is 5Å



# of data:742, avg. # of vertices:371, avg. # of edges: 496



Rank the enumerated patterns by statistical significance (p-value) ★ Association to thermophilic/methophilic labels ★ Fisher exact test 18 18

Runtime comparison

• Compared to gSpan • Made gapped linear graphs and run gSpan • LGM is faster than gSpan

19 19

• Minimum support = 10 • 103 patterns whose p-value < 0.001 • ★Thermophilic (TATA), Mesophilic (pol II) Share the function as DNA binding protein, but the thermostatility is different

20 20

Mapping motifs in 3D structure

• Thermophilic (TATA), Mesophilic (pol II)

21 21

Summary

• Efficient subgraph mining algorithm from linear graphs

• Search tree is defined by reverse search principle

• Patterns include disconnected subgraphs • Computational time is polynomial-delay • Interesting patterns from proteins 22 22

LGM: Mining Frequent Subgraphs from Linear Graphs

Koji Tsuda (AIST). The 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD2011). 25 May 2011. LGM: Mining Frequent Subgraphs.

1MB Sizes 0 Downloads 259 Views

Recommend Documents

Mining Heavy Subgraphs in Time-Evolving Networks
algorithm on transportation, communication and social media networks for .... The PCST problem [10] takes as input a network¯G = (V,E,w), with positive vertex ...

gApprox: Mining Frequent Approximate Patterns from a ...
such as biological networks, social networks, and the Web, demanding powerful ... [4, 10, 6] that mine frequent patterns in a set of graphs. Recently, there arise a ...

gApprox: Mining Frequent Approximate Patterns from a ...
it can be pushed deep into the mining process. 3. We present systematic empirical studies on both real and synthetic data sets: The results show that frequent ap ...

Margin-Closed Frequent Sequential Pattern Mining - Semantic Scholar
Jul 25, 2010 - Many scientific and business data sources are dy- namic and thus promising candidates for application of temporal mining methods. For an ...

Clustering Graphs by Weighted Substructure Mining
Call the mining algorithm to obtain F. Estimate θlk ..... an advanced graph mining method with the taxonomy of labels ... Computational Biology Research Center.

Birds Bring Flues? Mining Frequent and High ...
Discovering cliques from graph transaction database can provide insights ... clues about the relationship of bird migration and H5N1 according to the results of.

Frequent Pattern Mining Using Divide and Conquer ...
IJRIT International Journal of Research in Information Technology, Volume 1, Issue 4,April ... The implicit information within databases, mainly the interesting association ..... Computer Science, University of California at Irvine, CA, USA1998.

Margin-Closed Frequent Sequential Pattern Mining - Semantic Scholar
Jul 25, 2010 - Temporal data mining exploits temporal information in data sources in .... sume that they are frequent, with some minimum support µ defined.

Mining Frequent Highly-Correlated Item-Pairs at Very ...
results when mining very large data sets. 1. Introduction ... During this analysis we discovered that much of the ... compute frequent pairs from very large datasets. ..... Equipment And Software Platform Used .... architecture-conscious solution.

Frequent Pattern Mining over data streams
IJRIT International Journal of Research in Information Technology, Volume 1, Issue 5, May ... U.V.Patel College of Engineering, Ganpat University, Gujarat, India.

Frequent Subgraph Mining Based on Pregel
Jan 6, 2016 - Graph is an increasingly popular way to model complex data, and the size of single graphs is growing toward massive. Nonetheless, executing graph algorithms efficiently and at scale is surprisingly chal- lenging. As a consequence, distr

Frequent Pattern Mining Using Divide and Conquer ...
Abstract. The researchers invented ideas to generate the frequent itemsets. Time is most important measurement for all algorithms. Time is most efficient thing for ...

Mining Frequent Neighborhood Patterns in a Large ...
Nov 1, 2013 - [email protected]. Ji-Rong Wen. Renmin University of China jirong.wen@gmail. ... Figure 1: Neighborhood patterns with support ra- tios, mined from a public citation network dataset. 1. .... the number of such “partial matches” as

SUPERMANIFOLDS FROM FEYNMAN GRAPHS Contents ... - FSU Math
12. 3.4. Graph supermanifolds. 16. 3.5. Examples from Feynman graphs. 17. 3.6. The universality ..... with Xα = (xi,ξr) and Yβ = (yj,ηs). We explain in §3 below ...

Generating Semantic Graphs from Image ...
semantic parser generates a unique semantic graph. G representing the descriptions of .... pseudo-code 1, shows that if Gcomb is empty then Gnext,. i.e. the next ...

SUPERMANIFOLDS FROM FEYNMAN GRAPHS ... - Semantic Scholar
Feynman graphs, namely they generate the Grothendieck ring of varieties. ... showed that the Ward identities define a Hopf ideal in the Connes–Kreimer Hopf.

SUPERMANIFOLDS FROM FEYNMAN GRAPHS Contents ... - FSU Math
the grading is the Z2-grading by odd/even degrees. The supermanifold is split if the isomorphism A ∼= Λ•. OX (E) is global. Example 2.1. Projective superspace.

Structural Role Extraction & Mining in Large Graphs - IBM System G
Given a network, we want to automatically capture the structural behavior (or function) of nodes via roles. Exam- ... novel approach, called RolX (Role eXtraction), which auto- matically and effectively summarizes the behavior of ...... and tracking

Discovering Frequent Work Procedures From Resource ...
Feb 11, 2009 - file copy, attach file to email message, and save attachment. These actions allow .... TaskTracer to capture several kinds of provenance links (File copy, File rename ...... and M. Seltzer. Provenance-aware storage systems. In.

Monolithic Linear Battery Charger Operates from ... - Linear Technology
lead acid battery stack from a solar panel, though any combination of input and battery voltages are possible. The LTC4079's differential voltage regulation is.

Mining Distance-Based Outliers in Near Linear ... - Semantic Scholar
Institute for the Study of Learning and Expertise ...... 2Taking a course that a high school or college would accept ..... Technical Report Project 21-49-004,.

Frequent Service Map
Every 15 Minutes or Better. All Day, Every Day. 6 AM - 8 PM Mon - Sat. 8 AM - 6 PM Sun. Trabue Rd. R iverside D r. R ive rsid e. D r. N. Some trips continue east.