Detecting highly overlapping communities with Model-based Overlapping Seed Expansion Aaron McDaid and Neil Hurley

March 10, 2010

I

Clique percolation. I

Faster than alternatives (preliminary finding)

I

GCE

I

Model-based community finding

I

MOSES (Model-based Overlapping Seed ExpanSion) What next?

I

I I I

Benchmarks NMI follow the research basic structure. Kronecker?

MOSES

I

(Relatively) simple model

I

Optimize objective using heuristics

I

Scales

I

(a lot more to do though)

MOSES

MOSES

MOSES

Model that: I

Every pair of nodes has a (small) chance of having an edge.

I

Communities increase the probability of edges forming between member nodes.

I

Try to put most edges inside communities

I

Try to put most disconnected pairs outside communities s(i, j) number of communities in common between nodes i and j.

I

I I

Maximize s(i, j) where i and j are connected. Minimize s(i, j) where i and j are not connected.

MOSES A community around every edge. Maximizes P(edges|grouping )

MOSES I

So, maximizing P(edges|grouping ) is not good enough. Must consider P(grouping ) also. P(edges|grouping ) × P(grouping )

I

Need a prior on the grouping.

I

Arbitrary choices made so far. Might overhaul the model. Y

FX (G ) =

 s (i,j) 1−Xi,j

qo qi Z

i
×

Y

 s (i,j) Xi,j

1 − qo qi Z

i
× Qz !

1

Y ˆZ 1≤c≤Q

N nc



(N + 1)

(1)

Heuristics

I

Experimenting with various heuristics to maximize the objective I I I

I

I

Seed expansion. Start with an edge. Deletions Iterative update (a la Louvain method)

But many more things should be experimented with to get better results. Easy to efficiently optimize (with the current model) I I

Oklahoma Facebook network: 892,528 edges. 2,088 seconds. A lot more could be done to speed this up even further.

MOSES A

Current seed, C , in black. Frontier nodes in blue. The next node selected will probably be A, as it has the most connections into the seed. But that depends on pin and po and on the values of sz (i, j) for the edges

Evaluation 20-node communities (cliques), po = 0.005

1.0

I

0.0

0.2

0.4

NMI

0.6

0.8

MOSES LFM (default) LFM (last Collection) GCE Louvain method copra 5−clique percolation 4−clique percolation (dashed) Iterative Scan (dashed)

1

2

4

6

8

10

Average Overlap

12

14

15

Evaluation 20-node communities (pin = 0.4), po = 0.005

1.0

I

0.0

0.2

0.4

NMI

0.6

0.8

MOSES LFM (default) LFM (last Collection) GCE Louvain method copra 5−clique percolation 4−clique percolation (dashed) Iterative Scan (dashed)

1

2

4

6

8

10

Average Overlap

12

14

15

Evaluation 20-node communities (pin = 0.3), po = 0.005

1.0

I

0.0

0.2

0.4

NMI

0.6

0.8

MOSES LFM (default) LFM (last Collection) GCE Louvain method copra 5−clique percolation 4−clique percolation (dashed) Iterative Scan (dashed)

1

2

4

6

8

10

Average Overlap

12

14

15

Evaluation LFR k15minc15-maxc60mult1

1.0

I

0.0

0.2

0.4

NMI

0.6

0.8

MOSES LFM2−firstCol LFM2−lastCol GCE SCP−3 Louvain method copra SCP−4

1

1.2

1.6

2

3

4

5

6

Communities per node

7

8

9 10

Evaluation LFR k15minc15-maxc60mult3

1.0

I

0.0

0.2

0.4

NMI

0.6

0.8

MOSES LFM2−firstCol LFM2−lastCol GCE Louvain method copra SCP−4

1

1.2

1.6

2

3

4

5

6

Communities per node

7

8

9 10

Evaluation

I

Facebook I I

Traud et al’s five university networks. Average of 7 communities per node.

Evaluation

70

Counts 1142 1071 999 928 857 785 714 643 572 500 429 358 286 215 144 72 1

Communitiers per node

60

50

40

30

20

10

0 0

200

400

600

Degree

800

1000

1200

0.6

Size of communities found

0.4 0.3 0.2 0.1 0.0

Density

0.5

Oklahoma Princeton UNC Georgetown Caltech

1

5

10

50

Size of community

500

0.3 0.2 0.1 0.0

Density

0.4

0.5

Comms per node

1

2

5

10

20

50

Communities−per−person

100

0.2 0.1 0.0

Density

0.3

0.4

Degree Distribution

1

5

10

50

Degree

500

Acknowledgments

This research was supported by Science Foundation Ireland (SFI) Grant No. 08/SRC/I1407.

Detecting highly overlapping communities with Model ...

Mar 10, 2010 - 1. 5. 10. 50. 500. 0.0. 0.1. 0.2. 0.3. 0.4. 0.5. 0.6. Size of community. Density. Oklahoma. Princeton. UNC. Georgetown. Caltech ...

379KB Sizes 1 Downloads 268 Views

Recommend Documents

Detecting highly overlapping communities with Model ...
1Our C++ implementation of MOSES is available at http://sites.google.com/ ..... a) Edge expansion: In the initial phase of the algorithm, .... software. For the specification of overlapping NMI, see the appendix of .... development of the model.

Detecting highly overlapping communities with Model ...
Mar 10, 2010 - ... j are connected. ▻ Minimize s(i, j) where i and j are not connected. ... But many more things should be experimented with to get better results.

Detecting highly overlapping communities with Model ...
a more highly overlapping community structure, with nodes .... community within a social network, most definitions try to ..... node to ten communities per node.

Detecting Communities with Common Interests on Twitter
Jun 28, 2012 - Twitter, Social Networks, Community Detection, Graph Mining. 1. INTRODUCTION ... category, we selected the six most popular celebrities based on their number of ... 10. 12. 14. 16. 18. Control Group. Film & TVMusic Hosting News Bloggin

Detecting Like-minded Communities with Common ...
ABSTRACT. The popularity and prevalence of online social networks (OSN) have made them efficient platforms for advertising and mar- keting campaigns. One important problem in target adver- tising and viral marketing on OSNs is the efficient identifi-

An Interaction-based Approach to Detecting Highly Interactive Twitter ...
IOS Press. An Interaction-based Approach to Detecting. Highly Interactive Twitter Communities using. Tweeting Links. Kwan Hui Lim∗ and Amitava Datta. School of Computer ... 1570-1263/16/$17.00 c 2016 – IOS Press and the authors. All rights reserv

An Overlapping Generations Model of Habit Formation ...
when the tax rate is high enough (i.e., exceeds a ”critical” tax rate, which can be as low as zero ... Both savings and interest on savings are fully con- sumed. c2 t+1 = (1 + ..... be misleading if habit formation is taken into account. The intu

An Interaction-based Approach to Detecting Highly Interactive Twitter ...
Twitter: Understanding microblogging usage and communi- ties. In Proceedings of the 9th WebKDD and 1st SNA-KDD. 2007 Workshop on Web Mining and Social Network Analysis. (WebKDD/SNA-KDD '07), pages 56–65, Aug 2007. [20] A. M. Kaplan and M. Haenlein.

Detecting Location-centric Communities using Social ...
increasing popularity of Location-based Social Networks offers the op- portunity to ... Most of these earlier works consider the spatial aspect of check-ins and co- location without the .... erties of communities with ≤30 users [2, 10]. In particul

Tweets Beget Propinquity: Detecting Highly Interactive ...
among users, rather than the topological information implicit ... Interface (API)1. The availability of the Twitter API has stirred immense interest in the academic study of the Twitter social network. Various models have been proposed for studying a

An Overlapping Generations Model of Habit Formation ...
financial support. 1 ...... Utility and Probability, New York, London: W.W. Norton & Company. ... Satisfaction, New York and Oxford: Oxford University Press.

Tracking Across Multiple Cameras with Overlapping ...
line is similar, we count on the color and edge clues which lead us to the correct results. There are .... and Automation, May 2006. [16] S. M. Khan and M. Shah, ...

Sharir - Overlapping Architectures.pdf
1Alon Brutzkus & Amir Globerson. Globally Optimal Gradient Descent for a. ConvNet with Gaussian Inputs. ICML 2017. Sharir & Shashua (HUJI) Expressiveness of Overlapping Architectures 30/06/17 3 / 21. Page 4 of 51. Sharir - Overlapping Architectures.p

Combined model for detecting, localizing, interpreting ...
LFW and MIT-CMU databases. We also show promising results of our method when applied to a face recognition task. 1 Introduction. The focus of this work is on a method that combines face detection ('what is it?'), localization ('where is it?'), part i

Controlled School Choice with Soft Bounds and Overlapping Types
School choice programs are implemented to give students/parents an ... Computer simulation results illustrate that DA-OT outperforms an .... In the literature of computer science, ...... Online stochastic optimization in the large: Applica- ... in Bo

Controlled School Choice with Soft Bounds and Overlapping Types
that setting soft-bounds, which flexibly change the priorities of students based on .... the empirical analysis by Braun, Dwenger, Kübler, and Westkamp (2014).

Detecting Consciousness with MEG
simple tasks that a patient can use as a code to communicate. “yes.” Many extant ... user-friendly methods of communication that do not require practice, that ...

Functional Magnetic Resonance Imaging Investigation of Overlapping ...
Jan 3, 2007 - In contrast, multi-voxel analyses of variations in selectivity patterns .... Preprocessing and statistical analysis of MRI data were performed using ...

Design Considerations for Detecting Bicycles with ...
Inductive loop detectors are widely used for vehicle detection. Histori- cally, these ... engineering. They have ... Engineering,. Purdue University, West Lafayette, IN 47907. 1 .... that a depth of 5 cm provides the closest fit to the measured data.

Design Considerations for Detecting Bicycles with ...
well studied, and there are design guidelines concerning how it should be constructed .... loops spaced 4.5 m on center, the bicycle interacts with only one loop at a time. ... from the model are compared with measured loop detector data. The.

A Spike-Detecting AQM to deal with Elephants
Mar 20, 2012 - As mice flows do not have much data, they almost always complete in ... Therefore, in this work, we also analyze performance of the RED. AQM. ... Priority-based scheduling gives priority to packets of one type over pack-.

Software review Detecting horizontal gene transfer with T-REX and ...
Software review ... and a flood of biological data is produced by means of high-throughout sequencing techniques, ... supervised analysis detects the possible.