A Fuzzy-Interval Based Approach for Explicit Graph Embedding Muhammad Muzzamil Luqman1,2 , Josep Llad´os2 , Jean-Yves Ramel1 , and Thierry Brouard1 1

Laboratoire d’Informatique, Universit´e Fran¸cois Rabelais de Tours, France 2 Computer Vision Center, Universitat Aut´ onoma de Barcelona, Spain {mluqman,josep}@cvc.uab.es, {ramel,brouard}@univ-tours.fr

Abstract. We present a new method for explicit graph embedding. Our algorithm extracts a feature vector for an undirected attributed graph. The proposed feature vector encodes details about the number of nodes, number of edges, node degrees, the attributes of nodes and the attributes of edges in the graph. The first two features are for the number of nodes and the number of edges. These are followed by w features for node degrees, m features for k node attributes and n features for l edge attributes — which represent the distribution of node degrees, node attribute values and edge attribute values, and are obtained by defining (in an unsupervised fashion), fuzzy-intervals over the list of node degrees, node attributes and edge attributes. Experimental results are provided for sample data of ICPR20101 contest GEPR2 .

1

Introduction

The website [2] for the 20th International Conference on Pattern Recognition (ICPR2010) contest Graph Embedding for Pattern Recognition (GEPR), provides a very good description of the emerging research domain of graph embedding. It states and we quote: “In Pattern Recognition, statistical and structural methods have been traditionally considered as two rather separate worlds. However, many attempts at reducing the gap between these two approaches have been done. The idea inspiring these attempts is that of preserving the advantages of an expressive structural representation (such as a graph), while using most of the powerful, vector-based algorithms from Statistical Pattern Recognition. A possible approach to this issue has been recently suggested by Graph Embedding. The latter is a methodology aimed at representing a whole graph (possibly with attributes attached to its nodes and edges) as a point in a suitable vectorial space. Of course the relevant property is that of preserving the similarity of the graphs: the more two graphs are considered similar, the closer the corresponding points in the space should be. Graph embedding, in this sense, is a real bridge joining the two worlds: once the object at hand has been described in terms of graphs, and the 1 2

20th International Conference on Pattern Recognition. Graph Embedding for Pattern Recognition.

¨ D. Unay, Z. C ¸ ataltepe, and S. Aksoy (Eds.): ICPR 2010, LNCS 6388, pp. 93–98, 2010. c Springer-Verlag Berlin Heidelberg 2010 

94

M.M. Luqman et al.

latter represented in the vectorial space, all the problems of matching, learning and clustering can be performed using classical Statistical Pattern Recognition algorithms.”

2

Our Method

Our proposed method for graph embedding encodes the details of an undirected attributed graph into a feature vector. The feature vector is presented in next section, where we discuss it in detail. 2.1

Proposed Vector

The proposed feature vector not only utilizes information about the structure of the graph but also incorporates the attributes of nodes and edges of the graph, for extracting discriminatory information about the graph. Thus yielding a feature vector that corresponds to the structure and/or topology and/or geometry of the underlying content. Fig.1 presents the feature vector of our method for graph embedding. In the remainder of this section, we provide a detailed explanation of the extraction of each feature of our feature vector.

Fig. 1. The vector for graph embedding

Number of Nodes and Number of Edges. The number of nodes and the number of edges in a graph constitutes the basic discriminatory information that can be extracted very easily from a graph. This information helps to discriminate among graphs of different sizes. The first two features of our feature vector are composed of these details on the graph i.e. the number of nodes and the number of connections in the graph. Features for Node Degrees. The degree of a node refers to the total number of incoming and outgoing connections for the node. The distribution of the node degrees of a graph is a very important information that can be used for extracting discriminatory features from the graph. This information helps to discriminate among graphs which represent different structures and/or topologies and/or geometry. Our signature contains w features for node degrees. The number of features for node degrees, i.e. w, is computed from the learning dataset and may differ from one dataset to another, thus yielding a variable length feature vector for different datasets. If the learning dataset is not available, our method is capable

A Fuzzy-Interval Based Approach for Explicit Graph Embedding

95

of learning its parameters directly from the graphs in the test dataset. The w features for node degrees are obtained by defining fuzzy-intervals over list of node degrees of all graphs in the (learning) dataset and then computing the cardinality for each of the w features. In order to avoid repetition we detail the method for defining fuzzy-intervals and computing the corresponding feature values in section 2.2 — as the same method has been used for obtaining the set of features for node and edge attributes. Features for Node Attributes. The attributes of the nodes in the graph contain very important complementary (and/or supplementary in some cases) information about the underlying content. This information, if used intelligently, can provide discriminating features for the graphs. Our signature contains m features for k node attributes. The number k refers to the number of node attributes in the graph. Whereas, the number m of the features for node attributes is computed from the learning dataset. Both m (the number of features for node attributes) and k (the number of node attributes) may differ from one dataset to another, thus yielding a variable length of feature vector for different datasets. If the learning dataset is not available, our method is capable of learning its parameters directly from the graphs in the test dataset. The m features for k node attributes are obtained by defining fuzzy-intervals over list of node attributes of all graphs in the (learning) dataset and then computing the cardinality for each of the m features. Each of the k node attributes are processed one by one and can contribute a different number of features to this set of features. The method for obtaining fuzzy-intervals and computing the corresponding feature values is detailed in section 2.2. Features for Edge Attributes. The attributes of the edges in the graph also contain important complementary (and/or supplementary) information about the underlying content. This information may also provide discriminating features for the graphs. Our signature contains n features for l edge attributes. The number l refers to the number of edge attributes in the graph. Whereas, the number n of the features for edge attributes is computed from the learning dataset. Both n (the number of features for edge attributes) and l (the number of edge attributes) may differ from one dataset to another, thus yielding a variable length of feature vector for different datasets. If the learning dataset is not available, our method is capable of learning its parameters directly from the graphs in the test dataset. The n features for l edge attributes are obtained by defining fuzzy-intervals over list of edge attributes of all graphs in the (learning) dataset and then computing the cardinality for each of the n features. Each of the l edge attributes are processed one by one and can contribute a different number of features to this set of features. The method for obtaining fuzzy-intervals and computing the corresponding feature values is detailed in section 2.2.

96

M.M. Luqman et al.

2.2

Defining Fuzzy-Intervals and Computing Feature Values

This section applies to computation of w features for node degrees, m features for k node attributes and n features for l edge attributes – and details our approach for defining a set of fuzzy-intervals for a given data (data here refers to the list of node degrees - or - the list of values for each of the k node attributes - or the list of values for each of the l edge attributes – as applicable). We use a histogram based binning technique for obtaining an initial set of intervals for the data. The technique is originally proposed by [1] for discretization of continuous data and is based on use of Akaike Information Criterion (AIC). It starts with an initial m-bin histogram of data and finds optimal number of bins for underlying data. The adjacent bins are iteratively merged using an AIC-based cost function until the difference between AIC-beforemerge and AIC-aftermerge becomes negative. This set of bins is used for defining fuzzy-intervals i.e. the features and their fuzzy zones (for example Fig.2). The fuzzy-intervals are defined by using a set of 3 bins for defining a feature, set of 3 bins for defining its left fuzzy zone and a set of 3 bins for defining its right fuzzy zone. It is important to note here, and as seen in Fig.2, that the fuzzy zones on left and right of a feature overlaps the fuzzy zones of its neighbors. We have used the sets of 3 bins for defining fuzzy zones in order to be able to assign membership weights for full membership, medium membership and low membership to the fuzzy zones – membership weights of 1.00, 0.66 and 0.33 respectively. Generally an x number of bins can be used for defining the fuzzy zones for the features. For any given value in the data it is ensured that the sum of membership weights assigned to it always equals to 1. The first and the last features in the list have one fuzzy zone each (as seen in Fig.2). Our method is capable of learning these fuzzy-intervals for a given data, either from a learning dataset (if it is available) or directly from the the graphs in the test dataset. Once the fuzzy-intervals are obtained for a given dataset we get the structure of the feature vector for the dataset i.e. 1 feature for number of nodes, 1 feature for number of edges, w features for node degrees, m features for k node attributes and n features for l edge attributes. We perform a pass on the set of graphs in the dataset and while embedding each graph into a feature vector, we use these fuzzy-intervals for computing the cardinality for each feature in the feature vector. This results into a numeric feature vector for a given undirected attributed graph.

Fig. 2. The fuzzy intervals of features for node degrees

A Fuzzy-Interval Based Approach for Explicit Graph Embedding

3

97

Experimentation

Experimental results are presented for sample datasets of 20th International Conference on Pattern Recognition (ICPR2010), contest GEPR (Graph Embedding for Pattern Recognition) [2]. These datasets contain graphs extracted from three image databases: the Amsterdam Library of Object Images (ALOI), the Columbia Object Image Library (COIL), the Object Databank (ODBK). Table 1 provides the details on the graphs in each of these datasets. Table 1. Dataset details Dataset Number of graphs Node attributes (k ) Edge attributes (l) ALOI 1800 4 0 COIL 1800 4 0 ODBK 1248 4 0

The 4 node attributes encode the size and the average color of the image area represented by a node. We have used the sample data for learning parameters for our method (as described in section 2). Table 2 provides the details on the length of feature vector for each image database. Table 2. Length of feature vectors Dataset Length of feature vector ALOI 1595 COIL 1469 ODBK 1712

And finally, Table 3 presents the performance index, which is a clustering validation index to evaluate the separation between the classes when represented by the vectors, as calculated by the scripts provided by the contest. Further details on the performance index can be found at [2]. Table 3. Performance indexes Dataset Performance index ALOI 0.379169273726 COIL 0.375779781743 ODBK 0.352542741399

The experimentation is performed on Intel Core 2 Duo (T8300 @ 2.4GHz) with 2GB (790MHz) of RAM.

4

Conclusion

We have presented a method for explicit graph embedding. Our algorithm extracts a feature vector for an undirected attributed graph. The proposed feature

98

M.M. Luqman et al.

vector not only utilizes information about the structure of the graph but also incorporates the attributes of nodes and edges of the graph, for extracting discriminatory information about the graph. Thus yielding a feature vector that corresponds to the structure, topology and geometry of the underlying content. The use of fuzzy-intervals, for noise sensitive information in graphs, enables the proposed feature vector to incorporate robustness against the deformations introduced in graphs as a result of noise. The experimentation on sample datasets shows that the proposed method can be used to extract a huge number of meaningful features from the graphs. An important property of our method is that the number of features could be controlled by using appropriate parameter for the number of bins for defining a feature and its fuzzy zones. One possible future extension to this work is to extend it to directed attributed graphs, which could be done by introducing new features for in-degree and out-degree of nodes.

Acknowledgment This research was supported in part by PhD grant PD-2007-1/Overseas/FR/ HEC/222 from Higher Education Commission of Pakistan.

References 1. Colot, O., Olivier, C., Courtellemont, P., El-Matouat, A.: Information criteria and abrupt changes in probability laws. In: Signal Processing VII: Theories and Applications, pp. 1855–1858 (1994) 2. http://nerone.diiie.unisa.it/contest/index.shtml (as on May 26, 2010)

A Fuzzy-Interval Based Approach for Explicit Graph ... - Springer Link

number of edges, node degrees, the attributes of nodes and the attributes of edges in ... The website [2] for the 20th International Conference on Pattern Recognition. (ICPR2010) ... Graph embedding, in this sense, is a real bridge joining the.

129KB Sizes 0 Downloads 345 Views

Recommend Documents

A Fuzzy-Interval Based Approach for Explicit Graph ... - Springer Link
Computer Vision Center, Universitat Autónoma de Barcelona, Spain. {mluqman ... number of edges, node degrees, the attributes of nodes and the attributes.

A Fuzzy-Interval Based Approach For Explicit Graph ...
Aug 22, 2010 - Muhammad Muzzamil Luqman1,2, Josep Llados2, Jean-Yves Ramel1, Thierry Brouard1. 1 Laboratoire d'Informatique, Université François ...

A Velocity-Based Approach for Simulating Human ... - Springer Link
ing avoidance behaviour between interacting virtual characters. We first exploit ..... In: Proc. of IEEE Conference on Robotics and Automation, pp. 1928–1935 ...

3D articulated object retrieval using a graph-based ... - Springer Link
Aug 12, 2010 - Department of Electrical and Computer Engineering, Democritus. University ... Among the existing 3D object retrieval methods, two main categories ...... the Ph.D. degree in the Science of ... the past 9 years he has been work-.

Unsupervised Learning for Graph Matching - Springer Link
Apr 14, 2011 - Springer Science+Business Media, LLC 2011. Abstract Graph .... tion as an integer quadratic program (Leordeanu and Hebert. 2006; Cour and Shi ... computer vision applications such as: discovering texture regularity (Hays et al. .... fo

Explicit formulas for repeated games with absorbing ... - Springer Link
Dec 1, 2009 - mal stationary strategy (that is, he plays the same mixed action x at each period). This implies in particular that the lemma holds even if the players have no memory or do not observe past actions. Note that those properties are valid

A Graph-Partitioning Based Approach for Parallel Best ... - icaps 2017
GRAZHDA* seeks to approximate the partitioning of the actual search space graph by partitioning the domain tran- sition graph, an abstraction of the state space ...

A path-disjoint approach for blocking probability ... - Springer Link
Mar 20, 2009 - in hybrid dynamic wavelength routed WDM grooming networks ...... [44] Roy, K., Naskar, M.K.: A heuristic solution to SONET ADM min-.

A DNA-Based Genetic Algorithm Implementation for ... - Springer Link
out evolutionary computation using DNA, but only a few implementations have been presented. ... present a solution for the maximal clique problem. In section 5 ...

A Content-Based Information Retrieval System for ... - Springer Link
This paper deals with the elaboration of an interactive software which ... Springer Science + Business Media B.V. 2008 .... Trend boards offer a good representation of the references used ..... function could be fulfilled by mobile devices.

Exploring Wikipedia's Category Graph for Query ... - Springer Link
varying degrees of granularity, it is easy for system designers to identify a subset of them as “target categories” they wish to use as classification goals, rather ..... domly from MSN search logs, unedited and including the users' typos and mis

Endophenotype Approach to Developmental ... - Springer Link
research. Keywords Intermediate phenotype Æ Cognitive development Æ Autism Æ Asperger syndrome Æ. Theory of mind Æ Mentalising Æ Central coherence.

A Partition-Based Approach to Graph Mining
Proceedings of the 22nd International Conference on Data Engineering (ICDE'06) ... volves splitting a dataset into subsets, learning/mining from one or more of ...

A Partition-Based Approach to Graph Mining
ral data can be modeled as graphs. ... Proceedings of the 22nd International Conference on Data Engineering ...... CPU, 2.5GB RAM and 73GB hard disk.

Vessels-Cut: A Graph Based Approach to Patient ...
est path between the graph nodes that contain the vessel endpoints. The edge weights compound local image and seed intensity information and vessel path.

Conscience online learning: an efficient approach for ... - Springer Link
May 24, 2011 - as computer science, medical science, social science, and economics ...... ics in 2008 and M.Sc. degree in computer science in 2010 from Sun.

Topic-aware pivot language approach for statistical ... - Springer Link
Journal of Zhejiang University-SCIENCE C (Computers & Electronics). ISSN 1869-1951 (Print); ISSN 1869-196X (Online) www.zju.edu.cn/jzus; www.springerlink.com. E-mail: [email protected]. Topic-aware pivot language approach for statistical machine transl

A divide-and-conquer direct differentiation approach ... - Springer Link
Received: 29 October 2006 / Revised: 26 February 2007 / Accepted: 25 March 2007 / Published online: 18 July 2007 ... namics systems, sensitivity analysis is a critical tool for ...... Pre-multiplying (46) by (DJk )T and calling on the or-.

An Approach for the Local Exploration of Discrete ... - Springer Link
Optimization Problems. Oliver Cuate1(B), Bilel Derbel2,3, Arnaud Liefooghe2,3, El-Ghazali Talbi2,3, and Oliver Schütze1. 1. Computer Science Department ...

A Bayesian approach to object detection using ... - Springer Link
using receiver operating characteristic (ROC) analysis on several representative ... PCA Ж Bayesian approach Ж Non-Gaussian models Ж. M-estimators Ж ...

An experimental approach for investigating consumers ... - Springer Link
Feb 10, 2000 - service encounter satisfaction, overall service quality, ... and monitoring [5,7,8]. ...... Regimens," in Applications of Social Science to Clinical.

object-based ultrasound video processing for wireless ... - Springer Link
The thin white lines define boundaries between objects of interest. In the upper rectangular region, patient information has been deleted. number of times, or if ...

Economics- and physical-based metrics for comparing ... - Springer Link
May 3, 2011 - Here we present a new analytic metric, the Cost- ... Environmental Economics Unit, Department of Economics, School of Business, ...... mind, it is clear that the GCP and consequently the CETP are suitable tools to make.