Opaque Attribute Alignment Jennifer Sleeman, Rafael Alonso, Hua Li, Antonio Badia, Art Pope

SAIC.com © SAIC. All rights reserved.

Opaque Attributes

• •

2

With respect to aligning attributes Addressing problems: • Cannot distinguish based on label name comparison • Cannot distinguish based on comparing data values

SAIC.com © SAIC. All rights reserved.

Opaque Attributes and Data

Name

Name

Location

State

Clinton

California

CA

Bush

Bank of America

Washington

MD

Obama

JP Morgan

Virginia

VT

Citigroup Attribute names may be the same but data can be different representations

Attribute names are different but data representation is the same

Name

FName

LName

Jill Smith

Jill

Smith

John Jones

John

Jones

Jack Taylor

Jack

Taylor

Data representation can be spread across attributes 3

SAIC.com © SAIC. All rights reserved.

Opaque Attributes and Data

Faculty ID

Student ID

Employee ID

Employee LastName

100

100

100

Smith

200

200

200

Jones

300

300

100

Smith

Data can be the same for different representations

4

Numeric representations can be distributed similarly as non-numeric representations

SAIC.com © SAIC. All rights reserved.

Context of Opaque Attribute Alignment

• • • •

Ontology mapping Linked data Beyond label comparisons How is the data distributed?

http:/richard.cyganiak.de/2007/10/lod/

5

SAIC.com © SAIC. All rights reserved.

Approach - Overview

Opaque Attribute Alignment Distribution Kernel Density Estimation

Class Alignment Distribution Comparator

Distribution

6

Instance Alignment

SAIC.com © SAIC. All rights reserved.

Attribute Alignment

Approach – Kernel Density Estimation

• Non-parametric • Probability distribution • Estimates density • Used to perform image analysis • Not typically used for ontology alignment

http://upload.wikimedia.org/wikipedia/en/thumb/4/41/ Comparison_of_1D_histogram_and_KDE.png/800pxComparison_of_1D_histogram_and_KDE.png

7

SAIC.com © SAIC. All rights reserved.

Approach – Kernel Density Estimation

ID

8

Last Name

Gender

Rating

100 Smith

F

1

101 Jones

M

1

102 Alexander

M

3

103 Johnson

F

2

104 Bradley

M

2

• Different types of data • Two types of kernels – Epanechnikov – Aitchison & Aitken

SAIC.com © SAIC. All rights reserved.

Approach – Similarity Hash

• • • •

KDE assumes numeric data Non-numeric to numeric conversion Cannot affect distribution Basic steps: – Tokenize text – Hash into 128 bit hash value – Create a 128 bit vector – Increment /decrement array according to i-th bit of hash existence/non-existence

9

SAIC.com © SAIC. All rights reserved.

String

Hash

French police officers

3.212021112201211E31

French police station

3.111131122202301E31

French Cuisine

1.111011020101201E31

Miami police officer

2.1100222113211024E31

Miami beaches

1.111012000110002E31

Approach – Cross Entropy

• Compare distributions • Given a set of attributes – For each attribute in the primary set • Compare each attribute in the secondary set • Build a universal set • Perform Cross entropy

10

SAIC.com © SAIC. All rights reserved.

Evaluations and Results

• • • • •

11

Adapted the Ontology Alignment Evaluation Initiative (OAEI) 2011 Compare reference ontology with variations Three groups of tests (we use group 1 and 2) OAEI includes attribute and instance alignment Our test only attribute alignment

SAIC.com © SAIC. All rights reserved.

Evaluations and Results

• • • •

12

Uses random sampling without replacement Similarity threshold range, best score winner Runs each evaluation Calculates an overall F-Measure, Precision, Recall measure for all

SAIC.com © SAIC. All rights reserved.

Evaluations and Results

We used the OAEI data set to test our approach (parts I and II). We did not use the standard approach that participants used in the competition since we are currently only performing attribute alignment and not instance alignment.

F-Measures were approximately 55 percent.

13

SAIC.com © SAIC. All rights reserved.

Evaluations and Results

We tested using various methods to convert non-numeric to numeric data. SimHash was comparable to other methods such as Soundex. 14

SAIC.com © SAIC. All rights reserved.

Evaluations and Results

15

SAIC.com

Overall we saw better results when using mixed kernels as opposed to a single continuous kernel. Results shown are based on the OAEI data set. Also tested with additional data sets. © SAIC. All rights reserved.

Future Work

• New clustering approach – Compare densities based on centroids

• Support for class alignment

16

SAIC.com © SAIC. All rights reserved.

Conclusions

• • • •

17

Defined opaque attributes Kernel Density Estimation Not typically used for ontology alignment Promising approach

SAIC.com © SAIC. All rights reserved.

Questions?

18

SAIC.com © SAIC. All rights reserved.

Opaque Attribute Alignment presentation 3.29.12

SAIC. All rights reserved. Approach – Kernel Density Estimation. • Non-parametric. • Probability distribution. • Estimates density. • Used to perform image analysis. • Not typically used for ontology alignment http://upload.wikimedia.org/wikipedia/en/thumb/4/41/. Comparison_of_1D_histogram_and_KDE.png/800px-.

747KB Sizes 0 Downloads 170 Views

Recommend Documents

Opaque Attribute Alignment presentation 3.29.12
Mar 29, 2012 - Miami police officer. 2.1100222113211024E31. Miami beaches. 1.111012000110002E31 ... based on centroids. • Support for class alignment.

Recursive Attribute Factoring - Audentia
The World Wide Knowledge Base Project (Available at http://cs.cmu.edu/∼WebKB). 1998. [12] Sergey Brin and Lawrence Page. The anatomy of a large-scale ...

Split alignment
Apr 13, 2012 - I use the standard affine-gap scoring scheme, with one additional parameter: a .... Ai,j: the alignment score for query base j in alignment i.

Presentation
A fast, cheap and simple analytical method. .... limited data from Jordan ... data. • Some of those: Mishor Yamin,. Revivim – Mashabim, Sde-. Boker, Shivta ...

Enforcing Message Privacy Using Attribute Based ... - IJRIT
When making decision on use of cloud computing, consumers must have a clear ... identifier (GID) to bind a user's access ability at all authorities by using an ...

Syntax Macros: Attribute Redefinitions
Syntax macros extend the concrete syntax of a language by adding production rules for new concrete ..... Assuming that there are no inherited attributes, a type signature of the semantics for a ...... Electronic Notes in theoretical Computer Sci-.

Recursive Attribute Factoring - Research at Google
the case with a collection of web pages or scientific papers), building a joint model of document ... Negative Matrix Factorization [3] adds constraints that all compo- .... 6000 web pages from computer science depart- .... 4This approach can, of cou

Strategic Vertical Market Structure with Opaque Products
Jul 23, 2012 - Strategy and Business Economics Division, Sauder School of ... can be applied in any industry with horizontally differentiated upstream sellers.5 The goal .... its features at no cost (e.g., shareware vs. full-version software).

Attribute+Train+Game.pdf
Sign in. Loading… Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying.

Vehicle alignment system
Jun 13, 1983 - tionally employs rather sophisticated equipment and specialized devices in order to align vehicle wheels. It has been recognized, as shown in the Modern Tire. Dealer, Volume 63, Number 7, June of 1982 at page 31, that alignment of the

Privacy beyond Single Sensitive Attribute
Given a bitmap transformed table, for a pair of SAs Ai and Aj, their MI is. I(Ai,Aj) = ∑ v∈Ai ..... ICS, 2007. http://www.ics.uci.edu/˜mlearn/MLRepository.html. 2.

Presentation Title Presentation Sub-Title
April 2010, Prahran, Melbourne. • Direct impacts ... Victoria. Currently infrastructure and facilities are designed based on past climate, not future climate. ... Sensitivity of Materials to Climate Change Impacts. Material. CO. 2. Cyclones. & Stor

Presentation Title Presentation Sub-Title
Climate change impacts – impact upon cycling conditions and infrastructure. Infrastructure and climate change risks for Vic. Primary impacts – impact upon ...

Downlink Interference Alignment - Stanford University
cellular networks, multi-user MIMO. I. INTRODUCTION. ONE of the key performance metrics in the design of cellular systems is that of cell-edge spectral ...

Downlink Interference Alignment - Stanford University
Paper approved by N. Jindal, the Editor for MIMO Techniques of the. IEEE Communications ... Interference-free degrees-of-freedom ...... a distance . Based on ...

Presentation Title Presentation Sub-Title
Helen Millicer, Member, Glen Eira BUG and Bicycle. Victoria Board. Thanks for permission to use slides from presentations given to PACIA members in Vic and ...

Enforcing Message Privacy Using Attribute Based ... - IJRIT
IJRIT International Journal of Research in Information Technology, Volume 2, Issue 3, .... j ∈ Ai, Ai chooses ri ∈ Zp and a random ki − 1 degree polynomial. 4.

Syntax Macros: Attribute Redefinitions
... is presented to redefine attributes that are specified in the attribute grammar of an abstract data structure at run-time. .... 3.6.8 Nested Attribute Redefinitions .

Downlink Interference Alignment
Wireless Foundations. U.C. Berkeley. GLOBECOM 2010. Dec. 8. Joint work .... Downlink: Implementation Benefits. 2. 1. 1. K. Fix K-dim reference plane, indep. of ...

Manifold Alignment Determination
examples from which it is capable to recover a global alignment through a prob- ... only pre-aligned data for the purpose of multiview learning. Rather, we exploit ...

Presentation Information
Please arrive at the assigned meeting room 10 minutes before the session ... All meeting rooms are equipped with digital projectors and laptop computers.

Strategic Vertical Market Structure and Opaque Products
identity of the provider.4 More generally, opaque intermediation is a selling strategy that can be applied in any industry with horizontally differentiated upstream sellers.5 The goal of this paper is to provide a general and simple model of opaque i

Scalable Attribute-Value Extraction from Semi ... - PDFKUL.COM
huge number of candidate attribute-value pairs, but only a .... feature vector x is then mapped to either +1 or −1: +1 ..... Phone support availability: 631.495.xxxx.