Uwe M¨onnich, Frank Morawietz, and Stephan Kepser 

E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN

A Regular Query for Context-Sensitive Relations um,frank,kepser @sfs.uni-tuebingen.de http://tcl.sfs.uni-tuebingen.de

Seminar f¨ur Sprachwissenschaft Theoretical Computational Linguistics Group SFB 441: Linguistic Data Structures University of T¨ubingen

Philadelphia, 12. December 2001 – p.1

Fundamental Problem

E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN

Skylla (Lack of Expressive Power) and Charybdis (Undecidability)

Philadelphia, 12. December 2001 – p.2

Core XML (Dan Suciu: “The longest definition (500 pp) of regular tree

E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN

grammars”.) S

Fritz l¨ ost das Problem

NP

VP

PN

V

Fritz

löst

NP D

N

das

Problem

Philadelphia, 12. December 2001 – p.3

Context-Sensitive Relations

E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN

Cross-Serial Dependencies: (. . . wil) mer de maa em chind lönd hälffe schwüme Swiss-German: an bm cn d m —Non-CF Cannot be queried by regular means. Monadic Second Order logic (MSO) as query language is not powerful enough.

Philadelphia, 12. December 2001 – p.4

Context-free Tree Grammars

Generalizations of context-free (string) grammars. 



xn ➝ t xn . 















Rules have the form F x1 t is a tree with variables x1



E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN

Needed to describe context-sensitive relations.

Rewrite a non-terminal F with complete tree t . Examples: TAG, Minimalist Grammars (E. Stabler). Not necessary to write a new grammar for every query.

Philadelphia, 12. December 2001 – p.5

NPakk NPdat









Use MSO as query language. A query is an MSO-sentence. Example:

  

E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN

Query

Vakk Vdat

In praxi, use tree automaton representing the MSO-sentence. Result: set of candidate trees, all the trees we actually search for are in it, but also a lot of garbage.

Philadelphia, 12. December 2001 – p.6

E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN

Overview

Derivation Trees (MSO, RTG, FSTA)

Mildly Context-Sensitive Structures

Philadelphia, 12. December 2001 – p.7

FT

in

g

Derivation Trees (MSO, RTG, FSTA)

LI

E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN

Overview

Mildly Context-Sensitive Structures

Philadelphia, 12. December 2001 – p.7

Derivation Trees (MSO, RTG, FSTA)

g in FT LI

n

tio

uc

sd

n ra -T

SO

M

E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN

Overview

Mildly Context-Sensitive Structures

Philadelphia, 12. December 2001 – p.7

Lifting the Grammar

E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN

Is a simple primitive recursive function Makes control structure visible All function symbols become constants. Resulting grammar is regular. Therfore expressible by an MSO-formula, and There is a tree automaton for the lifted grammar.

Philadelphia, 12. December 2001 – p.8

Lifting the Candidate trees

E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN

Similar to lifting a grammar. All internal nodes become leaves. For each candidate tree, there is a small finite set of lifted trees.

Philadelphia, 12. December 2001 – p.9

Example of a Lifted Tree c

π4

b

c

π3

π1

π2

π2

c d

π1

a

ε

c

ε

π4

c 

c

π3



c

π2

c



π1 

c

a

π3

c

π4

a

a



d



c















c



E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN

c

b

ε

c

ε

Philadelphia, 12. December 2001 – p.10

Lifted Query We have E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN

a tree automaton representing the lifted grammar, and a set of lifted candidate trees. Run the tree automaton on the set of lifted candidate trees. Result: Solution set of lifted trees. But: Lifted Trees are unreadable, not in the format of trees in the treebank.

Philadelphia, 12. December 2001 – p.11

Reconstruction of the Intended Structures

E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN

The intended tree is present in the lifted tree, but hidden. Have to read the intended tree off the lifted tree. Technically: Define dominance and precedence relation of the intended tree on the basis of the dominance and precedence and the control structure (composition and projection) of the lifted tree. Possible with MSO-definable Transductions

Philadelphia, 12. December 2001 – p.12

Reconstruction of the Intended Structures c c

π1

π3

c π4



E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN

c

π4

π2

c

c



π4

c

ε



d

c π3



π3

c

ε

c π2

c a





π1

π2

b

c c a

a

b

ε

c

ε

intended dominance



d

immediate dominance

a

π1

Philadelphia, 12. December 2001 – p.13

Intended Query Result

E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN

Context-free tree grammar for describing context-sensitive relations MSO query on the original tree bank ➪ set of candidate trees Lift grammar and candidate trees ➪ automaton and set of lifted candidate trees Run automaton representing lifted grammar on lifted candidate trees ➪ set of lifted solution trees Reconstruct intended trees via MSO-transduction ➪ set of intended solution trees

Philadelphia, 12. December 2001 – p.14

Effectiveness

E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN

MSO decidable on trees MSO undecidable on graphs MSO + one binary relation undecidable on trees MSO ➝ tree automata hyperexponential “Normalized” MSO ➝ tree automata exponential Lifting and transduction require linear time

Philadelphia, 12. December 2001 – p.15

A Regular Query for Context-Sensitive Relations

N. Query. Use MSO as query language. A query is an MSO-sentence. Example: NPakk. NPdat. £. ¡£ ¢¤¢. Vakk. Vdat. £. ¡. In praxi, use tree automaton representing the. MSO-sentence. Result: set of candidate trees, all the trees we actually search for are in it, but also a lot of garbage. Philadelphia, 12. December 2001 – p.6 ...

327KB Sizes 0 Downloads 247 Views

Recommend Documents

Exploiting Query Logs for Cross-Lingual Query ...
General Terms: Algorithms, Performance, Experimentation, Theory ..... query is the one that has a high likelihood to be formed in the target language. Here ...... Tutorial on support vector regression. Statistics and. Computing 14, 3, 199–222.

A Query-Dependent Duplication Detection Approach for ...
duplicate detection is a crucial technique for search engines. “Duplicate .... Several optimization techniques have been proposed to reduce the number of.

A Space-Efficient Indexing Algorithm for Boolean Query Processing
index are 16.4% on DBLP, 26.8% on TREC, and 39.2% on ENRON. We evaluated the query processing time with varying numbers of tokens in a query.

A Query-Dependent Duplication Detection Approach for ...
Duplication of Web pages greatly hurts the perceived rele- vance of a search .... and storage space will be needed, making these algorithms only feasible for a ... We investigate a log file provided by MSN 1, which contains 32, 183, 256 queries.

A Social Query Model for Decentralized Search - Research at Google
Aug 24, 2008 - social search as well as peer-to-peer networks [17, 18, 1]. ...... a P2P service, where the greedy key-based routing will be replaced by the ...

A Query Approach for Influence Maximization on ...
on Specific Users in Social Networks. Jong-Ryul Lee ... steadily increased in online social networks such as Face- book and .... influence paths, but it is more efficient than the PMIA heuristics ... can be modified to find top-k influencers on speci

A Space-Efficient Indexing Algorithm for Boolean Query ...
lapping and redundant. In this paper, we propose a novel approach that reduces the size of inverted lists while retaining time-efficiency. Our solution is based ... corresponding inverted lists; each lists contains an sorted array of document ... doc

Query-driven Ontology for BigData.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Query-driven ...

Query-driven Ontology for BigData.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Query-driven ...

Query-driven Ontology for BigData.pdf
Query-driven Ontology for BigData.pdf. Query-driven Ontology for BigData.pdf. Open. Extract. Open with. Sign In. Main menu.

A COMMENT ON DOREIAN'S REGULAR EQUIYALENCE IN ...
correspond closely with intuitive notions of role (Nadel 1957; Sailer. 1978; Faust 1985), for symmetric data this correspondence seems to break down. Doreian's solution, which I call the “Doreian Split”, is creative and practical, and yields intu

Cross-Lingual Query Suggestion Using Query Logs of ...
A functionality that helps search engine users better specify their ... Example – MSN Live Search .... Word alignment optimization: GIZA++ (Och and Ney,. 2003).

A New NC-Algorithm for Finding a Perfect Matching in d-Regular ...
cubic-bipartite graphs, our algorithm as well as its analysis become much .... cycle vector in G is a big even cycle vector if it contains Ω(k) 3-vertices of G.

advertising value equivalency - Institute for Public Relations
The idea of Advertising Value Equivalency (AVE) has been around for many years. It has generated much debate in the Public Relations industry, with this.

Regular Verbs - YourDictionary
trouble troubled will trouble trust trusted will trust tug tugged will tug tumble tumbled will tumble turn turned will turn twist twisted will twist type typed will type undress undressed will undress unfasten unfastened will unfasten unite united wi

director, member relations - Initiative for Global Development
business community and experience with membership models and business ... your application package to [email protected] with “Member Relations.