Semantic Property Grammars for Knowledge Extraction from Biomedical Text Veronica Dahl and Baohua Gu Simon Fraser University, Burnaby, BC, Canada, V5A 1S6 {veronica, bgu}@cs.sfu.ca

Abstract. We present Semantic Property Grammars, designed to extract concepts and relations from biomedical texts. The implementation adapts a CHRG parser we designed for Property Grammars [1], which views linguistic constraints as properties between sets of categories and solves them by constraint satisfaction, can handle incomplete or erroneous text, and extract phrases of interest selectively. We endow it with concept and relation extraction abilities as well.

1

Semantic Property Grammars (SPGs)- an introduction

Property Grammars (PGs) [2] linguistically characterize sentences not in terms of an explicit, complete parse tree but in terms of seven simple properties between pairs of constituents, for instance, linearity (e.g., a determiner must precede a noun) or unicity (e.g., a noun can only have one determiner). A directly executable specification of PGs was developed by Dahl and Blache [3], which uses CHRG [4] to combine pairs of constituents according to whether properties between them are satisfied. SPGs are based on an adaptation of this parser which enhances it with concepts and relations gleaned from the substring being parsed. In the example below, for instance, the output gathers in a cat/6 symbol: the phrase’s category (noun phrase), its syntactic features (singular, masculine), the parse tree (not needed by the theory, but built for convenience), and the lists of satisfied and unsatisfied properties within this phrase, e.g, prec(prep, np) indicates that the preposition does precede its noun phrase in the embedded prepositional phrase (pp), since it appears in the list of satisfied properties. The unsatisfied property list is empty, since there are no property violations in this np. Finally, the semantic concept list contains the relationship induced by this noun phrase, as well as the concepts its parts intervene in, obtained in consultation with a biomedical ontology. Input The activation of NF-kappa-B via CD-28 Output cat(np,[sing,masc],np(det(the),n(activation),pp(prep(of),np(n(’NF-kappa-B’), pp(prep(via))),np(n(’CD-28’)))),[prec(prep,sn),unicity(prep),prec(n,pp), unicity(n),exclude(name,n),prec(det,n),unicity(det)],[ ],[protein(’NF-kappa-B’), gene(’CD-28’),activation(’NF-kappa-B’,’CD-28’)])

2

Extracting Semantic Information

Extracting Concepts and Relations The above example shows the simplest and most needed application of our parser to biomedical information extraction, namely, to glean concepts and relations from noun phrases. As shown in the output, we obtain more information than other noun phrase chunkers/parsers available for this task, which only output a parse tree. In addition, we provide semantic output which can further be combined with that of other parts of the sentence, as we shall see below. Although noun phrases are the most common source of wanted concept extraction, we can directly apply the same methodology to extract verb-induced relations, as in the sentence “The retinoblastoma protein negatively regulates transcriptional activation”, where the verb regulate marks a relation between two concepts retinoblastoma protein and transcriptional activation. To deal with this type of relation, we extract both concepts from their noun phrases, and link them together into the relationship induced by the verb. Relating Indirectly Connected Concepts Once concepts and relations have been extracted from the input, we can infer further concepts by consulting a biomedical ontology (there are several available). This is useful to: – disambiguate in function of context. For instance, usually binding site refers to a DNA domain or region, while sometimes it refers to a protein domain or region. Catching the latter meaning is not trivial since both c-Myc and G285 are protein molecules. However, our parser looks for semantic clues from surrounding words in order to disambiguate: in sentence 1) below, promoters points to the DNA region binding site, whereas in sentence 2), ligands points to the protein meaning of binding site. – assess a general theme in the (sub)text: since the parser retrieves the semantic classes of objects and relations as it goes along (as shown in section 1), it is a simple matter to assume them as themes. Consuming them when in doubt as to the main theme can assist in further disambiguation as well as in other semantic interpretation tasks. 1 Transcription factors USF1 and USF2 up-regulate gene expression via interaction with an E box on their target promoters, which is also a binding site for c-Myc.

2 The functional activity of ligands built from the binding site of G28-5 is dependent on the size and physical properties of the molecule both in solution and on the cell surfaces.

References 1. Dahl, V. and Blache, P.: Extracting Selected Phrases through Constraint Satisfaction. Proceeding of ICLP workshop on CSLP, 2005. 2. Blache, P.: Property Grammars: A Fully Constraint-Based Theory”, in H. Christiansen et al. (eds), Constraint Solving and NLP, LNCS, Springer, 2005. 3. Dahl, V. and Blache, P.: Directly Executable Constraint Based Grammars. Proc of Journees Francophones de Programmation en Logique avec Contraintes, 2004. 4. Christiansen, H.: CHR Grammars. International Journal on Journal on Theory and Practice of Logic Programming, special issue on CHRs, 2005.

Semantic Property Grammars for Knowledge Extraction ... - CiteSeerX

source of wanted concept extraction, we can directly apply the same method- ... assess a general theme in the (sub)text: since the parser retrieves the seman-.

81KB Sizes 2 Downloads 309 Views

Recommend Documents

Semantic Property Grammars for Knowledge Extraction ... - CiteSeerX
available for this task, which only output a parse tree. In addition, we ... to a DNA domain or region, while sometimes it refers to a protein domain or region.

Challenges for Discontiguous Phrase Extraction - CiteSeerX
Any reasonable program must start by putting the exponentially many phrases into ..... Web page: www.cs.princeton.edu/algs4/home (see in particular:.

Challenges for Discontiguous Phrase Extraction - CiteSeerX
useful for many purposes, including the characterization of learner texts. The basic problem is that there is a ..... Master's thesis, Universität. Tübingen, Germany.

Flexible Corpus Annotation with Property Grammars
Philippe Blache & Marie-Laure Guénot. LPL, Université de Provence. 29 Avenue Robert Schuman. 13621 Aix-en-Provence, France [email protected]. Abstract.

A Random Field Model for Improved Feature Extraction ... - CiteSeerX
Center for Biometrics and Security Research & National Laboratory of Pattern Recognition. Institute of ... MRF) has been used for solving many image analysis prob- lems, including .... In this context, we also call G(C) outlier indicator field.

Information Extraction from Calls for Papers with ... - CiteSeerX
These events are typically announced in call for papers (CFP) that are distributed via mailing lists. ..... INST University, Center, Institute, School. ORG Society ...

Textline Information Extraction from Grayscale Camera ... - CiteSeerX
INTRODUCTION ... our method starts by enhancing the grayscale curled textline structure using ... cant features of grayscale images [12] and speech-energy.

Leveraging Speech Production Knowledge for ... - Semantic Scholar
the inability of phones to effectively model production vari- ability is exposed in the ... The GP theory is built on a small set of primes (articulation properties), and ...

Information Extraction from Calls for Papers with ... - CiteSeerX
information is typically distributed via mailing lists in so-called call for papers ... in a structured manner, e.g. by searching in different fields and browsing lists of ...

Information Extraction from Calls for Papers with ... - CiteSeerX
Layout features such as line begins with punctuation and line is the last line are also used to learn to detect and extract signature lines and reply lines in E-mails ...

SeRT - a tool for knowledge extraction from text ...
text is used as language and meta-language. - paradigmatic relations can be ... parallel search of terms and relations. - term extraction. - search for surface ...

Semantic-Shift for Unsupervised Object Detection - CiteSeerX
notated images for constructing a supervised image under- standing system. .... the same way as in learning, but now keeping the factors. P(wj|zk) ... sponds to the foreground object as zFG and call it the fore- ..... In European Conference on.

Leveraging Speech Production Knowledge for ... - Semantic Scholar
the inability of phones to effectively model production vari- ability is exposed in .... scheme described above, 11 binary numbers are obtained for each speech ...

Relative clause extraction complexity in Japanese - CiteSeerX
(1) INTEGRATION resources: connecting an incoming word into the ... 2) structural integration cost ..... Computational factors in the acquisition of relative clauses ...

Relative clause extraction complexity in Japanese - CiteSeerX
Illustration of the cost function: (1) Object-extracted ... Items: simple transitive clauses that made up each RC. Results: 4 items ... effect, it should occur at the verb.

A Random Field Model for Improved Feature Extraction ... - CiteSeerX
Institute of Automation, Chinese Academy of Science. Beijing, China, 100080 ... They lead to improved tracking performance in situation of low object/distractor ...

On Knowledge - Semantic Scholar
Rhizomatic Education: Community as Curriculum by Dave Cormier. The truths .... Couros's graduate-level course in educational technology offered at the University of Regina provides an .... Techknowledge: Literate practice and digital worlds.

Elusive Knowledge - Semantic Scholar
I know that phones used to ring, but nowadays squeal ... plots, hallucinogens in the tap water, conspiracies to deceive, old Nick himself- and soon you find that ...

On Knowledge - Semantic Scholar
Rhizomatic Education: Community as Curriculum .... articles (Nichol 2007). ... Couros's graduate-level course in educational technology offered at the University ...

Elusive Knowledge - Semantic Scholar
of error are far-fetched, of course, but possibilities all the same. .... good enough - or none short of a watertight deductive argument, and all but the sceptics.

Elusive Knowledge - Semantic Scholar
Some say we have infallible knowledge of a few simple, axiomatic necessary ... Suppose you know that it is a fair lottery with one winning ticket and many losing ...

Elusive Knowledge - Semantic Scholar
of language, philosophy of mind, philosophy of science, metaphysics, and epistemology. .... opinion into knowledge--after all, you just might win. No justification ...

criteria for evaluating information extraction systems - Semantic Scholar
translating the contents of input documents into structured data is called information ... like free text that are written in natural language or the semi-structured ...

TEXTLINE INFORMATION EXTRACTION FROM ... - Semantic Scholar
because of the assumption that more characters lie on baseline than on x-line. After each deformation iter- ation, the distances between each pair of snakes are adjusted and made equal to average distance. Based on the above defined features of snake