Evolutionary Learning of Syntax Patterns for Genic Interaction Extraction

Alberto Bartoli, Andrea De Lorenzo, Eric Medvet, Fabiano Tarlao, Marco Virgolin

UNIVERSITÀ DEGLI STUDI DI TRIESTE DIPARTIMENTO DI INGEGNERIA E ARCHITETTURA

Evolutionary Learning of Syntax Patterns

DIA - UniTs

Problem

➔ Identifying sentences that contain interactions between genes and proteins ◆ from biomedical literature ➔ Available data: ◆ dictionary of genes, proteins and interactors ◆ example sentences

2

Evolutionary Learning of Syntax Patterns

DIA - UniTs

Why? ➔ Biomedical literature is: ◆ vast ◆ rapidly growing

➔ Challenging problem: automatic extraction of knowledge from a text in natural language ◆ informations are “diluted” in the text ◆ very challenging problem: discover relations between entities

3

Evolutionary Learning of Syntax Patterns

DIA - UniTs

Goal ➔ Generation of a classifier C in order to identify sentences containing interactions between genes and proteins ◆ automatically ◆ based on recurring syntactic patterns

4

Evolutionary Learning of Syntax Patterns

DIA - UniTs

Our approach ➔ Classifier C is a set of regular expressions (regex)

C={r1,r2,...} ➔ Each regex is a sentence classifier (“accepts” or “does not accept”) ◆ C accepts sentences accepted by at least one regex ➔ Regex applied on a semantical representation of the text

5

Evolutionary Learning of Syntax Patterns

DIA - UniTs

Our approach (II) ➔ Regex generated automatically ◆ by means of Genetic Programming (GP) ◆ starting from examples ● strings which must be accepted ● strings which must not be accepted

6

Evolutionary Learning of Syntax Patterns

DIA - UniTs

Sentences preprocessing Mapping of a sentence s in a ɸ-string x a. substitution of words in s with “annotations” i. gene, protein, interactor or ii. Part-Of-Speech b. mapping of annotations in Unicode characters c. concatenation

7

Evolutionary Learning of Syntax Patterns

DIA - UniTs

Sentences preprocessing (II) Example: s = YfhP may act as a negative regulator for the transcription of yfhQ ↓ [YfhP] [may] [act] [as] [a] [negative] [regulator] [for] [the] [transcription] [of] [yfhQ]

↓ [GENEPTN] [MD] [VB] [IN] [DT] [JJ] [INOUN] [IN] [DT] [INOUN] [IN] [GENEPTN]

↓ x = GB0if6JifJiG

8

Evolutionary Learning of Syntax Patterns

DIA - UniTs

Generation of C: GP ➔ We used a Tree-based GP ➔ In this work candidate solution = regex

9

Evolutionary Learning of Syntax Patterns

DIA - UniTs

Key aspects ➔ Multi-objective fitness: ◆ f=(Accuracy, FPR, Regex length) ◆ we purposefully avoided to include any problemspecific knowledge (gene/protein/…)

➔ Problem handled by mean of separate-andconquer ➔ Final output: set of regular expressions C={r1, r2,...} 10

Evolutionary Learning of Syntax Patterns

DIA - UniTs

Separate-and-conquer ➔ Each regex ri ∈ C makes an independent and parallel classification ➔ Each regex is tailored for a sub-problem ◆ the problem is solved “step-by-step”

➔ Final output = logic OR of classifications

11

Evolutionary Learning of Syntax Patterns

DIA - UniTs

Separate-and-conquer ● C=∅ ● we execute a GP search over the examples obtaining r* ● if FPR < threshold ○ C = C ∪ {r*} ● else ○ terminate

● remove from the positive examples those which were classified correctly by r* 12

Evolutionary Learning of Syntax Patterns

DIA - UniTs

Classifier example C = {r1, r2} r1 = GENEPTN[ˆRB][^NNS VBN GENEPTN]++ r2 = . INOUN IN GENEPTN . [ˆDT NN]

13

Evolutionary Learning of Syntax Patterns

DIA - UniTs

Experimental evaluation: the data ➔ Dataset: 456 sentences from biomedical papers ◆ ½ with interactions e ½ without ◆ manually labelled by experts

➔ Dataset splitted in Learning e Testing ◆ ≈80% examples in Learning ◆ ≈20% examples in Testing

➔ 5 fold randomly generated ◆ with Testingi≠Testingj 14

Evolutionary Learning of Syntax Patterns

DIA - UniTs

Baseline 1, 2: problem specific knowledge ➔ Annotations-Co-Occurrence ◆ it is tightly tailored to this specific problem ◆ sentence is positive if contains ● at least 2 genes/proteins ● at least 1 interactor

➔ Annotations-LLL05-Patterns ◆ 10 pattern generated in “LLL'05 Challenge: Genic Interaction Extraction with Alignments and Finite State Automata”

- J. Hakenberg et alia ◆ built over >90% of the dataset (also testing!)

15

Evolutionary Learning of Syntax Patterns

DIA - UniTs

Baseline 3: ɸ-SSLEA ➔ Based on Smart State Labeling Algorithm ◆ algorithm for DFA learning ◆ works well in presence of noise

➔ Hill-Climbing ➔ Generates DFA which accepts or refuse a ɸstring x ◆ if x accepted ⇒ x contains an interaction between gene/protein ◆ otherwise, no 16

Evolutionary Learning of Syntax Patterns

DIA - UniTs

Baseline 4, 5: Words-NaiveBayes e Words-SVM ➔ Standard for text classification ◆ Supervised Machine Learning methods

➔ Feature based on word occurrences ➔ Preprocessing ◆ stemming ◆ features selection

17

Evolutionary Learning of Syntax Patterns

DIA - UniTs

Results Averaged over the 5 folds Classifier

Accuracy

FPR

FNR

Annotations-Co-Occurrence

77.8

40.0

4.5

Annotations-LLL05-Patterns

82.3

25.0

10.5

Words-NaiveBayes

51.3

25.0

95.0

Words-SVM

73.8

29.0

23.5

ɸ-SSLEA

59.8

44.0

33.5

C

73.7

23.5

22.5 18

Evolutionary Learning of Syntax Patterns

DIA - UniTs

Results (II) ➔ C performs as well as Word-SVM and better than other learning approaches ➔ accuracies of C and Annotations-Co-Occurrence (which exploits domain knowledge of an expert) are very close ◆ Pro: C is composed by patterns (regex) readable ◆ Con: time to generate C (hours) ≫ time to generate other methods (minutes) ● but ≈ time taken for classifying (seconds)

19

Evolutionary Learning of Syntax Patterns

DIA - UniTs

Conclusions We proposed: ➔ a method for the automatic synthesis of a classifier for natural language sentences ◆ based on syntactic pattern ◆ by mean of GP ◆ separate-and-conquer ➔ results are highly promising

20

Evolutionary Learning of Syntax Patterns for Genic ...

Evolutionary Learning of Syntax Patterns. Key aspects. 10. ➔ Multi-objective fitness: ◇ f=(Accuracy, FPR, Regex length). ◇ we purposefully avoided to include ...

750KB Sizes 0 Downloads 247 Views

Recommend Documents

Evolutionary Learning of Syntax Patterns for Genic ...
classroom use is granted without fee provided that copies are not made or distributed for profit or commercial ... GECCO '15, July 11–15, 2015, Madrid, Spain.

Evolutionary Learning of Syntax Patterns for ... - ACM Digital Library
Jul 15, 2015 - ABSTRACT. There is an increasing interest in the development of tech- niques for automatic relation extraction from unstructured text. The biomedical domain, in particular, is a sector that may greatly benefit from those techniques due

Ensemble Learning for Free with Evolutionary Algorithms ?
Free” claim is empirically examined along two directions. The first ..... problem domain. ... age test error (over 100 independent runs as described in Sec-.

Better Learning and Decoding for Syntax Based SMT ...
Data made available by the courtesy of Microsoft .... Part-of-Speech mapping template: whether the ..... clude that PSDIG and Pharaoh each excel on dif-.

Contrasting evolutionary patterns in populations of demersal sharks ...
Oct 24, 2017 - DOI 10.1007/s00227-017-3254-2. ORIGINAL PAPER. Contrasting evolutionary patterns in populations of demersal sharks throughout the western Mediterranean. Sergio Ramírez‑Amaro1,2. · Antonia Picornell1 · Miguel Arenas3,4,5 · Jose A.

Learning the Motion Patterns of Humans for Predictive ...
Engineering, National Taiwan University, Taipei, Taiwan ( e-mail: ..... Int. Conf. on Robotics, Automation and Mechatronics, pp. 1-8, 2006. [3] H. Choset and J.

Origins of Syntax?
questions they address, and the techniques used to check the validity of current ... spring up in the future to explore other aspects of the vast research domain of ..... probabilistic information available in the input to the learner/speaker/hearer.

Fertility alteration behaviour of Thermosensitive Genic ... - CiteSeerX
The panicle development stages from meiotic division of pollen mother cell (S6) to pollen ripening (S8) were ..... Development Center, Changsha, China. pp.188-.

Anticipatory Learning in General Evolutionary Games - CiteSeerX
of the Jacobian matrix (13) by ai ±jbi. Then the stationary ... maxi ai. , if maxi ai ≥ 0. The proof is omitted for the sake of brevity. The important ..... st.html, 2004.

Anticipatory Learning in General Evolutionary Games - CiteSeerX
“anticipatory” learning, or, using more traditional feedback ..... if and only if γ ≥ 0 satisfies. T1: maxi ai < 1−γk γ. , if maxi ai < 0;. T2: maxi ai a2 i +b2 i. < γ. 1−γk

Spatial Patterns and Evolutionary Processes in ...
Nov 19, 2009 - Such a process could account ..... This latter test was made by checking whether the correlogram con- ..... Proceedings of the 10th international.

Learning and Life Cycle Patterns of Occupational ...
[email protected]. Nicholas Trachter. Federal Reserve Bank of Richmond [email protected]. July 22, 2014. Abstract. Data reveal that individuals experience a high number of occupational switches. Over. 40% of high school graduates tran