Serbia & Montenegro, Belgrade, November 22-24, 2005

EUROCON 2005

Building Domain Ontologies in a retrieval system for lung Pathology Sonja Niepage, Thomas Leuthold, and Thomas Schrader Every report consists of the following main parts: clinical data, macroscopy (description of the tissue sample), microscopy (description of the glass slides) and diagnosis.

Abstract - To support the diagnostic process or educational purposes, telepathology systems had to provide tools to manage the digitalized data like images and reports. Numerous systems have the disadvantage to restrict there retrieval capabilities i.e. to automatically picture analysis, ignoring the knowledge about the diagnostic results contained in the corresponding report. To overcome these deficits we developed a Semantic Web based retrieval system, using domain ontologies. In this paper we compare two basic approaches for building ontologies for the domain of lung pathology. Keywords UMLS

-

Sometimes a comment explains the diagnosis (summary and abstraction of clinical information and the macroscopic and microscopic characteristics). The pathology report could be considered as metadata of the glass slides respectively the digital images. To allow contend oriented search and also access to statistic data (e. g. incidence) and similar cases to support the diagnosis, the report should be semantically annotated. In this paper we compare two approaches for building the ontology for a Semantic Web based retrieval system as extension of the Digital Virtual Microscope.

semantic web, digital pathology, ontology,

I. INTRODUCTION

The main goal of Digital Pathology is to completely digitalize the diagnostic process in pathology, to allow the pathologist to manage and reuse digital images and reports in clinical routine, education and research [1]. To solve these tasks the Institute of Pathology, Charite has developed the Digital virtual Microscope (DVM) which allows the pathologist to move and zoom a digital slide in real time without any image quality loss [2], [3]. The digitalization of the glass slides, which were usually used to investigate a prepared tissue specimen by means of a conventional microscope, requires the implementation of large images databases. Under this point of view the information- and knowledge retrieval are integral components in systems for Digital Pathology. Actual systems mainly base on automatic classification algorithms [4] and methods to analysed image properties e.g. colour, texture, pattern [4]-[7]. In our project we improve the close relationship between the slides of a case and the pathology report in textual form as result of the investigation of these slides.

The project is funded by the Deutsche Forschungsgemeinschaft, as a cooperation among the Charite Institute of Pathology, the Institute for Computer Science at the FU Berlin and the Department of Linguistics at the University of Potsdam, Germany. Thomas Schrader, Institute for Pathology Charite, Berlin, Germany (phone: +49 03 450 536113; fax: +49 03 450 7536113; e-mail: [email protected]). Sonja Niepage, Institute for Pathology Charite, Berlin, Germany (e-mail: sonja.niepage(charite.de) Thomas Leuthold, Institute for Pathology Charite, Berlin, Germany (e-mail: [email protected])

1-4244-0049-X/05/$20.00 (C2005 IEEE

II. IMPLEMENTATION OF THE DOMAIN ONTOLOGIES

It is widely accepted that the construction of domain ontology from scratch is a central step in the development of a knowledge based system. Therefore numerous methodologies have been published to characterize the development process of a knowledge base, like TOVE, Methontology or DOLCE and could be described according the top down (e.g. DOLCE) and the bottom up approach (e.g. TOVE) [8], [9]. Related to Methontology we develop our ontology respectively sytem following the next 8 steps: * analysis of the application domain, * discovery of useful knowledge sources, * system design: discovery and design of useful knowledge structures and inference capabilities, * representation of the application knowledge using the selected knowledge representation language(s) (application ontology), * implementation of a prototype, * prototype testing and refinement and * management of the knowledge base: evolution and maintenance.

During the implementation we compared two basic approaches, reuse-oriented ontology engineering and text oriented ontology engineering, for building a knowledge base at the domain of lung pathology. In both cases the UMLS Semantic Network, which contains general medical concepts like "body part" was used as initial structure of the ontologies.

1113

1. Reuse-oriented Ontology engineering

As basis we used UMLS [10], as the most relevant medical thesaurus currently available. The applied release 2003AC contains over 1.5 million concepts from over 100 medical libraries. Due to the limitations of present Semantic Web tools, it was necessary to adapt this complex thesaurus in the following way. First we identified libraries and concepts relevant for the domain of lung pathology. In a second step the vocabulary of the report archive was compared to these libraries by means of a retrieval engine. Finally 10 UMLS libraries were selected, containing 350.000 different concepts. To adapt the ontology to the domain of lung pathology, we selected 4 central concepts (i.e. "lung", "pleura", "trachea" and "bronchia") and extract related concepts from the UMLS libraries (for more details see [11]).

The resulting about 1000 concepts represent the initial input for the domain ontology (see fig. 1). Due to the fact that only approximately 12.000 from over 500.000 concepts in UMLS were given in German, it was essential to translate the selected concepts to further adapt the ontology to the application domain. Beyond we modeled approximately 200 additional pathology-specific concepts (e. g. the components of a pathology report). 2. Text-oriented ontology engineering

The second approach based on the text analysis of 400 pathology reports. The results were lists of nouns, ranked by their relevance for the text domain and beyond that grouped by pre- and suffixes and also a list of adjectives modifying them. C iN

1. Biqeopladic irofNssT191 Beijiti n-_leopl1s of bi,o nehLisNOS CO345952 Blnicinn eopia iii ofJuiig_NOStC345963 C Blelnign mruoplasi inof_ o lpIeusNLSiC01347595 v Beiiigqn nleoplai iiof_tpieU^a_NOSiCO1 53955 C Benignneoplasmi of Visceral.Ieun C0347252 lBenignl neoplasni-of tr-achea_C01 53953 Blei4in_1lieo[)Izititl0f_viscei a1._pleuiRa_CO3147252

C F C F

i C

1

lGap4si-iSarkom

c

ilia[zinoklj

Klizilloidl C

C6

C @

LungjNeopli9m (_C0024121 lMalinll t-lieo Iil inll-of-luing

C @

Botcliialkal zinorn giiosszelligesIr

C lei izelligesiotIiBiioinialkai ib ziiio C pej'i ei el sBi-oi hi iilkai zin]oiii @ C C C C C C C

O242379

m4lgnlarft

1-

plb

rn

ofb;X I hI.. n 3s1

Maliginardneoplasm of b'ionTchus rNiS C0345950 G Secotn di maiy_ nli-i neoplasirioff bi -unCLIs-NO-C034i691 Imliynatt imeoplasi hofbr onchu s OlutiL unsecified C0348343 -ilk3k-, H Co 1olaligtna nt nleolasi ii of bi noihusNiNOSCO345950

r

Dicld iii lOihi

Epiqiottiskarzinom

Galletiblseilkaziioini Hypopharxkarzin imi iiom Kolkriin Lar-xkar zinom111 Lunoenlkar Zin

Miagetnkarzinomi

C C C

Niei enzellkarzinom

C C C

Pnkilki 1easkaitinom

Opeoplqaiuslarziniii Oiophar yikakzinom Ovar iallkarfzin!om

li

malilnanq itleoplasiiiof luing C0242379=bl;.,fi llu-iE-it nkJpfnrn 14 bronctiuS M 0 4 t S C345950 iiof Iobuonchus ligkJl nfleoplat C Malignat1t neoplas-npofhi;if6onhuso-jllul unspe ified iCO48343 Maligno M lalinat neoplasii of t achea NOS 01 53489 v et ndli ii alijnnt lie ph of ti achep-NS.oC1685016 C t1alignliat turntoI ofjileUraCi 1 534 4

P6intteiepfthelkaiziinu

Provtatakaizi nom g RRelillilix Zilzotli C @

1tiill lipiipe llqzilolI

83t1oviqlkaq zino]n

C U

CNeoupai in of umieil in beh-iuio ~of bi ouGlhU N 0-C346500 I- Neopi sl]of m 1 -0CC346501 Liiceilit inel ibihioi-of-loSg CNeopi smi_of liicei ditibellavioi of tiracleaNOS C03464C9 L Neoplani of_uLceitaii orLnnuikiio ii belivioi ofpleui (C049691 4 C Noil sill l neuplaiIn1 Iign it of th respiratoryrti actell type specifi C 0 i46i A it eoiai l h F Secoi]dL ylialigiiiqaritJeoplaii- tf luqg _NOS CO1 53676

Seon&I _aliqiv

liondhialkaziiloili

C

LuIg hNeoplsiiis_C 00241 21

B enliq_nheoplasmof_hSlulngOS_C345963 neoplaprn of' luLig_CO242379 =- rdalkginSaihl-

ziuoiti laiii) if oli dletelnoaiinoni

i Iclaizelliges adiiokai

1iilloffiaii-_Shl of lLilg 1NOSC01 68053

C

Adetnuok-zino C

.eiionii ina-hitu-of-bi onchub NOS CO345951 Ca Cal cinouiniii ii-_sNof lui-i gNOSC01685053 Cal. ci 10111 initt,pOfJi eLiia_XNOS C034711 8 Car cinornint of tiiacheaCOl 54i070 Cai inuio ua_f LI1nJJ) Ienbl inaC034 J5i Lo Ie O3lD219 _l epni toi yti act iieopIasin rBenign neoplasiiof lbi onll us NOS 0C345952 C Beiinnll_lleui sii of_lulg IuNOS C0345963

1@

V

Hieiiiinqio is_ai- noritos

C

otheilkrzino1

Ite sari inoi U

z eir0 @

arzinon

adlenoidzystitelies_kai zihlowii

@ liepDatozelluLaei-es_lai zitinoi lXar zienose C Lymigphefaessar zinose C Pei ikai dkiazi nose

of_hlion ihus_3Nr_

end ymxlioiianint nple s in of pleLuiNSaCD1 53NO 678 Secol S econ dari iilalignibat neoplasi of ti acliea NOSC0685016 n l iiall cell liii can cir C0279877 elaIi di(i lOio fti

Cd Pel iaone Lkar zinse

Fig. 1. A fragment of the UMLS based ontology, subclasses of "Neoplastic_Process".

Fig. 2. A fragment of the text based ontology, sybclasses of "Neoplastic_Process".

The analysis of the resulting subset shows, that using this procedure many concepts irrelevant to the domain of lung pathology in human medicine (e. g. "Kiemen" (gills)) are enclosed. However it was necessary to select the relevant concepts manually.

Further on we selected nouns being the most relevant to the domain of lung pathology and classified them in categories like "anatomical", "diagnostic" or "technical" (for more details of the engineering process see [12]). As well as in the first approach about 1000 nouns represent the initial concepts for this domain ontology (see fig. 2.).

1114

The relevant concepts were manually inserted as subclasses to UMLS Semantic Network. The modeling process was particularly supported by the pre- and suffix lists to model complex concepts like "Lungenlappen" (lobe of lung) as part of "Lunge" (lung) or "Karzinomzelle" (carcinoma cell) as a subclass of "Zelle" (cell). At last additional pathology specific concepts were directly added. III. EVALUATION OF THE TWO APPROACHES

In order to evaluate the ontologies resulted from the two approaches we compared the recall and the precision of the retrieval system based on the actual ontology and compared it to the efforts spend in the engineering process. To evaluate the retrieval capability we chose randomly a test set of 50 pathology reports from the report archive and selected 50 diagnostic items, used as search

terms. The UMLS based ontology contained 40% of the search terms, but only 8% were found directly as a result of simple string matching on concept names and their synonyms. This fact based on the arbitrarily concept names used in UMLS (e .g. concept names in the form "noun, adjective" occur as well as "adjective, noun"). Accordingly the recall amount only to 0.09 and the precision reached values from 0.92. To implement this ontology 5 person-month (PM) were needed. Thereby the customization required 2.25 PM including for instance 0.7 PM for the selection of the relevant libraries and 1.2 PM for the extraction of relevant concepts as described in section 2. 0.75 PM were invested on translation the input representation formalisms to OWL. The remaining 2 PM were needed for the refinement (e. g. 0.3 PM for the translation of the english concept names and 0.9 PM for the addition of pathology-specific concepts). In comparison the ontology based on the pathology reports contained 80% of the search terms whereas 61% were found directly. The recall adds up to 0.92 and the precision was 0.44. The lower precision in the second approach is conditional to the search algorithm. If a complex search term like "chronische Bronchitis" (chronic bronchitis) could not directly be found in the ontology it was divided and every word was compared separately to the ontology. So in that case one get for example all report containing the concept "Bronchitis". This procedure results in an increased number of false positives mainly in the text oriented approach due to the higher number of relevant concepts in this ontology. In the upper giver example the UMLS based ontology contains neither "chronische Bronchitis" nor "Bronchitis". To implement the ontology based on the pathology reports only 1.25 PM were necessary. Thereby the selection of the relevant concepts requires 0.09 PM, 0.31 PM were spend on the classification and manual insertion of the concepts in the ontology. 0.65 PM were needed for the definition of additional semantic relationships. The OWL implementation necessitated 0.14 PM. The remaining 0.06 PM were spend on refinement.

IV. CONCLUSION

In this paper we present two basic approaches building a knowledge base for a Semantic Web based retrieval system at the domain of lung pathology. It appears that with the text oriented approach one could reach both better retrieval performance and significant lower efforts in building it compared to the UMLS based ontology. In our opinion this is due to the following reasons: To review and adapt such a large knowledge base like UMLS to special application like lung pathology is a very time consuming process (especially due to the low incidence of german terms). Beyond that most terms are encoded in an arbitrary way and do not match with the normal occurrence in the pathology reports. However the text based approach allows coding the concept names in a uniform manner and therefore is more efficiently in semantic annotation of text documents, in our case of pathology reports.

1115

REFERENCES [1] S.T.C Wong and H.K. Huang. Design methods and architectural issues of integrated medical image data base systems. Computerized Medical Imaging and Graphics, 20(4):285-99, 1996. [2]

K. Saeger, K. Schliins, T. Schrader, and P. Hufnagl, "The virtual microscope for routine pathology based on a pacs system for 6 gb images", in Proceedings of the 17th International Congress on Computer Assisted Radiology and Surgery (CARS), London, UK, June 2003, pp 299-304.

[3]

P. Hufnagl, T. Schrader, K. Saeger, K. Schluins, K. Kayser, M. Dietel, ,,Telepathologie", in Der Onkologe, 9, Jan 2003, pp 29-36;

[4]

E.A. El-Kwae, H. Xu and M.R.Kabuka, "Content-based retrieval in picture archiving and communication systems", in J Digit Imaging, 13(2), 2000, pp 70-81.

[5]

H. Qi and W.E. Snyder, "Content based image retrieval in picture archiving and communication systems", in J Digit Imaging 12(2 Suppl 1), 1999, pp 81-3

[6]

J.Z. Wang, "Pathfinder: multiresolution region-based searching of pathology images using IRM", in Proc AMIA Symp, 2000, pp 8837

[7]

M.E. Mattie, L. Staib, E. Stratman, H.D. Tagare, J. Duncan, and P.L. Miller, "Pathmaster: Content-based Cell Image Retrieval using automated Feature Extraction", in J Am Med Inform Assoc, 7(4), 2000, pp 404-15.

[8] M. Fernandez-L6pez, A. G6mez-Perez, "Overview and Analysis of Methodologies for Building Ontologies.", in Knowledge Engineering Review, 17(2), 2002, pp 129-156. [9] M. Cristani, R. Cuel,, "A Survey on Ontology Creation Methodologies.", in Int'l Journal on Semantic Web & Information Systems, 1(2), 2005, pp 49-69.

[10] The UMLS Consortium. 2003. UMLS release 2003AC. http://www.nlm.nih.gov/research/umls/. [11] E. Paslaru Bontas, S. Tietz, R. Tolksdorf, and T. Schrader, ,,Generation and Management of a Medical Ontology in a Semantic In Web Retrieval System", Proceedings of the CoopIS/DOA/ODBASE (1), 2004, pages 637-653. [12] E. Paslaru Bontas, D. Schlangen, S. Niepage, "Ontology Engineering for the Semantic Annotation of Medical Data", in 4th Workshop on Web Semantics at the DEXA2005.

Building Domain Ontologies in a retrieval system for ...

deficits we developed a Semantic Web based retrieval system, using domain ontologies. ... (phone: +49 03 450 536113; fax: +49 03 450 7536113; e-mail:.

1012KB Sizes 0 Downloads 250 Views

Recommend Documents

Empirical Ontologies for Cohort Identification - Text REtrieval ...
ontology creators do not base their terminology de- sign on clinical text — so the distribution of ontology concepts in actual clinical texts may differ greatly. Therefore, in representing clinical reports for co- hort identification, we advocate f

Image retrieval system and image retrieval method
Dec 15, 2005 - face unit to the retrieval processing unit, image data stored in the image information storing unit is retrieved in the retrieval processing unit, and ...

Building a domain ontology for designers: towards a ...
solutions characterized by their semantic expression. .... difference between the proposed design solution and the .... archiving support, Digital Creativity, 2004.

A Content-Based Information Retrieval System for ... - Springer Link
This paper deals with the elaboration of an interactive software which ... Springer Science + Business Media B.V. 2008 .... Trend boards offer a good representation of the references used ..... function could be fulfilled by mobile devices.

Domain-Specific-Custom-Search-for-Quicker-Information-Retrieval ...
Domain-Specific-Custom-Search-for-Quicker-Information-Retrieval.pdf. Domain-Specific-Custom-Search-for-Quicker-Information-Retrieval.pdf. Open. Extract.

Keyword Spices: A New Method for Building Domain ...
domain-specific search engine for computer science research papers. ... We call this the filtering model for building .... simplify keyword spices in the way that results in high value ..... national World Wide Web Conference(WWW6), pages 189–.

An automatic algorithm for building ontologies from data
This algorithm aims to help teachers in the organization of courses and students in the ... computer science, ontology represents a tool useful to the learning ... It is clcar that ontologics arc important bccausc thcy cxplicatc all thc possiblc ...

A Robust Content-Based Image Retrieval System ...
GCH and 2-D Haar wavelet transform using the cosine distance gives good ..... January 1999. http://www.unn.ac.uk/iidr/report.html. [2] Christopher C. Yang, ...

A Motion Trajectory Based Video Retrieval System ...
learning and classification tool. In this paper, we propose a novel motion trajectory based video retrieval system. For feature space representation, we use two ...

A Security Framework for Content Retrieval in DTNs - IEEE Xplore
Dept. of Computer Science, UCLA. Los Angeles, USA. {tuanle, gerla}@cs.ucla.edu. Abstract—In this paper, we address several security issues in our previously ...

The use of ontologies in ITS domain knowledge authoring
The use of ontologies in ITS domain knowledge authoring. Pramuditha ... evaluating student solutions include KnoMic [7], Disciple [8, 9] and Demonstr8 [10].

Compressed Domain Video Retrieval using Object and ...
The object features such as speed, area and trajectory are then obtained after ... With the advent of broadband networks, high-powered workstations and ... (3) With the availability of hardware encoders and decoders for MPEG, such an.

Ontologies for eLearning
Institute of Control and Industrial Electronics, Warsaw University of Technology. Warsaw, Poland .... on ontologies keep the profile of each user and thanks to the ontologies are able to find best fitting content. ..... 2001), Orlando, USA. Edutella 

A graphical technique for finding equilibrium magnetic domain walls in ...
For the case of a two layer wire this technique is used to find two domain wall ... Keywords: Multilayer nanowire; Domain wall; Magnetization reversal; Thermal ...

Executive Information System For Monitoring Building Construction ...
EXECUTIVE INFORMATION SYSTEM FOR MONITORING BUILDING ... For Monitoring Building Construction Work Progress : Wan Zahran Zakaria - ttp.pdf.

[PDF BOOK] Building Ontologies with Basic Formal ...
Online PDF Building Ontologies with Basic Formal Ontology, Read PDF Building Ontologies with Basic Formal Ontology, Full PDF Building Ontologies with Basic ... a particular way of designing software applications as suites of independently SULFAMIC AC