The University of Tokyo Graduate School of Information Science and Technology
COLLECTIVE SEMANTIC ANNOTATION FOR WEB TEXT: TRIPLE TAGGING AND TRIPLE EXTRACTION
A Thesis in Department of Creative Informatics by Jie Yang
c 2008 Jie Yang °
Submitted in Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
June 2008
Abstract
Semantic annotations are machine-understandable metadata attached to web resources. Semantic annotations represent information contained in text documents in a structured format which are more amenable to applications in data mining, question answering, or the Semantic Web. Considerable research has been done in the reign of semantic annotation. If we check the sources of the semantics of semantic annotations, existing studies can be classified in two categories: the “ontology-centric” class which depends on the “a-prior” vocabularies (generally known as ontologies) to annotate web text; and the recent “user-centric” class which avoids pre-defined vocabularies and allows normal web users to annotate web text with less or no constraints. This research on “collective semantic annotation” is a user-centric annotation approach. The goal of the work is to explore how we can generate semantic annotations for web text by exploiting the strengths of both normal web users and computers. Specifically, two questions are addressed. Firstly, what user-centric support can be provided to encourage normal web users annotating web text? Secondly, how to automate the annotation process? As the result of the first question, a user-centric annotation diagram, triple tagging diagram, is proposed. I identify eight dimensions which help us to describe annotation frameworks. Literature work is investigated in terms of the eight dimensions. The features and novelties of the triple tagging diagram are addressed. The diagram consists of three parts: the concept model which defines annotation primitives, the collaboration model which addresses the information collection and navigation possibilities, and the ontology model which provides a common definition for triple annotations so that they can be exchanged, re-used, and extended on the Web. A model evaluation is carried out, which includes both qualitative and quantitative analysis. The evaluation exhibits the expressive power and advantages of the triple tagging diagram over existing work. Regarding the second question, I propose an interactive approach which generates semantic annotations for web text automatically. In this approach, the annotation generation problem is defined as a binary relation extraction problem. ii
Linguistics and machine learning techniques are exploited to solve the problem. Specifically, we propose the algorithm of penalty tree similarity. The algorithm is an extension of tree kernels which are widely used in the field of Information Extraction. A triple tagging corpus is created and used in experiments. The result shows that the extended tree similarity algorithm achieves better performance. As a result of this research, a triple tagging system, Triple-Note, is implemented. It is implemented in a web-server architecture. On the client side an extension of Firefox browser is implemented to support users’ annotating actions. On the server side, automatic extraction, annotation storage, and other servicing models are implemented.
The University of Tokyo Graduate School of Information ...
sources. Semantic annotations represent information contained in text documents in a structured format which are more amenable to applications in data mining,.
grammar for my conference abstracts, term papers, manuscripts, and this dissertation, ...... For example, in (21), the antecedent of the elided VP go to the ball.
ALEXANDER A. NEZLOBIN. Haas School of Business,. University of California, Berkeley. Berkeley, CA 94720. Tel: +1 (650) 862-8875. E-mail: [email protected] sites.google.com/site/alexanderanezlobin/. Education. Stanford University, Graduate Sc
Jul 6, 2011 - Act as a First Responder to alarms and calls for service. Observe campus activities, reporting suspicious behavior and other incidents to Central ...
To be concrete, he cites examples from the airline industry. ..... four key variations on the idea: âI have lots of time in ...... Renewable energy and solar in particular.
1. I'm excited to write about reinvention because it is a process I think about often ..... School of Business for 12 years. The class ... at trade shows to lend a hand.
Jan 13, 2015 - building a business that enhances The Coca-Cola Company's trademarks. ... Apple-- which regularly tops the list of the world's most valuable ...
Jan 13, 2015 - financialization is a potent force for changing social institutions. .... top five hedge fund managers in 2004 earned more than all of the CEOs in the .... Page 10 .... financial media meant that by the late 1990s, firms were under ...
Which of the figures above is the best representation of the channel in the schematic on the ... (a) Calculate VOH, VOL, VM of the above inveter. (b) Find VIH, VIL, ...
Sep 30, 2003 - ... without charge from the Social Science Research Network electronic library: ... Both were twenty-eight.1 Over the next seven years, Andrea2 ..... was designing computer systems for NASA.53 Andrea approached him first in ...... nati
MBA Only. MBA Leadership for Sustainable. Futures Scholarships. MBA Only ... Title of Masters course applied for (e.g. MSc Marketing). Type of offer received from the ... help you, and detailing your motivations, expectations and educational or profe
âBest Foot Forward or Best for Last in a Sequential Auction? .... Service to the Department of Finance, Kelley School of Business, and Indiana University.
Page 1. Econometrica Supplementary Material. SUPPLEMENT TO âSTABLE MATCHING WITH INCOMPLETE. INFORMATIONâ: ONLINE APPENDIX. (Econometrica, Vol. 82, No. 2, March 2014, 541â587). BY QINGMIN LIU, GEORGE J. MAILATH,. ANDREW POSTLEWAITE, AND LARRY S
Apr 1, 2006 - Futures trading started in Chicago in the mid 1800s as a way of managing ... has also been a rapid increase in the number of hedge funds and proprietary trading ..... the NYSE became a publicly traded company. It also led to.
Apr 1, 2006 - developed, or are currently in the process of development, indicate that Chicago ..... technology behind the new exchange is a combination of.
Aug 20, 2016 - Site visit: HEHE Labs - striving for Rwanda to become an ICT. Hub. 18:30 â 19:30 ... communities into social learning capitals â Dr Prasad.
Microeconomic Analysis, 1997-2000. Higher School of ... Microeconomic Analysis, 2005, 2006. Kellogg School of ... 2008-2013. Business Analytics, 2013-2014.