Linked data in practice in digital humanities projects Eetu Mäkelä, D.Sc., Assistant professor in digital humanities, University of Helsinki Adjunct professor in computer science, Aalto University [email protected]

Lugar y fecha

Linked digital humanities research process

Ideal digital humanities research process analysis tools

Iterative exploration of data raw data

results

processing tools

research articles

Sources for open data in the digital humanities - the good • Great aggregators pushing for CC0 licenses, publishing participating data: Europeana, Digital Public Library of America & The European Library • Influential national libraries moving to co-operative open (linked) data – Library of Congress, Deutsche Nationalbibliothek, British Library, Bibliothèque nationale de France • Museums, Galleries and Archives catching up: British Museum, Finnish National Gallery, … • Glue available: VIAF, CIDOC-CRM, Getty AAT, TGN, ULAN, CONA, Pleiades, ...

Sources for open data in the digital humanities - the bad • Academic libraries have a long tradition of collaborating with library service companies (primarily EBSCO Information Services, ProQuest LLC and Gale Cengage Learning) to produce services • Often, they also participate in content creation projects, and then hold the rights for that content – e.g. Early English Books Online (ProQuest), Nineteenth Century Collections Online (Gale), State Papers Online (Gale) • But, this is also a wider culture inside humanities, e.g. Electronic Enlightenment

Data in practice

Library catalogue contents Leader *****ngm 22*****1a 4500

538 ## $a VHS.

245 04 $a The Adventures of Safety Frog. $p Fire

521 ## $a Elementary grades.

safety $h [videorecording] / $c Century 21 Video, Inc. 246 30 $a Fire safety $h [videorecording] 260 ## $a Van Nuys, Calif. : $b AIMS Media, $c 1988. 300 ## $a 1 videocassette (10 min.) : $b sd., col. ; $c 1/2 in. 500 ## $a Cataloged from contributor's data.

530 ## $a Issued also as motion picture. 520 ## $a Safety Frog teaches children to be fire safe, explaining that smart kids never play with matches. She shows how smoke detectors work and explains why they are necessary. She also describes how to avoid house hold accidents that lead to fires and how to stop, drop, and roll if clothing catches fire. 650 #0 $a Fire prevention $v Juvenile films.

Documentation!!! • 81 pages of documentation on the exact annotation practices used in the digital edition of the Potage Dyvers • Library cataloguing standards: – 302 pages of ISBD – 750 pages AACR, 1056 pages of RDA • 1020 pages of the SPECTRUM standard for museum cataloguing • A single page of field descriptions in the Schoenberg database

The missing documentation • “We changed our cataloguing standards once in the 80’s, and then a second time in 1998.” • “Most of our older entries have actually been copied from the national library that has different cataloguing standards” • “A lot of the publications from the middle of the 18th century are simply missing, as they were never indexed.” • “This database was gathered based on the whimsies of what the participating researchers researched. It’s probably thus quite biased.”

Open data in the digital humanities - the ugly ● Different forms of encoding, typos (Paris,) Paris [Paris,] (Paris) A Paris À Paris (Paris.) [A Paris]

[Paris] (Paris

Amsterdam. - et Paris Amsterdam ; et Paris Amsterdam. - et à Paris Amsterdam [Paris] (Paris. - Amsterdam A Amsterdam [i. e. Paris]. M. DCC. LXX.

Data woes: viaf.org ● Automatic conversions from “Lastname, Firstname” to “Firstname Lastname” do not always work due to bad data

Charles-Victor Prévost d'Arlincourt Charles Victor Prévôt ˜d'œ Arlincourt Charles Victor Prevot d' Arlincourt Arlincourt

http://viaf.org/viaf/41896578/

Data woes: Europeana http://labs.europeana.eu/api/linked-open-da ta-data-downloads

Digital humanities research process in practice analysis tools

cleanup tools

Iterative cleanup, exploration of data raw data

results

understanding data

clean data

processing tools

research articles

Digital humanities research process in practice analysis tools

cleanup tools

Iterative cleanup, exploration of data, with attendant tool development

raw data

understanding data

clean data

processing tools

results

research articles

Collaborative digital humanities

Digital humanities

Linked digital humanities research process

Linked digital humanities research process analysis tools

cleanup tools integration tools

Iterative integration, cleanup, exploration of data, all with attendant tool development understanding data

clean data

processing tools

results

research articles

Linked digital humanities process in practice?

research articles??

research articles??

research articles??

What makes for good Linked Data? ● Data that is part of a wider context ● The hardest part is manifesting the network of relationships ○ Use existing vocabularies for attribute values! ■ People: VIAF, ULAN, ... ■ Places: GeoNames, TGN, Pleiades, ... ■ Concepts & General: DBPedia, AAT, LCSH, Iconclass, … ■ Events: ? ○ Everybody seems to obsess over schemas, but they are actually not that important (but do help) ■ CIDOC-CRM, EDM, BIBFRAME, ORG, RELATIONSHIP, FOAF, schema.org, SKOS, Geo, ...

A second view

Linked digital humanities research process

Digital humanities workflow 0. Formulate research questions 1. Discover relevant data

2. Ingest and integrate data

4. Publish interpretations, also as data

3. Explore and interpret data

Enabling a Virtuous Cycle 0. Formulate research questions 1. Discover relevant data

2. Ingest and integrate data

4. Publish interpretations, also as data

3. Explore and interpret data

e.g. the prosopographical records in Early Modern Letters Online (EMLO) originating from research into primary sources

Digital humanities workflow 0. Formulate research questions 1. Discover relevant data How?r, vocab.a balloon

4. Publish interpretations, also as data How?

2. Ingest and integrate data How?

3. Explore and interpret data How? o, Europeana 4D, VISU, Khepri, SAHA, RelFinder, ...

Digital humanities workflow • • • • • • •

Model Create Convert Publish Discover Integrate Explore

Linked Humanities Case Studies

180 people, 33 countries, led by Oxford

EMLO as Linked Data ● SAHA ● Palladio ● EMLO, EE and D’Alembert ● EMLO and BNF http://tinyurl.com/y9wcakhr

Aspects covered • • • • • • •

Model Create Convert (Publish) Discover Integrate (Explore)

Data model Created events and their temporal relationships: - someone possessed the manuscript at least before the auction - someone else may possess the manuscripts after the auction if the auction contains a sale or gift event - provenance info creates possession events that are in the stated order

Catalogue

Catalogue Entry

Place

Auction

Manuscript

Work

Sale

Gift

Possession

Time

Actor

Other notable types and their relationships: - Catalogues have entries that refer to manuscripts that may be comprised of multiple works - Works are created, owned, sold and bought by actors - Manuscripts also have a lot of other metadata, e.g. time, place, number of miniatures and so on

Sample queries ● Where are the manuscripts collected by Sir Thomas Phillipps from? ● How many hands have the manuscripts passed through?

Aspects covered • • • • • • •

Model Create Convert (Publish) Discover (Integrate) (Explore)

Interfacing structured and unstructured data in sociolinguistic research on language change

LINGUISTIC QUESTIONS Social meaning of spelling variation in historical periods of English and Finnish Neologisms in early English correspondence

TOOLS AND MATERIALS Developing a modular research toolkit for sociolinguistic analysis

RESEARCH GROUP: Terttu Nevalainen (PI; University of Helsinki) Samuli Kaislaniemi (University of Helsinki)

Anni Sairio (University of Helsinki)

Taru Nordlund (PI; University of Helsinki)

Eetu Mäkelä (Aalto University)

Tanja Säily (University of Helsinki)

Katja Litola (University of Helsinki)

Poika Isokoski (PI; University of Tampere)

Anna Merikallio (University of Helsinki)

Johanna Utriainen (University of Helsinki)

Harri Siirtola (University of Tampere)

Use by gender

Use by societal rank: clergy vs gentry

Use by societal rank: professionals vs others

Aspects covered • • • • • • •

Model Create Convert (Publish) Discover Integrate Explore

With Thea Lindquist, University of Colorado

Contextual Reader

Support close reading in an unfamiliar domain 1. Automatically give context 2. Locate other sources relevant to the topic across distributed collections requiring as little as possible from the distributed collections

A Contextual Reader for First World War Primary Sources Demonstrative documents: • a primary source PDF from the CU-Boulder WWI Collection Online • a postcard with metadata from the Great War Archive • an encyclopedic article from 1914 - 1918 Online

A Contextual Reader for WWI Primary Sources

A Contextual Reader for WWI Primary Sources

Under the Hood: Dynamic Entity Extraction 1. Extract content from HTML/PDF in browser 2. Call language analysis web service to generate query terms 3. Query Linked Data repository for context information → Items under study do not need to be formally annotated!

Vocabularies used: 1. WW1LOD 2. 1914 - 1918 Online Vocabularies 3. Europeana 1914 - 1918 Thesaurus 4. Out of the Trenches (PCDHN-LOD) 5. Trenches to Triples 6. DBpedia

Under the Hood: Query Expansion 1. Gather cross-lingual, alternate term information from vocabularies 2. Query remote web services for related content → Related content does not need to be formally annotated!

Repositories used: 1. CU-Boulder WWI Collection Online 2. WW1 Discovery 3. Europeana 4. Digital Public Library of America 5. The European Library

A Contextual Reader for Ancient Texts Demonstrative documents: ● an english translation of Caesar's Gallic War in the Perseus Hopper ● a Latin text by Livy → Language analysis step allows support for highly inflected languages, multilingual Linked Data enables crossing language boundaries

A Contextual Reader for Ancient Texts Vocabularies used: 1. Pleiades gazetteer of ancient places 2. English and Latin DBpedias

Repositories used: 1. Pelagios datasets 2. Perseus Catalog 3. Europeana 4. Digital Public Library of America 5. The European Library

A Contextual Reader for Finnish Law Demonstrative documents: ● a Finnish law ● a Finnish supreme court decision ● a statement by the standing committee on law on a law in preparation ● a news article concerning a law

A Contextual Reader for Finnish Law Vocabularies used: 1. legal terminology in the Finnish Terminology Bank 2. Asseri legal vocabulary 3. Edilex legal vocabulary 4. Talentum legal vocabulary 5. Legal terminology section of the Finnish DBpedia

Repositories used: 1. Finlex consolidated legislation 2. Finlex precedents of Finnish supreme courts 3. Edilex legal news

Aspects covered • • • • • • •

Model Create Convert Publish Discover Integrate Explore

Aspects covered • • • • • • •

Model Create Convert Publish Discover Integrate Explore

With Hans Wietzke, Stanford

Ancient Name Dropping

Ancient Name Dropping, with Hans Wietzke, Stanford Detecting clusters in references to authority in ancient greek texts on natural science – co-citations hand curated by Hans Wietzke

Aspects covered • • • • • • •

Model Create Convert (Publish) Discover Integrate Explore

with Dan Edelstein and Nicole Coleman, Stanford

Fibra – human scale tool for linked data that supports critical inquiry 1. Source information from linked datasets 2. Organize and add to data in order to build an argument 3. Capture both the data and the reasoning behind it so it will have context within the scholarly community 4. Publish the new knowledge to the community where it can be cited, re-used and built upon by others.

Fibra Construct

Aspects covered • • • • • •

Model (Convert) Publish (Discover) Integrate Explore

Final view

Linked digital humanities research process

Digital Humanities Workflow 0. Formulate research questions 1. Discover relevant data How?r, vocab.a balloon

4. Publish interpretations, also as data How?

2. Ingest and integrate data How?

3. Explore and interpret data How? o, Europeana 4D, VISU, Khepri, SAHA, RelFinder, ...

Tools for phases of the cycle Bulk Understand

Local

Aether

vocab.at

Voyager

(ARPA)

Karma

OpenRefine

Breve

Import

Edit OpenRefine

FiCa

Recon

Silk

Wrangler

SAHA

Snapper

Reconcile Fibra

Organize

Explore

SKOSJS

Publish

VISU

LDF.fi

Palladio

Octavo

Khepri

nodegoat

nodegoat

[email protected] url: http://seco.cs.aalto.fi/u/jiemakel @jiemakel

Linked data in practice in digital humanities projects

Information Services, ProQuest LLC and Gale Cengage. Learning) to produce services. • Often, they also participate in content creation projects, and then hold ...

4MB Sizes 1 Downloads 190 Views

Recommend Documents

(the) Digital Humanities? - Sign in Accounts
Software Development. Typically following an. "agile" development model ... Gephi. ○ Java. ○ Drupal References. ○ D3. ○ ArcGIS Network Analyst ...

humanities data in r exploring networks geospatial data ...
Before using this unit, we are encourages you to read this user guide in order for this unit ... The problem is that once you have gotten your nifty new product, the ...

Introduction to methods in digital humanities
Docent (Adjunct Professor) in Computer Science / Aalto University ... and paradox, allowing human-scale exploration of complex systems. - About -page of the Humanities + Design research laboratory at. Stanford. Digital humanities as ... Knowledge of

Named Entities in the Digital Humanities
Automatic conversions from “Lastname, Firstname” to. “Firstname Lastname” does not always work due to bad data. Problems for NER. Charles-Victor Prévost d'Arlincourt. Charles Victor Prévôt ˜d'œ. Arlincourt. Charles Victor Prevot d'.

Digital Humanities
Mar 9, 2010 - for Italtel-Siemens telephone exchanges and enjoyed a protracted struggle with C ... short text messages between mobile telephones. The SMS ...

Introduction to methods in digital humanities
The digital humanities comprise the study of what happens at the intersection of computing tools with cultural artefacts of all kinds. This study begins where basic familiarity with standard software ends. It probes how these common tools may be used

Funded Research Projects in Data Science - GitHub
logs, social media posts and blogs, satellites ... Due to the proliferation of social media, sensors, and the Internet of. Things .... “troll” on an Internet Web site simply ..... levels of “bad” cholesterol that sometimes kills adults in the

BookSampo - Linked Data in the Service of Fiction ...
globally unique identifiers. -> Relates entities (authors, books, movies, publishers, awards, series, reviews, ...) to each other, manifesting the rich context of fiction literature. -> Allows automatically merging in external information (e.g. from

Our digital humanities
evaluation technical feedback content feedback data. Our digital humanities ... digital. Social network analysis. Social network analysis. Social network analysis.

Emerging Diversities in Health Humanities Teaching - Yale ...
CALL FOR PROPOSALS. Emerging ... Individuals selected to lead sample class sessions will receive free registration for the seminar ... Attendees may expect approximate conference costs of $500 inclusive of 4 nights lodging, 3 breakfasts,.

Helsinki Centre for Digital Humanities
the active cooperation of industry and ... libraries and tutorials on the Internet. 3. High-level understanding of what types of things can be accomplished.