Shared, reproducible research Eetu Mäkelä, D.Sc. Assistant Professor in Digital Humanities / University of Helsinki Docent (Adjunct Professor) in Computer Science / Aalto University

Open science • • • • • • • • •

Open data Open research process Open research results Open access to publications Open source Open peer review Collaborative research Citizen participation in research …

Open, reproducible research Reproducible research •Open data •Open research process •Open research results •Open access to publications •Open source •Open peer review •Collaborative research •Citizen participation in research •…

Open, reproducible research Reproducibility of research is •Open data important: •Open research process •Open research results • 2011-2015, Reproducibility Project: Psychology: only 35 out of •Open access to publications 97 landmark studies were •Open source reproducible •Open peer review • Over half of publications in •Collaborative research psychology contain simple statistical errors •Citizen participation in research • Only 9 out of 53 landmark cancer •… studies were reproducible

Digital humanities research process raw data

cleaning up data (80% of work)

exploratory tools

results

research articles

understanding data

80% of your time to understanding and cleaning up data, because data is always complicated and messy, and thus blindly trusting it cannot yield trustworthy science

Leverage collaboration, open science workflows to reduce individual workload

raw data

cleaning up data (80% of work) d

exploratory tools

understanding data, 2 collaborate, share these, speed up research for everyone

+ reproducibility

results

research articles

Sharing data, code and publications: GitHub • GitHub is a great collaboration platform for open research that is based on code • Supports e.g. revision tracking, merging conflicting edits, tagging release versions, issue tracking, … • Integrates with Zenodo, a service hosted at CERN for sharing and long term archiving all aspects of research. • Basically, you can upload anything there, and it gives you a resolvable DOI (Digital Object Identifier) for it

Design for Digital Publishing of Data Driven Research

Examples of open, shared, reproducible research • • • •

Polymath NMRLipids rOpenSci COMHIS • Bibliographica, ESTC, Fennica

Commensurate numismatics • • • •

Collaboratively created unified terminology Place identifiers from Pleiades Open source code for publication platform Instances: • Coin hoards of the Roman Republic Online • Coinage of the Roman Republic Online • Online Coins of the Roman Empire

Reproducible research is hard • GitHub for sharing and Zenodo for long term archiving, but how to make sure someone in the future can reproduce your results? • Documentation! • Literate programming = the publication and the code that led to it at the same place ‒ Jupyter notebooks ‒ R Reproducible Research package • Versioning of data and code (git) • Management and versioning of external dependencies • Packrat for R

Insuring your reproducible research against the future is even harder • Format/software obsolescence

• Hardware obsolescence

[email protected] http://j.mp/s-makela

http://presemo.helsinki.fi/meth4dh

Shared, reproducible research

•Open data. •Open research process. •Open research results. •Open access to publications. •Open source. •Open peer review. •Collaborative research. •Citizen participation in research. •… Open, reproducible research. Reproducible research ...

1MB Sizes 6 Downloads 220 Views

Recommend Documents

Shared Governance
Public community college governance stands quite apart from the ... in America's community colleges is virtually a state-by-state choice with some of the.

Reproducible, relocatable, customizable builds and ... -
Automation. Installing scientific software shouldn't be hard. Build it Once. Use it Everywhere. ... Mailing List https://groups.google.com/d/forum/hashdist.

Human Computation Must Be Reproducible - CEUR Workshop ...
a social network website. Not only is it ... In the social sciences, content analysis is a methodology ..... chronic pain in adults, excluding headache, Pain, 80, 1-13.

Towards Reproducible Performance Studies Of Datacenter Network ...
Data Storage Institute ... codes for our simulation set- ups are publicly available at http://code.google.com/p/ntu-dsi- dcn/. ... fully functional datacenter network of 50,000 servers [5], with .... such as as higher network capacity and graceful pe

Making Computations and Publications Reproducible with VisTrails
6/8/12 10:41 AM ... through a Web-based interface, and upgrade the ..... the host and database name: .... best practices aren't necessarily formalized. By pub-.

Shared!Practice!Forum! -
Nepal!earthquake,!the!initial!mental!burden!of!shock!and! ... OPENPediatrics'! clinician! community! site! and! public! website.! Please! go! to!

2017.01 Shared Reading.pdf
Download. Connect more apps... Try one of the apps below to open or edit this item. 2017.01 Shared Reading.pdf. 2017.01 Shared Reading.pdf. Open. Extract.