Data Management Planning

Anticipating the costs of research data management Version 1.0 October 2015

University of Bristol

Research Data Service Image: Adapted from One Penny, Adrian Rowbotham, Flickr, CC BY-NC 2.0

and to enable them to prepare funding applications

INTRODUCTION

accordingly.

Many funding bodies now require that award

THE ROLE OF THE DATA MANAGER

recipients manage their research data, storing and preserving it in the long term and sharing some, if not all of that data once the research is completed. Academic publishers too, are increasingly calling for

Before assessing potential costs for individual data

scientific claims to be underpinned by publicly

management activities, Principal Investigators planning

accessible data which can be checked by anyone.

large-scale projects are asked to consider whether or not employing a Data Manager is appropriate. A career

Successful data management always has a financial

profile for a Data Manager is available from the

cost, even if this is only covering a small fraction of a

DaMSSI project.1 For large (or complex) research

single researcher’s time, spent organising files within

activities, employing a Data Manager has several

folders. Often though, when added together, the time

advantages;

spent performing several different aspects of data 

management (e.g. transcription, data anonymisation

A Data Manager can reduce project costs by

and converting between different file types) is more

ensuring data is sorted and processed as it is

substantial, and so the costs are more significant.

generated. This is often cheaper and more effective than leaving data processing until the

Some of these costs are an integral part of ‘doing the

end of a project;

research’; others will be incurred only because, for



example, a funder requires research data to be made

A Data Manger can ensure data benchmarks and standards are being met, helping to harmonise

publically available.

the efforts of individuals responsible for generating data and allowing procedural errors to

In terms of covering costs, some actions and resources

be spotted early on;

will have no direct cost to a research project (data 

storage below 5TB, for example, is provided for free to

A Data Manager can collect metadata and document methodology while a project is

University of Bristol Principal Investigators) while

underway. This reduces the risk of important

others (such as carrying out data quality control) will

information going unrecorded and being lost;

have a direct cost and should be included in research 

funding applications. This guide is intended to prompt

Most major funders now take research data very

researchers to consider research data management as

seriously. Dedicating even a fractional role to

an important, and potentially costly, research activity,

research data management helps demonstrate to

1

http://www.dcc.ac.uk/sites/default/files/documents/IDC C11/data%20manager%20final.pdf 2

a funder that the applicant also appreciated the

conducting recorded interviews) often form a major

importance.

part of a Research Assistant’s routine duties.

For more modest research activities, research data

The non-staff costs of creating new data are usually

management duties will be assigned to other members

covered directly by a research funder. For instance,

of the project team (such as a Research Assistant) or

some of the University’s scientific facilities make a

handled directly by the Principal Investigator.

charge to researchers, which should be included within funding applications and so passed on to

THE RESEARCH LIFECYCLE

research funders.

A research dataset is typically; 1. Created 2. Used (by the individual or team responsible for its creation) 3. Curated (prior to publication) 4. Published 5. Preserved 6. Re-used (by parties not involved in the creation of the dataset)

Activity

Anticipated cost

Gaining consent for data

Low cost if carried out

sharing (for research

before new data is

involving human

created

participants)

Each of these phases has associated costs and even

Data Description (e.g.

Low cost if carried out

data in spreadsheet are

as part of data creation

clearly marked with value

where no ‘new’ data is to be generated (for example

and variable labels)

when a freely available, public dataset will be used), researchers are encouraged to consider the potential costs of the other phases. Each phase is discussed

Data Cleaning (e.g.

Low cost if carried out

ensuring only relevant

as part of data creation

data is present or only

below.

controlled terms are used)

Creating new data The creation of new data is invariably the most expensive phase in this lifecycle, so it isn’t difficult to

Documentation (e.g. of

Low cost if carried out

methodology, analysis

as part of data creation

and quality control

see why research funders are keen that this step is

procedures)

avoided wherever possible; hence their promotion of

Digitisation

data re-use.

Low cost if simple and small scale (e.g.

For many researchers, the creation of data is a familiar

scanning of a few

activity and the time and resources involved are well

dozen paper

understood. Activities resulting in new data (for

documents). Moderate

example time spent analysing samples in the lab or

to high cost if complex 3

(e.g. large scale or

Grant funding occasionally covers the creation of new,

accurate text mark-up

project-software, for example a mobile device app.

is required)

Research IT3 can advise on costs involved and are able

Organisation of data (e.g.

Low cost if well

to undertake some software development work.

versioning, file naming &

planned and then

folder structure)

carried out as data is

High Performance Computing resources are free at the point of use but costs can also be included in funding

created Anonymisation

applications if guaranteed access is required, or a large

Low cost if well

request for resources is involved.4

planned and carried out as data is created

The Research Data Storage Facility provides each lead researcher with 5TB of storage without charge, but further data storage can be purchased at a cost of

Making use of data

£1500 per TB.5

Money spent here often supports research efforts

Activities in this phase can include:

immensely. If a robust and fit-for-purpose dataset is created only minimal modification will be required

Activity

Anticipated cost

later on, when the same data is shared. Applying

Formatting data (e.g.

Low cost if target

logical structures and quality control measures to the

converting files between

format is directly

data will ensure it sufficiently supports published

different formats)

equivalent to original

research claims.

format. Can be moderate cost if manual

Yet processes such as data standardisation, shifting

checking is needed (e.g.

between different file formats, undertaking quality

changing between

control procedures and ensuring data is appropriately

database formats)

stored during a research project can have significant

Transcription

costs, some of which are direct costs to the project.

Moderate cost, depending on quantity.

A wide range of software applications capable of

Assume 4-8 hours of

carrying out data processing tasks is provided at no

transcription per

direct cost by IT Services.2 Other, more specialist

recorded hour Data storage & security

software can either be purchased or leased, both of

RDSF is used

which will have associated direct costs.

2 3

No cost if under 5TB and

4

http://www.bristol.ac.uk/software/ http://www.bristol.ac.uk/it-services/research/

5

4

https://www.acrc.bris.ac.uk/acrc/hpc.htm https://www.acrc.bris.ac.uk/acrc/storage.htm

Curating data prior to publication If data has been carefully planned, created and processed up to this point, only minor modifications will be required in order to publish it, and costs will be

Organisation of data (e.g.

Can be moderate cost

versioning, file naming &

if not carried out as

folder structure)

part of data creation

Anonymisation

Can be high cost if not carried out as part of

correspondingly low. But, if a dataset containing

data creation

personal information has not been anonymised before the close of a project, weeks of staff time may be required to carry out this activity. It is strongly recommended that these situations are anticipated and so avoided, as money spent at this stage has

Gaining consent for data

Can be very high cost

sharing (for research

(or impossible) if not

involving human

carried out before data

participants)

is created

but complete. Ideally, this stage would consist of

The publication and preservation of data to support reuse

simply processing any recently created data and

Many research funders require data to be made

creating subsets of pre-prepared data in order to

available for a number for years, after a project has

underpin specific research claims.

ended. However, they are typically unwilling to directly

minimal benefit for the research project which is all

fund ongoing data preservation.

Activities in this phase can include: Activity

Anticipated cost

Data Description (e.g.

Can be high cost if not

data in spreadsheet are

carried out as part of

marked with value and

data creation

In the vast majority of cases, it is expected that researchers will resolve this issue by making use of one or more research data repository services. The costs of ongoing publication and preservation then become the responsibility of that service. Subject-based, national

variable labels)

and institutional data repositories exist.

Data Cleaning (e.g.

Can be moderate cost

ensuring only relevant

if not carried out as

Researchers should be aware that some data

data is present or only

part of data creation

repository services charge the data depositor (this

controlled terms are

covers the cost of adding new data to the repository).

used)

Deposit should be made before a project ends and this

Documentation (e.g. of

Can be moderate to

charge should be included within your funding

methodology, analysis

high cost if not carried

application.

and quality control

out as part of data

procedures)

creation

For more information on selecting a data repository see the Data Service Repository help page.6

6

http://data.bris.ac.uk/research/publish/repositories/ 5

Activities in this phase can include:

you include deposit charges in your funding application

Activity

Anticipated cost

Data sharing

Low or no cost is using a

Please contact the Research Data Service for more information [email protected]

data repository. Otherwise a significant ongoing cost (usually extending beyond project lifespan) Repository

Low cost, often done at

or discipline specific

dataset level, as part of

metadata (e.g. to

deposit process

INSPIRE or DDI standards)

SUMMARY Consider whether your project would benefit from the assistance of a Data Manager 

Establish whether any additional costs will be involved in the creation of your data which are not already covered elsewhere in your funding application



Try to organise and document your data as you go along to avoid the need for any staffing costs associated with cleaning data at the end of the project



Identify whether you will require any additional software, computing or storage solutions, and speak to the relevant University departments so quotes can be included in your costing



Always consider the potential costs associated with sharing and preserving your data, and use a repository or data centre where possible. Ensure

6

Anticipating the costs of research data management.pdf

Anticipating the costs of research data management.pdf. Anticipating the costs of research data management.pdf. Open. Extract. Open with. Sign In. Main menu.

522KB Sizes 1 Downloads 237 Views

Recommend Documents

Bounding the costs of quantum simulation of ... - Research at Google
Jun 29, 2017 - 50 305301. (http://iopscience.iop.org/1751-8121/50/30/305301) ..... An illustration showing the three different dynamical systems considered in.

Bounding the costs of quantum simulation of ... - Research at Google
Jun 29, 2017 - and Alán Aspuru-Guzik1. 1 Department of Chemistry and Chemical Biology, Harvard University, .... Let S = [0, L]ηD and let 1Pj : j = 1, ... , bηDl be a set of hypercubes that comprise a uniform .... be common and show below that this

Anticipating Critical Transitions.pdf
Tectonics Not specified Autocorrelation/. spatial correlation. + (43). Climate Critical slowing down Autocorrelation at lag 1 + (23, 44, 45). 0 (44, 46). Detrended ...

The Costs of Victory.pdf
Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying... Download. Connect more ...

Measuring the Costs of Crime_Mark Kleiman_Jonathan P ...
Page 3 of 4. Measuring the Costs of Crime_Mark Kleiman_Jonathan P Caulkins_Peter Gehred_for DOJ-NIJ_April 2014.pdf. Measuring the Costs of Crime_Mark ...

Organizing in the knowledge age: Anticipating the ...
I Academy of Management Executive, 1997 Vol. 11 No. 4 .... invest money, people, and systems in related markets—so-called diversification know-how.

The Costs of Corporate Welfare - Commonwealth Foundation
Government favoritism stunts economic growth, misallocates resources, and leads to higher tax bills. ... Alternative Energy Production Tax Credit. $2,000. $0. Total ... 4 Pennsylvania Office of the Budget, “2015-2016 Executive Budget,”.

The Hidden Costs of Control
way HP does business” (David Packard, 1995, p. .... To check whether the introduction of wages has an impact on the hidden costs of control, we implemented.

The Costs of Corporate Welfare - Commonwealth Foundation
Over the last four and a half decades, government spending in the Keystone State has .... years, and as a result, the state's Racing Fund faces a shortfall.9 Lawmakers ..... 2011 paper, two professors at the College of the Holy Cross reviewed the ...

Organizing in the knowledge age: Anticipating the ...
A number of leading companies today are experimenting with a new way of organizing—fhe cellular ..... software companies of every size. The choices firms face ...

Post-market drug surveillance sans trial costs ... - Research at Google
note that all of these drugs are usually taken for long periods of time (however, we have also demonstrated the applicability ... Reports were mapped to the same list of symptoms as QLRS, using the same synonym list. ..... social media. Finally ...

Post-market drug surveillance sans trial costs - Research at Google
queries submitted to the Yahoo U.S. Web search engine during 6 months in 2010. A total of 176 .... AERS data was downloaded from the FDA AERS website, and included reports submitted between. January 2004 and ..... social media. Finally ...

AUTOMATIC OPTIMIZATION OF DATA ... - Research at Google
matched training speech corpus to better match target domain utterances. This paper addresses the problem of determining the distribution of perturbation levels ...

Characterizing the Errors of Data-Driven ... - Research at Google
they are easily ported to any domain or language in .... inference algorithm that searches over all possible ... the name of the freely available implementation.1.

Simulation and Research on Data Fusion Algorithm of the Wireless ...
Nov 27, 2009 - The Wireless Sensor Network technology has been used widely; however the limited energy resource is one of the bottlenecks for its ...

The Unreasonable Effectiveness of Data - Research at Google
Feb 5, 2010 - Contact Editor: Brian Brannon, [email protected] such as f = ma or e ... comes to natural language processing and related fields, we're ...

Overcoming the Lack of Parallel Data in ... - Research at Google
compression is making use of rich feature rep- ..... As an illustration to the procedure, consider the .... 6Recall from the beginning of the section that for the full.

On the Complexity of Non-Projective Data ... - Research at Google
teger linear programming (Riedel and Clarke, 2006) .... gins by selecting the single best incoming depen- dency edge for each node j. ... As a side note, the k-best argmax problem for di- ...... of research is to investigate classes of non-projective