Simon Hodson introduced the workshop with an overview of the history of JISC’s involvement in supporting research data management in UK institutions, and described some of the projects currently underway as part of the Managing Reseach Data programme.
Simon handed over to Lee-Ann Coleman, who set the scene for the day’s theme with a discussion of the role of data within the research lifecycle. She gave a number of examples where the open sharing of data has been essential to the research process, including the 2006 case of Italian veterinary scientist Ilaria Capua, whose attempts to share her H1N1 avian flu data with other research groups provided an early insight into the obstacles to open sharing of data faced by many researchers.
Lee-Ann went on to discuss a 2009 study carried out by The British Library and RIN, which investigated patterns of information use in a number of groups working in different research areas (all within the life sciences). As well as illustrating the complex nature of information use and dissemination, the study showed the very different patterns of use seen in each discipline – an issue that is particularly challenging for those charged with managing the research outputs from multiple groups within an institution.
This discussion generated an interesting, and fundamental, question from the floor: ‘what do we mean when we talk about “data”, and does it mean different things to different people?’ Clearly, there is no easy answer to this question, and it is likely that individual institutions will have to arrive at a definition that suits their purposes. The group concluded that any definition shouldn’t be unnecessarily constraining, but it is also necessary, for the purposes of effective management, to have some form of working definition.
Elizabeth Newbold followed up with an introduction to DataCite, including an overview the aims of the service, and an introduction to the role that the British Library plays. DataCite aims to provide a solution to some key data management challenges: the absence of widely-used methods for the identification and citation of data, and the lack of an effective means of securely linking publications to their associated data.
Digital Object Identifiers (DOIs) were introduced as the identifier-of-choice for DataCite. The DOI system has the advantage of being globally recognised and is already in widespread use for the identification of electronically published scholarly articles. It has also recently (as of May 2012) been awarded ISO standard status (ISO 26324:2012 http://www.iso.org/iso/catalogue_detail?csnumber=43506). The British Library is one of 16 DataCite members around the world and functions as an “allocating agent”: that is, UK (and occasionally non-UK) repositories can register with the BL to have DOIs assigned to their data.
Case study: UK Data Archive
Louise Corti of the UK Data Archive (UKDA) gave us an insight into the day-to-day work of a data centre. She returned to the earlier “what is data” question and explained that, for the UKDA’s purposes, any (digital) output from a grant-funded research project is considered data (see http://www.data-archive.ac.uk/help/user-faq#4). She also discussed the UKDA’s approach to managing changes to data collections: an alteration to a dataset will result in a new ‘instance’ of that dataset being issued. If the change is significant, the new instance may be assigned a new DOI. All previous versions are retained. Louise ended her presentation by mentioning two unresolved challenges that are currently of concern to the UKDA: how to cite partial datasets, and how to create relationships between different digital objects.
DataCite: Technical Infrastructure
Ed Zukowski gave an overview of the technology underpinning DataCite services and described the process of registering a dataset with the service; from minting DOIs to attaching metadata using the DataCite schema.
He also introduced the DataCite test site, a new feature that is available for users who wish to experiment with the new DataCite 2.3 schema in a self-contained environment (if you would like to register for the test site, please contact us at email@example.com).
Ed’s talk generated many questions from the room; particularly relating to the allocation of DOIs. It was noted that DataCite does not ‘curate’ the data to which it allocates DOIs: the only requirement is that the data must be registered via an approved data centre. There was also a query about de-registering DOIs; something that is not permitted by the International DOI Foundation. It is possible to remove the dataset to which a DOI refers and update the landing page to which the DOI resolves to reflect this.
Responsibilities of DataCite ‘clients’
Elizabeth gave a brief overview of the responsibilities of data ‘clients’ (the data centres or institutions who have registered with the British Library to receive DataCite DOIs). In summary, these are:
- To create a publicly available ‘landing page’ to which an individual DOI will resolve
- Apply mandatory minimum metadata according to DataCite metadata scheme [http://schema.datacite.org/]. Metadata must be freely available under Creative Commons Zero (CC0) license
- Implement quality control measures to ensure compliance with DataCite metadata scheme
- Commit to long-term data preservation
The presentations given at the workshop can be accessed here.