Whereas in previous DataCite workshops we looked in detail at some of the key challenges of data citation, the fifth workshop took a more holistic view at the implementation of DOIs and the practical measures that repositories can take to overcome some of the technical, financial and cultural barriers to the adoption of DOIs for research data.
The workshop provided an opportunity for institutions that were considering adopting DOIs for their data to find out what is expected of them and to put their questions directly to current DataCite clients.
Working with the British Library and DataCite
Elizabeth Newbold from the British Library began with a brief overview of DataCite’s role and structure and a reminder of its central aim: to make data easier to find, access and cite through the application of Digital Object Identifiers (DOIs).
The British Library is the UK member of DataCite and is an “Allocating Agent” for DataCite DOIs. UK-based organisations that wish to obtain DOIs for their data can enter into a contractual agreement with the British Library to gain access to the DataCite system.
The remainder of Elizabeth’s presentation focused on the expectations that a British Library ‘data client’ should fulfil. In summary, any organisation that wishes to mint DataCite DOIs through the British Library should:
- Ensure data persistence
In other words, any data that are registered with DataCite must be managed and curated such that they remain persistently accessible and usable (as appropriate). This can be demonstrated by, for example, having formalised data management plans and policies which demonstrate a commitment to the long-term maintenance of the data in question.
- Maintain open resolution targets (a.k.a. landing pages) for all registered datasets
The landing page is the webpage to which users are directed when they resolve or click on a DOI. It must therefore be publicly accessible (“open”) and contain up-to-date information about the dataset and how it can be accessed.
- Provide mandatory metadata for registered datasets
DataCite requires that certain, basic metadata is provided for all registered datasets. The five mandatory metadata properties are: Identifier (must be a DOI), Title, Creator, Publisher and Publication Year. The DataCite Schema (v.2.2) also offers a further 12 optional properties that clients are encouraged to use.
The five minimal properties compose the DataCite recommended citation format: Creator (Publication Year): Title. Publisher. Identifier.
- Create suitably formatted DOIs
All DataCite clients are provided with a unique DOI prefix but can define the suffixes to meet their own requirements, although they should adhere to a few basic rules: only characters a-z, 0-9 and /. -_ should be used and it should be no longer than 255 characters.
Next, the basics of the DataCite technical infrastructure were introduced. At the core of the DataCite service is the Metadata Store (MDS), which holds the metadata associated with data for which a DOI has been minted. The DOI system is based on the Handle infrastructure, and DOIs minted in the MDS are passed to the Handle server (maintained by CNRI) through which they can be resolved in the Global Handle Registry.
A practical demonstration of DOI-minting using the DataCite Metadata Store followed – a walk-through of the whole process can be seen in these videos.
Preparing the repository for DOIs (or building one from scratch)
Tom Parsons, from the ADMIRe project, presented the ongoing work on the development of the University of Nottingham’s data management infrastructure.
The University’s (approved but not yet released) RDM Policy states that “The University will provide mechanisms and services for storage, backup, registration, deposit, retention and preservation of research data assets in support of current and future access, during and after completion of research projects.” Part of the remit of the ADMIRe project is to evaluate current data management practices and identify any additional provisions that are necessary to meet this commitment.
To understand the needs of researchers and other stakeholders, a range of approaches were taken, including a survey of University research staff (total of 366 respondents), focus groups and interviews.
Some of the key results from the survey were:
- Data exists in a wide range of formats, including physical (e.g. lab notebooks). Many of these cannot be accommodated in the repository at present.
- On average an individual researcher stores data in six separate locations.
- Only 24% said they recorded metadata for their data
The full survey results can be accessed here.
Tom then addressed the question of why Nottingham has chosen DOIs as identifiers for their data. The three key deciding factors were:
- EPSRC requires use of “a robust digital object identifier” (although not necessarily DataCite DOIs)
- The British Library is a trustworthy institution
- DOIs are already familiar to researchers
ADMIRe are currently developing their metadata requirements and have arrived at a nine-item list of elements which was based on the DataCite minimum metadata element set. The nine fields are:
- Creator (the main researchers involved in creating the data)
- Publisher (The University of Nottingham)
- Publication Year (the date when it was or will be made publicly available)
- Identifier (DOI provided by DataCite or other internal identifier system if private)
- Subject (keywords to help with search)
- Research grant code (if applicable and linked to Agresso)
- Location (for files outside the system)
- Link to paper (DOI)
At present, the project is evaluating this minimal set and considering whether additional fields are necessary to meet requirements.
Tom concluded by summarising the current status of RDM at the University of Nottingham:
- Good understanding of requirements
- RDM services need to accommodate the range of repositories already utilised by the University: ePrints, EQUELLA, DigiTool and in-house repository
- A SharePoint derivative has been selected to meet requirements (and be compatible with existing Microsoft systems)
- Have not yet minted a DOI!
Engaging with Researchers at the University of Exeter, Gareth Cole, Open Exeter
A major component of the Open Exeter project has engaging with stakeholders within the University of Exeter to raise awareness of RDM and undertake training activities.
Gareth emphasised that engagement should be happening throughout the entire RDM lifecycle – not just at the policy and advocacy stages.
Firstly: why engage? There are many reasons:
- To better understand stakeholder requirements
- Awareness raising
- Communicate in advance of new policies etc
- Feedback from researchers and support staff to help build policies
- Embed RDM training into existing courses
Who to engage with:
- (Just about) everyone!
- All level of seniority
- Academic and professional services staff
How Open Exeter have been engaging with researchers and other stakeholders
A multi-faceted approach has been adopted, including surveys, interviews and focus groups, talks and presentations for targeted audience and existing staff meetings and workshops.
Gareth concluded by recounting some of the approaches that they have found to be most effective:
- Working within existing infrastructures to engage
- Targeting the audience (avoid ‘general’ communications)
- Make use of contacts who already have relationship with your target audience (e.g. people will respond better to message from Head of Department than from unknown member of Library staff)
The day concluded with a lively Q&A session where attendees put their questions about how to work with DataCite to current clients and the BL DataCite team.