Our latest, and final, workshop in this series looked at what is possibly the most challenging piece of the data-sharing puzzle to put into place – developing meaningful metrics to gauge impact and encourage proper attribution.
The workshop began with a look at impact and metrics at the wider level: how do we measure the value of Research Data Management itself?
Neil’s work has taken a holistic approach to this question, with both quantitative and qualitative measures used to give a picture of the real worth of research data management activities to UK HE and to wider society.
The first question to ask is why consider impact at all? At a very basic level: it serves to make your boss/funder/institution look good (although Neil stressed that this was not necessarily his own view!). It is also useful as a means of raising awareness of the full range of benefits of a service, as most individual service users will only experience a few of those benefits.
Neil also raised the fundamental difficulty with any attempt to evaluate impact : “Not everything that has value can be measured”. But metrics can serve as proxies for outcomes or as indicators of relative value.
There are two key constraints on our ability to measure impact. These are timescale (it is more challenging to assess impact over a short period) and number of variables involved – thus measuring the impact of some activity upon society (many variables) is much more difficult that on a single institution (fewer variables).
Neil then introduced the Keeping Research Data Safe project – a cost-benefit analysis of data preservation and re-use in UK Higher education. The project resulted in the Benefits Analysis Toolkit, a range of tools to help organisations evaluate the value of RDM activities.
The Toolkit has been used to evaluate the economic impact of the Economic and Social Data Service and similar reports on the Arachaeology Data Service and British Atmospheric Data Centre are in progress.
The full range of Neil’s consulting work is available from the Charles Beagrie website.
Nigel began with a review of the current research data landscape, including the drivers and barriers to widespread data sharing. Some of the major factors preventing researchers sharing their data – lack of recognition, unclear citation standards and difficulty finding and accessing data – presented an opportunity for Thomson Reuters. The Data Citation Index (DCI) was initiated to address some of these issues.
To populate the Index, Thomson Reuters are working with established data repositories. These repositories are selected based on the nature of the data held (is it of interest to the research community), the quality and persistence of the data, and the level of descriptive information (metadata) available. There should also be some evidence that the data is used/cited in the relevant published literature. At present, 70 repositories are included in the service and a further 700 have been identified for indexing. See the essay on selection policy for more details.
The DCI takes a feed of descriptive metadata from the repository and analyses it to ensure it is ‘clean’ and conforms to the DCI’s own standards. Where metadata is inadequate, they may work with the repository to improve it. Content can be indexed at a variety of levels – repository, data study, data set and (possibly in future) microcitation.
The Index is still in an early stage of development, with the initial focus being on making research data more discoverable and accessible. For it to become a useful tool for measuring data use and citation there are still a number of challenges to be overcome. At the top of this list is metadata: levels of availability, quality and curation vary widely from repository to repository.
The dynamic nature of data repositories also poses a significant challenge. How the DCI deal with versions and updates to data is yet to be fully established as, again, practices vary by repository. One way to manage these issues is to maintain close partnerships with the repositories themselves.
Future plans for the Data Citation Index include:
- Expansion of content – there are currently over 2 million records in the index but they are aiming for 3 million plus by the end of 2014.
- Increased ability to track citations
- Incorporate data journals
The next two presentations looked at the role that DataCite DOIs can play in tracking usage and citation.
ODIN: ORCID and DataCite Interoperability Network, John Kaye, British Library
The British Library is one of the partners in the two-year ODIN project which seeks to connect researchers with their outputs by improved interoperability between identifier systems.
ORCID already offers researchers the ability to link their personal identifier with their publication record by adding articles to their profile. ODIN seeks to build upon this by developing a tool to enable import of DataCite records into ORCID. It is currently available in beta at http://datacite.labs.orcid-eu.org/
The British Library’s main contribution to the ODIN project is to establish proof of concept for ODIN in the humanities and social sciences. A demo tool which allows users to explore the connections between authors and data outputs for a small set of data from the Centre for Longitudal studies and the Medical Research Council is available at http://odin-discover.eu/
A similar proof of concept is being conducted in the high energy physics (HEP) community by CERN. A major problem in HEP is that data often has many (sometimes hundreds of) co-creators, making attribution complicated.
The full proof of concept reports will be available in Summer 2013.
DataCite Statistics, Elizabeth Newbold, British Library (no slides available)
Elizabeth demonstrated the DataCite Statistics tool which is currently in beta and offers DOI resolution statistics for all content registered in the Metadata Store. This can be viewed by data centre or by allocator (e.g. the British Library is an allocator). A list of top DOIs by number of monthly resolutions is also published.
Current limitations on the tool are that the DOI “failures” recorded include all non-resolutions, even those resulting from a mis-typed DOI entered into the resolver. What the number of resolutions actually means is also not entirely clear. A question for DataCite clients and users: how useful is this information?
Finally, Paul Needham from CranfieldUniversity was due to present a talk about the IRUS-UK: Institutional Repository Usage Statistics tool, but was unfortunately ill on the day. His slides can be found here. I’d also recommend that, if you are interested in finding out more about IRUS-UK, you watch one of the recorded webinars which are available on their site.
All presentations from this, and all previous, workshops are available here.