Workshop report: Managing and citing sensitive data

Restricted data stampThis workshop, held at the British Library on October 29th, focused on the challenges of making data that is ethically, legally or commercially sensitive available and citable.

More than 30 participants attended the event, which heard contributions from data repositories that deal with such data daily and RDM researchers who are working to improve the way we share and cite sensitive data.

Veerle Van den Eynden (UK Data Archive) opened the workshop with a useful overview of the legal and ethical challenges faced by data repositories and the UKDA’s approach to managing sensitive data.

Legal concerns for data repositories

The Data Protection Act is likely to be a major concern for repositories – such as the UKDA – which store research relating to human subjects. Veerle emphasised that the DPA should not inhibit ethically conducted research and that the act no longer applies if data is effectively anonymised.

Research data held by public sector organisation may also be subject to Freedom of Information requests. Data containing personal information is excepted from FOI requests, as is any information held under a confidentiality agreement. If there is a significant risk that release of data would lead to legal action against the data archive, then it should not be released under FOI.

Veerle stressed that the responsibility for ethical data management is a shared one, with the repository/archive, the data creator and the end-user all playing a part.

The role of the repository

The key functions of the repository/archive are to enable and support the ethical re-use of data and to maximise accessibility. Fundamentally, the role of a repository is to provide trust in the data sharing process.

The UKDA takes a variety of approaches to managing sensitive data: from anonymisation of personal information, restricting access to approved users, applying embargo periods or, in some cases, storing data in a data “enclave”, with very limited access and strict controls on re-use (see Secure Data Service).

Even when access to UKDA data is restricted, the metadata and documentation is openly accessible so that researchers can find out what is available and how they can gain access it. Information on rights, access and re-use conditions is contained in the metadata.

The role of the researcher

The ethical implications of research should be considered from the beginning of a research project and reviewed throughout. Researchers should first consider whether it is necessary to collect personal data and, if so, should anonymise the data or seek informed consent from participants to use their personal data.

The data repository has a duty to check that the depositor has performed the necessary precautions and also to agree upon the appropriate access conditions for the dataset.

Data licences

The Licence Agreement which depositors are asked to sign sets out the obligations of both the depositor and archive. The UKDA commits to making the data available only under agreed access and usage conditions and the data owner is asked to guarantee that the material does not breach privacy or data protection laws.

At the other end of the data sharing process, data users are required to sign a legally-binding End User Licence that specifies the conditions under which the data can be used. Respecting the confidentiality of data subjects is paramount, with users required to “preserve at all times the confidentiality of information pertaining to individuals and/or households in the data collections where the information is not in the public domain”. The licence also restricts the re-use of data for commercial purposes unless further permissions have been sought.

Certain particularly sensitive data held by the UKDA is subject to additional Special Licence conditions or access restrictions to ‘Approved Researchers’ only.

Note: The UKDA data licenses may be re-purposed by other archives/repositories with appropriate attribution to the University of Essex as copyright owner.  Users should note the the following statement from the UKDA Director:

“The licence is provided ‘as is’, without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the authors or copyright holders be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the licence or the use or other dealings in the licence.”

Managing biomedical and personal data

Next up was Jonathan Tedds (University of Leicester) who gave an insight into the management of biomedical and personal data from his perspective as Project Lead on the BRISSkit project. Jonathan presented some results from a 2010 research data security survey undertaken at the University of Leicester. The survey of more that 3000 researchers highlighted the significant number of university research who hold potentially sensitive data. Leicester is trialling a system of using RDM costing as a trigger to alert them to potentially sensitive research data holdings: researchers are asked if their data may be sensitive or require additional security measures. If they answer ‘yes’ then an alert is sent to the IT Research Support team.

Testing of DMP Online was also undertaken, although the response rate from researchers was fairly low (highlighting the difficulty of engaging researchers with RDM issues). Some issues raised were that certain funders relevant to the researchers questioned were not listed, and it was unclear how the DMP questions fit with existing NHS requirements for medical research data.

Cathy Pink from the University of Bath then gave a very interesting talk on the work that the Research360 project is doing on the challenges of working with commercial research partners. Bath has a large number of collaborative and commercially sponsored research projects, so the ability to effectively manage data produced under such arrangements is particularly important.

What do we mean by ‘commercially sensitive’ data?

The data itself could have some commercial value (e.g. as evidence for patent application) or it may contain information about the company which, if released, could damage their business.  It could also be the commercial partner’s systems or processes that are commercially sensitive, for example a proprietary method of collecting data – even when the data itself is not ‘sensitive’.

RDM benefits for commercial partnerships

The Research360 project is looking at some of the benefits of RDM for commercial collaborations. For the university, having good RDM practices and infrastructure in place should attract high-quality partners and strengthen existing relationships. For commercial organisations, a link  with a university provides access to researchers and students who are experienced in RDM – skills that the company may not have in-house.  Such a partnership may be particularly appealing to SMEs, as it would provide access to a robust and secure data management infrastructure that may otherwise not be available to them.

Researcher perceptions of data sharing

A survey conducted at Bath showed that the majority of researchers questioned either didn’t want to or believed that they were not permitted to share their data. Commercially-funded researchers were found to be twice as likely to be unwilling/unable to share than those with other funding sources.

RDM policy and partner data

A key question for a university like Bath is whether it should store data produced by project partners. If the partner has access to it’s own repository (e.g. when partner is another HEI) then this is unnecessary, but where the partner is a commercial organisation without access to repository facilities, the answer is less clear. Cathy noted that the legal implications of storing partner data are not clear.

This has implications for Bath’s RDM policy, which must accommodate the diverse range of collaboration agreements for projects underway. There are also limitations for the policy, as it can’t require embedded students or contracted researchers to comply.

Citing commercially sensitive data

Cathy highlighted some outstanding questions for data centres that are seeking to make commercially sensitive data citable.

  • Persistent Identifiers: Is a DOI appropriate if data not immediately publicly? Should DOI minting be delayed until any embargo is lifted?
  • Publication date: It is not clear how this will be encapsulated in the metadata. Could be date of data deposit, date of data release or end of project. Do we need additional metadata fields to deal with this?
  • Embargoes: What do we do with “permanently” embargoed data? May want to make available in the very long term (for historical purposes).

Data management at STFC

Brian Matthews from the STFC’s Rutherford Appleton Laboratory took a researcher-centric view of the ethics of data sharing, with a look at the  cultural barriers to widespread data-sharing and the implications these have for developing RDM policies.

The challenge faced by institutions and repositories is summed-up by a conflict within the RCUK Common Principles on Data Policy, in which  the sharing of research data is identified as a ‘public good’, yet also acknowledge the need to observe legal, ethical or commercial constraints on data release. The major test for institutions is to develop Data Management Plans and RDM infrastructures that can accommodate this conflict.

STFC’s data repository is similar to a university repository in that data is both created and stored within the same facility. However, it can also be considered to be like a ‘subject repository’ as the data it collects if often discipline-specific, although there is no mandate to disseminate this data to others within discipline.  STFC repository policies therefore must be designed to accommodate the unique needs of the user community. Brian highlighted the example of the data policy for the ISIS Neutron Source, which has the following key features:

  • All (non commercial) raw data and md obtained as result of free access to ISIS is made publicly available.
  • Access to catalogue is open to anyone but requires registration.
  • 3 year embargo on data generated from ISIS experiments. As ISIS is the data custodian, data creators must makea special case if they desire a longer embargo period.
  • The terms of the policy apply to raw data. Data generated by further analysis may be subject to other contractual obligations.

Citing the data

At present, STFC issues DOIs on a per-experiment basis for experiments which are collecting raw data. In future, it may be necessary to apply finer granularity to data holdings.

DOIs are issued at the beginning of an experiment, as it is the experiment itself that STFC wish to identify. Issuing at an early stage also familiarises researchers with the DOI and encourages them to use it.

STFC provide a recommended citation format for data: [author], [date], [title], [publisher], [doi].

An issue for STFC, which has a policy of embargoing data while research is ongoing, is that DataCite minimal metadata has the potential to leak information before an embargo ends and could break STFC policy.  At present, they are dealing with this by issuing DOIs with accompanying very minimal metadata which is then updated when the project is more advanced.

Presentations from the workshop are available here.

One response to “Workshop report: Managing and citing sensitive data

  1. Pingback: International Conference on Digital Curation 2012 | British Library Data Citation

Leave a comment