Open Data and Privacy Concerns in Biomedical Research

Privacy has long been the focus of debates about how to use and disseminate data taken from human subjects during clinical research. The increasing push to share data freely and openly within biomedicine poses a challenge to the idea of private individual information, whose dissemination patients and researchers can control and monitor.

In order to address this challenge, however, it is not enough to think about (or simply re-think) the meaning of ‘informed consent’ procedures. Rather, addressing privacy concerns in biomedical research today, and the ways in which the Open Data movement might transform how we think about the privacy of patients, involves understanding the ways in which data are disseminated and used to generate new results. In other words, one needs to study how biomedical researchers confront the challenges of making data intelligible and useful for future research.

Efficient data re-use comes from what the Royal Society calls ‘intelligent openness’ – the development of standards for data dissemination which make data both intelligible and assessable. Data are intelligible when they can be used as evidence for one or more claims, thus helping scientists to advance existing knowledge. Data are assessable when scientists can evaluate their quality and reliability as evidence, usually on the basis of their format, visualisation and extra information (metadata) also available in databases.

Yet the resources and regulatory apparatus for securing proper curation of data, and so their adequate dissemination and re-use, are far from being in place. Making data intelligible and assessable requires labour, infrastructures and funding, as well as substantial changes to the institutional structures surrounding scientific research. While the funding to build reliable and stable biomedical databases and Open Data Repositories is increasing, there is no appropriate business model to support the long-term sustainability of these structures, with national funders, industry, universities and publishing houses struggling to agree on their respective responsibilities in supporting data sharing.

Several other factors are important. For instance, the free dissemination of data is not yet welcomed by the majority of researchers, who do not have the time or resources for sharing their data, are not rewarded for doing so and who often fear that premature data-sharing will damage their competitive advantage over other research groups. There are intellectual property concerns too, especially when funding for research comes from industry or specific parts of government such as defence. Further, there are few clear standards for what counts as evidence in different research contexts and across different geographical locations. And more work needs to be done on how to relate datasets collected at different times and with different technologies.

The social sciences and humanities have an important role to help scientific institutions and funders develop policies and infrastructures for the evaluation of data-sharing practices, particularly the collaborative activities that fuel data-intensive research methods. An improved understanding of how data can be made available so as to maximise their usefulness for future research can also help tackle privacy concerns relating to sensitive data about individuals.

When it comes to sharing medical records, it is now generally agreed that providing ‘informed consent’ to individual patients is simply not possible, as neither patients not researchers themselves can predict how the data could be used in the future. Even the promise of anonymity is failing, as new statistical and computational methods make it possible to retrieve the identity of individuals from large, aggregated datasets, as shown by genome-wide association studies.

A more effective approach is the development of ‘safe havens’: data repositories which would give access to data only to researchers with appropriate credentials. This could potentially safeguard data from misuse, without hampering researchers’ ability to extract new knowledge from them. Whether this solution succeeds ultimately depends on the ability of researchers to work with data providers, including patients, to establish how data travel online, how they are best re-used and how data sharing is likely to affect, and hopefully improve, future medicine. This work is very important, and should be supported and rewarded by universities, research councils and other science funders as an integral part of the research process.

To learn more, read the report ‘Making Data Accessible to All’