The quest for a Better, Broader, Safer use of Health Data for Research and Analysis

Commissioned in February 2021 by the Secretary of State for Health and Social Care, and led by Professor Ben Goldacre [1], The Goldacre Review has released a report on 7 April 2022 which gathers findings and recommendations on how to ensure efficient and safe use of NHS data for research and analysis for the benefit of patients and the healthcare sector (the "Report")[2].

The launch of the Goldacre Review was closely followed, in June 2021, by the publication from the Department of Health and Social Care ("DHSC") of its draft policy paper titled 'Data Saves Lives, reshaping health and social care with data' later updated in February 2022, (the "Health and Social Care Data Strategy"). The Goldacre Review and the Health and Social Care Data Strategy are meant to complement each other. The DHSC has now published on 15 June 2022 a final version of the Health and Social Care Data Strategy: "Data saves lives: reshaping health and social care" [3] incorporating its response to the Report.

The Report proposes a new framework for data sharing across the NHS and makes 185 recommendations aimed at encouraging the safe and efficient use of data in academic and clinical research environments.

We have chosen to focus on two specific data protection considerations in the Report:

  • Section 1 looks at the structural challenges to data sharing for Research using NHS data: and
  • Section 2 highlights the privacy concerns and remedies to the use of NHS data in the healthcare sector.
Section 1: Structural Challenges To Data Sharing For Research Using Nhs Data

The collective NHS databases hold 73 years' worth of health data from an ethnically diverse population. If utilised appropriately, this data could be a rich source of material for clinical and academic research.

The Report reminds us that data was one of the main drivers of the global response to the COVID-19 pandemic, and can again be instrumental to relieve the post-pandemic backlog. Access to data in the life sciences sector allows for innovation in both medications and medical technology.

The NHS is not alone in recognising the potential of its data. Following a period of intense public focus on public health, the EU is also moving to harness the potential of its public and private sector data [4] to improve "health, the environment, energy, agriculture, mobility, finance, manufacturing, public administration, and skills", alongside development and testing of artificial intelligence.

On 28 June 2022, the Regulation of the European Parliament and of the Council of 30 May 2022 on European data governance and amending Regulation (EU) 2018/1724 (Data Governance Act) ((EU) 2022/868) will enter into force. The Data Governance Act ("DGA") aims to increase trust in data sharing, and facilitate the reuse of certain data held by the public sector. To that end, the EU promotes the use of secure processing environments and anonymisation techniques such as differential privacy and the creation of synthetic data. The DGA will create common European data spaces in strategic domains such as health but also in other sectors. The new rules shall apply 15 months after it comes into force (from 24 September 2023). 

A. Previous attempts to pool research data

The proposals set out in the Report will not to be the first attempt to pool NHS data for clinical research. Both the 2013 care data programme and the GP Data for Planning and Research (GPDPR) sought to centralise NHS data, however both faced significant numbers of people opting out of their records being shared.

The Report attempts to alleviate data protection concerns by limiting access to the data pool using Trusted Research Environments ("TREs") and notes that the GPDPR dataset will now be accessible only via a TRE. According to NHS Digital's notice "The data is accessed in a secure location rather than being downloaded and is de-identified to make sure that patients cannot be identified. The use of the data is tracked and no data can leave the secure environment, providing greater assurance that sensitive data is handled securely." [5]

Crucially, NHS Digital has not yet announced a set start date for the launch of the data collection as it aims to reach a consensus with all the stakeholders engaged in the discussions and reflect on those to make all changes required to ensure that the relaunch of GPDPR is a success.

B. Key concerns in relation to data sharing

The Report identifies five main challenges to the sharing of health data for research purposes, as well as setting out proposals for overcoming these challenges:

1. Security and data privacy concerns are of key concern. The Report proposes implementing TREs which will allow researchers to access complete patient data sets on which they can conduct comprehensive analysis in a secure environment. This proposal is looked at in more detail in section 3 below.

2. Preservation of monopolies over access to data from such individuals, teams or organisations is identified as a second challenge. The Report proposes that open professional discussions which lead to resourcing choices, and recognition which rewards those who collect data and then share it with a wide range of other users can help to alleviate this challenge.

3. The third challenge identified is a concern from some professionals that patient records will be used for performance management purposes, which is not only a deviation of its primary use but also is not conducive to effective feedback for quality improvement and governance. The Report proposes robust governance which aims to minimise the use of misleading performance metrics whilst highlighting the benefits of positive audit and feedback.

4. The unmanageably large number of data controllers whose permission is needed for conducting research is a further barrier identified in the Report. Patient data is currently held and managed by roughly 6,500 GP practices and 160 NHS Trusts, each acting as a data controller for its own data. The Report suggests two possible solutions, either the formation of a single national body capable of acting as data controller over all NHS patient records, or the creation of an "approvals pool" in which GP practices and NHS Trusts nominate a single body to review and approve data access requests on their behalf. Both approaches would benefit from greatly improved economies of scale and help to reduce the high governance burden currently incurred in duplicating this work within each NHS Trust or GP practice.

5. Concerns over the ethics and commercialisation of NHS patient's data is identified as an additional challenge. The recommendation in the Report to counter this challenge is threefold; (i) to use TRes which can provide assurance and transparency around the quality and reproducibility of commercial analyses. (ii) to seek consensus from the public when it comes to sharing data with commercial innovators, and (iii) to avoid exclusive arrangements between the NHS and the private sector and negotiate instead equity in innovations where NHS data is pivotal to development.

Section 2: Privacy Concerns And Remedies In Relation To The Use Of Nhs Data

The Report highlights as a key challenge the vulnerability associated with the use of pseudonymised data in the healthcare sector. It also highlights the difficulty raised by the applicable of the data minimisation principle in the context of clinical research and promotes use of TREs for research using health data as they allow safe access to data remotely for research purposes.

A. The vulnerabilities of pseudonymised data in the healthcare sector

Pseudonymisation is commonly used in the healthcare sector as a practical method of protecting patient privacy, and the Report recognises the value of data pseudonymisation. Data is pseudonymised when direct identifiers such as the name, address or date of birth of the data subject have been removed so that data subjects cannot be identified without the use of additional information.

Pseudonymisation is especially useful, as the Report acknowledges, in preventing accidental viewing of data (such as where information about a friend or family member is spotted while working with the data). However, due to the particularly personal nature of health data, pseudonymisation does not entirely remove the risk of re-identification, especially for bad actors with a personal connection to the data subject. The risk of re-identification in pseudonymised data sets also increases as the data set grows, posing particular risks for large pools of NHS data.

The Report also highlights the importance of accurately categorising data, and ragues that the existing definitions of "anonymous", "identifiable", and "linked" data do not sufficiently reflect the most common categories of data. The Report proposes the introduction of a further catgory of data, that of "pseudonymised but readily re-identifiable" data. The Report suggests that "Pseudonymised but readily available data" more accurately reflects the contextual risk such data carries, including the risk of re-identification by bad actors, and helps to avoid overstating the privacy protections provided by removing only direct identifiers.

The Report explains that the accurate categorisation of data, and the introduction of more nuanced and descriptive categories of data will allow informed choices to be made on a risk based approach and will help to earn trust. The Report also explains that TREs, discussed in more detail below, are the best mechanism to address the re-identification risks inherent in pseudonymised data, due to the privacy and productivity benefits that using a TRE can offer.

B. Promoting Secure Data Environments for Health Research

Data minimisation is a key principle of the UK GDPR, however it may pose a particular difficulty for clinical research. To comply with the data minimisation principle, a researcher should only have access to the personal data which is strictly necessary for the reserach being conducted. However, as the Report exlpains, it isn't always clear what data is strictly necessary for any given research project. To take an example from the Report, a researcher looking into the history of rheumatoid arthritis would need more than just arthritis specific health data since a number of other medical problems can arise from, interact with or even be confused with rheumatoid arthritis.

To address this difficulty, the Report refers to the concept of TRE which can be defined as a secure computing environment that holds data and enables access by researchers to the data they need for ethically and legally approved research projects. As it is a relatively new way of managing data, there is not yet any established definition. However, the Report refers to:

"a secure environment that researchers enter in order to work on the data remotely, rather than downloading it onto their own local machine. Users can extract and download the answers from their analysis - such as results tables, or graphs - but individual patients' data always stays within the secure environment."

TREs allow researchers access to complete patient data sets on which they can conduct comprehensive analysis in a secure environment. Because the data cannot be removed or downloaded the risks of researchers identifying data subjects of the pseudonymised data is very low. Crucially, the principle of data minimisation is also met by limiting researchers' access to the data, rather than limiting the data itself.

TREs have already played a role in the UK's response to the COVID-19 pandemic, proving their value in leveraging trustworthy data access ecosystems for health research. The Report sees an important role for TRE's in unleashing access to NHS data whilst simultaneously protecting privacy and fostering public trust.


Increased transparency, the use of Trusted Research Environments and a more nuanced concept of pseudonymisation in the healthcare sector all offer an opportunity to overcome data protection concerns and harness the power of NHS data. The Report recognises that no one of these steps alone will be sufficient to achieve the goal at hand, and that a concerted effort across each of these areas is necessary to enhance the NHS's data capabilities and to deliver better, broader, safer use of NHS data.

The recommendations set out in the Report, and the newly published Health and Social Cara Data Strategy will help to set the direction for the use of data in a post-pandemic healthcare system with a view to "delivering faster and more innovative treatment and diagnosis".

With thanks to Jack Firman, associate and Ben Green, trainee solicitor for their contribution to the article.