Biobank data leaked 198 times in past year
Confidential medical data held by UK Biobank has been leaked online at least 198 times in the past year.
The Biobank has issued almost 200 legal threats to researchers, urging them to take down the unlawful publication of the health data of thousands of British people, according to experts tracking the breaches.
The majority of the traceable leaks stem from the US or China .
On Thursday, Ian Murray, the technology minister, made an emergency statement revealing that half a million Britons who were part of the Biobank had their health data put up for sale on Alibaba , the Chinese website.
He said he could not “100 per cent” assure the public that outside actors accessing the data “could not identify someone from this, but that would be a very advanced way in which that data would have to be used”.
The genetic data, medical history and biological samples of the 503,000 volunteers were found for sale on one of the world’s largest online marketplaces. It has since been taken down, and officials do not believe the information was sold to anyone.
The UK Biobank is the world’s most comprehensive dataset of biological, health and lifestyle information and is used by researchers globally.
While it does not contain personal information such as names or addresses, it does include age and the month and year of birth.
Critics have called for restrictions on China’s ability to access the data , as well as improvements to security arrangements that have been described as “lax”.
Prof Luc Rocher, an associate professor at the University of Oxford’s Internet Institute, has compiled a list of every time the UK Biobank issued a legal request to have data taken down.
Almost 200 of these “copyright takedown requests” have been issued in less than a year.
Researchers are increasingly required to publish the code behind their work, which sometimes includes partial or entire Biobank datasets.
This occurs most commonly on GitHub, a popular online code-sharing platform, which allows other scientists to cross-check or build upon researchers’ workings.
Biobank data ‘quite easy to find’
Prof Rocher said all of these data ended up being published on this platform and “it’s quite easy still, at the moment, to find UK Biobank data that is there, uploaded, often by mistake, by researchers around the world who had received the files”.
There were many who had received the data legitimately and were contacted directly by UK Biobank to have it taken down, Prof Rocher said.
But the professor added that “there’s a number of people also who didn’t receive the data legally”.
“Students, who get it from their supervisor. Researchers, who get it from a friend, from a colleague. In this case, what typically UK Biobank does, as it cannot contact them because they don’t even know who these people are, is resort to a last option, a bazooka option, which is a copyright takedown notice,” Prof Rocher said.
Participants were between 40 and 69 years of age when they joined the study between 2006 and 2010.
The bank contains more than 15 million biological samples, which include data from urine, saliva and blood samples, which indicate organ function, disease risk and other biological information.
It also includes questionnaire information that participants have shared on habits such as sleep, diet, work environment, mental health and health outcomes.
Of the 198 cases that Prof Rocher has tracked, 75 have verified source locations.
The most common origin of the breaches was the US, where at least 24 recipients of such notices were based, followed by China , where there were 21.
There have also been four in Hong Kong, seven in the UK, five in Germany, as well as at least one in South Korea, Qatar and the United Arab Emirates, among the 14 countries listed.
The total number of breaches is likely to be higher given that less than half have been traced.
Prof Rocher said it was not always possible to know what was in the datasets being taken down but they often contained “genomic data, health data, hospital visits, job occupation and a lot of very sensitive data”.
The professor added: “It’s quite easy for people who don’t even know what the data is, don’t even know what UK Biobank is, to just come upon this data online, and I think that’s quite a big issue. What is frustrating here is that we’ve known this for a while.”
Upgrades to security had been inadequate, and a “simple” fix would be to stop people from downloading the whole file.
Three Chinese research institutions were identified as the source of the breach that had resulted in the data being advertised on Alibaba. They have now had their access revoked.
Dame Chi Onwurah, the chairman of the Government’s science, innovation and technology committee, said it “raises serious questions about whether lessons have been learned from repeated data breaches and leaks, and whether robust data management practices are being enforced at publicly funded bodies”.
Dr Nicola Byrne, the Government’s national data guardian, said she was “profoundly concerned to learn that the confidential data participants entrusted to UK Biobank in good faith has been found available for sale online”.
“Participants deserve clear answers from UK Biobank about what happened, why it happened and what will change to prevent this happening again,” she said.
The charity has referred itself to the Information Commissioner’s Office because of the breach.
An investigation by The Guardian was previously able to “re-identify” a Biobank participant by using their month and year of birth and details of a major surgery.
In response to the breach, Prof Sir Rory Collins, the chief executive of UK Biobank, said: “We apologise to our participants for the concern this will cause. We take the protection of your data extremely seriously.”
He added: “Even though we only ever share de-identified data and have no evidence of any of you being identified unwillingly, we don’t want any use by anyone who has not been approved for access.
“We are sorry that this incident has occurred and hope you are reassured by the swift and decisive action we have taken.”
