Shutterstock / Palau
After six months of covering the merger of technology and medicine for Re/code, I’ve come to believe one thing very strongly: The next great insights into health and disease, and the resulting breakthroughs in diagnostics and treatments, are likely to emerge at the intersection of these disciplines.
I’ve become convinced because nearly every researcher I speak with drives home this point in words and deeds, with advanced medical research increasingly relying on genomic sequencing and other forms of big-data analysis.
But it also simply makes sense: These tools and techniques are allowing scientists to understand biology at a more basic and fundamental level than has ever been possible in the past. They’re steadily unlocking the programming code of life itself.
These approaches are especially promising for devising personalized cancer treatments, based on the specific mutations within a person’s particular tumor.
This realization has shifted my thinking on online privacy. I’ve been a frequent critic of the policies and blunders of various Internet players, and will always believe that we should be thoughtful and deliberate about how we manage personal data in this Information Age.
As I pointed out recently:
But in the context of health care, I’ve come to believe that we need good and specific reasons to cling to our data. The default should be to look for safe ways to share.
We can’t afford to mindlessly indulge our abstract fears about privacy, and generalized resentment of big-tech businesses, when there is so much to be gained for society.
Health data is, as one researcher put it to me, the “grist for the mill” — and as it is, far too much of it is locked away in paper filing cabinets of clinics, isolated by well-meaning but out-of-date laws, or jealously guarded by corporations.
This all came to mind late last week, when Google revealed plans to conduct a “Baseline Study” to “establish a basic understanding of a healthy physiology at this most fundamental level.”
The Mountain View, Calif., technology giant’s research division plans to begin with a small pilot program surveying 175 healthy people, then will collaborate with researchers at Duke and Stanford on a far broader study.
Participants will provide blood, saliva and other samples, and will undergo full genomic sequencing and other tests. Google will analyze the data using its sophisticated algorithms and powerful computer network.
“This could become a reference tool that could inspire even more research studies,” Google said in a press release. “And in the long run, we hope this could be a small contribution toward helping the medical profession find new, proactive ways to keep us healthy.”
The company stresses that the effort is strictly for science, and says it’s taking pains to protect patient confidentially. The study will be overseen by an institutional review board, samples will be collected by the health institutions, and the data will only be given to Google once the names and social security numbers have been scrubbed.
But one point did initially give me pause: The information handed over will include full genome sequences of individual participants.
Google’s Baseline Study will remove names & ss #s, but provide access to “participants’ entire genomes.” Does that count as de-identified?
– James Temple (@jtemple) July 25, 2014
Curious about the implications of that, I contacted Hank Greely, a Stanford law professor focused on the ethical and legal issues associated with biomedical technologies. He said that a full genome sequence, the three billion DNA base pairs that make you you and me me, can only be anonymous if you define “anonymity” in a narrow way.
“I’m not saying people shouldn’t sign up for this,” he said in an interview. “But they need to know going into it that nobody can honestly promise you anonymity or confidentiality.”
That’s because once someone has the sequence, they could theoretically match it up to anywhere else that data lives — for example, on heredity sites like Ancestry.com, 23andMe and Family Tree DNA. In fact, several dozen adoptees reportedly used DNA tests to figure out the likely surname of their biological fathers on the latter site, the BBC reported in 2008.
As these tests become cheaper and more popular — the cost of full genome sequencing has plummeted a millionfold in the last decade, and simpler SNPs tests are already less than $100 — there are likely to be more places where this sort of data is available.
But with that all said, after several exchanges with Google, the risks in this very specific circumstance seem tiny to me.
The company isn’t hosting this data publicly, so the only worrisome scenarios are that someone hacks into it, or a rogue Google X employee decides to abuse it for reasons that would also be difficult to fathom.
Down the road, if the study produces useful insights, Google may share some information with outside researchers, but only those working on formal studies also approved by institutional review boards. It won’t ever hand it out to the public, the company says.
I have two tests that I try to apply when thinking about appropriate privacy boundaries: Do consumers have choice, and do they have transparency?
In this case, the answer appears to be “yes” to both. The study is purely voluntary; no one will be compelled to participate. And I’m assured that the consent form explicitly describes the possible risks associated with sharing genomic data, precisely as Greely advocates.
Given these precautions, I’m prepared to say that I’d be comfortable participating in this study — at least if I qualified as healthy, which, unfortunately, I probably don’t.
There’s no way of knowing whether Google’s study will actually produce any genuine scientific leaps, but there’s every reason to believe that one analysis of this sort soon will.
Join the conversation: