In December 2011, Samantha Clark opened up her browser and typed opensnp.org into the address bar. The page loaded; she hit the ‘Sign Up’ button, entered her name and email, and created a password. At the top of the page was a warning in red to read the disclaimer, a page-long list of the dangers of using the service. These included being discriminated against by employers or insurers, zero anonymity, government snoops, and potentially distressing discoveries about herself. She read it, checked a box that said, “I understand the warning and am willing to take these risks,” and then signed up.
“They really try to convince you not to do it,” Clark says.
Clark, a bioinformatics undergrad at the University of Toronto, had already been broadcasting her private musings to Facebook and Twitter for years, but this social network was different. Instead of uploading intimate photos of her and her friends, she’d be sharing a portrait of what makes her her— and with no “share with Friends only” option. She was uploading her DNA.
Members of openSNP upload their genes along with things like their sex, age, eye color, location, Fitbit data and medical history — for anyone to see and analyze. The record lives on forever, in an open-source database, so the detailed warning on its sign-up page should be read very closely. But for Clark, the possibilities outweigh the risks: She wants scientists to have access to the genetic information of more people throughout the world.
“The more people, the easier it will be for citizen scientists to work with the data and make new discoveries. If I want other people to do this and help science, I should set the tone,” said Clark, 25. “The benefit will be infinite as it picks up pace.”
Clark had gotten her DNA analyzed by personal genomics company 23andMe. While a user can download her genetic profile from the Mountain View-based start-up, only 23andMe and its partners have full access to the company’s genetic treasure trove. Clark was uploading her 23andMe profile to openSNP so anyone could use it.
openSNP, an open-sourced web-based social network and bank for DNA information, is the brainchild of Bastian Greshake, a bioinformatics doctoral student at Goethe University in Frankfurt, Germany and a self-proclaimed open-source geek. Greshake was the first person to upload his data to the platform. Clark was the second.
It made it easier to crowdsource genetic research, but it also echoed an incipient feeling rippling through the Internet: A desire to reassert control over our digital selves. Big, opaque companies like Google and Facebook — entities to which we have entrusted so much personal information — now have a monopoly on data about everyone who uses their services and the insights that can be gleaned from it. To open-source evangelists, it seemed like the same thing was going to happen to our genes. People just hadn’t noticed.
The world of open DNA
More than three years after Clark first put her genes up on the web for all to see, roughly 1,500 others have joined her on openSNP. It isn’t the only social network out there for genetic exhibitionists. Just like someone might have profiles on Facebook, Twitter, and LinkedIn, people are starting to upload their genetic information to multiple sites. Clark is active on Genomera, Snpedia, and Promethease — all grass-roots open-source platforms for genetic information and research. People have even uploaded their genes to the collaboration tool Github.
This all adds up to a citizen-genetics movement that is just getting started. People like Sharon Terry, an advocate for public participation in genetics research, and Melanie Swan — a Silicon Valley entrepreneur who founded DIYGenomics, an organization that organizes crowdsourced genetics research — are spurring this revolution.
“What we’re trying to do is imagine a system where the patient says, ‘I want my data. I want it open. I want researchers to work on it. I want them to share it.’ We’re trying to build this alternate universe,” said Stephen Friend, the director of Sage Bionetworks, a nonprofit that champions open science.
What we’re trying to do is imagine a system where the patient says, ‘I want my data. I want it open. I want researchers to work on it. I want them to share it.’ We’re trying to build this alternate universe.- Stephen Friend
But that alternate world — where the lines between researcher and patient are blurred — is still quite small. Swan said via email that she “started DIYgenomics as a way to mobilize [her] personalized genomic data from 23andMe, to actually use the information pro-actively to manage [her] health, and experiment collaboratively with others in a fun way.” Such experiments can foster intimate social interactions among DIY geneticists, and the hope is they could one day help advance personalized medicine.
That’s what drew Clark to the open-source genetics movement, but the dream is not realized yet. The community is too small to support rigorous scientific analysis. Clark is participating in six of the 16 studies organized by Swan’s DIYgenomics on Genomera, but she says she hasn’t heard from them in a while. She’s also not aware of any published studies that have actually leveraged her data. The biggest benefit for her instead has been the interactions she’s had online with other genetic social networkers.
Through the openSNP messaging system, for instance, Ian Logan, a retired physician and citizen geneticist from the U.K, reached out to her to tell her about a program he’d built to scan 23andMe data looking for rare genetic traits. He wanted her to check out her results and give him feedback. He’s done this for every openSNP user for free.
“I thought that was really cool,” Clarke said. She didn’t find out anything particularly useful, she says, but she liked the social connections she was having with citizen scientists around the world thanks to openSNP.
Others might see it as creepy. Logan doesn’t seem to be doing anything harmful with the data, but what if people with nefarious intentions went around downloading the DNA data being giving away for free? What would be the long-term implications of that?
The big privacy issue
When Clark was first toying with the idea of uploading her DNA to the web, she had a long conversation with her family about the consequences. Her decision to make her data completely public would affect them too. You share half your DNA with your parents, and a quarter with your siblings. So in theory, people could draw conclusions about Clark’s relatives from her DNA. They discussed some of the same things that were listed on the openSNP sign-up page: discrimination and lack of anonymity. Like Clark, they were okay with the risks. “They all supported my decision,” she said.
The scary part is that the types of conclusions or future applications these data might have aren’t completely clear yet. Could someone pair your genetic information with credit card purchases, Fitbit activity, Facebook status updates, LinkedIn profile and use it to sell you stuff? Miinome is a small startup that actually wants to do that. For-profit companies are going to join the (citizen) scientists looking at this data. When that happens consumer profiling won’t just be psychological anymore; it’ll be biological. And that could just be the beginning.
Right now, there are laws on the books in the U.S. that prohibit employers and insurance companies from using genetics to block people from jobs or health insurance, but there are other categories that aren’t covered: long-term disability, life insurance, a home loan application. Changing that “would be the biggest step toward removing the economic harm from shared genetic information,” said John Wilbanks, the chief commons officer at Sage Bionetworks and a big proponent of open data.
Beyond legislation, he says, we need a “Hippocratic oath for people who are accessing open genomes.” That would require people looking at genes to take into consideration privacy and to limit the mining of DNA information to what’s absolutely necessary for science. With genetic information, says Wilbanks, people aren’t going to be comfortable with the kind of “relentless mining” that goes on with other types of data. Right now, nothing like this exists, and most researchers aren’t doing a stellar job of protecting their subjects.
In March 2013, for instance, scientists published the genome of the most famous open-source science resource in modern biology: the HeLa cell line. For years, scientists had been looking for cells they could grow in a lab — cells that wouldn’t die. And for years, nothing seemed to work. Then, in 1951, an African-American woman named Henrietta Lacks showed up with an aggressive cancerous tumor at Johns Hopkins Hospital in Baltimore. Scientists took a sample, and decided to try to grow the cells out. They lived and have been growing steadily since then, powering countless scientific discoveries. Fortunes have been made thanks to these cells, but Lacks and her relatives didn’t get a penny of it.
At the time, there was no such thing as informed consent, so scientists could do what they wanted with Lacks’ cells without asking. The scientists became their new owners and more than 60 years later, when they published her genome, they showed the same disregard. This time, though, what they published could reveal the genetic traits of her living relatives.
With new open-source genetics networks cropping up, anyone outside an academic or commercial institution bound by review boards will be able to access genetic information. It will open up and democratize science, and that’s a good thing. But the downside is that these issues are likely to resurface again if we don’t create the proper guidelines. We’re complaining that 23andMe has sold our data, but when we put it on the web, we’re relinquishing ownership too.
“We can’t treat this like Facebook, where they make it hard to understand what you’re giving away. We have to make it really easy so that people don’t accidentally enroll,” said Wilbanks. “We don’t know who’s going to be the digital equivalent of Henrietta Lacks…We don’t know if and when there’s going to be someone who is that valuable digitally. [Because] we don’t even know what we want out of these genetic databases, we have to consent people into a deeply uncertain space.”
The internet of DNA is stuck in the 90s
The Personal Genome Project, Wilbanks says, has done a fantastic job with that. The PGP is the grand-daddy of all open-source genetics projects. It was started by genetics guru George Church at Harvard in 2005, just two years after he helped complete the Human Genome Project. The PGP has recently started expanding beyond the U.S. to Austria, the U.K., and Canada. Clark volunteered for the Toronto chapter and is currently on a waitlist.
So far, it’s collected a few thousand full genomes. (openSNP only has snippets.) It’s paired that with phenotypic information —information about those people’s health, appearance, and lifestyles. They’ve even added microbiome data — the bacteria that live all over your body — to the mix because recent research has shown that the bacteria living in and on us has real and powerful effects on our health. All that information, tied to genetic data, is what’s really valuable. Alone, your genes don’t tell you much. The PGP has almost two dozen publications in academic journals under its belt. But the data is open to just about anyone, not just “professional” scientists.
“When I share a piece of open-source software or Personal Genome Project-genome, I know that someone (other than me) might reuse it to make something profitable someday. It could benefit me or my kids eventually by helping society,” wrote Church in an email, describing it as a “pay-it-forward model.” Church is not only in the open-souce camp. He’s also an advisor to 23andMe and was on the advisory board of a genetics company recently acquired by pharmaceutical giant Roche. As with 23andMe, PGP participants don’t make money from research discoveries gleaned from their data, even if it proves valuable to drug companies like Pfizer and Genentech.
“The difference,” Church says, “between Google and Wikipedia — and analogously between 23andme and PGP — is the number of people who get access. In the closed model, the number is limited. In the open model, anyone can get access and the breakthrough might come from an unexpected source.”
But the problem is that platforms like openSNP and the PGP are tiny compared to 23andMe. They have thousands of users while 23andMe is approaching a million. If different projects had an easy way to share the data, that might help with the size issue.
For now, “it’s akin to what the web was in ’90s,” says Wilbanks. In a word: broken. Different services can’t communicate easily. That’s a problem because large-scale genetics research is a numbers game. Scientists use machine learning to tease out patterns that can help them predict what types of medical conditions to which your genes make you more vulnerable. Because they have so many users, 23andMe has the upper hand for now.
Earlier this year during his State of the Union Address, President Barack Obama proposed a $215-million precision medicine initiative to weave together the genetic, sensor, lifestyle, environmental, microbiome and medical-record data of 1 million volunteers into one database. Wilbanks is optimistic — maybe wrongfully so — that the people who participate will have the option to share their information more widely than just with the scientific community — the way the PGP participants have done; it would be a boon to the open-source movement.
As a bioinformatician and programmer, Samantha Clark understands these challenges well. And that’s why she likes that on openSNP, you can download all your data — your genes, your physical traits, and soon even your microbiome when that feature is added in. You can take it all with you. She sees that totally open policy as a potential solution to the limited users quandary. She thinks it could go viral.
“If there are future open-source efforts, I could see that becoming the standardized format of health data that geneticists will use,” she told me. “Right now, we’re just preparing for that.”
The outcomes could be positive. It’s possible the information gleaned from open-source DNA databases could help scientists discover new medications tailored to individuals more quickly, for instance. But the privacy what-ifs are staggering. Open-source genetics is still new territory. My advice: tread lightly.