A View from David Ewing Duncan
My Genome Via E-mail
Trying to understand the six billion nucleotides—all of my DNA—that just arrived in my in-box.
A few weeks back I received my complete genome by e-mail.
Actually, the e-mail provided a link to my raw data, a 690 MB file, a tad too large to send in totem by e-mail.
What I got was endless lines of nucleotides— As, Ts, Cs and Gs—divided up by chromosome in a report prepared by the California-based sequencing company Complete Genomics. They generously ran my genome for free—after considerable cajoling by me—so that I could report on the experience. (The dramatic decrease in price for sequencing whole genomes, from perhaps a million dollars three years ago to about $5,000 today, helped persuade them). The project also was championed by Harvard geneticist George Church and his Personal Genome Project (PGP), which has posted my results and given me the designation PGP 13. I am the 13th person to be sequenced for the project, which is aiming to collect 100,000 genomes. (Other PGPers include tech guru Esther Dyson, Harvard psychologist and author Steven Pinker, and Church).
The mass of code delivered to me holds clues about whether I’m at a higher risk than most people for everything from heart attack and certain cancers to Alzheimer’s disease and, more controversially, depression and other behavioral conditions. It contains tips about drugs that may not work for me, or that might inflict dangerous side effects. (Check out my book, Experimental Man, for details about some of these findings from previous testing).
One day, this data will be used in tandem with the stem cell line created for me by Cellular Dynamics International (See my feature article: “Growing Heart Cells Just for Me”). These stem cells—created by bioengineering cells from my blood, which I sent to the company—are similar to those cells that appear a few days after a human egg is fertilized. They can grow into any cell in the body, including the heart, brain, and liver. These cells, guided by the clues in my genome, could be used to refine predictions or diagnoses for diseases,or one day could be used to provide replacement cells should I get whacked in the head or have a heart attack.
But what have I learned from my complete genome that I didn’t already know?
If you have been following the Experimental Man Project, you know that I already have results for thousands of genetic markers (genotypes) associated with disease, and with traits that run the gamut from predicting that I have blue eyes (easy to verify) to a higher than normal risk for becoming a heroin addict (I’ve never actually been interested in the Big H).
These genotypes came from numerous tests, labs, and companies that include the likes of 23andme, Navigenics, Illumina, Affymetrix, the Coriell Institute’s Personalized Medicine Collaborative, and Quest Diagnostics. Their tests, however, identified perhaps 2 million genetic markers out of the billions in each of my cells. These were targeted to be among the short list of markers inside a human that seem to be most important in influencing disease and other traits, yet they missed significant portions of my genome that have now been captured by the Complete sequence.
As whole genomes become less expensive and more common, with hundreds of them now sequenced, scientists are discovering that subtle and often rare differences among people may be linked to even common diseases such as heart disease and diabetes. This may explain why many of the genetic markers identified by geneticists for common diseases seem to have a surprising small impact on whether a person actually gets cancer or diabetes, suggesting that as yet unidentified genes and other factors are at work that have not been discovered.
I’m just beginning to sift through my data from Complete Genomics, but I already have discovered one big difference from my previous testing. This is a near doubling of my total genotypes identified (referred to as “annotated”), from around 11,000 before to over 21,000 now. This analysis comes from SNPedia, a wiki-style website that devotes a page each to describe thousands of individual genetic markers. The site’s founder and curator, Michael Cariaso, has developed a program called Promethease that anyone with DNA data can use to create a list of genotypes drawn from SNPedia’s individual pages.
As of this writing, my total “genotypes annotated” equals 21,621—a number that will go up as more genotypes are identified in the scientific literature.
Here is how SNPedia’s Michael Cariaso described my results in an e-mail:
SNPedia is now watching over 100 genomes closely. Your genome now has the most detailed report known. This is due to the combined effects of your Complete Genomics full genome and your microarrays [previous tests], putting your combined at ~22k. With your recent arrival I think you’re likely to hold the lead for the rest of this year, and perhaps well beyond.
Two important challenges arise as I begin to analyze my data. One is that tools for interpreting whole genomes remain nascent as companies and labs that have been hell bent on building better and cheaper methods for sequencing begin to turn to the much more Herculean task of understanding what all of this code means. The other is that much of the genetic markers remain preliminary, based on statistical analyses that compare people, say, with heart disease to those who don’t have heart disease. Only a tiny percentage of these “Genome Wide Association Studies (GWAS)” have been clinically validated in real people to see if the risk factors indicated by the statisticians actually happen. (GWAS is also becoming a misnomer and needs updating, since these markers don’t really come from whole genomes).
This second challenge requires a massive effort akin to the Human Genome Project to systematically validate the tens of thousands of genotypes that have been identified so far by scientists. This task will be greatly aided by the proliferation of whole genomes as the price comes down.
In the end, though, the real question is: has this crush of data changed my life? For that, I’ll need to post another blog-or several. So stay tuned.
This is an appeal: Send me you ideas for how best to interpret my newly sequenced complete genome!