The human genome project-molecular biology’s international extravaganza that seeks to decode all of humankind’s estimated 100,000 genes-is on track to be completed in 2005. Researchers know that each of those genes contains a precise molecular script for making a different protein. But after spending several billion dollars on the Genome Project over a decade and a half, scientists will simply have gathered the cast of characters for another fundamental mystery: What in the heck do all these proteins do?
On biology’s stage, proteins are the leading players-as well as the producers and directors. Acting alone and in groups, proteins do just about everything in a cell, from shuttling urgent messages to controlling the cycle of growth and death. It’s the role of proteins that scientists need to know to unravel the secrets of life-and to develop potent new drugs. In fact, in one sense the Human Genome Project and its counterparts for other organisms, such as flies, worms, viruses, and bacteria, are simply precursors of a great Human Protein Project. Yet the efforts to decode the genes of less complex creatures are bringing advance notice of just how poorly lit the world of proteins really is.
Take baker’s yeast. A staple of genetics labs, researchers finished sequencing its genome-all 6,000 genes-in 1996. Each of the strings of genetic “letters” predicts the basic makeup of a protein. “But we still have no idea whatsoever what nearly half of those proteins do,” laments Roger Brent, a geneticist at the Molecular Sciences Institute in Berkeley, Calif. “And yeast is one of the most intensively studied organisms.”
Today’s research techniques are woefully inadequate for explaining the function of so many proteins. Typically researchers will breed “knock-out” mice missing a particular gene, then study what effect the loss of the corresponding protein has on the animal-an approach that is like trying to understand how a carburetor works by removing it and checking to see whether the car still runs. Thanks to the rapidly growing
availability of the DNA sequences of genes, scientists are gaining a neatly labeled inventory of parts. “What we need now are fast ways to find where and how all those parts fit,” says Brent. “We need to apply high-throughput approaches to studying proteins.”
The field developing these approaches is called “protein genomics,” or, more catchily, “proteomics,” and it’s one of the hottest areas of biotech. Over the past year, research in this sector has spawned an array of high-tech startups, including Genome Pharmaceuticals, a spin-out from Germany’s Max-Planck Institute, and Hybrigenics, a Paris-based venture. Established gene-hunting firms are also moving aggressively into proteomics. Cambridge, Mass.-based Millennium Pharmaceuticals, Genome Therapeutics, and Incyte Pharmaceuticals of Palo Alto, Calif. all have research groups. These genomics companies bring to the protein game qualities honed over half a decade of searching for genes: a taste for big projects and an aptitude for automated, high-speed science.
The biggest and most advanced of today’s proteomics efforts aim at revealing how proteins interact with one another. It’s by acting in complex networks that proteins command critical processes such as the way cells translate outside signals into biological “to-do” lists. Like a game of molecular telephone, signals are passed along chains of interacting proteins. To find out what happens in cancer-or in health-researchers need to find out just which proteins are working together.
At the Salt Lake City headquarters of Myriad Genetics, a row of 14 research robots has been whirring away since last November on this problem. The robots are part of an ambitious effort to construct what’s known as a “protein interaction map,” says project director Arnold Oliphant. “We’re taking every protein encoded by the human genome and asking which other proteins it binds to.” Interaction mapping is only one of several proteomic technologies, but Myriad is betting it is the fastest way to determine the function of a large number of proteins. Finding that a protein of unknown function can bind to one whose cellular task is known, says Oliphant, “is like finding that a particular screw belongs in the carburetor. You get a pretty good lead on what it’s for.”
But, with an estimated 100,000 proteins produced in the human body, there are a daunting 50 billion possible proteinprotein combinations to test. Myriad figures there are probably half a million actual interactions to discover. “It’s a big undertaking,” Oliphant says, “but we think this information is going to be phenomenally valuable.”
The big payoff could come in finding new drug targets. Until recently, genomics firms like Myriad were mostly in the business of tracking down disease genes by studying inheritance patterns. That’s how Myriad helped uncover a gene called BRCA1 that has been linked to breast cancer. But just what the gene’s protein does in the body is unclear. As a result, it’s not a decent drug target. That’s where the interaction map could prove handy-by helping to delineate just what the protein product of BRCA1 does and what other proteins it interacts with.
There are already some clues. Oliphant says Myriad scientists have discovered that BRCA1’s protein binds to another protein, known as CtIP, which has been identified as a bit player in an important cancer pathway. By connecting the dots, Oliphant hopes to better understand BRCA1’s role and find the Achilles’ heel that could be the first step in developing a drug to prevent at least some types of breast cancer.
Not everyone is convinced that such protein interaction maps will pay off. “Everyone’s searching for the next big technology. But I’m not sure we’re there yet,” says Jean-Franois Formela, a venture capitalist at Boston’s Atlas Venture. “It’s too random, too much of a long shot,” says Formela, who thinks the money could be better spent on more directed research rather than the brute-force effort to find all possible protein-protein interactions.
And, to be fair to the skeptics, largescale interaction mapping efforts face two key obstacles. The first is that not all human genes are known. This doesn’t actually stop researchers from recording protein interactions, but it does require them to jump through some genetic hoops. The result, say critics, is data that are potentially “noisy.” Furthermore, nearly half of the interactions detected by most mapping efforts can later turn out to be biologically irrelevant.
At the New Haven,Conn., laboratories of genomics firm CuraGen, researchers building a protein interaction map have instituted a series of controls for filtering out the noise. But CEO Jonathan Rothberg, who’s been mulling the protein interaction problem over for a decade, says that even after the filtering,”every one of those interactions needs to be confirmed.”As opposed to the rapid-fire genetic voodoo used in the first pass, these follow-up tests apply standard, relatively slow methods. But because the number of detected interactions is fairly small, Rothberg says this task is “definitely doable.”
Oliphant agrees that one way or another the interaction map is going to be the fastest, cheapest way to link proteins to their function on a large scale. Indeed, Oliphant ambitiously predicts Myriad will complete the entire human interaction map in less than five years. His team is beginning with 8,000 well-characterized human genes from public gene banks and the company’s own collection. They’ll use the proteins coded for by these genes as bait to fish for interactions in a genomic pool stocked with a complete collection of human gene products. For each initial piece of “bait,”Oliphant expects to fish out two to five interacting proteins. Some will be familiar, others will be entirely new. “We’ll characterize the novel ones, then use them as bait for the next fishing expedition,” he says.
While scientists at CuraGen and Myriad labor to unravel the protein interactions in humans, others are also looking to help create new drugs by taking on other organisms.Hybrigenics,which was founded at the end of 1997 to exploit interaction mapping strategies developed by French researchers including molecular geneticist Pierre Legrain of the Pasteur Institute, plans to create maps of pathogens such as HIV, the hepatitis C virus, and the ulcercausing bacterium H. pylori. “A map of HIV will help define alternative drug targets,” says Hybrigenics’ founder,A.Donny Strosberg. That could prove critical because of the deadly virus’s growing resistance to today’s drugs.
Academic researchers are also laying plans to chart the proteins of their favorite research organisms-including yeast, the bacterium E. coli, and the fruit fly, Drosophila. But it’s likely that most of the action will continue to play out at biotech companies because the Human Genome Project will be grabbing the lion’s share of federal funds for big science in biology for years to come. “There’s no international proteome project yet,” says Marc Vidal, an investigator at Massachusetts General Hospital who is trying to cobble together enough grants to start building an interaction map of the worm C. elegans. “But we talk about it over beer.”
Molecular Sciences’ Brent is not waiting idly.He’s beating the pavement to raise $1 million to fund a startup company called Functional Genomics Systems, which he says will undertake to map human proteins. But after selling drug companies first crack at the data, Brent says he will make it available to the public. “The technology is in place,” he says. “Now it’s just a matter of shoveling and money.”
The ribbon has been cut and the foundation is being laid. The great human protein project is underway, a biomedical monument for which the Human Genome Project is but the blueprint.