Though researchers have finished sequencing the human genome, it is still far from understood. A major objective of biotechnology is to develop the experimental and computational tools necessary for deciphering the signals encoded within the genome and to understand their role in human health and disease.
Much remains unknown. It is still a matter of debate exactly how many genes the genome encodes, or even how a gene should be defined. In addition, scientists are just beginning to understand the array of regulatory sequences that punctuate the genome and dictate when certain genes are turned on and off. The complex code within these elements has yet to be deciphered.
Comparative genomics can shed the powerful light of evolution on these unknowns. Functional regions of the DNA sequence, such as genes and regulatory regions, have been well conserved, remaining largely unchanged across related species through millions of years of evolution; but DNA sequences that do not code for genes or regulatory regions change more rapidly. To help us understand the evolutionary constraints of functional elements in the human genome, the National Human Genome Research Institute has recently expanded its sequencing efforts to include additional mammalian genomes.
In my group, instead of simply searching for highly conserved elements, we search for elements that have changed in particular ways. By comparing various genomes, we have found several evolutionary signatures–common patterns in the way a particular DNA sequence has evolved over time. We are now using these evolutionary signatures to reanalyze the human, yeast, and fly genomes and have already uncovered hundreds of novel genes, novel exons, and unusual gene structures.
We have also used genome-wide conservation patterns to define subtle regulatory motifs that are another type of evolutionary signature. Coupled with rapid string search algorithms, these signatures have led to the discovery of a complete dictionary of known and novel regulatory elements in the human, yeast, and fly, revealing the building blocks of gene regulation.
These evolutionary signatures are universal across kingdoms of life. With complete genomes, we can use them to elucidate common evolutionary principles, interpret our genome, study human variation and evolution, and revolutionize our understanding of human biology.
Manolis Kellis, one of the TR35, is an assistant professor of computer science at MIT.