Small DNA-laden wafers have transformed biology. Using these DNA chips, geneticists can see which genes are turned on, or expressed, in a cell at a particular time. Such gene expression experiments allow bioscientists to diagnose different diseases, quickly screen thousands of drug candidates for efficacy and safety and even learn the functions of newly discovered genes.
Sharing this information over the Web could lead to an explosion in biological knowledge. But each experiment generates gigabytes of data written in one of several formats, depending on the type of chip used. And with dozens of chips on the market and hundreds of ways to analyze the data, the Web is in danger of becoming a genetic Tower of Babel.
Companies and academics have begun creating uniform formats for representing gene expression data, designed to work on any computer (see table below). Overseeing the effort to fashion a single standard from these proliferating formats is the Object Management Group, an international nonprofit consortium that has helped the computer industry establish software standards for over a decade. A life sciences subgroup formed in 1997, and standards for protein and DNA sequence analysis followed. Next in line: molecular and chemical structure representations and drug trial data, as well as gene expression data.
Participants hope that a gene expression standard will emerge by year’s end. If it does, the enormous amount of data produced in the wake of the Human Genome Project could find a common language on the Web.
Gene Standards Projects
Group | Project | Purpose |
Rosetta Inpharmatics | Gene Expression Markup Language | Data representation |
| ||
European Bioinformatics Institute/Microarray Gene Expression Database Group | Microarray Markup Language | Data representation |
| ||
National Center for Genome Research | GeneX/GeneXML | Database/data representation |
| ||
NetGenics | Standard interface for gene-expression data warehousing | Data management and analysis |