Celera Genomics Publishes First Analysis of Human Genome
Genome Sequence Reveals Fewer Genes than Predicted
Celera Produces First Assembly of the Mouse Genome
Rockville, MD - February 12, 2001
Celera Genomics (NYSE: CRA), an Applera Corporation business, today announced that its scientists have published an accurate assembly of the human genome and an initial interpretation of the sequence. Celera has estimated that the sequences represent over 95 percent of the human genetic information with an accuracy of greater than 99.96 percent. Celera scientists estimate that there are between 26,500-30,000 genes. Earlier estimates have ranged from 50,000 to over 140,000. This published genome is now available free to academic and non-profit researchers around the world via Celera’s web site (www.celera.com).
Because of the relatively low number of genes, Celera scientists believe it will be necessary to look elsewhere for the mechanisms that generate the complexities inherent in human development and the sophisticated signaling systems that maintain homeostasis. They believe researchers can pursue a number of those factors now that there is comprehensive data on non-gene regions of the genome. These include: non-gene regions of the genome containing regulatory elements that moderate gene transcription and the molecular activity that leads to alternative start and stop sites for the transcription of DNA. At the protein level, minor alterations in the nature of protein-protein interactions, protein modifications, and localization can have dramatic effects on cellular physiology.
The publication also revealed that humans are 99.9 percent genetically identical. In fact, there are only about 800 letters different per million letters in each person’s genetic code. By comparing the human genes with those of the fruit fly, Celera researchers found specific differences between the genomes, which are thought to be the telling signs of evolutionary events. The differences had to do with the development of the immune system genes, genes having to due with blood coagulation, and genes active in neural systems. Only about one percent of the genetic code in the human genome acts as genes—coding for proteins in the human genome.
The information will be published in the February 16 edition of the journal Science in a paper by Celera principal authors J. Craig Venter, Ph.D., president and chief scientific officer, Mark D. Adams, Ph.D., vice president, genome programs, and Eugene Myers, Ph.D., vice president, informatics research. The paper outlines Celera’s methods for sequencing and assembly of the human genome as well as results from the first analysis of the human genome. Celera completed the sequencing and assembly of the human genome in nine months.
“This is a momentous occasion for all the scientists around the world who have worked to decode the billions of letters that make up the human genome,” said Dr. Venter. “We are extraordinarily proud of the speed and accuracy with which we have accomplished this at Celera, as we realize this represents a new starting point in biological research. We firmly believe that our work will stand the test of time and will be the foundation for discovery leading to potential cures and treatments for illness.”
Whole Genome Shotgun Sequencing
Celera began to sequence the human genome on September 8, 1999, using the whole genome shotgun (WGS) technique that its scientists pioneered in sequencing the first complete genome of a free-living organism, which was decoded in 1995 at The Institute for Genomic Research (TIGR). During the nine-month period that Celera scientists sequenced the human genome, 14.8 billion base pairs of DNA from more than 27,271,853 sequence reads were generated, representing 5.11-fold coverage of the genome. Celera scientists then used public data, which had been disassembled into 550 base pair length segments, to add another 2.9-fold coverage for a total of 8-fold coverage of the genome.
The WGS technique involves randomly shearing the human chromosomes into millions of pieces of 2,000, 10,000 and 50,000 base pairs in length. The chromosome fragments are inserted into a plasmid vector and propagated in E. coli to produce millions of copies of each fragment. A key feature to Celera’s sequencing method is that both ends of each fragment of DNA are sequenced (paired-end sequencing).
The millions of sequences representing billions of letters of genetic code were then assembled into the proper order using the whole genome assembly algorithm and a second assembly method, the compartmentalized shotgun assembly (CSA). These methods result in the reconstruction of the linear sequence of the 23 pairs of human chromosomes. With the CSA method the genome data is divided into segments or compartments, which are then assembled. Each assembly method had its advantages, for example, the WGS had fewer breaks while the CSA had a few percent greater coverage of the genome. The CSA assembly was used for the automated annotation reported in the Science paper. Comparing the WGS assembly to the CSA assembly also served as a check on accuracy of the assembly process.
There are a number of measures of accuracy that are important for understanding the quality of a genome sequence. These include the accuracy of the sequencing data and, more importantly, whether the pieces of the genome have been assembled in the correct order and orientation. Correctly ordered and oriented sequence is important because without it genes are mischaracterized or may even be missed. The Whole Genome Shotgun process produces uniform coverage and quality of sequencing and assembly. Comparison of the Celera data against chromosomes that had been previously completed by the public sequencing consortium to a finished quality level (chromosomes 21, 22) showed excellent agreement between the two datasets. When the Celera assembly of all other chromosomes was compared to the assembly of those done by the public sequencing consortium, published this week in the journal Nature, there were many more discrepancies between sequence order and orientation of the public data than that of Celera.
Celera’s Donor Pool
Recruitment of donors of DNA for sequencing was done via self-referral, newspaper ads, and outreach activities to ensure ethnic diversity. Celera followed a strict Institutional Review Board (IRB) protocol to provide for informed consent and protect the anonymity of the donors. Five individuals were chosen from Celera’s donor pool of 21 individuals. These donors are men and women from a variety of self-disclosed ethnic backgrounds.
Celera has rapidly sequenced and assembled the fruit fly genome, the human genome, and the mouse genome. The speed with which Celera is capable of doing this is directly attributable to the state-of-the art technology employed in various stages of the sequencing process. Celera has approximately 300 ABI PRISM® 3700 automated sequencers in its high-throughput DNA sequencing factory. These machines, developed by Applied Biosystems, have enabled researchers worldwide to use this process on an industrial scale for the first time. Prior to its production in 1999, researchers spent months, if not years, sequencing portions of the genome.
Another key to Celera’s success in genomic sequencing has been the development of high performance supercomputing technology. Celera’s computing partner is Compaq Computer Corporation. In completing the sequencing and assembly of the 2.91 billion letters of genetic code, Celera relied exclusively on networked Compaq AlphaServer computers running Tru64 UNIX and TruCluster software to manage the more than 80 terabytes of data and to perform what are believed to be some of the most complex computations in the history of supercomputing. Celera’s final assembly computations were run on Compaq’s new AlphaServer GS160 because the algorithms and data required 64 gigabytes of shared memory to run successfully. Celera leverages LION Bioscience’s SRS data integration platform in the Celera Publication Site and the Celera Discovery System.
Celera, which began sequencing the mouse in April 2000, also announced today that it has completed the first assembly of the mouse genome using its proprietary whole genome assembly method. Celera has sequenced 15.8 billion base pairs for nearly 5.5X coverage of the genome from several strains of mice that are widely used in scientific research. This coverage ensures greater than 99 percent representation of the mouse genome. In October, Celera announced that it had sequenced 9.3 billion base pairs or 3X of the mouse genome. Celera is continuing to refine this assembly and now begins the annotation or analysis phase.
Applera Corporation, formerly PE Corporation, comprises two operating groups. The Celera Genomics Group, headquartered in Rockville, MD, intends to be the definitive source of genomic and related medical information. Celera has developed three business units: the On-line Information Business, Discovery Sciences, and Discovery Services, all of which build upon Celera’s generation, integration, and analysis of biological information. Celera intends to enable therapeutic discoveries both through its own application of its scientific capabilities and in partnership with pharmaceutical and biotechnology companies. The Applied Biosystems Group (NYSE:ABI) develops and markets instrument-based systems, reagents, software, and contract services to the life science industry and research community. Customers use these tools to analyze nucleic acids (DNA and RNA) and proteins in order to make scientific discoveries, develop new pharmaceuticals, and conduct standardized testing. Applied Biosystems is headquartered in Foster City, CA, and reported sales of $1.4 billion during fiscal 2000. Information about Applera Corporation, including reports and other information filed by the company with the Securities and Exchange Commission, is available on the World Wide Web at www.applera.com, or by telephoning 800.762.6923.
Notice To Readers: Celera's press releases, presentations and printed remarks are included on this website for historical purposes only. The information contained in these documents should be considered accurate only as of the date of the relevant document. This information may change over time, and therefore visitors to this website should not assume that the information contained in these documents remains accurate at a later time. We do not have any current intention to update any of the information in these documents.