In a pioneering effort that generated massive amounts of DNA sequence data from 12 people, a team supported by the National Institutes of Health (NIH) has demonstrated the feasibility and value of a new strategy for identifying relatively rare genetic variants that may cause or contribute to disease. The proof-of-concept findings were published online today in the journal Nature.
The new strategy involves isolating and sequencing all exons which are the parts of the human genome that contain the information needed to produce proteins, the building blocks of the body. The complete set of exons – referred to as the "exome" – makes up only one percent of the human genome. By selecting only the exome to sequence, the important information about an individual can be obtained at a much lower cost than sequencing a person's entire genome. Assessment of the results of exome sequencing is based on knowledge of the genetic code and allows for a more informative interpretation of genetic variants. Using the exome strategy, like other methods of direct DNA sequencing, investigators also can detect rare variants that typically provide a stronger indication of disease susceptibility.
The research, conducted by scientists from the University of Washington in Seattle, Wash., and Agilent Technologies in Santa Clara, Calif., was funded by the National Heart, Lung, and Blood Institute (NHLBI), the National Human Genome Research Institute (NHGRI), and the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), which are all part of the NIH. It was carried out as part of The Exome Project, a program jointly managed by the NHLBI and the NHGRI that was established to develop, validate, and begin to apply a cost-effective, high-throughput approach for exome sequencing that can be deployed in large, well-phenotyped human populations.
"This focused approach will yield information that informs our understanding of the genetic basis of diseases, a prerequisite for personalized medicine," said NHLBI Director Elizabeth G. Nabel, M.D. "We have great hope that targeted sequencing, when applied to a larger number of individuals, will be used to discover the genetic underpinnings of common conditions such as high blood pressure and high cholesterol. The current findings provide the fundamental groundwork for pursuing this important goal."
To demonstrate the utility of their approach, researchers focused on the exomes of eight people (four Yoruba, two East Asians, two European-Americans), whose DNA had previously been characterized by the International HapMap Project. The HapMap Project was an effort that produced a comprehensive catalog of common human genetic variation across the human genome. In addition, the study included four unrelated people with Freeman-Sheldon syndrome, a rare inherited disorder caused by mutations in the MYH3 gene, to see if exome sequencing had the power to detect the MYH3 mutations known to exist in their DNA.
The researchers began by shearing the 12 samples of genomic DNA into fragments and then using special probes to capture only those fragments that contained exons. The resulting 12 collections of exomes, were then sequenced and analyzed. Altogether, the project determined 300 million bases of DNA sequence – the largest data set reported so far of human coding sequence produced by more advanced second-generation sequencing technologies.
Comparison of the exome sequences to the publicly available human genome sequence highlighted the sensitivity of this technique for detecting genetic variations, both common and rare. The investigators were able to identify a range of these DNA misspellings, such as rare and common single letter variations known as single nucleotide polymorphisms, or SNPs, and insertions and deletions of sequences within genes.
From the DNA of the four people with Freeman-Sheldon syndrome, the researchers were able to pinpoint the causal genetic variant by applying a multi-step systematic strategy to filter out common variants and variants that were specific to each individual. The findings demonstrate that sequencing the exomes of a small number of unrelated individuals with a disorder that is due to a single gene can serve as a genome-wide scan for the causative gene. Within large population studies, the researchers suggest that exome sequencing could be used to uncover genes that contribute to the risk for more common, multigenic diseases such as diabetes or cancer.