In research published today by Nature, an international team describes the finest map of changes to the structure of human genomes and a resource they have developed for researchers worldwide to look at the role of these changes in human disease. They also identify 75 'jumping genes' - regions of our genome that can be found in more than one location in some individuals.
However, the team cautions that they have not found large numbers of candidates that might alter susceptibility to complex diseases such as diabetes or heart disease among the common structural variants. They suggest strategies for finding this 'dark matter' of genetic variation.
Human genomes differ because of single-letter variations in the genetic code and also because whole segments of the code might be deleted or multiplied in different human genomes. These larger, structural differences are called copy number variants (CNVs).
The new research to map and characterize CNVs is of a scale and a power unmatched to date, involving hundreds of human genomes, billions of data points and many thousands of CNVs.
"This study is more than ten times as powerful as our first map, published three years ago," explains Dr Matt Hurles from the Wellcome Trust Sanger Institute and a leader on the project, "and much more detailed than any other. Importantly, we have also assigned the CNVs to a specific genetic background so that they can be readily examined in disease studies carried out by others, such as the Wellcome Trust Case Control Consortium.
"Nevertheless, we have not found large numbers of common CNVs that we can tie strongly to disease. There remains much to be discovered and much to understand and our freely available genotyped collection will drive that discovery."
The results show that any two genomes differ by more than 1000 CNVs, or around 0.8% of a person's genome sequence. Most of these CNVs are deletions, with a minority being duplications.
Two consequences are particularly striking in this study of apparently healthy people. First, 75 regions have jumped around in the genomes of these samples: second, more than 250 genes can lose one of the two copies in our genome without obvious consequences and a further 56 genes can fuse together potentially to form new composite genes.
Chromosomes are shown color-coded in the outermost circle. Inside are lines connecting the origin and the new location (where known) of 58 out of 75 putative inter-chromosomal duplications, colored according to their chromosome of origin.
(Photo Credit: Jan Aerts, Wellcome Trust Sanger Institute)
"This paper detailing common CNVs in different world populations, and providing the first glimpse into evolutionary biology of such class of human variation, is unquestionably one of the most important advances in human genome research since the completion of a reference human genome," says Professor James R. Lupski, Vice Chair of the department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas. "It complements the cataloguing of single nucleotide variation delineated in the HapMap Project and will both enable some new approaches to, and further augment other studies of, basic human biology relevant to health and disease."
"The genetic 'blueprint' of humans is the human genome," says Sir Mark Walport, Director of the Wellcome Trust. "But we are each unique as individuals, shaped by variation in both genome and environment. Understanding the variation amongst human genomes is key to understanding the inherited differences between each of us in health and disease. A whole new dimension has been added to our understanding of variation in the human genome by the identification of copy number variants."
The results also give, for the first time, a minimum measure of the rate of CNV mutation: at least one in 17 children will have a new CNV. In many cases, that CNV will have no obvious clinical consequences. However, for some the effects are severe. In those cases the data are captured in the DECIPHER database, a repository of clinical information about CNVs designed to aid the diagnosis of rare disorders in young children.
But CNVs are not only about here and now; they are also ancient legacies of how our ancestors adapted to their environments. Among the most impressive variations between populations are CNVs that modify the activity of the immune system, known to be evolving rapidly in human populations, and genes implicated in muscle function. The researchers propose that the consequences of these CNVs can be dissected in population studies.
The team scanned 42 million locations on the genomes of 40 people, half of European ancestry and half of West-African ancestry. The scale of the method meant they could detect CNVs as small as 450 bases occurring in one in 20 individuals.
However, the researchers concede that their map of common variants will not account for much of the 'dark matter' of the genome - the missing heritability where, despite diligent searches, genetic variants have not been found for common disease.
"CNV studies have made huge advances in the past few years, but we are still looking only at the most common CNVs," explains Dr Steve Scherer of the Hospital for Sick Children, Toronto. "We suspect that there are many CNVs that have real clinical consequences that occur in perhaps one in 50 or one in 100 people - below the level we have detected.
"Success in the hunt for the missing genetic causes of common disease has become possible in the last few years and we expect to find more as higher resolution searches become possible."
The research group have maximized the value of their research by not only mapping the CNVs, but by also genotyping them - assigning them to a specific genetic background that makes them readily useful in wider genetic studies, such as the Wellcome Trust Case Control Consortium.
"We were determined to develop not only the map, but also to provide the resources that help other researchers and clinical cytogeneticists most rapidly use our CNV results," comments Dr Charles Lee, one of the project leaders from Brigham and Women's Hospital and Harvard Medical School in Boston, USA. "Already, the data that we have generated is benefiting other large-scale studies such as the 1000 Genomes Projects as well as making an enormous difference in the accurate interpretation of clinical genetic diagnoses.
"Nonetheless, the human CNV story is far from over."
Source: Wellcome Trust Sanger Institute