How do you study a group of organisms with over 300,000 species, dispersed across all seven continents, and with up to 50 times as much DNA content as the human genome?
This is the question posed to biologists studying the evolutionary history of flowering plants, called angiosperms, whose rapid diversification was so convoluted a problem that Darwin referred to it as the 'abominable mystery.'
This month, both the American Journal of Botany (AJB) and Applications in Plant Sciences (APPS) are devoting their July issues to what has recently become a turning point in the way scientists study the relationships among flowering plants. Dubbed Angiosperms353, the initiative combines new and innovative DNA sequencing techniques with genetic information from 1KP, a massive data resource with DNA from more than 1,000 species that took an international team over a decade to complete.
"Using these gene sequences as a common tool opens up new questions that could not have been looked at before," said Dr. Matthew Johnson, assistant professor and herbarium director at Texas Tech University and one of the original architects of Angiosperms353.
The Greater Phylogenetic Good
Until now, geneticists have had to choose between two options when designing a study: either obtain small amounts of DNA for a large number of organisms or the reverse.
After DNA sequencing was originally developed in the mid-1960s, scientists primarily went with the first option. They began stitching together the tree of life by comparing genetic sequences shared widely among species. Named after its founder, Sanger sequencing was used to assemble trees by examining just a small number of genes, somewhat like trying to understand a country by only visiting its capital.
With the advent of next-generation sequencing at the turn of the century, some researchers began specializing in the opposite approach, meticulously assembling a single organism's entire genetic code. The first test case, the Human Genome Project, was completed in 2003, spurring the new age of genomics.
Today, next-generation sequencing has largely replaced older methods in most labs. However, costs remain prohibitively high for many researchers. And while knowing the genetic code of an organism's entire genome comes in handy when trying to answer specific questions, such as how proteins and cells function at a molecular level, comparing genomes is an inefficient way of piecing together relationships.
To overcome these challenges, researchers have adopted a technique called target sequence capture, which leverages the advantages of next-generation sequencing while focusing in on defined sets of hundreds of genes. This method of retrieving DNA has boomed in popularity in the past few years, allowing scientists filling in the branches and leaves on the tree of life to probe both deeply and widely within and between species.
But target sequence capture still has one major drawback in that, unlike its Sanger counterpart, there hasn't yet been a widely standardized set of sequences with which to compare across multiple studies and to build upon their results. Every time a researcher wants to analyze evolutionary patterns in a group of organisms, they have to design new probes to extract genetic information.
"These increasingly popular genomic methods allow scientists to fish out hundreds of genes; however, the probes needed to do this are expensive and complex to design, and usually only work for a narrowly defined group," said Dr. William Baker, a Senior Research Leader at the Royal Botanic Gardens, Kew, and a lead guest editor for the AJB special issue.
This limitation has hampered the development of large studies on the evolutionary history of plants, but is an issue scientists identified early on and have worked diligently over the past decade to avoid. Starting in 2019 with the release of two combined probe sets -- Angiosperms353 for flowering plants and GoFlag for groups including ferns and mosses -- they're now starting to reap the rewards of their labor.
"Angiosperms353 targets a standardized set of genes, which means published data can be re-used and synthesized across studies for the 'greater phylogenetic good,'" Baker said.
The Splash Zone
Plant biologists haven't wasted any time in putting the Angiosperms353 probes to use. The 20 studies published in these special issues span the breadth of angiosperm diversity, encompassing over 500 genera and several times as many species. And because of the broad utility of the probes, each study also zooms in on a particular group at different magnifications.
Many of the genetic sequences the probes correspond to have been relatively stable throughout the 140-million-year history of flowering plants. These DNA strands accumulate mutations at a glacial pace and are thus useful in constructing the main branches of the angiosperm tree of life.
Other sequences mutate at a much faster clip, to the extent that no two are alike in any given species. And while most of the probes correspond to DNA actively used by cells to create proteins, they also adhere to small portions of DNA that flank either end of a protein-coding strand, regions emblematically referred to as 'the splash zone.'
These flanking regions don't actively code for proteins; in fact, scientists are still unsure exactly what they do. What they do know is this non-coding DNA mutates quickly, similar to the types of genes used for forensic testing in crime labs. In plants, they can be used to illuminate close relationships among closely related species or to reveal patterns of genetic diversity among individuals, filling in the small twigs and leaves on the tree of life and providing an important roadmap for conservation efforts.
Past, Present, and Future
Sequence capture also has an important advantage over previous techniques in that it can be reliably used to retrieve old DNA. This feature is extremely important in a field where some estimates suggest the majority of the 70,000 or so plant species yet to be discovered have already been collected and stored in herbaria. Some species, such as Miconia abscondita, were only discovered through genetic analysis of herbarium tissue after they'd gone extinct in the wild. And analyses of plant communities from ages past have been used in multiple cases to study how plants are responding to climate change.
The studies in these issues offer a glimpse into the future of plant phylogenetics, one in which researchers can obtain immense quantities of data in a fraction of the time it would have taken them just 20 years ago. For Baker, who will be publishing Angiosperms353 data for over 7,000 flowering plant genera later this year, that future looks bright. In concert with the Royal Botanic Gardens, Kew, he and several colleagues have been using the new probe set to construct the plant tree of life through the PAFTOL project. He's also helped launch a free repository called the Kew Tree of Life Explorer to store and distribute the growing amounts of genetic data from researchers around the world who are using the probes.
"The standardization of these targeted genes will pay dividends for decades to come, as we inch towards our collective goal of a complete tree of life for all species," Baker said.