Accurately reconstructing the relationships between different species requires analyzing the sequences of a judiciously selected, and preferably large, sample of different genes. Hybrid capture with high-throughput sequencing, or Hyb-Seq, is a powerful tool for obtaining those gene sequences, but must be calibrated for each group analyzed to ensure an informative sample of genes are sequenced. Researchers must take a variety of considerations into account when selecting which genes to sequence, and the choices made in gene sampling can affect the outcome of the analysis. In work presented in a recent issue of Applications in Plant Sciences, Dr. Katy Jones and colleagues evaluated the performance of a Hyb-Seq probe set designed for the large and diverse sunflower family, Asteraceae, and found it to be effective in reconstructing relationships at multiple taxonomic levels, from subspecies to tribe.
Genes that would be informative in one taxonomic group may not be in another, for a variety of reasons -- the gene is not present in all species, or evolves too slowly in that group to add meaningful information to a phylogenetic analysis, or has duplicated to create multiple paralogs. The diverse evolutionary histories in a large group like the Asteraceae makes selecting which genes to sequence a challenge. "Asteraceae is the largest angiosperm family and the Asteraceae COS probe set contains 1061 loci, some of which may be informative for some tribes/genera but not to others, for example due to potential paralogy in some groups but not in others," said Dr. Jones, corresponding author of the manuscript, work she did during her postdoctoral research at Botanischer Garten und Botanisches Museum Berlin.
Dr. Jones and colleagues were interested in how the genes sampled in the 1,061-locus Asteraceae Hyb-Seq probe set would perform in phylogenetic analyses at different taxonomic levels. The researchers tested the probe set on a tribe within the Asteraceae, the Cichorieae. "We were interested to know how analyzing a dataset containing many species across a large tribe compared to a dataset just containing a small species complex may influence phylogenetic inference within that small species complex," said Dr. Jones. "It was quite explorative at the start and over time the questions, ideas, and number of different taxonomic groups grew!"
The researchers found that the Hyb-Seq probe set yielded sequence data that could accurately reconstruct relationships between species at multiple different levels, but that the way the data was subsampled and analyzed was important and influenced results. For example, phylogenetic analysis with coalescent species tree approaches produced different results than with maximum likelihood methods when long branches (loci that have undergone considerable evolution) were not removed.
As part of this work, Dr. Jones and colleagues present an optimized pipeline for the preparation and analysis of Hyb-Seq data, and discussed different wet lab approaches that could influence results, streamlining the process for other research groups. This was a direct response to their own personal experience with Hyb-Seq. "We were often sending emails back and forth about different things, for example if someone would find that they got poor capture or more of the off-target plastome compared to previous runs," said Dr. Jones. "We'd chat about our wet lab steps or analysis pipelines." Dr. Jones noted the support of the Asteraceae community in this work, and particularly that of the late Vicki Funk.
Dr. Jones hopes that working out the nuances in these sorts of analyses will mean that, in the future, powerful tools like Hyb-Seq will be put to greater use. "I hope this paper encourages more people to use Hyb-Seq data for their research questions," said Dr. Jones, "because the phylogenetic methods are becoming even more accessible."