Soria-Carrasco and colleagues use a combination of whole genome sequence and GBS data to investigate evidence of parallel evolution in stick insects. They find signficiantly more shared regions of high divergence than expected by chance among replicate populations, and greater sharing than expected with alleles showing frequency change in an experimental colonization setup. The authors did an impressive job putting together a lot of data, and I can imagine how much work is required to do this kind of analysis in an emerging (eclosing?) model system.
I’ll preface the rest of my post with a confession – I have only skimmed the relevant sections of the supplement, so I apologize in advance if I missed something.
The question of repeated evolution is an exciting one, but one that never ceases to confuse me. What is the question we are interested in? Are we only interested in identifying SNPs that show selection in the same direction in replicate populations? Genes? Regions? Pathways? Or is the question what proportion of adaptive divergence is repeated across populations? Or even simply how often does the same phenotype evolve in response to similar selective pressures? The expectation for repeated evolution would seem to me to depend on a lot of factors, from the strength of selection, the presence of standing variation, the time period involved, and the genetic architecture of the trait. And without a model, I’m not quite sure whether to be surprised by the results of a particular study. This study is a case in point: 83% of their high Fst SNPs are unique to one pair of populations and show no evidence of repeated evolution. But maybe 17% is way more than we expect? I’m not sure. But even it is, one can look at those results and still argue that the vast majority of adaptive divergence appears to be independent. And I remain confused (which doesn’t bode well for the repeated evolution paper I’m currently trying to write) as to what this actually means.
Such questions aside, some parts of the results and analysis I found puzzling. For example, Fst is dependent on minor allele frequency (Fig 1). SNPs at high frequency in the ancestral population are more likely to be at high frequency in daughter pops and thus show high Fst. In fact, when I run some completely neutral simulations of 2 populations pairs, using a 90% Fst quantile like the authors do, I get an average of 1.1% SNPs with high Fst shared between pairs of populations rather than 1% expected, or a 10% excess! Moreover, I saw more than expected shared high Fst SNPs in 85% of the simulations I did. Now to be fair I simply invented a relatedeness matrix for my simualations that gave reasonable pairwise Fst values (median of ~1%); without grabbing their data I don’t know how realistic my matrix is, but it does go to show that this effect could be important.
Figure 1.
I think we also felt it would have been nice to explore the GO terms a bit more – does metal ion binding make sense given, say, what we know (or could easily know from some simple analyses) of the mineral composition of the two host plants?
In the experimental transplant, it seemed odd that stick insects taken from Adenostoma left more offspring (or at least more were collected?) when transplanted onto the alternative host Ceanothus than when transplanted onto a different Adenostoma in four of the five replicates. While the sample sizes may be too small to infer much about fitness from these numbers, that’s certainly not in the direction I would have expected.
Finally, selection must have been pretty strong to overcome 1) Nm values of 20-100 or 2) genetic drift in the experimental pop in a single generation. Is that realistic? Or even if some loci are under sufficiently strong selection, are those informative about adaptive evolution in general, or even the majority of cases of repeated evolution?