Phylogenetic and phylogeographic methods for inferring mode of speciation
2010-05-21
A literature review on biogeographic research methods for distinguishing modes of speciation
In order to understand the origin of species, it is crucial to determine the circumstances that led to their divergence from formerly conspecific populations. Although it had been suggested that the process of becoming ecologically distinct while in sympatry may lead to speciation, most researchers of the early twentieth century found little evidence for this mode of speciation, such that, by the 1960s, geographic isolation (allopatry) came to be seen as the dominant, if not sole, driver of speciation.1 Different alternative modes (including parapatric and peripatric) were later proposed, and with the increasing availability of molecular data, new methods for inferring mode of speciation were developed.
A simple method for determining mode of speciation would be to use phylogenies to determine sister-species pairs and to compare the distributions of those sister species. A major flaw in this method is that it relies on the assumption that current species ranges correspond to those at the time of speciation, i.e. current sympatry indicates sympatric speciation and similarly for allopatry. However, this assumption is violated in many cases.2
Barraclough & Vogler 20003 developed a method known as age–range correlation (ARC), which uses all the nodes in a phylogeny and includes the expectation that species ranges will change. The amount of overlap between sympatrically speciating sister taxa is expected to decrease from 100% over time as ranges shift stochastically (Figure 1a); where the dominant mode is allopatry, range overlap should increase from 0% as ranges expand over time (Figure 1b). The occurrence of peripatry is estimated by plotting asymmetry in range size against depth of node, where peripatric populations initially have much smaller ranges than their progenitors but the asymmetry decreases over time due to range expansion.
The results of this method are sensitive to accurate phylogeny estimation, but more serious objections can be raised about how the ranges of deeper nodes are inferred. Barraclough & Vogler 20003 used the union of the ranges of the daughter taxa as the range of their common ancestor, for which there is no evidence. In the results of ARC using real data, the patterns deduced to represent allopatry and sympatry are difficult to distinguish from randomness, since historical range shifts obscure the pattern in unpredictable ways.2
Fitzpatrick & Turelli 20061 attempted to improve the ARC method by not inferring ancestral ranges. Instead, the amount of ancestral range overlap is calculated as the nested average of the amount of overlap between daughter clades. Taxon and distribution sampling need therefore not be as complete for this method. However, the ARC method inevitably requires several terminal taxa so that there is enough data for a trend to be observed, but this assumes that the predominant mode of speciation is constant throughout the tree — on all branches and throughout the evolutionary history of the group. The mode may have changed over time, but as can be seen from Figure 1c and 1d, in which the mode of speciation has switched from the one to the other, the regression results are determined by the pattern in the (more numerous) recent nodes. Thus the method does not actually use the entire history of the group but is determined by the current distributions of the terminal taxa, which is a weakness for the reasons outlined by Losos & Glor 2003.2
Another test for sympatry proposed by Fitzpatrick & Turelli 20061 is to compare the observed Jall — the proportion of pairwise species comparisons with no overlap — to the expected Jall derived from randomizations of ranges on species in the tree, but this method does suffer from the lack of geographical information, since the total map area in which ranges can be placed is unavailable and there is no way to determine which areas are habitable by the species or what effects the landscape may have on dispersion (e.g. the presence of rivers, mountains, or biome boundaries).
A possible way to overcome the limitations of the ARC method is to only examine sister species, without attempting inference at deeper nodes. Van der Niet & Johnson 20094 looked for shifts in habitat (including soil type and altitude), flowering time, pollinator, and fire survival strategy between sister species in the Cape. They noted that species ranges might have shifted since speciation and that quarter-degree squares may be too coarse a scale to detect current sympatry, and thus chose to focus on qualitative ecological traits rather than geographical distances. However, they still claim that these ecological shifts may have been responsible for many speciation events without knowing the traits of the ancestral populations at the time of speciation.
Past potential ranges can be predicted using ecological niche modelling.5 For each of a pair of sister species, the current distribution is projected onto environmental maps of the area and the environmental characteristics of the habitat are determined (using whatever variables are deemed appropriate, e.g. mean annual rainfall, maximum temperature of hottest month, and altitude). With additional knowledge of changes in climate and geology, the potential niche (in the absence of predation, competition, and limits to dispersal) at the time of speciation can thus be obtained by identifying all areas of the map that have those habitat characteristics.
If the potential ranges of two sister species are separated by an area uninhabitable by either, this constitutes evidence for allopatric speciation, as the diverging populations would have been isolated from each other. This reasoning is based on the Niche Conservatism hypothesis that habitat requirements are heritable and therefore that the niche of the common ancestor was similar and predictable from those of the daughter species.5
The ecological comparison of sister species only relies on phylogenies to identify sister species pairs, i.e. those taxa for which contemporary range and environmental data are available (unlike phylogenetic methods that aim to make inferences about deeper nodes). But ecological niche modelling does not provide a test for sympatry, as sister species with overlapping potential ranges may still have arisen by allopatric speciation. (Realized niches are, by definition, subsets of potential niches, and they may not have overlapped.) For recently diverged taxa, phylogeographic methods exist to test whether the sister taxa were still in genetic contact at the time of divergence. Allopatric speciation results from reproductive isolation, whereas gene flow is expected to continue at neutral loci between sympatric populations that are diverging due to selection.6
The Divergence Population Genetics approach used by Machado et al. 20027 uses sequence data from various loci across the genome. If loci have significantly different levels of variation in fixed differences and shared polymorphies between sister taxa, then the null isolation model is incorrect. This may be the result of selection on some loci but not others, which can be tested for using the HKA and McDonald–Kreitman tests, which respectively examine the correlation between amounts of polymorphism and divergence, and whether polymorphic and divergent sites have the same ratio of synonymous to nonsynonymous substitutions. If no evidence for selection is found, the alternative explanation is that the ancestral populations became genetically structured and speciated in the presence of gene flow.
Machado et al. 20027 noted, however, that the processes affecting the variables under observation had not been modelled at the time, so that any statistical tests relied on coalescent simulations to set up null distributions. A coalescent-based likelihood model was subsequently designed by Rasmus Nielsen and colleagues.6 This Isolation with Migration (IM) model includes, in addition to genealogy, the demographic parameters NA, N1, and N2 (the effective sizes of the ancestral and two daughter populations); t (the time since the split); and m1 and m2 (the post-speciation migration rates between the sister species), which are thus directly estimated in this method. If the 95% confidence interval for either m1 or m2 does not include zero, it is inferred that the species diverged in the presence of gene flow, i.e. sympatrically.
By implementing the IM model in a Bayesian framework, one can treat the genealogy as a nuisance parameter and integrate over a large number of possible trees to take uncertainty in the tree estimate into account and minimize the effect of any one genealogy on the demographic estimates. It is also possible and, in fact, necessary to use data from multiple loci, as different genes can have very different histories, even if they are taken from the same individuals.6 These loci are, however, assumed to evolve neutrally. Thus, one should ideally use several nuclear markers, as plastid loci are all linked to genes that are under selection, the entire plastid genome being inherited as a unit. It is also possible to test for selection using the tests outlined above. Another assumption of the model is that there has been no recombination in the loci used. A number of tests and strategies can be used to meet this assumption.6,8
Niemiller et al. 20088 used the IM model to test whether Tennessee cave salamanders speciated allopatrically due to range fragmentation or sympatrically by ecological adaptation. They noted that, for very recent splits, divergence with gene flow may be indistinguishable from isolation (allopatry) with retention of ancestral polymorphisms. Flat posterior distributions are expected, which must be interpreted as an inconclusive result, indicating that the method does not yield false positive results in this situation.
If post-speciation gene flow is detected, a possible alternative explanation is allopatric speciation followed by secondary contact and hybridization.8 The authors looked at the distribution of migration events over time to see if gene flow events were concentrated near the present (indicating renewed contact and hybridization) or if they were continuous since splitting (indicating sympatric divergence). It is also possible to determine the probability of hybridization in the lab by examining mate choice patterns and the fertility of hybrids, as done for Drosophila by Coyne & Orr 1989.9
The IM model also assumes that the populations are panmictic. If there is strong hierarchical structuring, this will distort the estimates of effective population sizes and time since splitting, as population structure reduces Ne and coalescent time is proportional to Ne. However the gene flow parameters should remain unaffected.8
This study found that there had been gene flow between different cave-dwelling species and the surface-dwelling one, indicating that they were not relicts of a widespread population isolated by aridification of the surface environment, but rather the results of parallel colonizations with subsequent adaptation to the subterranean environment.
Fitzpatrick et al. 200910 see the sympatry–allopatry dichotomy as false, since these two modes represent simple, extreme instances along a continuum of more complex speciation patterns. The amount of gene flow is crucial in determining the mechanism whereby speciation has taken place. However, ecology plays a role even in “non-ecological” forms of speciation such as allopatric divergence and polyploidy.11 Thus, rather than trying to assign particular cases to the categories of allopatry or sympatry, it would be more fruitful to describe both the ecological and population-genetic circumstances and processes whereby specific speciation events take place. These may include niche modelling with reconstruction of past habitats, and demographic models based on population-level gene sequences.
-
Fitzpatrick BM, Turelli M. 2006. The geography of mammalian speciation: mixed signals from phylogenies and range maps. Evolution 60: 601–615. ↩︎
-
Losos JB, Glor RE. 2003. Phylogenetic comparative methods and the geography of speciation. TRENDS in Ecology and Evolution 18: 220–227. ↩︎
-
Barraclough TG, Vogler AP. 2000. Detecting the geographical pattern of speciation from species-level phylogenies. The American Naturalist 155: 419–434. ↩︎
-
Van der Niet T, Johnson SD. 2009. Patterns of plant speciation in the Cape floristic region. Molecular Phylogenetics and Evolution 51: 85–93. ↩︎
-
Kozak KH, Wiens JJ. 2006. Does niche conservatism promote speciation? A case study in North American salamanders. Evolution 60: 2604–2621. ↩︎
-
Hey J, Nielsen R. 2004. Multilocus methods for estimating population sizes, migration rates and divergence time, with applications to the divergence of Drosophila pseudoobscura and D. persimilis. Genetics 167: 747–760. ↩︎
-
Machado CA, Kliman RM, Markert JA, Hey J. 2002. Inferring the history of speciation from multilocus DNA sequence data: the case of Drosophila pseudoobscura and close relatives. Molecular Biology and Evolution 19: 472–488. ↩︎
-
Niemiller ML, Fitzpatrick BM, Miller BT. 2008. Recent divergence with gene flow in Tennessee cave salamanders (Plethodontidae: Gyrinophilus) inferred from gene genealogies. Molecular Ecology 17: 2258–2275. ↩︎
-
Coyne JA, Orr HA. 1989. Patterns of speciation in Drosophila. Evolution 43: 362–381. ↩︎
-
Fitzpatrick BM, Fordyce JA, Gavrilets S. 2009. Pattern, process and geographic modes of speciation. Journal of Evolutionary Biology 22: 2342–2347. ↩︎
-
Sobel JM, Chen GF, Watt LR, Schemske DW. 2009. The biology of speciation. Evolution 64: 295–315. ↩︎