The Spectre of Too Many Species
AbstractRecent simulation studies examining the performance of Bayesian species delimitation as implemented in the BPP program have suggested that BPP may detect population splits but not species divergences and that it tends to over-split when data of many loci are analyzed. Here we confirm several of these results and provide their mathematical justifications. We point out that the distinction between population and species splits made in the protracted speciation model has no influence on the generation of gene trees and sequence data, which explains why no method can use such data to distinguish between population splits and speciation. We suggest that the the protracted speciation model is unrealistic and its mechanism for assigning species status contradicts prevailing taxonomic practice. We confirm the suggestion, based on simulation, that in the case of speciation with gene flow, Bayesian model selection as implemented in BPP tends to detect population splits when the amount of data (the number of loci) increases so over-splitting is a legitimate concern. We discuss the use of a recently proposed empirical genealogical divergence index (gdi) for species delimitation and illustrate that parameter estimates produced by a full likelihood analysis as implemented in BPP provide much more reliable inference under thegdithan the approximate method PHRAPL. We suggest that the Bayesian model-selection approach is useful for identifying sympatric cryptic species while Bayesian parameter estimation under the multispecies coalescent can be used to implement empirical criteria for determining species status among allopatric populations.