The Spectre of Too Many Species

AbstractRecent simulation studies examining the performance of Bayesian species delimitation as implemented in the BPP program have suggested that BPP may detect population splits but not species divergences and that it tends to over-split when data of many loci are analyzed. Here we confirm several of these results and provide their mathematical justifications. We point out that the distinction between population and species splits made in the protracted speciation model has no influence on the generation of gene trees and sequence data, which explains why no method can use such data to distinguish between population splits and speciation. We suggest that the the protracted speciation model is unrealistic and its mechanism for assigning species status contradicts prevailing taxonomic practice. We confirm the suggestion, based on simulation, that in the case of speciation with gene flow, Bayesian model selection as implemented in BPP tends to detect population splits when the amount of data (the number of loci) increases so over-splitting is a legitimate concern. We discuss the use of a recently proposed empirical genealogical divergence index (gdi) for species delimitation and illustrate that parameter estimates produced by a full likelihood analysis as implemented in BPP provide much more reliable inference under thegdithan the approximate method PHRAPL. We suggest that the Bayesian model-selection approach is useful for identifying sympatric cryptic species while Bayesian parameter estimation under the multispecies coalescent can be used to implement empirical criteria for determining species status among allopatric populations.

Download Full-text

How to improve parameter estimates in GLM-based fMRI data analysis: cross-validated Bayesian model averaging

10.1101/095778 ◽

2016 ◽

Cited By ~ 2

Author(s):

Joram Soch ◽

Achim Pascal Meyer ◽

John-Dylan Haynes ◽

Carsten Allefeld

Keyword(s):

Data Analysis ◽

Model Selection ◽

Bayesian Model ◽

Bayesian Model Averaging ◽

Model Averaging ◽

Bayesian Model Selection ◽

Fmri Data ◽

Parameter Estimates ◽

Fmri Data Analysis ◽

Level Analysis

AbstractIn functional magnetic resonance imaging (fMRI), model quality of general linear models (GLMs) for first-level analysis is rarely assessed. In recent work (Soch et al., 2016: “How to avoid mismodelling in GLM-based fMRI data analysis: cross-validated Bayesian model selection”, NeuroImage, vol. 141, pp. 469-489; DOI: 10.1016/j. neuroimage.2016.07.047), we have introduced cross-validated Bayesian model selection (cvBMS) to infer the best model for a group of subjects and use it to guide second-level analysis. While this is the optimal approach given that the same GLM has to be used for all subjects, there is a much more efficient procedure when model selection only addresses nuisance variables and regressors of interest are included in all candidate models. In this work, we propose cross-validated Bayesian model averaging (cvBMA) to improve parameter estimates for these regressors of interest by combining information from all models using their posterior probabilities. This is particularly useful as different models can lead to different conclusions regarding experimental effects and the most complex model is not necessarily the best choice. We find that cvBMS can prevent not detecting established effects and that cvBMA can be more sensitive to experimental effects than just using even the best model in each subject or the model which is best in a group of subjects.

Download Full-text

Bayesian model selection reveals biological origins of zero inflation in single-cell transcriptomics

Genome Biology ◽

10.1186/s13059-020-02103-2 ◽

2020 ◽

Vol 21 (1) ◽

Cited By ~ 5

Author(s):

Kwangbom Choi ◽

Yang Chen ◽

Daniel A. Skelly ◽

Gary A. Churchill

Keyword(s):

Model Selection ◽

Single Cell ◽

Bayesian Model ◽

Negative Binomial ◽

Reference Model ◽

Bayesian Model Selection ◽

Cellular Heterogeneity ◽

Parameter Estimates ◽

Zero Inflation ◽

Suitable Reference

Abstract Background Single-cell RNA sequencing is a powerful tool for characterizing cellular heterogeneity in gene expression. However, high variability and a large number of zero counts present challenges for analysis and interpretation. There is substantial controversy over the origins and proper treatment of zeros and no consensus on whether zero-inflated count distributions are necessary or even useful. While some studies assume the existence of zero inflation due to technical artifacts and attempt to impute the missing information, other recent studies argue that there is no zero inflation in scRNA-seq data. Results We apply a Bayesian model selection approach to unambiguously demonstrate zero inflation in multiple biologically realistic scRNA-seq datasets. We show that the primary causes of zero inflation are not technical but rather biological in nature. We also demonstrate that parameter estimates from the zero-inflated negative binomial distribution are an unreliable indicator of zero inflation. Conclusions Despite the existence of zero inflation in scRNA-seq counts, we recommend the generalized linear model with negative binomial count distribution, not zero-inflated, as a suitable reference model for scRNA-seq analysis.

Download Full-text

Bayesian model selection reveals biological origins of zero inflation in single-cell transcriptomics

10.1101/2020.03.03.974808 ◽

2020 ◽

Cited By ~ 2

Author(s):

Kwangbom Choi ◽

Yang Chen ◽

Daniel A. Skelly ◽

Gary A. Churchill

Keyword(s):

Model Selection ◽

Single Cell ◽

Bayesian Model ◽

Negative Binomial ◽

Reference Model ◽

Bayesian Model Selection ◽

Cellular Heterogeneity ◽

Parameter Estimates ◽

Zero Inflation ◽

Suitable Reference

AbstractSingle-cell RNA sequencing is a powerful tool for characterizing cellular heterogeneity in gene expression. However, high variability and a large number of zero counts present challenges for analysis and interpretation. There is substantial controversy over the origins and proper treatment of zeros and no consensus on whether zero-inflated count distributions are necessary or even useful. While some studies assume the existence of zero inflation due to technical artifacts and attempt to impute the missing information, other recent studies of argue that there is no zero inflation in scRNA-Seq data. We apply a Bayesian model selection approach to unambiguously demonstrate zero inflation in multiple biologically realistic scRNA-Seq datasets. We show that the primary causes of zero inflation are not technical but rather biological in nature. We also demonstrate that parameter estimates from the zero-inflated negative binomial distribution are an unreliable indicator of zero inflation. Despite the existence of zero inflation of scRNA-Seq counts, we recommend the generalized linear model with negative binomial count distribution (not zero-inflated) as a suitable reference model for scRNA-Seq analysis.

Download Full-text