zero counts
Recently Published Documents


TOTAL DOCUMENTS

30
(FIVE YEARS 12)

H-INDEX

7
(FIVE YEARS 3)

2021 ◽  
Vol 3 (4) ◽  
Author(s):  
Gerard A Bouland ◽  
Ahmed Mahfouz ◽  
Marcel J T Reinders

Abstract Single-cell RNA sequencing data is characterized by a large number of zero counts, yet there is growing evidence that these zeros reflect biological variation rather than technical artifacts. We propose to use binarized expression profiles to identify the effects of biological variation in single-cell RNA sequencing data. Using 16 publicly available and simulated datasets, we show that a binarized representation of single-cell expression data accurately represents biological variation and reveals the relative abundance of transcripts more robustly than counts.


2021 ◽  
Author(s):  
Gerard A. Bouland ◽  
Ahmed Mahfouz ◽  
Marcel J.T. Reinders

AbstractSingle-cell RNA sequencing data is characterized by a large number of zero counts, yet there is growing evidence that these zeros reflect biological rather than technical artifacts. We propose differential dropout analysis (DDA), as an alternative to differential expression analysis (DEA), to identify the effects of biological variation in single-cell RNA sequencing data. Using 16 publicly available datasets, we show that dropout patterns are biological in nature and can assess the relative abundance of transcripts more robustly than counts.


2021 ◽  
Vol 10 (s1) ◽  
Author(s):  
Sami Khedhiri

Abstract Objectives Modeling and forecasting possible trajectories of COVID-19 infections and deaths using statistical methods is one of the most important topics in present time. However, statistical models use different assumptions and methods and thus yield different results. One issue in monitoring disease progression over time is how to handle excess zeros counts. In this research, we assess the statistical empirical performance of these models in terms of their fit and forecast accuracy of COVID-19 deaths. Methods Two types of models are suggested in the literature to study count time series data. The first type of models is based on Poisson and negative binomial conditional probability distributions to account for data over dispersion and using auto regression to account for dependence of the responses. The second type of models is based on zero-inflated mixed auto regression and also uses exponential family conditional distributions. We study the goodness of fit and forecast accuracy of these count time series models based on autoregressive conditional count distributions with and without zero inflation. Results We illustrate these methods using a recently published online COVID-19 data for Tunisia, which reports daily death counts from March 2020 to February 2021. We perform an empirical analysis and we compare the fit and the forecast performance of these models for death counts in presence of an intervention policy. Our statistical findings show that models that account for zero inflation produce better fit and have more accurate forecast of the pandemic deaths. Conclusions This paper shows that infectious disease data with excess zero counts are better modelled with zero-inflated models. These models yield more accurate predictions of deaths related to the pandemic than the generalized count data models. In addition, our statistical results find that the lift of travel restrictions has a significant impact on the surge of COVID-19 deaths. One plausible explanation of the outperformance of zero-inflated models is that the zero values are related to an intervention policy and therefore they are structural.


BMC Genomics ◽  
2020 ◽  
Vol 21 (S9) ◽  
Author(s):  
Siamak Zamani Dadaneh ◽  
Paul de Figueiredo ◽  
Sing-Hoi Sze ◽  
Mingyuan Zhou ◽  
Xiaoning Qian

Abstract Background Single-cell RNA sequencing (scRNA-seq) is a powerful profiling technique at the single-cell resolution. Appropriate analysis of scRNA-seq data can characterize molecular heterogeneity and shed light into the underlying cellular process to better understand development and disease mechanisms. The unique analytic challenge is to appropriately model highly over-dispersed scRNA-seq count data with prevalent dropouts (zero counts), making zero-inflated dimensionality reduction techniques popular for scRNA-seq data analyses. Employing zero-inflated distributions, however, may place extra emphasis on zero counts, leading to potential bias when identifying the latent structure of the data. Results In this paper, we propose a fully generative hierarchical gamma-negative binomial (hGNB) model of scRNA-seq data, obviating the need for explicitly modeling zero inflation. At the same time, hGNB can naturally account for covariate effects at both the gene and cell levels to identify complex latent representations of scRNA-seq data, without the need for commonly adopted pre-processing steps such as normalization. Efficient Bayesian model inference is derived by exploiting conditional conjugacy via novel data augmentation techniques. Conclusion Experimental results on both simulated data and several real-world scRNA-seq datasets suggest that hGNB is a powerful tool for cell cluster discovery as well as cell lineage inference.


Author(s):  
Abhishek Sarkar ◽  
Matthew Stephens

AbstractHow to model and analyze scRNA-seq data has been the subject of considerable confusion and debate. The high proportion of zero counts in a typical scRNA-seq data matrix has garnered particular attention, and lead to widespread but inconsistent use of terminology such as “dropout” and “missing data.” Here, we argue that much of this terminology is unhelpful and confusing, and outline simple ways of thinking about models for scRNA-seq data that can help avoid this confusion. The key ideas are: (1) observed scRNA-seq counts reflect both the actual expression level of each gene in each cell and the measurement process, and it is important for models to explicitly distinguish contributions from these two distinct factors; and (2) the measurement process can be adequately described by a simple Poisson model, a claim for which we provide both theoretical and empirical support. We show how these ideas lead to a simple, flexible statistical framework that encompasses a number of commonly used models and analysis methods, and how this framework makes explicit their different assumptions and helps interpret their results. We also illustrate how explicitly separating models for expression and measurement can help address questions of biological interest, such as whether mRNA expression levels are multi-modal among cells.


Author(s):  
Sujan Rudra ◽  
Soma Chowdhury Biswas

Our main aim is to identify the factors that influence the use of manufactured cigarettes among tobacco users especially those whose age is above fifteen. Among the tobacco users, a large portion of adult does not take manufactured cigarettes but take other tobacco. As a result, we need to construct a model that can handle the existence of excess zero counts and the over-dispersed phenomenon. Motivated by these facts, in this paper, we propose to apply the Hurdle Negative Binomial (HNB) regression model to discover the relationships between uses of manufactured cigarettes and social factors. The data were found to have excess zeros (35%); moreover, the variance is 47.122, which is much higher than its mean 5.933. With excess zeros and high variability of non-zero outcomes, the HNB model was found to be better fitted.  


Author(s):  
Luay Habeeb Hashim ◽  
Ahmad Naeem Flaih

Count data, including zero counts arise in a wide variety of application, hence models for counts have become widely popular in many fields. In the statistics field, one may define the count data as that type of observation which takes only the non-negative integers value. Sometimes researchers may Counts more zeros than the expected. Excess zero can be defined as Zero-Inflation. Data with abundant zeros are especially popular in health, marketing, finance, econometric, ecology, statistics quality control, geographical, and environmental fields when counting the occurrence of certain behavioral and natural events, such as frequency of alcohol use, take drugs, number of cigarettes smoked, the occurrence of earthquakes, rainfall, and etc.  Some models have been used to analyzing count data such as the zero-inflated Poisson (ZIP) model and the negative binomial model. In this paper, the models, Poisson, Negative Binomial, ZIP, and ZINB were been used to analyze rainfall data.


2019 ◽  
Author(s):  
Rebecca Elyanow ◽  
Bianca Dumitrascu ◽  
Barbara E. Engelhardt ◽  
Benjamin J. Raphael

AbstractMotivationSingle-cell RNA-sequencing (scRNA-seq) enables high throughput measurement of RNA expression in individual cells. Due to technical limitations, scRNA-seq data often contain zero counts for many transcripts in individual cells. These zero counts, or dropout events, complicate the analysis of scRNA-seq data using standard analysis methods developed for bulk RNA-seq data. Current scRNA-seq analysis methods typically overcome dropout by combining information across cells, leveraging the observation that cells generally occupy a small number of RNA expression states.ResultsWe introduce netNMF-sc, an algorithm for scRNA-seq analysis that leverages information across both cells and genes. netNMF-sc combines network-regularized non-negative matrix factorization with a procedure for handling zero inflation in transcript count matrices. The matrix factorization results in a low-dimensional representation of the transcript count matrix, which imputes gene abundance for both zero and non-zero entries and can be used to cluster cells. The network regularization leverages prior knowledge of gene-gene interactions, encouraging pairs of genes with known interactions to be close in the low-dimensional representation. We show that netNMF-sc outperforms existing methods on simulated and real scRNA-seq data, with increasing advantage at higher dropout rates (e.g. above 60%). Furthermore, we show that the results from netNMF-sc – including estimation of gene-gene covariance – are robust to choice of network, with more representative networks leading to greater performance gains.AvailabilitynetNMF-sc is available at github.com/raphael-group/[email protected]


Sign in / Sign up

Export Citation Format

Share Document