Formal reasoning about hashing-based probabilistic data structures often requires reasoning about random variables where when one variable gets larger (such as the number of elements hashed into one bucket), the others tend to be smaller (like the number of elements hashed into the other buckets). This is an example of
, a generalization of probabilistic independence that has recently found interesting applications in algorithm design and machine learning. Despite the usefulness of negative dependence for the analyses of probabilistic data structures, existing verification methods cannot establish this property for randomized programs.
To fill this gap, we design LINA, a probabilistic separation logic for reasoning about negative dependence. Following recent works on probabilistic separation logic using
to reason about the probabilistic independence of random variables, we use separating conjunction to reason about negative dependence. Our assertion logic features two separating conjunctions, one for independence and one for negative dependence. We generalize the logic of bunched implications (BI) to support multiple separating conjunctions, and provide a sound and complete proof system. Notably, the semantics for separating conjunction relies on a
, rather than partial, operation for combining resources. By drawing on closure properties for negative dependence, our program logic supports a Frame-like rule for negative dependence and
operations. We demonstrate how LINA can verify probabilistic properties of hash-based data structures and balls-into-bins processes.
This research is inspired from monitoring the process covariance structure of q attributes where samples are independent, having been collected from a multivariate normal distribution with known mean vector and unknown covariance matrix. The focus is on two matrix random variables, constructed from different Wishart ratios, that describe the process for the two consecutive time periods before and immediately after the change in the covariance structure took place. The product moments of these constructed random variables are highlighted and set the scene for a proposed measure to enable the practitioner to calculate the run-length probability to detect a shift immediately after a change in the covariance matrix occurs. Our results open a new approach and provides insight for detecting the change in the parameter structure as soon as possible once the underlying process, described by a multivariate normal process, encounters a permanent/sustained upward or downward shift.
This paper is concerned with developing low variance simulation estimators of probabilities related to the sum of Bernoulli random variables. It shows how to utilize an identity used in the Chen-Stein approach to bounding Poisson approximations to obtain low variance estimators. Applications and numerical examples in such areas as pattern occurrences, generalized coupon collecting, system reliability, and multivariate normals are presented. We also consider the problem of estimating the probability that a positive linear combination of Bernoulli random variables is greater than some specified value, and present a simulation estimator that is always less than the Markov inequality bound on that probability.
Human-induced environmental change increasingly threatens the stability of socio-ecological systems. Careful statistical characterization of environmental concentrations is critical to quantify and predict the consequences of such changes on human and ecosystems conditions. However, while concentrations are naturally defined as the ratio between solute mass and solvent volume, they have rarely been treated as such, typically limiting the analysis to familiar distributions generically used for any other environmental variable. To address this gap, we propose a more general framework that leverages their definition explicitly as ratios of random variables. We show that the resulting models accurately describe the behavior of nitrate plus nitrite in US rivers and salt concentration in estuaries in the Everglades by accounting for heavy tails potentially emerging when the water volume fluctuates around low values. Models that preclude the presence of heavy tails and the related high probability of extreme concentrations could significantly undermine the accuracy of diagnostic frameworks and the effectiveness of mitigation interventions, especially for soil contamination characterized by a water volume (i.e., soil moisture) frequently approaching zero.
In this paper we develop the foundation of a new theory for decision trees based on new modeling of phenomena with soft numbers. Soft numbers represent the theory of soft logic that addresses the need to combine real processes and cognitive ones in the same framework. At the same time soft logic develops a new concept of modeling and dealing with uncertainty: the uncertainty of time and space. It is a language that can talk in two reference frames, and also suggest a way to combine them. In the classical probability, in continuous random variables there is no distinguishing between the probability involving strict inequality and non-strict inequality. Moreover, a probability involves equality collapse to zero, without distinguishing among the values that we would like that the random variable will have for comparison. This work presents Soft Probability, by incorporating of Soft Numbers into probability theory. Soft Numbers are set of new numbers that are linear combinations of multiples of ”ones” and multiples of ”zeros”. In this work, we develop a probability involving equality as a ”soft zero” multiple of a probability density function (PDF). We also extend this notion of soft probabilities to the classical definitions of Complements, Unions, Intersections and Conditional probabilities, and also to the expectation, variance and entropy of a continuous random variable, condition being in a union of disjoint intervals and a discrete set of numbers. This extension provides information regarding to a continuous random variable being within discrete set of numbers, such that its probability does not collapse completely to zero. When we developed the notion of soft entropy, we found potentially another soft axis, multiples of 0log(0), that motivates to explore the properties of those new numbers and applications. We extend the notion of soft entropy into the definition of Cross Entropy and Kullback–Leibler-Divergence (KLD), and we found that a soft KLD is a soft number, that does not have a multiple of 0log(0). Based on a soft KLD, we defined a soft mutual information, that can be used as a splitting criteria in decision trees with data set of continuous random variables, consist of single samples and intervals.
In this paper we focus on providing sufficient conditions for some well-known stochastic orders in reliability but dealing with the discrete versions of them, filling a gap in the literature. In particular, we find conditions based on the unimodality of the likelihood ratio for the comparison in some stochastic orders of two discrete random variables. These results have interest in comparing discrete random variables because the sufficient conditions are easy to check when there are no closed expressions for the survival functions, which occurs in many cases. In addition, the results are applied to compare several parametric families of discrete distributions.