true distribution
Recently Published Documents


TOTAL DOCUMENTS

41
(FIVE YEARS 8)

H-INDEX

11
(FIVE YEARS 1)

Author(s):  
Xi Chen ◽  
Qihang Lin ◽  
Guanglin Xu

Distributionally robust optimization (DRO) has been introduced for solving stochastic programs in which the distribution of the random variables is unknown and must be estimated by samples from that distribution. A key element of DRO is the construction of the ambiguity set, which is a set of distributions that contains the true distribution with a high probability. Assuming that the true distribution has a probability density function, we propose a class of ambiguity sets based on confidence bands of the true density function. As examples, we consider the shape-restricted confidence bands and the confidence bands constructed with a kernel density estimation technique. The former allows us to incorporate the prior knowledge of the shape of the underlying density function (e.g., unimodality and monotonicity), and the latter enables us to handle multidimensional cases. Furthermore, we establish the convergence of the optimal value of DRO to that of the underlying stochastic program as the sample size increases. The DRO with our ambiguity set involves functional decision variables and infinitely many constraints. To address this challenge, we apply duality theory to reformulate the DRO to a finite-dimensional stochastic program, which is amenable to a stochastic subgradient scheme as a solution method.


PLoS ONE ◽  
2021 ◽  
Vol 16 (9) ◽  
pp. e0256499
Author(s):  
Stefan Wellek

The vast majority of testing procedures presented in the literature as goodness-of-fit tests fail to accomplish what the term is promising. Actually, a significant result of such a test indicates that the true distribution underlying the data differs substantially from the assumed model, whereas the true objective is usually to establish that the model fits the data sufficiently well. Meeting that objective requires to carry out a testing procedure for a problem in which the statement that the deviations between model and true distribution are small, plays the role of the alternative hypothesis. Testing procedures of this kind, for which the term tests for equivalence has been coined in statistical usage, are available for establishing goodness-of-fit of discrete distributions. We show how this methodology can be extended to settings where interest is in establishing goodness-of-fit of distributions of the continuous type.


Zootaxa ◽  
2021 ◽  
Vol 4951 (1) ◽  
pp. 130-136
Author(s):  
PAWEŁ JAŁOSZYŃSKI

Horaeomorphus Schaufuss is an easily identifiable genus of Stenichnini, predominantly distributed in the Australasian realm. Many scydmaenines occurring in other regions, however, have been misplaced in Horaeomorphus, and therefore the true distribution of this genus remains unclear. In previous studies several new genera were established for Australian species misplaced in Horaeomorphus, and a few species misplaced in Euconnus Thomson, Stenichnus Thomson, and Syndicus Motschulsky were transferred to Horaeomorphus. Three species that inhabit New Caledonia were placed in this genus: Horaeomorphus australis Franz, H. baloghi Franz, and H. novaecaledoniae Franz. Examination of these taxa revealed that none of them was conspecific with the SE Asian type species of Horaeomorphus. Three new combinations are proposed: Heterotetramelus (s. str.) australis (Franz) comb. n., Heterotetramelus (s. str.) baloghi (Franz) comb. n., and Heterotetramelus (s. str.) novaecaledoniae (Franz) comb. n.; each species is redescribed. 


Author(s):  
Liang Xu ◽  
Yi Zheng ◽  
Li Jiang

Problem definition: For the standard newsvendor problem with an unknown demand distribution, we develop an approach that uses data input to construct a distribution ambiguity set with the nonparametric characteristics of the true distribution, and we use it to make robust decisions. Academic/practical relevance: Empirical approach relies on historical data to estimate the true distribution. Although the estimated distribution converges to the true distribution, its performance with limited data is not guaranteed. Our approach generates robust decisions from a distribution ambiguity set that is constructed by data-driven estimators for nonparametric characteristics and includes the true distribution with the desired probability. It fits situations where data size is small. Methodology: We apply a robust optimization approach with nonparametric information. Results: Under a fixed method to partition the support of the demand, we construct a distribution ambiguity set, build a protection curve as a proxy for the worst-case distribution in the set, and use it to obtain a robust stocking quantity in closed form. Implementation-wise, we develop an adaptive method to continuously feed data to update partitions with a prespecified confidence level in their unbiasedness and adjust the protection curve to obtain robust decisions. We theoretically and experimentally compare the proposed approach with existing approaches. Managerial implications: Our nonparametric approach under adaptive partitioning guarantees that the realized average profit exceeds the worst-case expected profit with a high probability. Using real data sets from Kaggle.com, it can outperform existing approaches in yielding profit rate and stabilizing the generated profits, and the advantages are more prominent as the service ratio decreases. Nonparametric information is more valuable than parametric information in profit generation provided that the service requirement is not too high. Moreover, our proposed approach provides a means of combining nonparametric and parametric information in a robust optimization framework.


2020 ◽  
Vol 34 (04) ◽  
pp. 4255-4263
Author(s):  
Shantanu Jain ◽  
Justin Delano ◽  
Himanshu Sharma ◽  
Predrag Radivojac

Positive-unlabeled learning is often studied under the assumption that the labeled positive sample is drawn randomly from the true distribution of positives. In many application domains, however, certain regions in the support of the positive class-conditional distribution are over-represented while others are under-represented in the positive sample. Although this introduces problems in all aspects of positive-unlabeled learning, we begin to address this challenge by focusing on the estimation of class priors, quantities central to the estimation of posterior probabilities and the recovery of true classification performance. We start by making a set of assumptions to model the sampling bias. We then extend the identifiability theory of class priors from the unbiased to the biased setting. Finally, we derive an algorithm for estimating the class priors that relies on clustering to decompose the original problem into subproblems of unbiased positive-unlabeled learning. Our empirical investigation suggests feasibility of the correction strategy and overall good performance.


2019 ◽  
Vol 9 (4) ◽  
pp. 813-850 ◽  
Author(s):  
Jay Mardia ◽  
Jiantao Jiao ◽  
Ervin Tánczos ◽  
Robert D Nowak ◽  
Tsachy Weissman

Abstract We study concentration inequalities for the Kullback–Leibler (KL) divergence between the empirical distribution and the true distribution. Applying a recursion technique, we improve over the method of types bound uniformly in all regimes of sample size $n$ and alphabet size $k$, and the improvement becomes more significant when $k$ is large. We discuss the applications of our results in obtaining tighter concentration inequalities for $L_1$ deviations of the empirical distribution from the true distribution, and the difference between concentration around the expectation or zero. We also obtain asymptotically tight bounds on the variance of the KL divergence between the empirical and true distribution, and demonstrate their quantitatively different behaviours between small and large sample sizes compared to the alphabet size.


Author(s):  
Kunio Takezawa

In this paper, AIC (Akaike's Information Criterion) is used to judge whether a coin is biased or not using the sequence of heads and tails produced by tossing the coin several times. It is well known that AIC·(−0:5) is an efficient estimator of the expected log-likelihood when the true distribution is contained in a specified parametric model. In the coin tossing problem, however, AIC·(−0:5) works as an efficient estimator even if the true distribution is not contained in a specied parametric model. Moreover, the judgement of fairness of coin using AIC is equivalent to a statistical test using the Bernoulli distribution with a signicance level ranging from 11% to 18%. This indicates that the judgement of the fairness of coin based on AIC leads to a higher probability of type I errors than that given by a statistical test with a signicance level of 5%. These findings show that we judge the fairness of a coin based on AIC when we do not have any prior knowledge about its fairness and we want to judge it from the standpoint of prediction. In contrast, a statistical test with a significance level of 5% is adopted when we have prior knowledge that the coin is probably unbiased. Moreover, a statistical test with a 5% significance level allows us to conclude that the coin is biased if we obtain sufficient evidence that permits us to disbelieve the prior knowledge.


2019 ◽  
Author(s):  
Truly Santika ◽  
Michael F. Hutchinson ◽  
Kerrie A. Wilson

ABSTRACTPresence-only data used to develop species distribution models are often biased towards areas that are frequently surveyed. Furthermore, the size of calibration area with respect to the area covered by the species occurrences has been shown to affect model accuracy. However, existing assessments of the effect of data inadequacy and calibration size on model accuracy have predominately been conducted using empirical studies. These studies can give ambiguous results, since the data used to train and test the model can both be biased.These limitations were addressed by applying simulated data to assess how inadequate data coverage and the size of calibration area affect the accuracy of species distribution models generated by MaxEnt and BIOCLIM. The validity of four presence-only performance measures, Contrast Validation Index (CVI), Boyce index, AUC and AUCratio, was also assessed.CVI, AUC and AUCratio ranked the accuracy of univariate models correctly according to the true importance of their defining environmental variable, a desirable property of an accuracy measure. Contrastingly, Boyce index failed to rank the accuracy of univariate models correctly and a high percentage of irrelevant variables produced models with a high Boyce index.Inadequate data coverage and increased calibration area reduced model accuracy by reducing the correct identification of the dominant environmental determinant. BIOCLIM outperformed MaxEnt models in predicting the true distribution of simulated species with a symmetric dominant response. However, MaxEnt outperformed BIOCLIM in predicting the true distribution of simulated species with skew and linear dominant responses. Despite this, the standard performance measures consistently overestimated the performance of MaxEnt models and showed them as always having higher model accuracy than the BIOCLIM models.It has been acknowledged that research should be directed towards testing and improving species distribution modelling tools, particularly how to handle the inevitable bias and scarcity of species occurrence data. Simulated data, as demonstrated here, provides a powerful approach to comprehensively test the performance of modelling tools and to disentangle the effects of data properties and modelling options on model accuracy. This may be impossible to achieve using real-world data.


2018 ◽  
Vol 10 (10) ◽  
pp. 12344-12349
Author(s):  
Dinesh Gabadage ◽  
Gayan Edirisinghe ◽  
Madhava Botejue ◽  
Kalika Perera ◽  
Thilina Surasinghe ◽  
...  

Distribution of Kerivoula hardwickii, Hardwicke's woolly bat, in Sri Lanka is restricted to the central highlands and to northeastern region of the country, and so far, only recorded from four distinct locations. In Sri Lanka, this species was last documented in the year 1994, and no subsequent surveys recorded this species in Sri Lanka, thus considered rare in Sri Lanka. In contrast, within its southern Asian biogeography, K. hardwickii is widely distributed, particularly in Southeastern Asia. In this study, a single male of K. hardwickii was observed in lowland rainforest ecoregion of Sri Lanka near Labugama-Kalatuwana Forest Reserve where the bat was roosting on a curled live banana frond. The bat was roosting 1.8 m above the ground. This was the first instance K. hardwickii was recorded in the lowland rainforests of Sri Lanka, which extends this species’ biogeography of Sri Lanka into the lowland wet zone. Thus, distribution range of K. hardwickii in Sri Lanka could be broader than historically documented. However, intensive surveys, particularly in lowland rainforest region, are required to validate the true distribution of this bat in Sri Lanka.


Sign in / Sign up

Export Citation Format

Share Document