scholarly journals A Maximum-Entropy Method to Estimate Discrete Distributions from Samples Ensuring Nonzero Probabilities

Entropy ◽  
2018 ◽  
Vol 20 (8) ◽  
pp. 601 ◽  
Author(s):  
Paul Darscheid ◽  
Anneli Guthke ◽  
Uwe Ehret

When constructing discrete (binned) distributions from samples of a data set, applications exist where it is desirable to assure that all bins of the sample distribution have nonzero probability. For example, if the sample distribution is part of a predictive model for which we require returning a response for the entire codomain, or if we use Kullback–Leibler divergence to measure the (dis-)agreement of the sample distribution and the original distribution of the variable, which, in the described case, is inconveniently infinite. Several sample-based distribution estimators exist which assure nonzero bin probability, such as adding one counter to each zero-probability bin of the sample histogram, adding a small probability to the sample pdf, smoothing methods such as Kernel-density smoothing, or Bayesian approaches based on the Dirichlet and Multinomial distribution. Here, we suggest and test an approach based on the Clopper–Pearson method, which makes use of the binominal distribution. Based on the sample distribution, confidence intervals for bin-occupation probability are calculated. The mean of each confidence interval is a strictly positive estimator of the true bin-occupation probability and is convergent with increasing sample size. For small samples, it converges towards a uniform distribution, i.e., the method effectively applies a maximum entropy approach. We apply this nonzero method and four alternative sample-based distribution estimators to a range of typical distributions (uniform, Dirac, normal, multimodal, and irregular) and measure the effect with Kullback–Leibler divergence. While the performance of each method strongly depends on the distribution type it is applied to, on average, and especially for small sample sizes, the nonzero, the simple “add one counter”, and the Bayesian Dirichlet-multinomial model show very similar behavior and perform best. We conclude that, when estimating distributions without an a priori idea of their shape, applying one of these methods is favorable.

2015 ◽  
Vol 137 (4) ◽  
Author(s):  
Gholamhossein Yari ◽  
Zahra Amini Farsani

In the field of the wind energy conversion, a precise determination of the probability distribution of wind speed guarantees an efficient use of the wind energy and enhances the position of wind energy against other forms of energy. The present study thus proposes utilizing an accurate numerical-probabilistic algorithm which is the combination of the Newton’s technique and the maximum entropy (ME) method to determine an important distribution in the renewable energy systems, namely the hyper Rayleigh distribution (HRD) which belongs to the family of Weibull distribution. The HRD is mainly used to model the wind speed and the variations of the solar irradiance level with a negligible error. The purpose of this research is to find the unique solution to an optimization problem which occurs when maximizing Shannon’s entropy. To confirm the accuracy and efficiency of our algorithm, we used the long-term data for the average daily wind speed in Toyokawa for 12 yr to examine the Rayleigh distribution (RD). This data set was obtained from the National Climatic Data Center (NCDC) in Japan. It seems that the RD is more closely fitted to the data. In addition, we presented different simulation studies to check the reliability of the proposed algorithm.


2008 ◽  
Vol 8 (14) ◽  
pp. 3963-3971 ◽  
Author(s):  
M. Krysta ◽  
M. Bocquet ◽  
J. Brandt

Abstract. We give here an account on the results of source inversion of the ETEX-II experiment. Inversion has been performed with the maximum entropy method on the basis of non-zero measurements and in conjunction with a transport model POLAIR3D. The discrepancy scaling factor between the reconstructed and the true mass has been estimated to be equal to 7. The results contrast with the method's performance on the ETEX-I source. In the latter case its mass has been reconstructed with an accuracy exceeding 80%. The large value of the discrepancy factor for ETEX-II could be ascribed to modelling difficulties, possibly linked not to the transport model itself but rather to the quality of the measurements.


Author(s):  
Y. Zempo ◽  
S.S. Kano

The maximum entropy method is one of the key techniques for spectral analysis. The main feature is to describe spectra in low frequency with short timeseries data. We adopted the maximum entropy method to analyze the spectrum from the dipole moment obtained by the timedependent density functional theory calculation in real time, which is intensively studied and applied to computing optical properties. In the maximum entropy method analysis, we proposed that we use the concatenated data set made from severaltimes repeated raw data together with the phase. We have applied this technique to spectral analysis of the dynamic dipole moment obtained from timedependent density functional theory dipole moment of several molecules such as oligofluorene with n = 8. As a result, the higher resolution can be obtained without any peak shift due to the phase jump. The peak position is in good agreement to that of FT with just raw data. This paper presents the efficiency and characteristic features of this technique. Метод максимальной энтропии — один из основных в спектральном анализе. Его главная особенность — описание низкочастотных спектров короткими временными рядами данных. Авторы применили метод максимальной энтропии для анализа спектров дипольного момента, полученных расчетами в реальном времени по нестационарной теории функционала плотности. Данный вопрос интенсивно изучается и находит практическое применение при расчетах оптических свойств. При анализе методом максимальной энтропии предложено использовать объединенные наборы данных, включающие несколько повторяющихся последовательностей исходных данных с учетом фазы. Данный метод был применен при проведении спектрального анализа динамического дипольного момента, рассчитанного по нестационарной теории функционала плотности на основе дипольного момента нескольких молекул — в частности, молекул олигофлуорена при n = 8. В итоге удалось повысить разрешение без смещения максимумов из-за скачка фазы. Положение максимумов хорошо согласуется с результатами применения преобразования Фурье к необработанным исходным данным. В настоящей статье представлены особенности данного метода и показатели его эффективности.


2008 ◽  
Vol 8 (1) ◽  
pp. 2795-2819 ◽  
Author(s):  
M. Krysta ◽  
M. Bocquet ◽  
J. Brandt

Abstract. We give here an account on the results of source inversion of the ETEX-II experiment. Inversion has been performed with the maximum entropy method on the basis of non-zero measurements and in conjunction with a transport model Polair3D. The discrepancy scaling factor between the true and the reconstructed mass has been estimated to be equal to 7. The results contrast with the method's performance on the ETEX-I source. In the latter case its mass has been reconstructed with an accuracy exceeding 80%. The large value of the discrepancy factor for ETEX-II could be ascribed to modelling difficulties, possibly linked not to the transport model itself but rather to the quality of the measurements.


2021 ◽  
Vol 263 (3) ◽  
pp. 3769-3778
Author(s):  
Ke Ni ◽  
Yu Huang

Many studies have investigated subjective responses to noise, but few concerned about the influence of age on the annoyance (discomfort) caused by noise. It is difficult to get a quantitative model featuring the relationship between noise-induced annoyance and age from one or several laboratory studies due to relatively small samples and limited age groups. This paper investigated recent studies (published after the year 2000) on noise-induced annoyance by the literature review method. We classified the studies according to their employed noise types and summarized the quantified subjective values and the ranges of age. The quantitative values of annoyance obtained from variable rating scales were transferred to a uniform scale and normalized. A probability density function then figured out the corresponding annoyance of a certain age under the small sample -distribution assumption. A predicting model of noise-induced annoyance from the age of 7-55 was proposed, which fitted previous data well.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Kang Li ◽  
Xian-ming Shi ◽  
Juan Li ◽  
Mei Zhao ◽  
Chunhua Zeng

In view of the small sample size of combat ammunition trial data and the difficulty of forecasting the demand for combat ammunition, a Bayesian inference method based on multinomial distribution is proposed. Firstly, considering the different damage grades of ammunition hitting targets, the damage results are approximated as multinomial distribution, and a Bayesian inference model of ammunition demand based on multinomial distribution is established, which provides a theoretical basis for forecasting the ammunition demand of multigrade damage under the condition of small samples. Secondly, the conjugate Dirichlet distribution of multinomial distribution is selected as a prior distribution, and Dempster–Shafer evidence theory (D-S theory) is introduced to fuse multisource previous information. Bayesian inference is made through the Markov chain Monte Carlo method based on Gibbs sampling, and ammunition demand at different damage grades is obtained by referring to cumulative damage probability. The study result shows that the Bayesian inference method based on multinomial distribution is highly maneuverable and can be used to predict ammunition demand of different damage grades under the condition of small samples.


Sign in / Sign up

Export Citation Format

Share Document