scholarly journals Empirical probability distribution validity based on accumulating statistics of observations by controlling the moving average and root-mean-square deviation

Knowing probability distributions for calculating expected values is always required in the engineering practice and other fields. Commonly, probability distributions are not always available. Moreover, the distribution type may not be reliably determined. In this case, an empirical distribution should be built directly from the observations. Therefore, the goal is to develop a methodology of accumulating and processing observation data so that the respective empirical distribution would be close enough to the unknown real distribution. For this, criteria regarding sufficiency of observations and the distribution validity are to be substantiated. As a result, a methodology is presente О.М. Мелкозьорова1, С.Г. Рассомахінd that considers the empirical probability distribution validity with respect to the parameter’s expected value. Values of the parameter are registered during a period of observations or measurements of the parameter. On this basis, empirical probabilities are calculated, where every next period the previous registration data are used as well. Every period gives an approximation to the parameter’s expected value using those empirical probabilities. The methodology using the moving averages and root-mean-square deviations asserts that the respective empirical distribution is valid (i.e., it is sufficiently close to the unknown real distribution) if the parameter’s expected value approximations become scattered very little for at least the three window multiple-of-2 widths by three successive windows. This criterion also implies the sufficiency of observation periods, although the sufficiency of observations per period is not claimed. The validity strongly depends on the volume of observations per period.

Atmosphere ◽  
2019 ◽  
Vol 10 (2) ◽  
pp. 43 ◽  
Author(s):  
Dariusz Młyński ◽  
Andrzej Wałęga ◽  
Andrea Petroselli ◽  
Flavia Tauro ◽  
Marta Cebulska

The aim of this study was to determine the best probability distributions for calculating the maximum annual daily precipitation with the specific probability of exceedance (Pmaxp%). The novelty of this study lies in using the peak-weighted root mean square error (PWRMSE), the root mean square error (RMSE), and the coefficient of determination (R2) for assessing the fit of empirical and theoretical distributions. The input data included maximum daily precipitation records collected in the years 1971–2014 at 51 rainfall stations from the Upper Vistula Basin, Southern Poland. The value of Pmaxp% was determined based on the following probability distributions of random variables: Pearson’s type III (PIII), Weibull’s (W), log-normal, generalized extreme value (GEV), and Gumbel’s (G). Our outcomes showed a lack of significant trends in the observation series of the investigated random variables for a majority of the rainfall stations in the Upper Vistula Basin. We found that the peak-weighted root mean square error (PWRMSE) method, a commonly used metric for quality assessment of rainfall-runoff models, is useful for identifying the statistical distributions of the best fit. In fact, our findings demonstrated the consistency of this approach with the RMSE goodness-of-fit metrics. We also identified the GEV distribution as recommended for calculating the maximum daily precipitation with the specific probability of exceedance in the catchments of the Upper Vistula Basin.


Author(s):  
Zhenjia (Jerry) Huang ◽  
Yu Zhang

In wave basin model test of an offshore structure, waves that represent the given sea states have to be generated, qualified and accepted for the model test. We normally accept waves in wave calibration tests if the significant wave height, spectral peak period and spectrum match the specified target values. However, for model tests where the responses depend highly on the local wave motions (wave elevation and kinematics) such as wave impact on hull, green water impact on deck and air gap tests, additional qualification checks may be required. For instance, we may need to check wave crest probability distributions to avoid unrealistic wave crest in the test. To date, acceptance criteria of wave crest distribution calibration tests of large and steep waves of three-hour duration (full scale) have not been established. Two purposes of the work presented in the paper are: 1. to define and clarify the wave crest probability distribution of single realization (PDSR) and the probability distribution of wave crest for an ensemble of realizations (PDER) of a given sea state in order to use them appropriately; and 2. to develop semi-empirical probability distributions of nonlinear waves for both PDSR and PDER for easy, practical use. We found that in current practice ensemble and single realization distributions have the potential to be misinterpreted and misused. Clear understanding of the two kinds of distributions will help appropriate offshore design and production unit performance assessments. The semi-empirical formulas proposed in this paper were developed through regression analysis of crest distributions from a large number of sea states and realizations. Wave time series from potential flow simulations, computational fluid dynamics (CFD) simulations and model test results were used to establish the probability distributions. The nonlinear wave simulations were performed for three-hour duration assuming that they were long-crested. The sea states are assumed to be represented by JONSWAP spectrum, where a wide range of significant wave height, peak period, spectral peak parameter, and water depth were considered. Coefficients of the proposed semi-empirical probability distribution formulas, comparisons among crest distributions from numerical simulations and the semi-empirical formulas are presented in this paper.


Geosciences ◽  
2020 ◽  
Vol 10 (2) ◽  
pp. 43
Author(s):  
Md Masud Hasan ◽  
Barry F. W. Croke ◽  
Shuangzhe Liu ◽  
Kunio Shimizu ◽  
Fazlul Karim

Probabilistic models for sub-daily rainfall predictions are important tools for understanding catchment hydrology and estimating essential rainfall inputs for agricultural and ecological studies. This research aimed at achieving theoretical probability distribution to non-zero, sub-daily rainfall using data from 1467 rain gauges across the Australian continent. A framework was developed for estimating rainfall data at ungauged locations using the fitted model parameters from neighbouring gauges. The Lognormal, Gamma and Weibull distributions, as well as their mixed distributions were fitted to non-zero six-minutes rainfall data. The root mean square error was used to evaluate the goodness of fit for each of these distributions. To generate data at ungauged locations, parameters of well-fit models were interpolated from the four closest neighbours using inverse weighting distance method. Results show that the Gamma and Weibull distributions underestimate and lognormal distributions overestimate the high rainfall events. In general, a mixed model of two distributions was found better compared to the results of an individual model. Among the five models studied, the mixed Gamma and Lognormal (G-L) distribution produced the minimum root mean square error. The G-L model produced the best match to observed data for high rainfall events (e.g., 90th, 95th, 99th, 99.9th and 99.99th percentiles).


2019 ◽  
Vol 31 (3) ◽  
pp. 299-309 ◽  
Author(s):  
Ali Akbar Safaei ◽  
Hassan Ghassemi ◽  
Mahmoud Ghiasi

Fuel consumption of marine vessels plays an important role in both generating air pollution and ship operational expenses where the global environmental concerns toward air pollution and economics of shipping operation are being increased. In order to optimize ship fuel consumption, the fuel consumption prediction for her envisaged voyage is to be known. To predict fuel consumption of a ship, noon report (NR) data are available source to be analysed by different techniques. Because of the possible human error attributed to the method of NR data collection, it involves risk of possible inaccuracy. Therefore, in this study, to acquire pure valid data, the NR raw data of two very large crude carriers (VLCCs) composed with their respective Automatic Identification System (AIS) satellite data. Then, well-known models i.e. K-Mean, Self-Organizing Map (SOM), Outlier Score Base (OSB) and Histogram of Outlier Score Base (HSOB) methods are applied to the collected tankers NR during a year. The new enriched data derived are compared to the raw NR to distinguish the most fitted methodology of accruing pure valid data. Expected value and root mean square methods are applied to evaluate the accuracy of the methodologies. It is concluded that measured expected value and root mean square for HOSB are indicating high coherence with the harmony of the primary NR data.


Entropy ◽  
2021 ◽  
Vol 23 (12) ◽  
pp. 1668
Author(s):  
Jan Naudts

The present paper investigates the update of an empirical probability distribution with the results of a new set of observations. The update reproduces the new observations and interpolates using prior information. The optimal update is obtained by minimizing either the Hellinger distance or the quadratic Bregman divergence. The results obtained by the two methods differ. Updates with information about conditional probabilities are considered as well.


Sign in / Sign up

Export Citation Format

Share Document