The ‘best fit’ to the truncated Poisson distribution

1982 ◽  
Vol 109 (3) ◽  
pp. 453-457
Author(s):  
J. G. Spain

Suppose we have a finite number N of observed frequencies [a(i)], and suppose further that we have a ‘reasonable belief’ that the (truncated) Poisson distribution could be used to describe those frequencies. The parameter μ (which represents both the mean and the variance of the entire unit Poisson distribution) and a suitable scaling-factor H would, in some way, be estimated from the data. In the normal course of events, the chi-squared test would probably be used to test for ‘goodness of fit’, and the number of ‘degrees of freedom’ would reflect the extent to which μ and H had been directly estimated from the data. Depending upon the degree of significance considered appropriate, the test would be used to decide whether or not a ‘reasonable fit’ had been obtained, and whether or not a particular formally-defined hypothesis should be accepted or rejected.

Biometrika ◽  
2019 ◽  
Vol 106 (3) ◽  
pp. 716-723
Author(s):  
Mengyu Xu ◽  
Danna Zhang ◽  
Wei Biao Wu

Summary We establish an approximation theory for Pearson’s chi-squared statistics in situations where the number of cells is large, by using a high-dimensional central limit theorem for quadratic forms of random vectors. Our high-dimensional central limit theorem is proved under Lyapunov-type conditions that involve a delicate interplay between the dimension, the sample size, and the moment conditions. We propose a modified chi-squared statistic and introduce an adjusted degrees of freedom. A simulation study shows that the modified statistic outperforms Pearson’s chi-squared statistic in terms of both size accuracy and power. Our procedure is applied to the construction of a goodness-of-fit test for Rutherford’s alpha-particle data.


2018 ◽  
Vol 52 (5) ◽  
pp. 1801467 ◽  
Author(s):  
Benjamin Planquette ◽  
Olivier Sanchez ◽  
James J. Marsh ◽  
Peter G. Chiles ◽  
Joseph Emmerich ◽  
...  

Residual pulmonary vascular obstruction (RPVO) and chronic thromboembolic pulmonary hypertension (CTEPH) are both long-term complications of acute pulmonary embolism, but it is unknown whether RPVO can be predicted by variants of fibrinogen associated with CTEPH.We used the Akaike information criterion to select the best predictive models for RPVO in two prospectively followed cohorts of acute pulmonary embolism patients, using as candidate variables the extent of the initial obstruction, clinical characteristics and fibrinogen-related data. We measured the selected models’ goodness of fit by analysis of deviance and compared models using the Chi-squared test.RPVO occurred in 29 (28.4%) out of 102 subjects in the first cohort and 46 (25.3%) out of 182 subjects in the second. The best-fit predictive model derived in the first cohort (p=0.0002) and validated in the second cohort (p=0.0005) implicated fibrinogen Bβ-chain monosialylation in the development of RPVO. When the derivation procedure excluded clinical characteristics, fibrinogen Bβ-chain monosialylation remained a predictor of RPVO in the best-fit predictive model (p=0.00003). Excluding fibrinogen characteristics worsened the predictive model (p=0.03).Fibrinogen Bβ-chain monosialylation, a common structural attribute of fibrin, helped predict RPVO after acute pulmonary embolism. Fibrin structure may contribute to the risk of developing RPVO.


Author(s):  
Nandi O Leslie ◽  
Richard E Harang ◽  
Lawrence P Knachel ◽  
Alexander Kott

We propose several generalized linear models (GLMs) to predict the number of successful cyber intrusions (or “intrusions”) into an organization’s computer network, where the rate at which intrusions occur is a function of the following observable characteristics of the organization: (i) domain name system (DNS) traffic classified by their top-level domains (TLDs); (ii) the number of network security policy violations; and (iii) a set of predictors that we collectively call the “cyber footprint” that is comprised of the number of hosts on the organization’s network, the organization’s similarity to educational institution behavior, and its number of records on scholar.google.com . In addition, we evaluate the number of intrusions to determine whether these events follow a Poisson or negative binomial (NB) probability distribution. We reveal that the NB GLM provides the best fit model for the observed count data, number of intrusions per organization, because the NB model allows the variance of the count data to exceed the mean. We also show that there are restricted and simpler NB regression models that omit selected predictors and improve the goodness-of-fit of the NB GLM for the observed data. With our model simulations, we identify certain TLDs in the DNS traffic as having a significant impact on the number of intrusions. In addition, we use the models and regression results to conclude that the number of network security policy violations is consistently predictive of the number of intrusions.


2019 ◽  
Vol 23 (10) ◽  
pp. 4349-4365 ◽  
Author(s):  
Daniel Rasche ◽  
Christian Reinhardt-Imjela ◽  
Achim Schulte ◽  
Robert Wenzel

Abstract. Large wood (LW) can alter the hydromorphological and hydraulic characteristics of rivers and streams and may act positively on a river's ecology by i.e. leading to increased habitat availability. On the contrary, floating as well as stable LW is a potential threat for anthropogenic goods and infrastructure during flood events. Concerning the contradiction of potential risks and positive ecological impacts, addressing the physical effects of stable large wood is highly important. Hydrodynamic models offer the possibility of investigating the hydraulic effects of anchored large wood. However, the work and time involved varies between approaches that incorporate large wood in hydrodynamic models. In this study, a two-dimensional hydraulic model is set up for a mountain creek to simulate the hydraulic effects of stable LW and to compare multiple methods of accounting for LW-induced roughness. LW is implemented by changing in-channel roughness coefficients and by adding topographic elements to the model; this is carried out in order to determine which method most accurately simulates observed hydrographs and to provide guidance for future hydrodynamic modelling of stable large wood with two-dimensional models. The study area comprises a 282 m long reach of the Ullersdorfer Teichbächel, a creek in the Ore Mountains (south-eastern Germany). Discharge time series from field experiments allow for a validation of the model outputs with field observations with and without stable LW. We iterate in-channel roughness coefficients to best fit the mean simulated and observed flood hydrographs with and without LW at the downstream reach outlet. As an alternative approach for modelling LW-induced effects, we use simplified discrete topographic elements representing individual LW elements in the channel. In general, the simulations reveal a high goodness of fit between the observed flood hydrographs and the model results without and with stable in-channel LW. The best fit of the simulation and mean observed hydrograph with in-channel LW can be obtained when increasing in-channel roughness coefficients throughout the reach instead of an increase at LW positions only. The best fit in terms of the hydrograph's general shape can be achieved by integrating discrete elements into the calculation mesh. The results illustrate that the mean observed hydrograph can be satisfactorily modelled using an adjustment of roughness coefficients. In conclusion, a time-consuming and work-intensive mesh manipulation is suitable for analysing the more detailed effects of stable LW on a small spatio-temporal scale where high precision is required. In contrast, the reach-wise adjustment of in-channel roughness coefficients seems to provide similarly accurate results on the reach scale and, thus, could be helpful for practical applications of model-based impact assessments of stable LW on flood hydrographs of small streams and rivers.


2021 ◽  
Vol 11 (40) ◽  
pp. 126-127
Author(s):  
Maurizio Brizzi ◽  
Daniele Nani ◽  
Lucietta Betti

One of the major criticisms directed to basic research on high dilution effects is the lack of a steady statistical approach; therefore, it seems crucial to fix some milestones in statistical analysis of this kind of experimentation. Since plant research in homeopathy has been recently developed and one of the mostly used models is based on in vitro seed germination, here we propose a statistical approach focused on the Poisson distribution, that satisfactorily fits the number of non-germinated seeds. Poisson distribution is a discrete-valued model often used in statistics when representing the number X of specific events (telephone calls, industrial machine failures, genetic mutations etc.) that occur in a fixed period of time, supposing that instant probability of occurrence of such events is constant. If we denote with λ the average number of events that occur within the fixed period, the probability of observing exactly k events is: P(k) = e-λ λk /k! , k = 0, 1,2,… This distribution is commonly used when dealing with rare effects, in the sense that it has to be almost impossible to have two events at the same time. Poisson distribution is the basic model of the socalled Poisson process, which is a counting process N(t), where t is a time parameter, having these properties: - The process starts with zero: N(0) = 0; - The increments are independent; - The number of events that occur in a period of time d(t) follows a Poisson distribution with parameter proportional to d(t); - The waiting time, i.e. the time between an event and another one, follows and exponential distribution. In a series of experiments performed by our research group ([1], [2]., [3], [4]) we tried to apply this distribution to the number X of non-germinated seeds out of a fixed number N* of seeds in a Petri dish (usually N* = 33 or N* = 36). The goodness-of-fit was checked by different tests (Kolmogorov distance and chi-squared), as well as with the Poissonness plot proposed by Hoaglin [5]. The goodness-of-fit of Poisson distribution allows to use specific tests, like the global Poisson test (based on a chi-squared statistics) and the comparison of two Poisson parameters, based on the statistic z = X1–X2 / (X1+X2)1/2 which is, for large samples (at least 20 observations) approximately standard normally distributed. A very clear review of these tests based on Poisson distribution is given in [6]. This good fit of Poisson distribution suggests that the whole process of germination of wheat seeds may be considered as a non-homogeneous Poisson process, where the germination rate is not constant but changes over time. Keywords: Poisson process, counting variable, goodness-of-fit, wheat germination References [1] L.Betti, M.Brizzi, D.Nani, M.Peruzzi. A pilot statistical study with homeopathic potencies of Arsenicum Album in wheat germination as a simple model. British Homeopathic Journal; 83: 195-201. [2] M.Brizzi, L.Betti (1999), Using statistics for evaluating the effectiveness of homeopathy. Analysis of a large collection of data from simple plant models. III Congresso Nazionale della SIB (Società Italiana di Biometria) di Roma, Abstract Book, 74-76. [3] M.Brizzi, D.Nani, L.Betti, M.Peruzzi. Statistical analysis of the effect of high dilutions of Arsenic in a large dataset from a wheat germination model. British Homeopathic Journal, 2000;, 89, 63-67. [4] M.Brizzi, L.Betti (2010), Statistical tools for alternative research in plant experiments. “Metodološki Zvezki – Advances in Methodology and Statistics”, 7, 59-71. [5] D.C.Hoaglin (1980), A Poissonness plot. “The American Statistician”, 34, 146-149. [6] L.Sachs (1984) Applied statistics. A handbook of techniques. Springer Verlag, 186-189.


2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Cate Wagner ◽  
Erica Barr ◽  
Joseph Spada ◽  
Cole Joslin ◽  
Paul M. Sommers

The National Hockey League is a professional ice hockey league in North America currently comprised of 31 teams.  Their seasons culminate with the Stanley Cup Playoffs.  The top sixteen teams (eight in each conference) qualify for the playoffs.  The conference champions face off in the final round, known as the Stanley Cup Finals.  The authors show that goals scored per game in the Stanley Cup Finals follow a Poisson distribution.  Using the results of all 438 Stanley Cup Final games played since 1939 (when the Finals became a best-of-seven series), chi-squared goodness-of-fit tests show that the observed distribution of goals scored per game by series winners, series losers, and game losers closely approximate a Poisson theoretical model.  The combined number of goals scored by both finalists and goals scored by game winners do not


1966 ◽  
Vol 25 ◽  
pp. 373
Author(s):  
Y. Kozai

The motion of an artificial satellite around the Moon is much more complicated than that around the Earth, since the shape of the Moon is a triaxial ellipsoid and the effect of the Earth on the motion is very important even for a very close satellite.The differential equations of motion of the satellite are written in canonical form of three degrees of freedom with time depending Hamiltonian. By eliminating short-periodic terms depending on the mean longitude of the satellite and by assuming that the Earth is moving on the lunar equator, however, the equations are reduced to those of two degrees of freedom with an energy integral.Since the mean motion of the Earth around the Moon is more rapid than the secular motion of the argument of pericentre of the satellite by a factor of one order, the terms depending on the longitude of the Earth can be eliminated, and the degree of freedom is reduced to one.Then the motion can be discussed by drawing equi-energy curves in two-dimensional space. According to these figures satellites with high inclination have large possibilities of falling down to the lunar surface even if the initial eccentricities are very small.The principal properties of the motion are not changed even if plausible values ofJ3andJ4of the Moon are included.This paper has been published in Publ. astr. Soc.Japan15, 301, 1963.


2020 ◽  
Vol 9 (1) ◽  
pp. 84-88
Author(s):  
Govinda Prasad Dhungana ◽  
Laxmi Prasad Sapkota

 Hemoglobin level is a continuous variable. So, it follows some theoretical probability distribution Normal, Log-normal, Gamma and Weibull distribution having two parameters. There is low variation in observed and expected frequency of Normal distribution in bar diagram. Similarly, calculated value of chi-square test (goodness of fit) is observed which is lower in Normal distribution. Furthermore, plot of PDFof Normal distribution covers larger area of histogram than all of other distribution. Hence Normal distribution is the best fit to predict the hemoglobin level in future.


1998 ◽  
Vol 11 (1) ◽  
pp. 565-565
Author(s):  
G. Cayrel de Strobel ◽  
R. Cayrel ◽  
Y. Lebreton

After having studied in great detail the observational HR diagram (log Teff, Mbol) composed by 40 main sequence stars of the Hyades (Perryman et al.,1997, A&A., in press), we have tried to apply the same method to the observational main sequences of the three next nearest open clusters: Coma Berenices, the Pleiades, and Praesepe. This method consists in comparing the observational main sequence of the clusters with a grid of theoretical ZAMSs. The stars composing the observational main sequences had to have reliable absolute bolometric magnitudes, coming all from individual Hipparcos parallaxes, precise bolometric corrections, effective temperatures and metal abundances from high resolution detailed spectroscopic analyses. If we assume, following the work by Fernandez et al. (1996, A&A,311,127), that the mixing-lenth parameter is solar, the position of a theoretical ZAMS, in the (log Teff, Mbol) plane, computed with given input physics, only depends on two free parameters: the He content Y by mass, and the metallicity Z by mass. If effective temperature and metallicity of the constituting stars of the 4 clusters are previously known by means of detailed analyses, one can deduce their helium abundances by means of an appropriate grid of theoretical ZAMS’s. The comparison between the empirical (log Teff, Mbol) main sequence of the Hyades and the computed ZAMS corresponding to the observed metallicity Z of the Hyades (Z= 0.0240 ± 0.0085) gives a He abundance for the Hyades, Y= 0.26 ± 0.02. Our interpretation, concerning the observational position of the main sequence of the three nearest clusters after the Hyades, is still under way and appears to be greatly more difficult than for the Hyades. For the moment we can say that: ‒ The 15 dwarfs analysed in detailed in Coma have a solar metallicity: [Fe/H] = -0.05 ± 0.06. However, their observational main sequence fit better with the Hyades ZAMS. ‒ The mean metallicity of 13 Pleiades dwarfs analysed in detail is solar. A metal deficient and He normal ZAMS would fit better. But, a warning for absorption in the Pleiades has to be recalled. ‒ The upper main sequence of Praesepe, (the more distant cluster: 180 pc) composed by 11 stars, analysed in detail, is the one which has the best fit with the Hyades ZAMS. The deduced ‘turnoff age’ of the cluster is slightly higher than that of the Hyades: 0.8 Gyr instead of 0.63 Gyr.


2000 ◽  
Vol 19 (2) ◽  
pp. 277-307 ◽  
Author(s):  
Jérôme Bastien ◽  
Michelle Schatzman ◽  
Claude-Henri Lamarque

Sign in / Sign up

Export Citation Format

Share Document