Random forests for homogeneous and non-homogeneous Poisson processes with excess zeros

2019 ◽  
Vol 29 (8) ◽  
pp. 2217-2237
Author(s):  
Walid Mathlouthi ◽  
Denis Larocque ◽  
Marc Fredette

We propose a general hurdle methodology to model a response from a homogeneous or a non-homogeneous Poisson process with excess zeros, based on two forests. The first forest in the two parts model is used to estimate the probability of having a zero. The second forest is used to estimate the Poisson parameter(s), using only the observations with at least one event. To build the trees in the second forest, we propose specialized splitting criteria derived from the zero truncated homogeneous and non-homogeneous Poisson likelihood. The particular case of a homogeneous process is investigated in details to stress out the advantages of the proposed method over the existing ones. Simulation studies show that the proposed methods perform well in hurdle (zero-altered) and zero-inflated settings, for both homogeneous and non-homogeneous processes. We illustrate the use of the new method with real data on the demand for medical care by the elderly.

2013 ◽  
Vol 805-806 ◽  
pp. 1948-1951
Author(s):  
Tian Jin

The non-homogeneous Poisson model has been applied to various situations, including air pollution data. In this paper, we propose a kernel based nonparametric estimation for fitting the non-homogeneous Poisson process data. We show that our proposed estimator is-consistent and asymptotically normally distributed. We also study the finite-sample properties with a simulation study.


Mathematics ◽  
2021 ◽  
Vol 9 (3) ◽  
pp. 228
Author(s):  
Ángel Berihuete ◽  
Marta Sánchez-Sánchez ◽  
Alfonso Suárez-Llorens

The COVID-19 pandemic has highlighted the need for finding mathematical models to forecast the evolution of the contagious disease and evaluate the success of particular policies in reducing infections. In this work, we perform Bayesian inference for a non-homogeneous Poisson process with an intensity function based on the Gompertz curve. We discuss the prior distribution of the parameter and we generate samples from the posterior distribution by using Markov Chain Monte Carlo (MCMC) methods. Finally, we illustrate our method analyzing real data associated with COVID-19 in a specific region located at the south of Spain.


Author(s):  
Guanghao Qi ◽  
Nilanjan Chatterjee

Abstract Background Previous studies have often evaluated methods for Mendelian randomization (MR) analysis based on simulations that do not adequately reflect the data-generating mechanisms in genome-wide association studies (GWAS) and there are often discrepancies in the performance of MR methods in simulations and real data sets. Methods We use a simulation framework that generates data on full GWAS for two traits under a realistic model for effect-size distribution coherent with the heritability, co-heritability and polygenicity typically observed for complex traits. We further use recent data generated from GWAS of 38 biomarkers in the UK Biobank and performed down sampling to investigate trends in estimates of causal effects of these biomarkers on the risk of type 2 diabetes (T2D). Results Simulation studies show that weighted mode and MRMix are the only two methods that maintain the correct type I error rate in a diverse set of scenarios. Between the two methods, MRMix tends to be more powerful for larger GWAS whereas the opposite is true for smaller sample sizes. Among the other methods, random-effect IVW (inverse-variance weighted method), MR-Robust and MR-RAPS (robust adjust profile score) tend to perform best in maintaining a low mean-squared error when the InSIDE assumption is satisfied, but can produce large bias when InSIDE is violated. In real-data analysis, some biomarkers showed major heterogeneity in estimates of their causal effects on the risk of T2D across the different methods and estimates from many methods trended in one direction with increasing sample size with patterns similar to those observed in simulation studies. Conclusion The relative performance of different MR methods depends heavily on the sample sizes of the underlying GWAS, the proportion of valid instruments and the validity of the InSIDE assumption. Down-sampling analysis can be used in large GWAS for the possible detection of bias in the MR methods.


Author(s):  
Moritz Berger ◽  
Gerhard Tutz

AbstractA flexible semiparametric class of models is introduced that offers an alternative to classical regression models for count data as the Poisson and Negative Binomial model, as well as to more general models accounting for excess zeros that are also based on fixed distributional assumptions. The model allows that the data itself determine the distribution of the response variable, but, in its basic form, uses a parametric term that specifies the effect of explanatory variables. In addition, an extended version is considered, in which the effects of covariates are specified nonparametrically. The proposed model and traditional models are compared in simulations and by utilizing several real data applications from the area of health and social science.


1984 ◽  
Vol 21 (03) ◽  
pp. 548-557
Author(s):  
M. P. Quine ◽  
D. F. Watson

A simple method is proposed for the generation of successive ‘nearest neighbours' to a given origin in ann-dimensional Poisson process. It is shown that the method provides efficient simulation of random Voronoi polytopes. Results are given of simulation studies in two and three dimensions.


Symmetry ◽  
2021 ◽  
Vol 13 (11) ◽  
pp. 2164
Author(s):  
Héctor J. Gómez ◽  
Diego I. Gallardo ◽  
Karol I. Santoro

In this paper, we present an extension of the truncated positive normal (TPN) distribution to model positive data with a high kurtosis. The new model is defined as the quotient between two random variables: the TPN distribution (numerator) and the power of a standard uniform distribution (denominator). The resulting model has greater kurtosis than the TPN distribution. We studied some properties of the distribution, such as moments, asymmetry, and kurtosis. Parameter estimation is based on the moments method, and maximum likelihood estimation uses the expectation-maximization algorithm. We performed some simulation studies to assess the recovery parameters and illustrate the model with a real data application related to body weight. The computational implementation of this work was included in the tpn package of the R software.


Author(s):  
Xiaozhou Wang ◽  
Xi Chen ◽  
Qihang Lin ◽  
Weidong Liu

The performance of clustering depends on an appropriately defined similarity between two items. When the similarity is measured based on human perception, human workers are often employed to estimate a similarity score between items in order to support clustering, leading to a procedure called crowdsourced clustering. Assuming a monetary reward is paid to a worker for each similarity score and assuming the similarities between pairs and workers' reliability have a large diversity, when the budget is limited, it is critical to wisely assign pairs of items to different workers to optimize the clustering result. We model this budget allocation problem as a Markov decision process where item pairs are dynamically assigned to workers based on the historical similarity scores they provided. We propose an optimistic knowledge gradient policy where the assignment of items in each stage is based on the minimum-weight K-cut defined on a similarity graph. We provide simulation studies and real data analysis to demonstrate the performance of the proposed method.


Sign in / Sign up

Export Citation Format

Share Document