scholarly journals A Discrete Gamma Model Approach to Flexible Count Regression Analysis: Maximum Likelihood Inference

Author(s):  
Chénagnon Frédéric Tovissodé ◽  
Romain Lucas Glèlè Kakaï

Most existing flexible count regression models allow only approximate inference. Balanced discretization is a simple method to produce a mean-parametrizable flexible count distribution starting from a continuous probability distribution. This makes easy the definition of flexible count regression models allowing exact inference under various types of dispersion (equi-, under- and overdispersion). This study describes maximum likelihood (ML) estimation and inference in count regression based on balanced discrete gamma (BDG) distribution and introduces a likelihood ratio based latent equidispersion (LE) test to identify the parsimonious dispersion model for a particular dataset. A series of Monte Carlo experiments were carried out to assess the performance of ML estimates and the LE test in the BDG regression model, as compared to the popular Conway-Maxwell-Poisson model (CMP). The results show that the two evaluated models recover population effects even under misspecification of dispersion related covariates, with coverage rates of asymptotic 95% confidence interval approaching the nominal level as the sample size increases. The BDG regression approach, nevertheless, outperforms CMP regression in very small samples (n = 15 − 30), mostly in overdispersed data. The LE test proves appropriate to detect latent equidispersion, with rejection rates converging to the nominal level as the sample size increases. Two applications on real data are given to illustrate the use of the proposed approach to count regression analysis.

2014 ◽  
Vol 8 (1) ◽  
pp. 104-110 ◽  
Author(s):  
Jianping Xiang ◽  
Jihnhee Yu ◽  
Kenneth V Snyder ◽  
Elad I Levy ◽  
Adnan H Siddiqui ◽  
...  

BackgroundWe previously established three logistic regression models for discriminating intracranial aneurysm rupture status based on morphological and hemodynamic analysis of 119 aneurysms. In this study, we tested if these models would remain stable with increasing sample size, and investigated sample sizes required for various confidence levels (CIs).MethodsWe augmented our previous dataset of 119 aneurysms into a new dataset of 204 samples by collecting an additional 85 consecutive aneurysms, on which we performed flow simulation and calculated morphological and hemodynamic parameters, as done previously. We performed univariate significance tests on these parameters, and multivariate logistic regression on significant parameters. The new regression models were compared against the original models. Receiver operating characteristics analysis was applied to compare the performance of regression models. Furthermore, we performed regression analysis based on bootstrapping resampling statistical simulations to explore how many aneurysm cases were required to generate stable models.ResultsUnivariate tests of the 204 aneurysms generated an identical list of significant morphological and hemodynamic parameters as previously (from the analysis of 119 cases). Furthermore, multivariate regression analysis produced three parsimonious predictive models that were almost identical to the previous ones, with model coefficients that had narrower CIs than the original ones. Bootstrapping showed that 10%, 5%, 2%, and 1% convergence levels of CI required 120, 200, 500, and 900 aneurysms, respectively.ConclusionsOur original hemodynamic–morphological rupture prediction models are stable and improve with increasing sample size. Results from resampling statistical simulations provide guidance for designing future large multi-population studies.


2019 ◽  
Author(s):  
Lara Nonell ◽  
Juan R González

AbstractDNA methylation plays an important role in the development and progression of disease. Beta-values are the standard methylation measures. Different statistical methods have been proposed to assess differences in methylation between conditions. However, most of them do not completely account for the distribution of beta-values. The simplex distribution can accommodate beta-values data. We hypothesize that simplex is a quite flexible distribution which is able to model methylation data.To test our hypothesis, we conducted several analyses using four real data sets obtained from microarrays and sequencing technologies. Standard data distributions were studied and modelled in comparison to the simplex. Besides, some simulations were conducted in different scenarios encompassing several distribution assumptions, regression models and sample sizes. Finally, we compared DNA methylation between females and males in order to benchmark the assessed methodologies under different scenarios.According to the results obtained by the simulations and real data analyses, DNA methylation data are concordant with the simplex distribution in many situations. Simplex regression models work well in small sample size data sets. However, when sample size increases, other models such as the beta regression or even the linear regression can be employed to assess group comparisons and obtain unbiased results. Based on these results, we can provide some practical recommendations when analyzing methylation data: 1) use data sets of at least 10 samples per studied condition for microarray data sets or 30 in NGS data sets, 2) apply a simplex or beta regression model for microarray data, 3) apply a linear model in any other case.


2018 ◽  
Vol 19 (5) ◽  
pp. 467-500
Author(s):  
Marco Enea ◽  
Gianfranco Lovison

Bivariate ordered logistic models (BOLMs) are appealing to jointly model the marginal distribution of two ordered responses and their association, given a set of covariates. When the number of categories of the responses increases, the number of global odds ratios to be estimated also increases, and estimation gets problematic. In this work we propose a non-parametric approach for the maximum likelihood (ML) estimation of a BOLM, wherein penalties to the differences between adjacent row and column effects are applied. Our proposal is then compared to the Goodman and Dale models. Some simulation results as well as analyses of two real data sets are presented and discussed.


2020 ◽  
Vol 45 (6) ◽  
pp. 667-689
Author(s):  
Xi Wang ◽  
Yang Liu

In continuous testing programs, some items are repeatedly used across test administrations, and statistical methods are often used to evaluate whether items become compromised due to examinees’ preknowledge. In this study, we proposed a residual method to detect compromised items when a test can be partitioned into two subsets of items: secure items and possibly compromised items. We derived the standard error of the residual statistic by taking the sampling error in both ability and item parameter estimate into account. The simulation results suggest that the Type I error is close to the nominal level when both sources of error are adjusted, and item parameter error can be ignored only when the item calibration sample size is much larger than the evaluation sample size. We also investigated the performance of the residual method when not using information from secure items in both simulation and real data analyses.


2020 ◽  
Vol 50 (2) ◽  
pp. 555-583 ◽  
Author(s):  
George Tzougas ◽  
Dimitris Karlis

AbstractRegression modelling involving heavy-tailed response distributions, which have heavier tails than the exponential distribution, has become increasingly popular in many insurance settings including non-life insurance. Mixed Exponential models can be considered as a natural choice for the distribution of heavy-tailed claim sizes since their tails are not exponentially bounded. This paper is concerned with introducing a general family of mixed Exponential regression models with varying dispersion which can efficiently capture the tail behaviour of losses. Our main achievement is that we present an Expectation-Maximization (EM)-type algorithm which can facilitate maximum likelihood (ML) estimation for our class of mixed Exponential models which allows for regression specifications for both the mean and dispersion parameters. Finally, a real data application based on motor insurance data is given to illustrate the versatility of the proposed EM-type algorithm.


Mathematics ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. 1231
Author(s):  
Guillermo Martínez-Flórez ◽  
Roger Tovar-Falón

In this paper, two new distributions were introduced to model unimodal and/or bimodal data. The first distribution, which was obtained by applying a simple transformation to a unit-Birnbaum–Saunders random variable, is useful for modeling data with positive support, while the second is appropriate for fitting data on the (0,1) interval. Extensions to regression models were also studied in this work, and statistical inference was performed from a classical perspective by using the maximum likelihood method. A small simulation study is presented to evaluate the benefits of the maximum likelihood estimates of the parameters. Finally, two applications to real data sets are reported to illustrate the developed methodology.


Author(s):  
Rania Hassan Abd El Khaleq

A new ‡exible extension of the Fréchet model is proposed and studied. Some of itsfundamental statistical properties are derived. The importance of the new model is shown via two applications to real data sets. A simple type Copula based construction are also presented.We assess the performance of the maximum likelihood estimations of the new distribution with respect to sample size n. The assessment was based on a simulation study.The new model is much better than other important competitive models.


2016 ◽  
Vol 41 (1) ◽  
pp. 30-43 ◽  
Author(s):  
Sunbok Lee

The logistic regression (LR) procedure for testing differential item functioning (DIF) typically depends on the asymptotic sampling distributions. The likelihood ratio test (LRT) usually relies on the asymptotic chi-square distribution. Also, the Wald test is typically based on the asymptotic normality of the maximum likelihood (ML) estimation, and the Wald statistic is tested using the asymptotic chi-square distribution. However, in small samples, the asymptotic assumptions may not work well. The penalized maximum likelihood (PML) estimation removes the first-order finite sample bias from the ML estimation, and the bootstrap method constructs the empirical sampling distribution. This study compares the performances of the LR procedures based on the LRT, Wald test, penalized likelihood ratio test (PLRT), and bootstrap likelihood ratio test (BLRT) in terms of the statistical power and type I error for testing uniform and non-uniform DIF. The result of the simulation study shows that the LRT with the asymptotic chi-square distribution works well even in small samples.


2018 ◽  
Vol 79 (2) ◽  
pp. 358-384 ◽  
Author(s):  
Thomas Jaki ◽  
Minjung Kim ◽  
Andrea Lamont ◽  
Melissa George ◽  
Chi Chang ◽  
...  

Regression mixture models are a statistical approach used for estimating heterogeneity in effects. This study investigates the impact of sample size on regression mixture’s ability to produce “stable” results. Monte Carlo simulations and analysis of resamples from an application data set were used to illustrate the types of problems that may occur with small samples in real data sets. The results suggest that (a) when class separation is low, very large sample sizes may be needed to obtain stable results; (b) it may often be necessary to consider a preponderance of evidence in latent class enumeration; (c) regression mixtures with ordinal outcomes result in even more instability; and (d) with small samples, it is possible to obtain spurious results without any clear indication of there being a problem.


Sign in / Sign up

Export Citation Format

Share Document