scholarly journals Bayesian Multilevel Models for Count Data

Author(s):  
Olumide Sunday Adesina

The traditional Poisson regression model for fitting count data is considered inadequate to fit over-or under-dispersed count data and new models have been developed to make up for such inadequacies inherent in the model. In this study, Bayesian Multi-level model was proposed using the No-U-Turn Sampler (NUTS) sampler to sample from the posterior distribution. A simulation was carried out for both over-and under-dispersed data from discrete Weibull distribution. Pareto k diagnostics was implemented, and the result showed that under-dispersed and over-dispersed simulated data has all its k value to be less than 0.5, which indicate that all the observations are good. Also all WAIC were the same as LOO-IC except for Poisson in the over-dispersed simulated data. Real-life data set from National Health Insurance Scheme (NHIS) was used for further analysis. Seven multi-level models were f itted and the Geometric model outperformed other model. 

2021 ◽  
Vol 19 (1) ◽  
pp. 2-20
Author(s):  
Piyush Kant Rai ◽  
Alka Singh ◽  
Muhammad Qasim

This article introduces calibration estimators under different distance measures based on two auxiliary variables in stratified sampling. The theory of the calibration estimator is presented. The calibrated weights based on different distance functions are also derived. A simulation study has been carried out to judge the performance of the proposed estimators based on the minimum relative root mean squared error criterion. A real-life data set is also used to confirm the supremacy of the proposed method.


2021 ◽  
Vol 50 (2) ◽  
pp. 16-37
Author(s):  
Valentin Todorov

In a number of recent articles Riani, Cerioli, Atkinson and others advocate the technique of monitoring robust estimates computed over a range of key parameter values. Through this approach the diagnostic tools of choice can be tuned in such a way that highly robust estimators which are as efficient as possible are obtained. This approach is applicable to various robust multivariate estimates like S- and MM-estimates, MVE and MCD as well as to the Forward Search in whichmonitoring is part of the robust method. Key tool for detection of multivariate outliers and for monitoring of robust estimates is the Mahalanobis distances and statistics related to these distances. However, the results obtained with thistool in case of compositional data might be unrealistic since compositional data contain relative rather than absolute information and need to be transformed to the usual Euclidean geometry before the standard statistical tools can be applied. Various data transformations of compositional data have been introduced in the literature and theoretical results on the equivalence of the additive, the centered, and the isometric logratio transformation in the context of outlier identification exist. To illustrate the problem of monitoring compositional data and to demonstrate the usefulness of monitoring in this case we start with a simple example and then analyze a real life data set presenting the technologicalstructure of manufactured exports. The analysis is conducted with the R package fsdaR, which makes the analytical and graphical tools provided in the MATLAB FSDA library available for R users.


Author(s):  
P. K. KAPUR ◽  
ADARSH ANAND ◽  
NITIN SACHDEVA

Performance of a product not as expected by the customer brings warranty expenditure into the picture. In other words, the deviation of the product performance (PP) from the customer expectation (CE) is the reason for customer complaints and warranty expenses. When this conflicting scenario occurs in market, warranty comes into existence and fulfilling warranty claims of customers adds to product's overall cost. In this paper, based on the difference between PP and CE about the product we estimate profit for the firm. Furthermore, factors like fixed cost, production cost and inventory cost have also been considered in framing the optimization problem. In the proposed model, a two-dimensional innovation diffusion model (TD-IDM) which combines the adoption time of technological diffusion and price of the product has been used. Classical Cobb–Douglas function that takes into account the technological adoptions and other dimensions explicitly has been used to structure the production function. The proposed model has been validated on real life data set.


Author(s):  
Domenic Di Francesco ◽  
Marios Chryssanthopoulos ◽  
Michael Havbro Faber ◽  
Ujjwal Bharadwaj

Abstract In pipelines, pressure vessels and various other steel structures, the remaining thickness of a corroding ligament can be measured directly and repeatedly over time. Statistical analysis of these measurements is a common approach for estimating the rate of corrosion growth, where the uncertainties associated with the inspection activity are taken into account. An additional source of variability in such calculations is the epistemic uncertainty associated with the limited number of measurements that are available to engineers at any point in time. Traditional methods face challenges in fitting models to limited or missing datasets. In such cases, deterministic upper bound values, as recommended in industrial guidance, are sometimes assumed for the purpose of integrity management planning. In this paper, Bayesian inference is proposed as a means for representing available information in consistency with evidence. This, in turn, facilitates decision support in the context of risk-informed integrity management. Aggregating inspection data from multiple locations does not account for the possible variability between the locations, and creating fully independent models can result in excessive levels of uncertainty at locations with limited data. Engineers intuitively acknowledge that the areas with more sites of corrosion should, to some extent, inform estimates of growth rates in other locations. Bayesian multi-level (hierarchical) models provide a mathematical basis for achieving this by means of the appropriate pooling of information, based on the homogeneity of the data. Included in this paper is an outline of the process of fitting a Bayesian multi-level model and a discussion of the benefits and challenges of pooling inspection data between distinct locations, using example calculations and simulated data.


2020 ◽  
Vol 18 (2) ◽  
pp. 2-13
Author(s):  
Oyebayo Ridwan Olaniran ◽  
Mohd Asrul Affendi Abdullah

A new Bayesian estimation procedure for extended cox model with time varying covariate was presented. The prior was determined using bootstrapping technique within the framework of parametric empirical Bayes. The efficiency of the proposed method was observed using Monte Carlo simulation of extended Cox model with time varying covariates under varying scenarios. Validity of the proposed method was also ascertained using real life data set of Stanford heart transplant. Comparison of the proposed method with its competitor established appreciable supremacy of the method.


Author(s):  
Uchenna U. Uwadi ◽  
Elebe E. Nwaezza

In this study, we proposed a new generalised transmuted inverse exponential distribution with three parameters and have transmuted inverse exponential and inverse exponential distributions as sub models. The hazard function of the distribution is nonmonotonic, unimodal and inverted bathtub shaped making it suitable for modelling lifetime data. We derived the moment, moment generating function, quantile function, maximum likelihood estimates of the parameters, Renyi entropy and order statistics of the distribution. A real life data set is used to illustrate the usefulness of the proposed model.     


2015 ◽  
Author(s):  
Abelardo Montesinos-Lopez ◽  
Osval Montesinos-Lopez ◽  
Jose Crossa ◽  
Juan Burgueno ◽  
Kent Eskridge ◽  
...  

Genomic tools allow the study of the whole genome and are facilitating the study of genotype-environment combinations and their relationship with the phenotype. However, most genomic prediction models developed so far are appropriate for Gaussian phenotypes. For this reason, appropriate genomic prediction models are needed for count data, since the conventional regression models used on count data with a large sample size (n) and a small number of parameters (p) cannot be used for genomic-enabled prediction where the number of parameters (p) is larger than the sample size (n). Here we propose a Bayesian mixed negative binomial (BMNB) genomic regression model for counts that takes into account genotype by environment (G × E) interaction. We also provide all the full conditional distributions to implement a Gibbs sampler. We evaluated the proposed model using a simulated data set and a real wheat data set from the International Maize and Wheat Improvement Center (CIMMYT) and collaborators. Results indicate that our BMNB model is a viable alternative for analyzing count data.


2013 ◽  
Vol 3 (4) ◽  
pp. 1-14 ◽  
Author(s):  
S. Sampath ◽  
B. Ramya

Cluster analysis is a branch of data mining, which plays a vital role in bringing out hidden information in databases. Clustering algorithms help medical researchers in identifying the presence of natural subgroups in a data set. Different types of clustering algorithms are available in the literature. The most popular among them is k-means clustering. Even though k-means clustering is a popular clustering method widely used, its application requires the knowledge of the number of clusters present in the given data set. Several solutions are available in literature to overcome this limitation. The k-means clustering method creates a disjoint and exhaustive partition of the data set. However, in some situations one can come across objects that belong to more than one cluster. In this paper, a clustering algorithm capable of producing rough clusters automatically without requiring the user to give as input the number of clusters to be produced. The efficiency of the algorithm in detecting the number of clusters present in the data set has been studied with the help of some real life data sets. Further, a nonparametric statistical analysis on the results of the experimental study has been carried out in order to analyze the efficiency of the proposed algorithm in automatic detection of the number of clusters in the data set with the help of rough version of Davies-Bouldin index.


Author(s):  
Longbing Cao

Actionable knowledge discovery is selected as one of the greatest challenges (Ankerst, 2002; Fayyad, Shapiro, & Uthurusamy, 2003) of next-generation knowledge discovery in database (KDD) studies (Han & Kamber, 2006). In the existing data mining, often mined patterns are nonactionable to real user needs. To enhance knowledge actionability, domain-related social intelligence is substantially essential (Cao et al., 2006b). The involvement of domain-related social intelligence into data mining leads to domaindriven data mining (Cao & Zhang, 2006a, 2007a), which complements traditional data-centered mining methodology. Domain-related social intelligence consists of intelligence of human, domain, environment, society and cyberspace, which complements data intelligence. The extension of KDD toward domain-driven data mining involves many challenging but promising research and development issues in KDD. Studies in regard to these issues may promote the paradigm shift of KDD from data-centered interesting pattern mining to domain-driven actionable knowledge discovery, and the deployment shift from simulated data set-based to real-life data and business environment-oriented as widely predicted.


Sign in / Sign up

Export Citation Format

Share Document