scholarly journals Multivariate INAR(1) Regression Models Based on the Sarmanov Distribution

Mathematics ◽  
2021 ◽  
Vol 9 (5) ◽  
pp. 505
Author(s):  
Lluís Bermúdez ◽  
Dimitris Karlis

A multivariate INAR(1) regression model based on the Sarmanov distribution is proposed for modelling claim counts from an automobile insurance contract with different types of coverage. The correlation between claims from different coverage types is considered jointly with the serial correlation between the observations of the same policyholder observed over time. Several models based on the multivariate Sarmanov distribution are analyzed. The new models offer some advantages since they have all the advantages of the MINAR(1) regression model but allow for a more flexible dependence structure by using the Sarmanov distribution. Driven by a real panel data set, these models are considered and fitted to the data to discuss their goodness of fit and computational efficiency.

2021 ◽  
Vol 2 (2) ◽  
pp. 40-47
Author(s):  
Sunil Kumar ◽  
Vaibhav Bhatnagar

Machine learning is one of the active fields and technologies to realize artificial intelligence (AI). The complexity of machine learning algorithms creates problems to predict the best algorithm. There are many complex algorithms in machine learning (ML) to determine the appropriate method for finding regression trends, thereby establishing the correlation association in the middle of variables is very difficult, we are going to review different types of regressions used in Machine Learning. There are mainly six types of regression model Linear, Logistic, Polynomial, Ridge, Bayesian Linear and Lasso. This paper overview the above-mentioned regression model and will try to find the comparison and suitability for Machine Learning. A data analysis prerequisite to launch an association amongst the innumerable considerations in a data set, association is essential for forecast and exploration of data. Regression Analysis is such a procedure to establish association among the datasets. The effort on this paper predominantly emphases on the diverse regression analysis model, how they binning to custom in context of different data sets in machine learning. Selection the accurate model for exploration is the most challenging assignment and hence, these models considered thoroughly in this study. In machine learning by these models in the perfect way and thru accurate data set, data exploration and forecast can provide the maximum exact outcomes.


1997 ◽  
Vol 29 (6) ◽  
pp. 955-974 ◽  
Author(s):  
A C Vias ◽  
G F Mulligan

Economic base analysis is frequently used to describe employment profiles and to predict project-related impacts in small communities. Considerable evidence suggests, however, that economic base multipliers should be estimated from survey data and not from shortcut methods. In this paper two competing versions of the economic base model are developed and then these two models are estimated by use of the Arizona community data set. In both cases, marginal multiplier estimates, controlled for transfer payments, are generated for ten individual sectors in five different types of communities. Results from these two disaggregate economic base models are assessed and then compared with results provided earlier by more aggregate models. The better of these two new models closely resembles the popular input—output model.


2020 ◽  
Author(s):  
Peijia Liu ◽  
Dong Yang ◽  
Shaomin Li ◽  
Yutian Chong ◽  
Wentao Hu ◽  
...  

Abstract Background The utilization of estimating-GFR equations is critical for kidney disease in the clinic. However, the performance of the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation has not improved substantially in the past eight years. Here we hypothesized that random forest regression(RF) method could go beyond revised linear regression, which is used to build the CKD-EPI equationMethods 1732 participants were enrolled in this study totally (1333 in development data set from Tianhe District and 399 in external data set Luogang District). Recursive feature elimination (RFE) is applied to the development data to select important variables and build random forest models. Then same variables were used to develop the estimated GFR equation with linear regression as a comparison. The performances of these equations are measured by bias, 30% accuracy , precision and root mean square error(RMSE).Results Of all the variables, creatinine, cystatin C, weight, body mass index (BMI), age, uric acid(UA), blood urea nitrogen(BUN), hematocrit(HCT) and apolipoprotein B(APOB) were selected by RFE method. The results revealed that the overall performance of random forest regression models ascended the revised regression models based on the same variables. In the 9-variable model, RF model was better than revised linear regression in term of bias, precision ,30%accuracy and RMSE(0.78 vs 2.98, 16.90 vs 23.62, 0.84 vs 0.80, 16.88 vs 18.70, all P<0.01 ). In the 4-variable model, random forest regression model showed an improvement in precision and RMSE compared with revised regression model. (20.82 vs 25.25, P<0.01, 19.08 vs 20.60, P<0.001). Bias and 30%accurancy were preferable, but the results were not statistically significant (0.34 vs 2.07, P=0.10, 0.8 vs 0.78, P=0.19, respectively).Conclusions The performances of random forest regression models are better than revised linear regression models when it comes to GFR estimation.


2009 ◽  
Vol 6 (1) ◽  
pp. 115-141 ◽  
Author(s):  
P. C. Stolk ◽  
C. M. J. Jacobs ◽  
E. J. Moors ◽  
A. Hensen ◽  
G. L. Velthof ◽  
...  

Abstract. Chambers are widely used to measure surface fluxes of nitrous oxide (N2O). Usually linear regression is used to calculate the fluxes from the chamber data. Non-linearity in the chamber data can result in an underestimation of the flux. Non-linear regression models are available for these data, but are not commonly used. In this study we compared the fit of linear and non-linear regression models to determine significant non-linearity in the chamber data. We assessed the influence of this significant non-linearity on the annual fluxes. For a two year dataset from an automatic chamber we calculated the fluxes with linear and non-linear regression methods. Based on the fit of the methods 32% of the data was defined significant non-linear. Significant non-linearity was not recognized by the goodness of fit of the linear regression alone. Using non-linear regression for these data and linear regression for the rest, increases the annual flux with 21% to 53% compared to the flux determined from linear regression alone. We suggest that differences this large are due to leakage through the soil. Macropores or a coarse textured soil can add to fast leakage from the chamber. Yet, also for chambers without leakage non-linearity in the chamber data is unavoidable, due to feedback from the increasing concentration in the chamber. To prevent a possibly small, but systematic underestimation of the flux, we recommend comparing the fit of a linear regression model with a non-linear regression model. The non-linear regression model should be used if the fit is significantly better. Open questions are how macropores affect chamber measurements and how optimization of chamber design can prevent this.


2020 ◽  
Author(s):  
Niema Ghanad Poor ◽  
Nicholas C West ◽  
Rama Syamala Sreepada ◽  
Srinivas Murthy ◽  
Matthias Görges

BACKGROUND In the pediatric intensive care unit (PICU), quantifying illness severity can be guided by risk models to enable timely identification and appropriate intervention. Logistic regression models, including the pediatric index of mortality 2 (PIM-2) and pediatric risk of mortality III (PRISM-III), produce a mortality risk score using data that are routinely available at PICU admission. Artificial neural networks (ANNs) outperform regression models in some medical fields. OBJECTIVE In light of this potential, we aim to examine ANN performance, compared to that of logistic regression, for mortality risk estimation in the PICU. METHODS The analyzed data set included patients from North American PICUs whose discharge diagnostic codes indicated evidence of infection and included the data used for the PIM-2 and PRISM-III calculations and their corresponding scores. We stratified the data set into training and test sets, with approximately equal mortality rates, in an effort to replicate real-world data. Data preprocessing included imputing missing data through simple substitution and normalizing data into binary variables using PRISM-III thresholds. A 2-layer ANN model was built to predict pediatric mortality, along with a simple logistic regression model for comparison. Both models used the same features required by PIM-2 and PRISM-III. Alternative ANN models using single-layer or unnormalized data were also evaluated. Model performance was compared using the area under the receiver operating characteristic curve (AUROC) and the area under the precision recall curve (AUPRC) and their empirical 95% CIs. RESULTS Data from 102,945 patients (including 4068 deaths) were included in the analysis. The highest performing ANN (AUROC 0.871, 95% CI 0.862-0.880; AUPRC 0.372, 95% CI 0.345-0.396) that used normalized data performed better than PIM-2 (AUROC 0.805, 95% CI 0.801-0.816; AUPRC 0.234, 95% CI 0.213-0.255) and PRISM-III (AUROC 0.844, 95% CI 0.841-0.855; AUPRC 0.348, 95% CI 0.322-0.367). The performance of this ANN was also significantly better than that of the logistic regression model (AUROC 0.862, 95% CI 0.852-0.872; AUPRC 0.329, 95% CI 0.304-0.351). The performance of the ANN that used unnormalized data (AUROC 0.865, 95% CI 0.856-0.874) was slightly inferior to our highest performing ANN; the single-layer ANN architecture performed poorly and was not investigated further. CONCLUSIONS A simple ANN model performed slightly better than the benchmark PIM-2 and PRISM-III scores and a traditional logistic regression model trained on the same data set. The small performance gains achieved by this two-layer ANN model may not offer clinically significant improvement; however, further research with other or more sophisticated model designs and better imputation of missing data may be warranted. CLINICALTRIAL


Risks ◽  
2020 ◽  
Vol 8 (1) ◽  
pp. 20
Author(s):  
Emilio Gómez-Déniz ◽  
Enrique Calderín-Ojeda

In this paper, a flexible count regression model based on a bivariate compound Poisson distribution is introduced in order to distinguish between different types of claims according to the claim size. Furthermore, it allows us to analyse the factors that affect the number of claims above and below a given claim size threshold in an automobile insurance portfolio. Relevant properties of this model are given. Next, a mixed regression model is derived to compute credibility bonus-malus premiums based on the individual claim size and other risk factors such as gender, type of vehicle, driving area, or age of the vehicle. Results are illustrated by using a well-known automobile insurance portfolio dataset.


Author(s):  
Kai Heinrich

Modeling topic distributions over documents has become a recent method for coping with the problematic of huge amounts of unstructured data. Especially in the context of Web communities, topic models can capture the zeitgeist as a snapshot of people's communication. However, the problem that arises from that static snapshot is that it fails to capture the dynamics of a community. To cope with this problem, dynamic topic models were introduced. This chapter makes use of those topic models in order to capture dynamics in user behavior within microblog communities such as Twitter. However, only applying topic models yields no interpretable results, so a method is proposed that compares different political parties over time using regression models based on DTM output. For evaluation purposes, a Twitter data set divided into different political communities is analyzed and results and findings are presented.


2014 ◽  
Vol 2014 (1) ◽  
pp. 285469 ◽  
Author(s):  
Merv Fingas

Research has shown that asphaltenes are the prime stabilizers of water-in-oil emulsions and that resins are necessary to solvate the asphaltenes. Research has also shown that many compositional factors play a role including the amount of saturates and the properties of viscosity and density. These factors can then be used to develop models of emulsion formation. A review of the formation processes of these emulsions and water and oil types is given. This applies to all four water-in-oil types: stable, meso-stable, unstable emulsions and entrained water. The differences among these four types are high-lighted. A number of other techniques have also been used to model emulsions including neural networks. These are noted and compared to the regression models. A data set of more than 400 oils and their water-in-oil mixtures are used for the comparison. Numerical modeling schemes for the formation of water-in-oil emulsions are reviewed. New models are based on empirical data and the corresponding physical knowledge of emulsion formation. The density, viscosity, asphaltene and resin contents were correlated with a stability index. The establishment of an index for emulsion stability enables the use of this value as a target for the optimization of regressions to form a new model. The predictions of the new model are much simpler and better than old models and some that have been in the literature for some time. The new model is more accurate than the old models, although some improvement could still be made. The benefit of the new model is that it is more accurate and simpler than former regression models. The different approaches to these models and older regression models are highlighted.


Author(s):  
Morten W. Fagerland ◽  
David W. Hosmer

Ordinal regression models are used to describe the relationship between an ordered categorical response variable and one or more explanatory variables. Several ordinal logistic models are available in Stata, such as the proportional odds, adjacent-category, and constrained continuation-ratio models. In this article, we present a command (ologitgof) that calculates four goodness-of-fit tests for assessing the overall adequacy of these models. These tests include an ordinal version of the Hosmer–Lemeshow test, the Pulkstenis–Robinson chi-squared and deviance tests, and the Lipsitz likelihood-ratio test. Together, these tests can detect several different types of lack of fit, including wrongly specified continuous terms, omission of different types of interaction terms, and an unordered response variable.


2020 ◽  
Vol 50 (1) ◽  
Author(s):  
Guilherme Alves Puiatti ◽  
Paulo Roberto Cecon ◽  
Moysés Nascimento ◽  
Ana Carolina Campana Nascimento ◽  
Antônio Policarpo Souza Carneiro ◽  
...  

ABSTRACT: The objective of this study was to adjust nonlinear quantile regression models for the study of dry matter accumulation in garlic plants over time, and to compare them to models fitted by the ordinary least squares method. The total dry matter of nine garlic accessions belonging to the Vegetable Germplasm Bank of Universidade Federal de Viçosa (BGH/UFV) was measured in four stages (60, 90, 120 and 150 days after planting), and those values were used for the nonlinear regression models fitting. For each accession, there was an adjustment of one model of quantile regression (τ=0.5) and one based on the least squares method. The nonlinear regression model fitted was the Logistic. The Akaike Information Criterion was used to evaluate the goodness of fit of the models. Accessions were grouped using the UPGMA algorithm, with the estimates of the parameters with biological interpretation as variables. The nonlinear quantile regression is efficient for the adjustment of models for dry matter accumulation in garlic plants over time. The estimated parameters are more uniform and robust in the presence of asymmetry in the distribution of the data, heterogeneous variances, and outliers.


Sign in / Sign up

Export Citation Format

Share Document