scholarly journals Quantile-Based Estimation of Liu Parameter in the Linear Regression Model: Applications to Portland Cement and US Crime Data

2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Muhammad Suhail ◽  
Iqra Babar ◽  
Yousaf Ali Khan ◽  
Muhammad Imran ◽  
Zeeshan Nawaz

In multiple linear regression models, the multicollinearity problem mostly occurs when the explanatory variables are correlated among each other. It is well known that when the multicollinearity exists, the variance of the ordinary least square estimator is unstable. As a remedy, Liu in [1] developed a new method of estimation with biasing parameter d. In this paper, we have introduced a new method to estimate the biasing parameter in order to mitigate the problem of multicollinearity. The proposed method provides the class of estimators that are based on quantile of the regression coefficients. The performance of the new estimators is compared with the existing estimators through Monte Carlo simulation, where mean squared error and mean absolute error are considered as evaluation criteria of the estimators. Portland cement and US Crime data is used as an application to illustrate the benefit of the new estimators. Based on simulation and numerical study, it is concluded that the new estimators outperform the existing estimators in certain situations including high and severe cases of multicollinearity. 95% mean prediction interval of all the estimators is also computed for the Portland cement data. We recommend the use of new method to practitioners when the problem of high multicollinearity exists among the explanatory variables.

Author(s):  
Warha, Abdulhamid Audu ◽  
Yusuf Abbakar Muhammad ◽  
Akeyede, Imam

Linear regression is the measure of relationship between two or more variables known as dependent and independent variables. Classical least squares method for estimating regression models consist of minimising the sum of the squared residuals. Among the assumptions of Ordinary least squares method (OLS) is that there is no correlations (multicollinearity) between the independent variables. Violation of this assumptions arises most often in regression analysis and can lead to inefficiency of the least square method. This study, therefore, determined the efficient estimator between Least Absolute Deviation (LAD) and Weighted Least Square (WLS) in multiple linear regression models at different levels of multicollinearity in the explanatory variables. Simulation techniques were conducted using R Statistical software, to investigate the performance of the two estimators under violation of assumptions of lack of multicollinearity. Their performances were compared at different sample sizes. Finite properties of estimators’ criteria namely, mean absolute error, absolute bias and mean squared error were used for comparing the methods. The best estimator was selected based on minimum value of these criteria at a specified level of multicollinearity and sample size. The results showed that, LAD was the best at different levels of multicollinearity and was recommended as alternative to OLS under this condition. The performances of the two estimators decreased when the levels of multicollinearity was increased.


PLoS ONE ◽  
2021 ◽  
Vol 16 (11) ◽  
pp. e0259991
Author(s):  
Iqra Babar ◽  
Hamdi Ayed ◽  
Sohail Chand ◽  
Muhammad Suhail ◽  
Yousaf Ali Khan ◽  
...  

Background The problem of multicollinearity in multiple linear regression models arises when the predictor variables are correlated among each other. The variance of the ordinary least squared estimator become unstable in such situation. In order to mitigate the problem of multicollinearity, Liu regression is widely used as a biased method of estimation with shrinkage parameter ‘d’. The optimal value of shrinkage parameter plays a vital role in bias-variance trade-off. Limitation Several estimators are available in literature for the estimation of shrinkage parameter. But the existing estimators do not perform well in terms of smaller mean squared error when the problem of multicollinearity is high or severe. Methodology In this paper, some new estimators for the shrinkage parameter are proposed. The proposed estimators are the class of estimators that are based on quantile of the regression coefficients. The performance of the new estimators is compared with the existing estimators through Monte Carlo simulation. Mean squared error and mean absolute error is considered as evaluation criteria of the estimators. Tobacco dataset is used as an application to illustrate the benefits of the new estimators and support the simulation results. Findings The new estimators outperform the existing estimators in most of the considered scenarios including high and severe cases of multicollinearity. 95% mean prediction interval of all the estimators is also computed for the Tobacco data. The new estimators give the best mean prediction interval among all other estimators. The implications of the findings We recommend the use of new estimators to practitioners when the problem of high to severe multicollinearity exists among the predictor variables.


Author(s):  
Paolo Giudici

Several classes of computational and statistical methods for data mining are available. Each class can be parameterised so that models within the class differ in terms of such parameters (see, for instance, Giudici, 2003; Hastie et al., 2001; Han & Kamber, 2000; Hand et al., 2001; Witten & Frank, 1999): for example, the class of linear regression models, which differ in the number of explanatory variables; the class of Bayesian networks, which differ in the number of conditional dependencies (links in the graph); the class of tree models, which differ in the number of leaves; and the class multi-layer perceptrons, which differ in terms of the number of hidden strata and nodes. Once a class of models has been established the problem is to choose the “best” model from it.


Agriculture ◽  
2020 ◽  
Vol 10 (8) ◽  
pp. 348
Author(s):  
Marcelo Chan Fu Wei ◽  
José Paulo Molin

Soybean yield estimation is either based on yield monitors or agro-meteorological and satellite imagery data, but they present several limiting factors regarding on-farm decision level. Aware that machine learning approaches have been largely applied to estimate soybean yield and the availability of data regarding soybean yield and its components (number of grains (NG) and thousand grains weight (TGW)), there is an opportunity to study their relationships. The objective was to explore the relationships between soybean yield and its components, generate equations to estimate yield and evaluate its prediction accuracy. The training dataset was composed of soybean yield and its components’ data from 2010 to 2019. Linear regression models based on NG, TGW and yield were fitted on the training dataset and applied to a validation dataset composed of 58 on-field collected samples. It was found that globally TGW and NG presented weak (r = 0.50) and strong (r = 0.92) linear relationships with yield, respectively. In addition to that, applying the fitted models to the validation dataset, model based on NG presented the highest accuracy, coefficient of determination (R2) of 0.70, mean absolute error (MAE) of 639.99 kg ha−1 and root mean squared error (RMSE) of 726.67 kg ha−1.


Healthcare ◽  
2020 ◽  
Vol 8 (4) ◽  
pp. 525
Author(s):  
Samer A Kharroubi

Background: Typically, modeling of health-related quality of life data is often troublesome since its distribution is positively or negatively skewed, spikes at zero or one, bounded and heteroscedasticity. Objectives: In the present paper, we aim to investigate whether Bayesian beta regression is appropriate for analyzing the SF-6D health state utility scores and respondent characteristics. Methods: A sample of 126 Lebanese members from the American University of Beirut valued 49 health states defined by the SF-6D using the standard gamble technique. Three different models were fitted for SF-6D via Bayesian Markov chain Monte Carlo (MCMC) simulation methods. These comprised a beta regression, random effects and random effects with covariates. Results from applying the three Bayesian beta regression models were reported and compared based on their predictive ability to previously used linear regression models, using mean prediction error (MPE), root mean squared error (RMSE) and deviance information criterion (DIC). Results: For the three different approaches, the beta regression model was found to perform better than the normal regression model under all criteria used. The beta regression with random effects model performs best, with MPE (0.084), RMSE (0.058) and DIC (−1621). Compared to the traditionally linear regression model, the beta regression provided better predictions of observed values in the entire learning sample and in an out-of-sample validation. Conclusions: Beta regression provides a flexible approach to modeling health state values. It also accounted for the boundedness and heteroscedasticity of the SF-6D index scores. Further research is encouraged.


2012 ◽  
Vol 45 (16) ◽  
pp. 1629-1634 ◽  
Author(s):  
Diego Eckhard ◽  
Håkan Hjalmarsson ◽  
Cristian R. Rojas ◽  
Michel Gevers

2021 ◽  
Vol 2 (1) ◽  
pp. 12-20
Author(s):  
Kayode Ayinde, Olusegun O. Alabi ◽  
Ugochinyere Ihuoma Nwosu

Multicollinearity has remained a major problem in regression analysis and should be sustainably addressed. Problems associated with multicollinearity are worse when it occurs at high level among regressors. This review revealed that studies on the subject have focused on developing estimators regardless of effect of differences in levels of multicollinearity among regressors. Studies have considered single-estimator and combined-estimator approaches without sustainable solution to multicollinearity problems. The possible influence of partitioning the regressors according to multicollinearity levels and extracting from each group to develop estimators that will estimate the parameters of a linear regression model when multicollinearity occurs is a new econometrics idea and therefore requires attention. The results of new studies should be compared with existing methods namely principal components estimator, partial least squares estimator, ridge regression estimator and the ordinary least square estimators using wide range of criteria by ranking their performances at each level of multicollinearity parameter and sample size. Based on a recent clue in literature, it is possible to develop innovative estimator that will sustainably solve the problem of multicollinearity through partitioning and extraction of explanatory variables approaches and identify situations where the innovative estimator will produce most efficient result of the model parameters. The new estimator should be applied to real data and popularized for use.


Author(s):  
Paolo Giudici

Several classes of computational and statistical methods for data mining are available. Each class can be parameterised so that models within the class differ in terms of such parameters (See for instance Giudici, 2003, Hastie et al., 2001, Han and Kamber, 200, Hand et al, 2001 and Witten and Frank, 1999). For example the class of linear regression models, which differ in the number of explanatory variables; the class of bayesian networks, which differ in the number of conditional dependencies (links in the graph); the class of tree models, which differ in the number of leaves and the class multi-layer perceptrons which differ in terms of the number of hidden strata and nodes. Once a class of models has been established the problem is to choose the “best” model from it.


2021 ◽  
Author(s):  
Yuezhou Zhang ◽  
Amos A Folarin ◽  
Shaoxiong Sun ◽  
Nicholas Cummins ◽  
Yatharth Ranjan ◽  
...  

BACKGROUND The Bluetooth sensor embedded in mobile phones provides an unobtrusive, continuous, and cost-efficient means to capture individuals’ proximity information, such as the nearby Bluetooth devices count (NBDC). The continuous NBDC data can partially reflect individuals’ behaviors and status, such as social connections and interactions, working status, mobility, and social isolation and loneliness, which were found to be significantly associated with depression by previous survey-based studies. OBJECTIVE This paper aims to explore the NBDC data’s value in predicting depressive symptom severity as measured via the 8-item Patient Health Questionnaire (PHQ-8). METHODS The data used in this paper included 2,886 bi-weekly PHQ-8 records collected from 316 participants recruited from three study sites in the Netherlands, Spain, and the UK as part of the EU RADAR-CNS study. From the NBDC data two weeks prior to each PHQ-8 score, we extracted 49 Bluetooth features, including statistical features and nonlinear features for measuring periodicity and regularity of individuals’ life rhythms. Linear mixed-effect models were used to explore associations between Bluetooth features and the PHQ-8 score. We then applied hierarchical Bayesian linear regression models to predict the PHQ-8 score from the extracted Bluetooth features. RESULTS A number of significant associations were found between Bluetooth features and depressive symptom severity. Generally speaking, along with the depressive symptoms worsening, one or more of the following changes were found in the preceding two weeks’ NBDC data: (1) the amount decreased, (2) the variance decreased, (3) the periodicity (especially circadian rhythm) decreased, and (4) the NBDC sequence became more irregular. Compared with commonly used machine learning models, the proposed hierarchical Bayesian linear regression model achieved the best prediction metrics, R^2= 0.526, and root mean squared error (RMSE) of 3.891. Bluetooth features can explain an extra 18.8% of the variance in the PHQ-8 score relative to the baseline model without Bluetooth features (R^2=0.338, RMSE = 4.547). CONCLUSIONS Our statistical results indicate that the NBDC data has the potential to reflect changes in individuals’ behaviors and status concurrent with the changes in the depressive state. The prediction results demonstrate the NBDC data has a significant value in predicting depressive symptom severity. These findings may have utility for mental health monitoring practice in real-world settings.


Sign in / Sign up

Export Citation Format

Share Document