scholarly journals Comparison of Some Methods for Estimating Parameters of General Linear Model in Presence of Heteroscedastic Problem and High Leverage Points

2021 ◽  
Vol 27 (127) ◽  
pp. 213-228
Author(s):  
Qasim Mohammed Saheb ◽  
Saja Mohammad Hussein

Linear regression is one of the most important statistical tools through which it is possible to know the relationship between the response variable and one variable (or more) of the independent variable(s), which is often used in various fields of science. Heteroscedastic is one of the linear regression problems, the effect of which leads to inaccurate conclusions. The problem of heteroscedastic may be accompanied by the presence of extreme outliers in the independent variables (High leverage points) (HLPs), the presence of (HLPs) in the data set result unrealistic estimates and misleading inferences. In this paper, we review some of the robust weighted estimation methods that accommodate both Robust and classical methods in the detection of extreme outliers (High leverage points) (HLPs) and the determination of weights. The methods include both Diagnostic Robust Generalized Potential Based on Minimum Volume Ellipsoid (DRGP (MVE)), Diagnostic Robust Generalized Potential Based on Minimum Covariance Determinant (DRGP (MCD)), and Diagnostic Robust Generalized Potential Based on Index Set Equality (DRGP (ISE)). The comparison was made according to the standard error criterion of the estimated parameters  SE ( ) and SE ( ) of general linear regression model, for sample sizes (n=60, n=100, n=160), with different degree (severity) of heterogeneity, and contamination percentage (HLPs) are (τ =10%, τ=30%). it was found through comparison that weighted least squares estimation based on the weights of the DRGP (ISE) method are considered the best in estimating the parameters of the multiple linear regression model because they have the lowest standard error values of the estimators ( ) and ( )  as compared to other methods. Paper type: A case study

Author(s):  
Abu Sayed Md. Al Mamun ◽  
A.H.M. R. Imon ◽  
A. G. Hussin ◽  
Y. Z. Zubairi ◽  
Sohel Rana

In a standard linear regression model the explanatory variables, , are considered to be fixed and hence assumed to be free from errors. But in reality, they are variables and consequently can be subjected to errors. In the regression literature there is a clear distinction between outlier in the - space or errors and the outlier in the X-space. The later one is popularly known as high leverage points. If the explanatory variables are subjected to gross error or any unusual pattern we call these observations as outliers in the - space or high leverage points. High leverage points often exert too much influence and consequently become responsible for misleading conclusion about the fitting of a regression model, causing multicollinearity problems, masking and/or swamping of outliers etc. Although a good number of works has been done on the identification of high leverage points in linear regression model, this is still a new and unsolved problem in linear functional relationship model. In this paper, we suggest a procedure for the identification of high leverage points based on deletion of a group of observations. The usefulness of the proposed method for the detection of multiple high leverage points is studied by some well-known data set and Monte Carlo simulations.


1995 ◽  
Vol 3 (3) ◽  
pp. 133-142 ◽  
Author(s):  
M. Hana ◽  
W.F. McClure ◽  
T.B. Whitaker ◽  
M. White ◽  
D.R. Bahler

Two artificial neural network models were used to estimate the nicotine in tobacco: (i) a back-propagation network and (ii) a linear network. The back-propagation network consisted of an input layer, an output layer and one hidden layer. The linear network consisted of an input layer and an output layer. Both networks used the generalised delta rule for learning. Performances of both networks were compared to the multiple linear regression method MLR of calibration. The nicotine content in tobacco samples was estimated for two different data sets. Data set A contained 110 near infrared (NIR) spectra each consisting of reflected energy at eight wavelengths. Data set B consisted of 200 NIR spectra with each spectrum having 840 spectral data points. The Fast Fourier transformation was applied to data set B in order to compress each spectrum into 13 Fourier coefficients. For data set A, the linear regression model gave better results followed by the back-propagation network which was followed by the linear network. The true performance of the linear regression model was better than the back-propagation and the linear networks by 14.0% and 18.1%, respectively. For data set B, the back-propagation network gave the best result followed by MLR and the linear network. Both the linear network and MLR models gave almost the same results. The true performance of the back-propagation network model was better than the MLR and linear network by 35.14%.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Qingqi Zhang

In this paper, the author first analyzes the major factors affecting housing prices with Spearman correlation coefficient, selects significant factors influencing general housing prices, and conducts a combined analysis algorithm. Then, the author establishes a multiple linear regression model for housing price prediction and applies the data set of real estate prices in Boston to test the method. Through the data analysis and test in this paper, it can be summarized that the multiple linear regression model can effectively predict and analyze the housing price to some extent, while the algorithm can still be improved through more advanced machine learning methods.


Sign in / Sign up

Export Citation Format

Share Document