influential observations Latest Research Papers

<p>This thesis investigates three research problems which arise in multivariate data and censored regression. The first is the identification of outliers in multivariate data. The second is a dissimilarity measure for clustering purposes. The third is the diagnostics analysis for the Buckley-James method in censored regression. Outliers can be defined simply as an observation (or a subset of observations) that is isolated from the other observations in the data set. There are two main reasons that motivate people to find outliers; the first is the researcher's intention. The second is the effects of an outlier on analyses, i.e. the existence of outliers will affect means, variances and regression coefficients; they will also cause a bias or distortion of estimates; likewise, they will inflate the sums of squares and hence, false conclusions are likely to be created. Sometimes, the identification of outliers is the main objective of the analysis, and whether to remove the outliers or for them to be down-weighted prior to fitting a non-robust model. This thesis does not differentiate between the various justifications for outlier detection. The aim is to advise the analyst of observations that are considerably different from the majority. Note that the techniques for identification of outliers introduce in this thesis is applicable to a wide variety of settings. Those techniques are performed on large and small data sets. In this thesis, observations that are located far away from the remaining data are considered to be outliers. Additionally, it is noted that some techniques for the identification of outliers are available for finding clusters. There are two major challenges in clustering. The first is identifying clusters in high-dimensional data sets is a difficult task because of the curse of dimensionality. The second is a new dissimilarity measure is needed as some traditional distance functions cannot capture the pattern dissimilarity among the objects. This thesis deals with the latter challenge. This thesis introduces Influence Angle Cluster Approach (iaca) that may be used as a dissimilarity matrix and the author has managed to show that iaca successfully develops a cluster when it is used in partitioning clustering, even if the data set has mixed variables, i.e. interval and categorical variables. The iaca is developed based on the influence eigenstructure. The first two problems in this thesis deal with a complete data set. It is also interesting to study about the incomplete data set, i.e. censored data set. The term 'censored' is mostly used in biological science areas such as a survival analysis. Nowadays, researchers are interested in comparing the survival distribution of two samples. Even though this can be done by using the logrank test, this method cannot examine the effects of more than one variable at a time. This difficulty can easily be overcome by using the survival regression model. Examples of the survival regression model are the Cox model, Miller's model, the Buckely James model and the Koul- Susarla-Van Ryzin model. The Buckley James model's performance is comparable with the Cox model and the former performs best when compared both to the Miller model and the Koul-Susarla-Van Ryzin model. Previous comparison studies proved that the Buckley-James estimator is more stable and easier to explain to non-statisticians than the Cox model. Today, researchers are interested in using the Cox model instead of the Buckley-James model. This is because of the lack of function of Buckley-James model in the computer software and choices of diagnostics analysis. Currently, there are only a few diagnostics analyses for Buckley James model that exist. Therefore, this thesis proposes two new diagnostics analyses for the Buckley-James model. The first proposed diagnostics analysis is called renovated Cook's distance. This method produces comparable results with the previous findings. Nevertheless, this method cannot identify influential observations from the censored group. It can only detect influential observations from the uncensored group. This issue needs further investigation because of the possibility of censored points becoming influential cases in censored regression. Secondly, the local influence approach for the Buckley-James model is proposed. This thesis presents the local influence diagnostics of the Buckley-James model which consist of variance perturbation, response variable perturbation, censoring status perturbation, and independent variables perturbation. The proposed diagnostics improves and also challenge findings of the previous ones by taking into account both censored and uncensored data to have a possibility to become an influential observation.</p>

Download Full-text

Analysis and Diagnostics for Censored Regression and Multivariate Data

10.26686/wgtn.16973998 ◽

2021 ◽

Author(s):

◽

Nazrina Aziz

Keyword(s):

Regression Model ◽

Cox Model ◽

Multivariate Data ◽

Dissimilarity Measure ◽

Local Influence ◽

Data Sets ◽

Influential Observations ◽

Censored Regression ◽

Data Set ◽

Survival Regression

<p>This thesis investigates three research problems which arise in multivariate data and censored regression. The first is the identification of outliers in multivariate data. The second is a dissimilarity measure for clustering purposes. The third is the diagnostics analysis for the Buckley-James method in censored regression. Outliers can be defined simply as an observation (or a subset of observations) that is isolated from the other observations in the data set. There are two main reasons that motivate people to find outliers; the first is the researcher's intention. The second is the effects of an outlier on analyses, i.e. the existence of outliers will affect means, variances and regression coefficients; they will also cause a bias or distortion of estimates; likewise, they will inflate the sums of squares and hence, false conclusions are likely to be created. Sometimes, the identification of outliers is the main objective of the analysis, and whether to remove the outliers or for them to be down-weighted prior to fitting a non-robust model. This thesis does not differentiate between the various justifications for outlier detection. The aim is to advise the analyst of observations that are considerably different from the majority. Note that the techniques for identification of outliers introduce in this thesis is applicable to a wide variety of settings. Those techniques are performed on large and small data sets. In this thesis, observations that are located far away from the remaining data are considered to be outliers. Additionally, it is noted that some techniques for the identification of outliers are available for finding clusters. There are two major challenges in clustering. The first is identifying clusters in high-dimensional data sets is a difficult task because of the curse of dimensionality. The second is a new dissimilarity measure is needed as some traditional distance functions cannot capture the pattern dissimilarity among the objects. This thesis deals with the latter challenge. This thesis introduces Influence Angle Cluster Approach (iaca) that may be used as a dissimilarity matrix and the author has managed to show that iaca successfully develops a cluster when it is used in partitioning clustering, even if the data set has mixed variables, i.e. interval and categorical variables. The iaca is developed based on the influence eigenstructure. The first two problems in this thesis deal with a complete data set. It is also interesting to study about the incomplete data set, i.e. censored data set. The term 'censored' is mostly used in biological science areas such as a survival analysis. Nowadays, researchers are interested in comparing the survival distribution of two samples. Even though this can be done by using the logrank test, this method cannot examine the effects of more than one variable at a time. This difficulty can easily be overcome by using the survival regression model. Examples of the survival regression model are the Cox model, Miller's model, the Buckely James model and the Koul- Susarla-Van Ryzin model. The Buckley James model's performance is comparable with the Cox model and the former performs best when compared both to the Miller model and the Koul-Susarla-Van Ryzin model. Previous comparison studies proved that the Buckley-James estimator is more stable and easier to explain to non-statisticians than the Cox model. Today, researchers are interested in using the Cox model instead of the Buckley-James model. This is because of the lack of function of Buckley-James model in the computer software and choices of diagnostics analysis. Currently, there are only a few diagnostics analyses for Buckley James model that exist. Therefore, this thesis proposes two new diagnostics analyses for the Buckley-James model. The first proposed diagnostics analysis is called renovated Cook's distance. This method produces comparable results with the previous findings. Nevertheless, this method cannot identify influential observations from the censored group. It can only detect influential observations from the uncensored group. This issue needs further investigation because of the possibility of censored points becoming influential cases in censored regression. Secondly, the local influence approach for the Buckley-James model is proposed. This thesis presents the local influence diagnostics of the Buckley-James model which consist of variance perturbation, response variable perturbation, censoring status perturbation, and independent variables perturbation. The proposed diagnostics improves and also challenge findings of the previous ones by taking into account both censored and uncensored data to have a possibility to become an influential observation.</p>

Download Full-text

Detection of Influential Observations in Spatial Regression Model Based on Outliers and Bad Leverage Classification

Symmetry ◽

10.3390/sym13112030 ◽

2021 ◽

Vol 13 (11) ◽

pp. 2030

Author(s):

Ali Mohammed Baba ◽

Habshah Midi ◽

Mohd Bakri Adam ◽

Nur Haizum Abd Rahman

Keyword(s):

Regression Model ◽

Regression Models ◽

Spatial Regression ◽

Spatial Models ◽

Influential Observations ◽

Leverage Points ◽

Cook’S Distance ◽

Spatial Regression Models ◽

Cook's Distance ◽

Classical Regression

Influential observations (IOs), which are outliers in the x direction, y direction or both, remain a problem in the classical regression model fitting. Spatial regression models have a peculiar kind of outliers because they are local in nature. Spatial regression models are also not free from the effect of influential observations. Researchers have adapted some classical regression techniques to spatial models and obtained satisfactory results. However, masking or/and swamping remains a stumbling block for such methods. In this article, we obtain a measure of spatial Studentized prediction residuals that incorporate spatial information on the dependent variable and the residuals. We propose a robust spatial diagnostic plot to classify observations into regular observations, vertical outliers, good and bad leverage points using a classification based on spatial Studentized prediction residuals and spatial diagnostic potentials, which we refer to as and . Observations that fall into the vertical outliers and bad leverage points categories are referred to as IOs. Representations of some classical regression measures of diagnostic in general spatial models are presented. The commonly used diagnostic measure in spatial diagnostics, the Cook’s distance, is compared to some robust methods, (using robust and non-robust measures), and our proposed and plots. Results of our simulation study and applications to real data showed that the Cook’s distance, non-robust and robust were not very successful in detecting IOs. The suffered from the masking effect, and the robust suffered from swamping in general spatial models. Interestingly, the results showed that the proposed plot, followed by the plot, was very successful in classifying observations into the correct groups, hence correctly detecting the real IOs.

Download Full-text

Influence Diagnostic Methods in the Poisson Regression Model with the Liu Estimator

Computational Intelligence and Neuroscience ◽

10.1155/2021/4407328 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Aamna Khan ◽

Muhammad Amanullah ◽

Muhammad Amin ◽

Randa Alharbi ◽

Abdisalam Hassan Muse ◽

...

Keyword(s):

Regression Model ◽

Count Data ◽

Poisson Regression ◽

Model Fitting ◽

Estimation Method ◽

Real Data ◽

Diagnostic Methods ◽

Influential Observations ◽

Poisson Regression Model ◽

Liu Estimator

There is a long history of interest in modeling Poisson regression in different fields of study. The focus of this work is on handling the issues that occur after modeling the count data. For the prediction and analysis of count data, it is valuable to study the factors that influence the performance of the model and the decision based on the analysis of that model. In regression analysis, multicollinearity and influential observations separately and jointly affect the model estimation and inferences. In this article, we focused on multicollinearity and influential observations simultaneously. To evaluate the reliability and quality of regression estimates and to overcome the problems in model fitting, we proposed new diagnostic methods based on Sherman–Morrison Woodbury (SMW) theorem to detect the influential observations using approximate deletion formulas for the Poisson regression model with the Liu estimator. A Monte Carlo method is done for the assessment of the proposed diagnostic methods. Real data are also considered for the evaluation of the proposed methods. Results show the superiority of the proposed diagnostic methods in detecting unusual observations in the presence of multicollinearity compared to the traditional maximum likelihood estimation method.

Download Full-text

ROBUST PARAMETER ESTIMATION FOR RANDOM EFFECT PANEL DATA MODEL IN THE PRESENCE OF HETEROSCEDASTICITY AND INFLUENTIAL OBSERVATIONS

FUDMA Journal of Sciences ◽

10.33003/fjs-2020-0404-689 ◽

2021 ◽

Vol 4 (4) ◽

pp. 561-569

Author(s):

Sani Muhammad ◽

Suleiman Shamsuddeen ◽

Ismail G. Baoku

Keyword(s):

Panel Data ◽

Data Model ◽

Random Effect ◽

Estimation Method ◽

Panel Data Model ◽

Influential Observations ◽

Weighting Method ◽

Robust Parameter ◽

Robust Parameter Estimation ◽

Combine Problem

Panel data estimators can strongly be biased and inconsistent in the presence of heteroscedasticity and anomalous observations called influential observations (IOs) in Random effect (RE) panel data model. The existing methods (LWS, WLSF, WLSDRGP) address only the problem of IO but fail to remedy the combine problem of heteroscedasticity and IOs. Therefore, in this research we develop a method that will remedy the combine problem of heteroscedasticity and IOs based on robust heteroscedasticity consistent covariance matrix (RHCCM) estimator and fast improvised influential distance (FIID) weighting method denoted by WLSFIID. The simulation and numerical evidences show that our proposed estimation method is more efficient than the existing methods by providing smallest bias, and smallest standard error of HC4 and HC5.

Download Full-text

Detection of Influential Observations in Spatial Regression Model Based on Outliers and Bad Leverage Classification

10.20944/preprints202108.0178.v1 ◽

2021 ◽

Author(s):

Ali Mohammed Baba ◽

Habshah Midi ◽

Mohd Bakri Adam ◽

Nur Haizum Bint Abd Rahman

Keyword(s):

Regression Model ◽

Spatial Model ◽

Spatial Regression ◽

Model Fitting ◽

Spatial Prediction ◽

Influential Observations ◽

Spatial Regression Model ◽

Cook’S Distance ◽

Cook's Distance ◽

Classical Regression

Influential Observations, which are outliers in x direction, y direction or both, remain a hitch in classical regression model fitting. Spatial regression model, with peculiar nature of outliers due to their local nature, is not free from the effect of such influential observations. Researchers have adapted some classical regression techniques to the spatial models and yielded satisfactory results. However, masking or/and swamping remain stumbling block to such methods. We obtained the spatial representation of the classical regression measures of diagnostic in general spatial model. Commonly used diagnostic measure in spatial diagnostic, the Cook's distance, is compared to some robust methods, Hi2 (using robust and non-robust measures), and classification based on generalized residuals and diagnostic generalized potentials, ISRs-Posi and ESRs-Posi, with the help of the obtained spatial prediction residuals and the spatial leverage term. Results of simulation and applications to real data have shown the advantage of the ISRs-Posi and ESRs-Posi due to classification of outliers over Cook's distance and non-robust Hsi12, which suffer from masking, and robust Hsi22 which suffer from swamping in general spatial model.

Download Full-text

Leverage and Influential Observations on the Liu Type Estimator in the Linear Regression Model with the Severe Collinearity

Heliyon ◽

10.1016/j.heliyon.2021.e07792 ◽

2021 ◽

pp. e07792

Author(s):

Hussein Eledum

Keyword(s):

Linear Regression ◽

Regression Model ◽

Linear Regression Model ◽

Influential Observations

Download Full-text

Fast Improvised Influential Distance for the Identification of Influential Observations in Multiple Linear Regression

Sains Malaysiana ◽

10.17576/jsm-2021-5007-22 ◽

2021 ◽

Vol 50 (7) ◽

pp. 2085-2094

Author(s):

Habshah Midi ◽

Muhammad Sani ◽

Shelan Saied Ismaeel ◽

Jayanthi Arasan

Keyword(s):

Linear Regression ◽

Regression Model ◽

Multiple Linear Regression ◽

Linear Regression Model ◽

Multiple Linear Regression Model ◽

Real Data ◽

Initial Step ◽

Influential Observations ◽

Monte Carlo Simulation Study ◽

Running Time

Influential observations (IO) are those observations that are responsible for misleading conclusions about the fitting of a multiple linear regression model. The existing IO identification methods such as influential distance (ID) is not very successful in detecting IO. It is suspected that the ID employed inefficient method with long computational running time for the identification of the suspected IO at the initial step. Moreover, this method declares good leverage observations as IO, resulting in misleading conclusion. In this paper, we proposed fast improvised influential distance (FIID) that can successfully identify IO, good leverage observations, and regular observations with shorter computational running time. Monte Carlo simulation study and real data examples show that the FIID correctly identify genuine IO in multiple linear regression model with no masking and a negligible swamping rate.

Download Full-text

Influential Observations: Leverage Points and Outliers

Linear Regression Models ◽

10.1201/9781003162230-14 ◽

2021 ◽

pp. 293-312

Author(s):

John P. Hoffmann

Keyword(s):

Influential Observations ◽

Leverage Points

Download Full-text

influential observations
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Empirical Bayes Model Averaging with Influential Observations: Tuning Zellner’s g Prior for Predictive Robustness

Analysis and Diagnostics for Censored Regression and Multivariate Data

Analysis and Diagnostics for Censored Regression and Multivariate Data

Detection of Influential Observations in Spatial Regression Model Based on Outliers and Bad Leverage Classification

Influence Diagnostic Methods in the Poisson Regression Model with the Liu Estimator

ROBUST PARAMETER ESTIMATION FOR RANDOM EFFECT PANEL DATA MODEL IN THE PRESENCE OF HETEROSCEDASTICITY AND INFLUENTIAL OBSERVATIONS

Detection of Influential Observations in Spatial Regression Model Based on Outliers and Bad Leverage Classification

Leverage and Influential Observations on the Liu Type Estimator in the Linear Regression Model with the Severe Collinearity

Fast Improvised Influential Distance for the Identification of Influential Observations in Multiple Linear Regression

Influential Observations: Leverage Points and Outliers

Export Citation Format

influential observationsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Empirical Bayes Model Averaging with Influential Observations: Tuning Zellner’s g Prior for Predictive Robustness

Analysis and Diagnostics for Censored Regression and Multivariate Data

Analysis and Diagnostics for Censored Regression and Multivariate Data

Detection of Influential Observations in Spatial Regression Model Based on Outliers and Bad Leverage Classification

Influence Diagnostic Methods in the Poisson Regression Model with the Liu Estimator

ROBUST PARAMETER ESTIMATION FOR RANDOM EFFECT PANEL DATA MODEL IN THE PRESENCE OF HETEROSCEDASTICITY AND INFLUENTIAL OBSERVATIONS

Detection of Influential Observations in Spatial Regression Model Based on Outliers and Bad Leverage Classification

Leverage and Influential Observations on the Liu Type Estimator in the Linear Regression Model with the Severe Collinearity

Fast Improvised Influential Distance for the Identification of Influential Observations in Multiple Linear Regression

Influential Observations: Leverage Points and Outliers

influential observations
Recently Published Documents