Logistic Regression Diagnostics and Problems of Inference

1992 ◽  
Vol 49 (3) ◽  
pp. 597-608 ◽  
Author(s):  
J. J. Beauchamp ◽  
S. W. Christensen ◽  
E. P. Smith

We used multiple logistic regression techniques to develop models for estimating the probability of brook trout (Salvelinus fontinalis) presence/absence as a function of observable water chemistry variables and watershed characteristics. The data set consists of the Adirondack Lakes Survey Corporation data collected on 1469 lakes during 1984–87. Two models fitted to a randomly selected development subset of lakes, using two sets of candidate explanatory/predictor variables of particular interest, were compared on the basis of coefficient consistency and predictive ability. In addition to the usual maximum likelihood logistic regression results, we also applied collinearity and other associated diagnostics and variable-selection procedures designed specifically for the logistic regression model to arrive at parsimonious models. Both models correctly predicted fish presence in more than 85% of the model development set and more than 80% of the lakes in the verification data. For those variables appearing in both models, the signs of the estimated coefficients were the same and in agreement with expectation. The removal of influential observations, as indicated by the logistic regression diagnostics, caused all of the estimated coefficients to increase in absolute magnitude. This results in a model which is more sensitive to changes in the explanatory variables.


1981 ◽  
Vol 9 (4) ◽  
pp. 705-724 ◽  
Author(s):  
Daryl Pregibon

1990 ◽  
Vol 47 (6) ◽  
pp. 1128-1135 ◽  
Author(s):  
Brian D. Marx ◽  
Eric P. Smith

An historical data set from the Adirondack region of New York is revisited to study the relationship between water chemistry variables associated with acid precipitation and the presence/absence of brook trout (Salvelinus fontinalis) and lake trout (Salvelinus namaycush). For the trout species data sets, water chemistry variables associated with acid precipitation, for example pH and alkalinity, are highly correlated. Regression models to assess their effects on the probability of the presence of fish species are therefore affected by multicollinearity. Because the appropriate regressions are logistic, correction techniques based on least squares do not work. Maximum likelihood parameter estimation is highly unstable for the trout presence/absence data. Developments in weighted multicollinearity diagnostics are used to evaluate maximum likelihood logistic regression parameter estimates. Further, an application of biased parameter estimation is presented as an option to the traditional maximum likelihood logistic regression. Biased estimation methods, like ridge, principal component, or Stein estimation can substantially reduce the variance of the parameter estimates and prediction variance for certain future observations. In many cases, only a slight modification to the converged maximum likelihood estimator is necessary.


2017 ◽  
Vol 33 (2) ◽  
pp. 563-593 ◽  
Author(s):  
M. Revan Özkale ◽  
Stanley Lemeshow ◽  
Rodney Sturdivant

Author(s):  
A. A. M. Nurunnabi ◽  
A. B. M. S. Ali ◽  
A. H. M. Rahmatullah Imon ◽  
Mohammed Nasser

The use of logistic regression, its modelling and decision making from the estimated model and subsequent analysis has been drawn a great deal of attention since its inception. The current use of logistic regression methods includes epidemiology, biomedical research, criminology, ecology, engineering, pattern recognition, machine learning, wildlife biology, linguistics, business and finance, et cetera. Logistic regression diagnostics have attracted both theoreticians and practitioners in recent years. Detection and handling of outliers is considered as an important task in the data modelling domain, because the presence of outliers often misleads the modelling performances. Traditionally logistic regression models were used to fit data obtained under experimental conditions. But in recent years, it is an important issue to measure the outliers scale before putting the data as a logistic model input. It requires a higher mathematical level than most of the other material that steps backward to its study and application in spite of its inevitability. This chapter presents several diagnostic aspects and methods in logistic regression. Like linear regression, estimates of the logistic regression are sensitive to the unusual observations: outliers, high leverage, and influential observations. Numerical examples and analysis are presented to demonstrate the most recent outlier diagnostic methods using data sets from medical domain.


Sign in / Sign up

Export Citation Format

Share Document