Logistic Regression Diagnostics and Problems of Inference

We used multiple logistic regression techniques to develop models for estimating the probability of brook trout (Salvelinus fontinalis) presence/absence as a function of observable water chemistry variables and watershed characteristics. The data set consists of the Adirondack Lakes Survey Corporation data collected on 1469 lakes during 1984–87. Two models fitted to a randomly selected development subset of lakes, using two sets of candidate explanatory/predictor variables of particular interest, were compared on the basis of coefficient consistency and predictive ability. In addition to the usual maximum likelihood logistic regression results, we also applied collinearity and other associated diagnostics and variable-selection procedures designed specifically for the logistic regression model to arrive at parsimonious models. Both models correctly predicted fish presence in more than 85% of the model development set and more than 80% of the lakes in the verification data. For those variables appearing in both models, the signs of the estimated coefficients were the same and in agreement with expectation. The removal of influential observations, as indicated by the logistic regression diagnostics, caused all of the estimated coefficients to increase in absolute magnitude. This results in a model which is more sensitive to changes in the explanatory variables.

Download Full-text

Logistic Regression Diagnostics

The Annals of Statistics ◽

10.1214/aos/1176345513 ◽

1981 ◽

Vol 9 (4) ◽

pp. 705-724 ◽

Cited By ~ 674

Author(s):

Daryl Pregibon

Keyword(s):

Logistic Regression ◽

Regression Diagnostics

Download Full-text

Weighted Multicollinearity in Logistic Regression: Diagnostics and Biased Estimation Techniques with an Example from Lake Acidification

Canadian Journal of Fisheries and Aquatic Sciences ◽

10.1139/f90-131 ◽

1990 ◽

Vol 47 (6) ◽

pp. 1128-1135 ◽

Cited By ~ 9

Author(s):

Brian D. Marx ◽

Eric P. Smith

Keyword(s):

Logistic Regression ◽

Parameter Estimation ◽

Maximum Likelihood ◽

Water Chemistry ◽

Acid Precipitation ◽

Estimation Methods ◽

Parameter Estimates ◽

Regression Diagnostics ◽

Data Set ◽

Biased Estimation

An historical data set from the Adirondack region of New York is revisited to study the relationship between water chemistry variables associated with acid precipitation and the presence/absence of brook trout (Salvelinus fontinalis) and lake trout (Salvelinus namaycush). For the trout species data sets, water chemistry variables associated with acid precipitation, for example pH and alkalinity, are highly correlated. Regression models to assess their effects on the probability of the presence of fish species are therefore affected by multicollinearity. Because the appropriate regressions are logistic, correction techniques based on least squares do not work. Maximum likelihood parameter estimation is highly unstable for the trout presence/absence data. Developments in weighted multicollinearity diagnostics are used to evaluate maximum likelihood logistic regression parameter estimates. Further, an application of biased parameter estimation is presented as an option to the traditional maximum likelihood logistic regression. Biased estimation methods, like ridge, principal component, or Stein estimation can substantially reduce the variance of the parameter estimates and prediction variance for certain future observations. In many cases, only a slight modification to the converged maximum likelihood estimator is necessary.

Download Full-text

An Introduction to Logistic Regression Diagnostics

Applied Logistic Regression Analysis ◽

10.4135/9781412983433.n4 ◽

2014 ◽

pp. 68-91 ◽

Cited By ~ 2

Keyword(s):

Logistic Regression ◽

Regression Diagnostics

Download Full-text

Logistic regression diagnostics in ridge regression

Computational Statistics ◽

10.1007/s00180-017-0755-x ◽

2017 ◽

Vol 33 (2) ◽

pp. 563-593 ◽

Cited By ~ 3

Author(s):

M. Revan Özkale ◽

Stanley Lemeshow ◽

Rodney Sturdivant

Keyword(s):

Logistic Regression ◽

Ridge Regression ◽

Regression Diagnostics

Download Full-text

Applying CHAID for Logistic Regression Diagnostics and Classification Accuracy Improvement

SSRN Electronic Journal ◽

10.2139/ssrn.1412208 ◽

2009 ◽

Cited By ~ 4

Author(s):

Evgeny Antipov ◽

Elena Pokryshevskaya

Keyword(s):

Logistic Regression ◽

Classification Accuracy ◽

Regression Diagnostics ◽

Accuracy Improvement

Download Full-text

Outlier Detection in Logistic Regression

Multidisciplinary Computational Intelligence Techniques ◽

10.4018/978-1-4666-1830-5.ch016 ◽

2012 ◽

pp. 257-278 ◽

Cited By ~ 1

Author(s):

A. A. M. Nurunnabi ◽

A. B. M. S. Ali ◽

A. H. M. Rahmatullah Imon ◽

Mohammed Nasser

Keyword(s):

Logistic Regression ◽

Diagnostic Methods ◽

Regression Diagnostics ◽

Data Sets ◽

Experimental Conditions ◽

Numerical Examples ◽

Logistic Regression Models ◽

Regression Methods ◽

Diagnostic Aspects ◽

Using Data

The use of logistic regression, its modelling and decision making from the estimated model and subsequent analysis has been drawn a great deal of attention since its inception. The current use of logistic regression methods includes epidemiology, biomedical research, criminology, ecology, engineering, pattern recognition, machine learning, wildlife biology, linguistics, business and finance, et cetera. Logistic regression diagnostics have attracted both theoreticians and practitioners in recent years. Detection and handling of outliers is considered as an important task in the data modelling domain, because the presence of outliers often misleads the modelling performances. Traditionally logistic regression models were used to fit data obtained under experimental conditions. But in recent years, it is an important issue to measure the outliers scale before putting the data as a logistic model input. It requires a higher mathematical level than most of the other material that steps backward to its study and application in spite of its inevitability. This chapter presents several diagnostic aspects and methods in logistic regression. Like linear regression, estimates of the logistic regression are sensitive to the unusual observations: outliers, high leverage, and influential observations. Numerical examples and analysis are presented to demonstrate the most recent outlier diagnostic methods using data sets from medical domain.

Download Full-text

Logistic Regression Diagnostics and Problems of Inference

Applying CHAID for logistic regression diagnostics and classification accuracy improvement

Residuals and regression diagnostics: focusing on logistic regression

Multiple-Group Logistic Regression Diagnostics

Selection of Factors Affecting the Presence of Brook Trout (Salvelinus fontinalis) in Adirondack Lakes: A Case Study