scholarly journals Logistic Regression via Excel Spreadsheets: Mechanics, Model Selection, and Relative Predictor Importance

Author(s):  
Michael Brusco

Logistic regression is one of the most fundamental tools in predictive analytics. Graduate business analytics students are often familiarized with implementation of logistic regression using Python, R, SPSS, or other software packages. However, an understanding of the underlying maximum likelihood model and the mechanics of estimation are often lacking. This paper describes two Excel workbooks that can be used to enhance conceptual understanding of logistic regression in several respects: (i) by providing a clear formulation and solution of the maximum likelihood estimation problem; (ii) by showing the process for testing the significance of logistic regression coefficients; (iii) by demonstrating different methods for model selection to avoid overfitting, specifically, all possible subsets ordinary least squares regression and l1-regularized logistic regression (lasso); and (iv) by illustrating the measurement of relative predictor importance using all possible subsets.

1987 ◽  
Vol 24 (4) ◽  
pp. 389-395 ◽  
Author(s):  
Trudy A. Cameron ◽  
Michelle D. James

Closed-ended contingent valuation surveys are used to assess demands in hypothetical markets and recently have been applied widely to the valuation of (non-market) environmental resources. This interviewing strategy holds considerable promise for more general market research applications. The authors describe a new maximum likelihood estimation technique for use with these special data. Unlike previously used methods, the estimated models are as easy to interpret as ordinary least squares regression results and the results can be approximated accurately by packaged probit estimation routines.


Author(s):  
Jeremy Freese

This article presents a method and program for identifying poorly fitting observations for maximum-likelihood regression models for categorical dependent variables. After estimating a model, the program leastlikely will list the observations that have the lowest predicted probabilities of observing the value of the outcome category that was actually observed. For example, when run after estimating a binary logistic regression model, leastlikely will list the observations with a positive outcome that had the lowest predicted probabilities of a positive outcome and the observations with a negative outcome that had the lowest predicted probabilities of a negative outcome. These can be considered the observations in which the outcome is most surprising given the values of the independent variables and the parameter estimates and, like observations with large residuals in ordinary least squares regression, may warrant individual inspection. Use of the program is illustrated with examples using binary and ordered logistic regression.


Author(s):  
Sadriana Rustan ◽  
Muhammad Arif Tiro ◽  
Muhammad Nadjib Bustan

Abstrak. Analisis regresi logistik digunakan untuk menentukan hubungan antara peubah respon bersifat kategori dengan satu atau lebih peubah penjelas dengan asumsi bahwa respon tidak dipengaruhi oleh lokasi geografis (data spasial). Salah satu metode analisis spasial adalah Model Regresi Logistik Terboboti Geografis (RLTG). Model RLTG adalah bentuk regresi logistik lokal di mana lokasi geografis diperhatikan dan diasumsikan memiliki distribusi Bernoulli. Pendugaan parameter model RLTG menggunakan metode Maximum Likelihood Estimation (MLE) dengan memberikan bobot yang berbeda pada lokasi yang berbeda. Data dalam penelitian ini diperoleh dari publikasi Badan Pusat Statistik, yaitu data dan Informasi Kemiskinan di Provinsi Sulawesi Selatan. Penelitian ini bertujuan untuk mengetahui faktor-faktor yang mempengaruhi status kemiskinan di Provinsi Sulawesi Selatan dengan menggunakan model regresi logistik terboboti geografis dengan fungsi pembobot Kernel bisquare. Hasil penelitian menunjukkan bahwa peubah penjelas yang mempengaruhi status kemiskinan di Provinsi Sulawesi Selatan adalah persentase penduduk tidak bekerja dan persentase rumah tangga pengguna jamban bersama.Abstract. Logistic regression a analysis is used to determine the relationship between categorical response variables with one or more predictor variable assuming that the response is not influenced by geographical location (spatial data). One method of spatial analysis is Geographically Weighted Logistic Regression (GWLR). The GWLR model is a local form of logistic regression where the geographical location is considered and assumed to have a Bernoulli distribution. Estimating parameters of the RLTG model uses the Maximum Likelihood Estimation (MLE) method by giving different weights to different locations. The data were obtained from BPS publications, namely Data and Information on Poverty in South Sulawesi Province. This study aims to determine the factors that influence poverty status in South Sulawesi Province using a geographically weighted logistic regression model with kernel bisquare weighting function. The results showed that the explanatory variables that influence the status of poverty in the province of South Sulawesi were the percentage of the population not working and the percentage of common household toilet users.Keywords: logistic regression, kernel bisquare, GWLR and poverty.


Entropy ◽  
2019 ◽  
Vol 21 (6) ◽  
pp. 596
Author(s):  
Antonio Calcagnì ◽  
Livio Finos ◽  
Gianmarco Altoé ◽  
Massimiliano Pastore

In this article, we provide initial findings regarding the problem of solving likelihood equations by means of a maximum entropy (ME) approach. Unlike standard procedures that require equating the score function of the maximum likelihood problem at zero, we propose an alternative strategy where the score is instead used as an external informative constraint to the maximization of the convex Shannon’s entropy function. The problem involves the reparameterization of the score parameters as expected values of discrete probability distributions where probabilities need to be estimated. This leads to a simpler situation where parameters are searched in smaller (hyper) simplex space. We assessed our proposal by means of empirical case studies and a simulation study, the latter involving the most critical case of logistic regression under data separation. The results suggested that the maximum entropy reformulation of the score problem solves the likelihood equation problem. Similarly, when maximum likelihood estimation is difficult, as is the case of logistic regression under separation, the maximum entropy proposal achieved results (numerically) comparable to those obtained by the Firth’s bias-corrected approach. Overall, these first findings reveal that a maximum entropy solution can be considered as an alternative technique to solve the likelihood equation.


2013 ◽  
Vol 59 (3) ◽  
pp. 279-293 ◽  
Author(s):  
M. Corrie Schoeman ◽  
F. P. D. (Woody) Cotterill ◽  
Peter J. Taylor ◽  
Ara Monadjem

Abstract We tested the prediction that at coarse spatial scales, variables associated with climate, energy, and productivity hypotheses should be better predictor(s) of bat species richness than those associated with environmental heterogeneity. Distribution ranges of 64 bat species were estimated with niche-based models informed by 3629 verified museum specimens. The influence of environmental correlates on bat richness was assessed using ordinary least squares regression (OLS), simultaneous autoregressive models (SAR), conditional autoregressive models (CAR), spatial eigenvector-based filtering models (SEVM), and Classification and Regression Trees (CART). To test the assumption of stationarity, Geographically Weighted Regression (GWR) was used. Bat species richness was highest in the eastern parts of southern Africa, particularly in central Zimbabwe and along the western border of Mozambique. We found support for the predictions of both the habitat heterogeneity and climate/productivity/energy hypotheses, and as we expected, support varied among bat families and model selection. Richness patterns and predictors of Miniopteridae and Pteropodidae clearly differed from those of other bat families. Altitude range was the only independent variable that was significant in all models and it was most often the best predictor of bat richness. Standard coefficients of SAR and CAR models were similar to those of OLS models, while those of SEVM models differed. Although GWR indicated that the assumption of stationarity was violated, the CART analysis corroborated the findings of the curve-fitting models. Our results identify where additional data on current species ranges, and future conservation action and ecological work are needed.


2004 ◽  
Vol 32 (1) ◽  
pp. 31-44 ◽  
Author(s):  
Beverly L. Stiles ◽  
Howard B. Kaplan

Theoretically informed models are estimated that specify the direction of the relationship between social comparisons and negative self-feelings. The data are from three waves of an ongoing longitudinal study of adaptations to stress. Subjects are individuals who were tested in their middle teens (T3), mid-twenties (Time 4) and in their mid-thirties (Time 5). The models were estimated using both logistic regression and ordinary least squares regression. In general, the results suggest that negative self-feelings are an antecedent of social comparison processes as negative self-feelings are significantly related to all five measures of social comparison. Findings suggest that negative self-feelings are sometimes a consequence of social comparison processes as negative self-feelings are significantly related to three of the five measures of social comparison.


Sign in / Sign up

Export Citation Format

Share Document