Robust weighted kernel logistic regression in imbalanced and rare events data

The logistic regression (LR) model for assessing differential item functioning (DIF) is highly dependent on the asymptotic sampling distributions. However, for rare events data, the maximum likelihood estimation method may be biased and the asymptotic distributions may not be reliable. In this study, the performance of the regular maximum likelihood (ML) estimation is compared with two bias correction methods including weighted logistic regression (WLR) and Firth's penalized maximum likelihood (PML) to assess DIF for imbalanced or rare events data. The power and type I error rate of the LR model for detecting DIF were investigated under different combinations of sample size, moderate and severe magnitudes of uniform DIF (DIF = 0.4 and 0.8), sample size ratio, number of items, and the imbalanced degree (τ). Indeed, as compared with WLR and for severe imbalanced degree (τ = 0.069), there were reductions of approximately 30% and 24% under DIF = 0.4 and 27% and 23% under DIF = 0.8 in the power of the PML and ML, respectively. The present study revealed that the WLR outperforms both the ML and PML estimation methods when logistic regression is used to evaluate DIF for imbalanced or rare events data.

Download Full-text

Logistic Regression in Rare Events Data

Political Analysis ◽

10.1093/oxfordjournals.pan.a004868 ◽

2001 ◽

Vol 9 (2) ◽

pp. 137-163 ◽

Cited By ~ 1740

Author(s):

Gary King ◽

Langche Zeng

Keyword(s):

Logistic Regression ◽

Data Collection ◽

Rare Events ◽

Explanatory Variables ◽

Relative Risks ◽

Efficient Sampling ◽

Dependent Variables ◽

Data Collections ◽

Events Data ◽

Collection Strategies

We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quarter-million dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99% of their (nonfixed) data collection costs or to collect much more meaningful explanatory variables. We provide methods that link these two results, enabling both types of corrections to work simultaneously, and software that implements the methods developed.

Download Full-text

Estimating group fixed effects in panel data with a binary dependent variable: How the LPM outperforms logistic regression in rare events data

Social Science Research ◽

10.1016/j.ssresearch.2020.102486 ◽

2021 ◽

Vol 93 ◽

pp. 102486

Author(s):

Joan C. Timoneda

Keyword(s):

Logistic Regression ◽

Panel Data ◽

Fixed Effects ◽

Rare Events ◽

Binary Dependent Variable ◽

Events Data

Download Full-text

Video Genre Classification Using Weighted Kernel Logistic Regression

Advances in Multimedia ◽

10.1155/2013/653687 ◽

2013 ◽

Vol 2013 ◽

pp. 1-6 ◽

Cited By ~ 7

Author(s):

Ahmed A. M. Hamed ◽

Renfa Li ◽

Zhang Xiaoming ◽

Cheng Xu

Keyword(s):

Logistic Regression ◽

Video Data ◽

Semantic Gap ◽

Good Representation ◽

Classification Problems ◽

Computational Tools ◽

Kernel Logistic Regression ◽

Genre Classification ◽

Weighted Kernel ◽

Multiclass Classification Problems

Due to the widening semantic gap of videos, computational tools to classify these videos into different genre are highly needed to narrow it. Classifying videos accurately demands good representation of video data and an efficient and effective model to carry out the classification task. Kernel Logistic Regression (KLR), kernel version of logistic regression (LR), proves its efficiency as a classifier, which can naturally provide probabilities and extend to multiclass classification problems. In this paper, Weighted Kernel Logistic Regression (WKLR) algorithm is implemented for video genre classification to obtain significant accuracy, and it shows accurate and faster good results.

Download Full-text

Geographically weighted kernel logistic regression for small area proportion estimation

Journal of the Korean Data and Information Science Society ◽

10.7465/jkdi.2016.27.2.531 ◽

2016 ◽

Vol 27 (2) ◽

pp. 531-538

Author(s):

Jooyong Shim ◽

Changha Hwang

Keyword(s):

Logistic Regression ◽

Small Area ◽

Kernel Logistic Regression ◽

Proportion Estimation ◽

Weighted Kernel

Download Full-text

Predicting Mobile Portability Across Telecommunication Networks Using the Integrated-KLR

International Journal of Intelligent Information Technologies ◽

10.4018/ijiit.2021070104 ◽

2021 ◽

Vol 17 (3) ◽

pp. 50-62

Author(s):

Ayodeji Samuel Makinde ◽

Abayomi O. Agbeyangi ◽

Wilson Nwankwo

Keyword(s):

Logistic Regression ◽

Telecommunication Networks ◽

Mobile Service ◽

Personal Choice ◽

Kernel Logistic Regression ◽

Service Efficiency ◽

Number Portability ◽

Prediction Techniques ◽

The Cost ◽

Potential Customers

Mobile number portability (MNP) across telecommunication networks entails the movement of a customer from one mobile service provider to another. This, often, is as a result of seeking better service delivery or personal choice. Churning prediction techniques seek to predict customers tending to churn and allow for improved customer sustenance campaigns and the cost therein through an improved service efficiency to customer. In this paper, MNP predicting model using integrated kernel logistic regression (integrated-KLR) is proposed. The Integrated-KLR is a combination of kernel logistic regression and expectation-maximization clustering which helps in proactively detecting potential customers before defection. The proposed approach was evaluated with five others, mostly used algorithms: SOM, MLP, Naïve Bayes, RF, J48. The proposed iKLR outperforms the other algorithms with ROC and PRC of 0.856 and 0.650, respectively.

Download Full-text