scholarly journals Robust weighted kernel logistic regression in imbalanced and rare events data

2011 ◽  
Vol 55 (1) ◽  
pp. 168-183 ◽  
Author(s):  
Maher Maalouf ◽  
Theodore B. Trafalis
2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Marjan Faghih ◽  
Zahra Bagheri ◽  
Dejan Stevanovic ◽  
Seyyed Mohhamad Taghi Ayatollahi ◽  
Peyman Jafari

The logistic regression (LR) model for assessing differential item functioning (DIF) is highly dependent on the asymptotic sampling distributions. However, for rare events data, the maximum likelihood estimation method may be biased and the asymptotic distributions may not be reliable. In this study, the performance of the regular maximum likelihood (ML) estimation is compared with two bias correction methods including weighted logistic regression (WLR) and Firth's penalized maximum likelihood (PML) to assess DIF for imbalanced or rare events data. The power and type I error rate of the LR model for detecting DIF were investigated under different combinations of sample size, moderate and severe magnitudes of uniform DIF (DIF = 0.4 and 0.8), sample size ratio, number of items, and the imbalanced degree (τ). Indeed, as compared with WLR and for severe imbalanced degree (τ = 0.069), there were reductions of approximately 30% and 24% under DIF = 0.4 and 27% and 23% under DIF = 0.8 in the power of the PML and ML, respectively. The present study revealed that the WLR outperforms both the ML and PML estimation methods when logistic regression is used to evaluate DIF for imbalanced or rare events data.


2001 ◽  
Vol 9 (2) ◽  
pp. 137-163 ◽  
Author(s):  
Gary King ◽  
Langche Zeng

We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quarter-million dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99% of their (nonfixed) data collection costs or to collect much more meaningful explanatory variables. We provide methods that link these two results, enabling both types of corrections to work simultaneously, and software that implements the methods developed.


2013 ◽  
Vol 2013 ◽  
pp. 1-6 ◽  
Author(s):  
Ahmed A. M. Hamed ◽  
Renfa Li ◽  
Zhang Xiaoming ◽  
Cheng Xu

Due to the widening semantic gap of videos, computational tools to classify these videos into different genre are highly needed to narrow it. Classifying videos accurately demands good representation of video data and an efficient and effective model to carry out the classification task. Kernel Logistic Regression (KLR), kernel version of logistic regression (LR), proves its efficiency as a classifier, which can naturally provide probabilities and extend to multiclass classification problems. In this paper, Weighted Kernel Logistic Regression (WKLR) algorithm is implemented for video genre classification to obtain significant accuracy, and it shows accurate and faster good results.


2021 ◽  
Vol 17 (3) ◽  
pp. 50-62
Author(s):  
Ayodeji Samuel Makinde ◽  
Abayomi O. Agbeyangi ◽  
Wilson Nwankwo

Mobile number portability (MNP) across telecommunication networks entails the movement of a customer from one mobile service provider to another. This, often, is as a result of seeking better service delivery or personal choice. Churning prediction techniques seek to predict customers tending to churn and allow for improved customer sustenance campaigns and the cost therein through an improved service efficiency to customer. In this paper, MNP predicting model using integrated kernel logistic regression (integrated-KLR) is proposed. The Integrated-KLR is a combination of kernel logistic regression and expectation-maximization clustering which helps in proactively detecting potential customers before defection. The proposed approach was evaluated with five others, mostly used algorithms: SOM, MLP, Naïve Bayes, RF, J48. The proposed iKLR outperforms the other algorithms with ROC and PRC of 0.856 and 0.650, respectively.


Sign in / Sign up

Export Citation Format

Share Document