Logistic Regression in Rare Events Data

We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quarter-million dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99% of their (nonfixed) data collection costs or to collect much more meaningful explanatory variables. We provide methods that link these two results, enabling both types of corrections to work simultaneously, and software that implements the methods developed.

Download Full-text

Explaining Rare Events in International Relations

International Organization ◽

10.1162/00208180152507597 ◽

2001 ◽

Vol 55 (3) ◽

pp. 693-715 ◽

Cited By ~ 398

Author(s):

Gary King ◽

Langche Zeng

Keyword(s):

Data Collection ◽

International Conflict ◽

Rare Events ◽

Explanatory Variables ◽

Relative Risks ◽

Efficient Sampling ◽

Dependent Variables ◽

Data Collections ◽

Events Data ◽

Collection Strategies

Some of the most important phenomena in international conflict are coded as “rare events”: binary dependent variables with dozens to thousands of times fewer events, such as wars and coups, than “nonevents.” Unfortunately, rare events data are difficult to explain and predict, a problem stemming from at least two sources. First, and most important, the data-collection strategies used in international conflict studies are grossly inefficient. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (wars, for example) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99 percent of their (nonfixed) data-collection costs or to collect much more meaningful explanatory variables. Second, logistic regression, and other commonly used statistical procedures, can underestimate the probability of rare events. We introduce some corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. We also provide easy-to-use methods and software that link these two results, enabling both types of corrections to work simultaneously.

Download Full-text

Weighted logistic regression for large-scale imbalanced and rare events data

Knowledge-Based Systems ◽

10.1016/j.knosys.2014.01.012 ◽

2014 ◽

Vol 59 ◽

pp. 142-148 ◽

Cited By ~ 24

Author(s):

Maher Maalouf ◽

Mohammad Siddiqi

Keyword(s):

Logistic Regression ◽

Large Scale ◽

Rare Events ◽

Weighted Logistic Regression ◽

Events Data

Download Full-text

Logistic Regression in Rare Events Data

Journal of Statistical Software ◽

10.18637/jss.v008.i02 ◽

2003 ◽

Vol 8 (2) ◽

Cited By ~ 25

Author(s):

Gary King ◽

Langche Zeng

Keyword(s):

Logistic Regression ◽

Rare Events ◽

Events Data

Download Full-text

A Comparative Study of the Bias Correction Methods for Differential Item Functioning Analysis in Logistic Regression with Rare Events Data

BioMed Research International ◽

10.1155/2020/1632350 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Marjan Faghih ◽

Zahra Bagheri ◽

Dejan Stevanovic ◽

Seyyed Mohhamad Taghi Ayatollahi ◽

Peyman Jafari

Keyword(s):

Logistic Regression ◽

Maximum Likelihood ◽

Sample Size ◽

Differential Item Functioning ◽

Bias Correction ◽

Rare Events ◽

Estimation Methods ◽

Type I ◽

Item Functioning ◽

Events Data

The logistic regression (LR) model for assessing differential item functioning (DIF) is highly dependent on the asymptotic sampling distributions. However, for rare events data, the maximum likelihood estimation method may be biased and the asymptotic distributions may not be reliable. In this study, the performance of the regular maximum likelihood (ML) estimation is compared with two bias correction methods including weighted logistic regression (WLR) and Firth's penalized maximum likelihood (PML) to assess DIF for imbalanced or rare events data. The power and type I error rate of the LR model for detecting DIF were investigated under different combinations of sample size, moderate and severe magnitudes of uniform DIF (DIF = 0.4 and 0.8), sample size ratio, number of items, and the imbalanced degree (τ). Indeed, as compared with WLR and for severe imbalanced degree (τ = 0.069), there were reductions of approximately 30% and 24% under DIF = 0.4 and 27% and 23% under DIF = 0.8 in the power of the PML and ML, respectively. The present study revealed that the WLR outperforms both the ML and PML estimation methods when logistic regression is used to evaluate DIF for imbalanced or rare events data.

Download Full-text

Estimating group fixed effects in panel data with a binary dependent variable: How the LPM outperforms logistic regression in rare events data

Social Science Research ◽

10.1016/j.ssresearch.2020.102486 ◽

2021 ◽

Vol 93 ◽

pp. 102486

Author(s):

Joan C. Timoneda

Keyword(s):

Logistic Regression ◽

Panel Data ◽

Fixed Effects ◽

Rare Events ◽

Binary Dependent Variable ◽

Events Data

Download Full-text

Robust weighted kernel logistic regression in imbalanced and rare events data

Computational Statistics & Data Analysis ◽

10.1016/j.csda.2010.06.014 ◽

2011 ◽

Vol 55 (1) ◽

pp. 168-183 ◽

Cited By ~ 63

Author(s):

Maher Maalouf ◽

Theodore B. Trafalis

Keyword(s):

Logistic Regression ◽

Rare Events ◽

Kernel Logistic Regression ◽

Weighted Kernel ◽

Events Data

Download Full-text

Distributed and scalable platform architecture for smart cities complex events data collection: Covid19 pandemic use case

Journal of Ambient Intelligence and Humanized Computing ◽

10.1007/s12652-020-02852-9 ◽

2021 ◽

Author(s):

Wadii Basmi ◽

Azedine Boulmakoul ◽

Lamia Karim ◽

Ahmed Lbath

Keyword(s):

Data Collection ◽

Smart Cities ◽

Use Case ◽

Platform Architecture ◽

Complex Events ◽

Events Data

Download Full-text

The Establishment of a Dominant Interpretive Framework in Language Intervention

Language Speech and Hearing Services in Schools ◽

10.1044/0161-1461.2803.288 ◽

1997 ◽

Vol 28 (3) ◽

pp. 288-296 ◽

Cited By ~ 11

Author(s):

Jack S. Damico ◽

Sandra K. Damico

Keyword(s):

School Culture ◽

Data Collection ◽

Language Intervention ◽

Speech Language Pathologists ◽

Therapeutic Discourse ◽

Therapeutic Encounter ◽

Collection Strategies ◽

Patterns Of Interaction ◽

Interpretive Framework ◽

The Way

One aspect of therapeutic discourse that has not been fully investigated in language intervention is the way that interactional dominance is established and maintained within the therapeutic encounter. Using various data collection strategies, therapeutic discourse from 10 language intervention sessions was collected and analyzed. By employing an analytic device known as the "dominant interpretive framework," the interactional styles and strategies of two speech-language pathologists were investigated. Data revealed several systematic patterns of interaction that constrained the ranges of interaction between the clinician and the client. Several implications regarding client empowerment, mediation, and assimilation into the school culture are discussed.

Download Full-text

Promoting community engagement in a pre-registration nursing programme: a qualitative study of student experiences

British Journal of Nursing ◽

10.12968/bjon.2021.30.20.1190 ◽

2021 ◽

Vol 30 (20) ◽

pp. 1190-1197

Author(s):

Pam Hodge ◽

Nora Cooper ◽

Brian P Richardson

Keyword(s):

Data Collection ◽

Community Engagement ◽

Qualitative Data ◽

Learning Experience ◽

Student Nurses ◽

Risk Assessments ◽

Student Experiences ◽

Cross Sectional ◽

Health And Wellbeing ◽

Collection Strategies

Aims: To offer child health student nurses a broader learning experience in practice with an autonomous choice of a volunteer placement area. To reflect the changing nature of health care and the move of care closer to home in the placement experience. To evaluate participants' experiences. Design: This study used descriptive and interpretative methods of qualitative data collection. This successive cross-sectional data collection ran from 2017 to 2020. All data were thematically analysed using Braun and Clarke's model. Methods: Data collection strategies included two focus groups (n=14) and written reflections (n=19). Results: Students identified their increased confidence, development as a professional, wider learning and community engagement. They also appreciated the relief from formal assessment of practice and the chance to focus on the experience. Conclusion: Students positively evaluated this experience, reporting a wider understanding of health and wellbeing in the community. Consideration needs to be given to risk assessments in the areas students undertake the placements and the embedding of the experience into the overall curriculum.

Download Full-text

Success and failure: a case study of two rural telemedicine projects

Journal of Telemedicine and Telecare ◽

10.1258/135763303767149906 ◽

2003 ◽

Vol 9 (3) ◽

pp. 125-129 ◽

Cited By ~ 26

Author(s):

Pamela Whitten ◽

Inez Adams

Keyword(s):

Data Collection ◽

Organizational Structure ◽

Health Organizations ◽

Telemedicine Application ◽

Clinical Utilization ◽

Steady Growth ◽

Multiple Data ◽

One Year ◽

Collection Strategies

We studied two rural telemedicine projects in the state of Michigan: one that enjoyed success and steady growth in activity, and one that experienced frustration and a lack of clinical utilization. Multiple data collection strategies were employed during study periods, which lasted approximately one year. Both projects enjoyed a grassroots approach and had dedicated project coordinators. However, the more successful project benefited from resources and expertise not available to the less successful project. In addition, the more successful project possessed a more formalized organizational structure for the telemedicine application. A comparison of the two projects leads to a simple conclusion. Telemedicine programmes are positioned within larger health organizations and do not operate in a vacuum. It is crucial that the organization in which it is intended to launch telemedicine is examined carefully first. Each organization operates within a larger environment, which is often constrained by fiscal, geographical and personnel factors. All these will affect the introduction of telemedicine.

Download Full-text