scholarly journals Logistic Regression in Rare Events Data

2001 ◽  
Vol 9 (2) ◽  
pp. 137-163 ◽  
Author(s):  
Gary King ◽  
Langche Zeng

We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quarter-million dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99% of their (nonfixed) data collection costs or to collect much more meaningful explanatory variables. We provide methods that link these two results, enabling both types of corrections to work simultaneously, and software that implements the methods developed.

2001 ◽  
Vol 55 (3) ◽  
pp. 693-715 ◽  
Author(s):  
Gary King ◽  
Langche Zeng

Some of the most important phenomena in international conflict are coded as “rare events”: binary dependent variables with dozens to thousands of times fewer events, such as wars and coups, than “nonevents.” Unfortunately, rare events data are difficult to explain and predict, a problem stemming from at least two sources. First, and most important, the data-collection strategies used in international conflict studies are grossly inefficient. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (wars, for example) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99 percent of their (nonfixed) data-collection costs or to collect much more meaningful explanatory variables. Second, logistic regression, and other commonly used statistical procedures, can underestimate the probability of rare events. We introduce some corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. We also provide easy-to-use methods and software that link these two results, enabling both types of corrections to work simultaneously.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Marjan Faghih ◽  
Zahra Bagheri ◽  
Dejan Stevanovic ◽  
Seyyed Mohhamad Taghi Ayatollahi ◽  
Peyman Jafari

The logistic regression (LR) model for assessing differential item functioning (DIF) is highly dependent on the asymptotic sampling distributions. However, for rare events data, the maximum likelihood estimation method may be biased and the asymptotic distributions may not be reliable. In this study, the performance of the regular maximum likelihood (ML) estimation is compared with two bias correction methods including weighted logistic regression (WLR) and Firth's penalized maximum likelihood (PML) to assess DIF for imbalanced or rare events data. The power and type I error rate of the LR model for detecting DIF were investigated under different combinations of sample size, moderate and severe magnitudes of uniform DIF (DIF = 0.4 and 0.8), sample size ratio, number of items, and the imbalanced degree (τ). Indeed, as compared with WLR and for severe imbalanced degree (τ = 0.069), there were reductions of approximately 30% and 24% under DIF = 0.4 and 27% and 23% under DIF = 0.8 in the power of the PML and ML, respectively. The present study revealed that the WLR outperforms both the ML and PML estimation methods when logistic regression is used to evaluate DIF for imbalanced or rare events data.


1997 ◽  
Vol 28 (3) ◽  
pp. 288-296 ◽  
Author(s):  
Jack S. Damico ◽  
Sandra K. Damico

One aspect of therapeutic discourse that has not been fully investigated in language intervention is the way that interactional dominance is established and maintained within the therapeutic encounter. Using various data collection strategies, therapeutic discourse from 10 language intervention sessions was collected and analyzed. By employing an analytic device known as the "dominant interpretive framework," the interactional styles and strategies of two speech-language pathologists were investigated. Data revealed several systematic patterns of interaction that constrained the ranges of interaction between the clinician and the client. Several implications regarding client empowerment, mediation, and assimilation into the school culture are discussed.


2021 ◽  
Vol 30 (20) ◽  
pp. 1190-1197
Author(s):  
Pam Hodge ◽  
Nora Cooper ◽  
Brian P Richardson

Aims: To offer child health student nurses a broader learning experience in practice with an autonomous choice of a volunteer placement area. To reflect the changing nature of health care and the move of care closer to home in the placement experience. To evaluate participants' experiences. Design: This study used descriptive and interpretative methods of qualitative data collection. This successive cross-sectional data collection ran from 2017 to 2020. All data were thematically analysed using Braun and Clarke's model. Methods: Data collection strategies included two focus groups (n=14) and written reflections (n=19). Results: Students identified their increased confidence, development as a professional, wider learning and community engagement. They also appreciated the relief from formal assessment of practice and the chance to focus on the experience. Conclusion: Students positively evaluated this experience, reporting a wider understanding of health and wellbeing in the community. Consideration needs to be given to risk assessments in the areas students undertake the placements and the embedding of the experience into the overall curriculum.


2003 ◽  
Vol 9 (3) ◽  
pp. 125-129 ◽  
Author(s):  
Pamela Whitten ◽  
Inez Adams

We studied two rural telemedicine projects in the state of Michigan: one that enjoyed success and steady growth in activity, and one that experienced frustration and a lack of clinical utilization. Multiple data collection strategies were employed during study periods, which lasted approximately one year. Both projects enjoyed a grassroots approach and had dedicated project coordinators. However, the more successful project benefited from resources and expertise not available to the less successful project. In addition, the more successful project possessed a more formalized organizational structure for the telemedicine application. A comparison of the two projects leads to a simple conclusion. Telemedicine programmes are positioned within larger health organizations and do not operate in a vacuum. It is crucial that the organization in which it is intended to launch telemedicine is examined carefully first. Each organization operates within a larger environment, which is often constrained by fiscal, geographical and personnel factors. All these will affect the introduction of telemedicine.


Sign in / Sign up

Export Citation Format

Share Document