Explaining Rare Events in International Relations

Some of the most important phenomena in international conflict are coded as “rare events”: binary dependent variables with dozens to thousands of times fewer events, such as wars and coups, than “nonevents.” Unfortunately, rare events data are difficult to explain and predict, a problem stemming from at least two sources. First, and most important, the data-collection strategies used in international conflict studies are grossly inefficient. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (wars, for example) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99 percent of their (nonfixed) data-collection costs or to collect much more meaningful explanatory variables. Second, logistic regression, and other commonly used statistical procedures, can underestimate the probability of rare events. We introduce some corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. We also provide easy-to-use methods and software that link these two results, enabling both types of corrections to work simultaneously.

Download Full-text

Logistic Regression in Rare Events Data

Political Analysis ◽

10.1093/oxfordjournals.pan.a004868 ◽

2001 ◽

Vol 9 (2) ◽

pp. 137-163 ◽

Cited By ~ 1740

Author(s):

Gary King ◽

Langche Zeng

Keyword(s):

Logistic Regression ◽

Data Collection ◽

Rare Events ◽

Explanatory Variables ◽

Relative Risks ◽

Efficient Sampling ◽

Dependent Variables ◽

Data Collections ◽

Events Data ◽

Collection Strategies

We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quarter-million dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99% of their (nonfixed) data collection costs or to collect much more meaningful explanatory variables. We provide methods that link these two results, enabling both types of corrections to work simultaneously, and software that implements the methods developed.

Download Full-text

An Automated Information Extraction Tool for International Conflict Data with Performance as Good as Human Coders: A Rare Events Evaluation Design

International Organization ◽

10.1017/s0020818303573064 ◽

2003 ◽

Vol 57 (3) ◽

pp. 617-642 ◽

Cited By ~ 183

Author(s):

Gary King ◽

Will Lowe

Keyword(s):

International Relations ◽

Information Extraction ◽

Computational Linguistics ◽

Large Scale ◽

International Conflict ◽

Rare Events ◽

News Stories ◽

Conflict And Cooperation ◽

The Past ◽

Data Collections

Despite widespread recognition that aggregated summary statistics on international conflict and cooperation miss most of the complex interactions among nations, the vast majority of scholars continue to employ annual, quarterly, or (occasionally) monthly observations. Daily events data, coded from some of the huge volume of news stories produced by journalists, have not been used much for the past two decades. We offer some reason to change this practice, which we feel should lead to considerably increased use of these data. We address advances in event categorization schemes and software programs that automatically produce data by “reading” news stories without human coders. We design a method that makes it feasible, for the first time, to evaluate these programs when they are applied in areas with the particular characteristics of international conflict and cooperation data, namely event categories with highly unequal prevalences, and where rare events (such as highly conflictual actions) are of special interest. We use this rare events design to evaluate one existing program, and find it to be as good as trained human coders, but obviously far less expensive to use. For large-scale data collections, the program dominates human coding. Our new evaluative method should be of use in international relations, as well as more generally in the field of computational linguistics, for evaluating other automated information extraction tools. We believe that the data created by programs similar to the one we evaluated should see dramatically increased use in international relations research. To facilitate this process, we are releasing with this article data on 3.7 million international events, covering the entire world for the past decade.

Download Full-text

Distributed and scalable platform architecture for smart cities complex events data collection: Covid19 pandemic use case

Journal of Ambient Intelligence and Humanized Computing ◽

10.1007/s12652-020-02852-9 ◽

2021 ◽

Author(s):

Wadii Basmi ◽

Azedine Boulmakoul ◽

Lamia Karim ◽

Ahmed Lbath

Keyword(s):

Data Collection ◽

Smart Cities ◽

Use Case ◽

Platform Architecture ◽

Complex Events ◽

Events Data

Download Full-text

The Establishment of a Dominant Interpretive Framework in Language Intervention

Language Speech and Hearing Services in Schools ◽

10.1044/0161-1461.2803.288 ◽

1997 ◽

Vol 28 (3) ◽

pp. 288-296 ◽

Cited By ~ 11

Author(s):

Jack S. Damico ◽

Sandra K. Damico

Keyword(s):

School Culture ◽

Data Collection ◽

Language Intervention ◽

Speech Language Pathologists ◽

Therapeutic Discourse ◽

Therapeutic Encounter ◽

Collection Strategies ◽

Patterns Of Interaction ◽

Interpretive Framework ◽

The Way

One aspect of therapeutic discourse that has not been fully investigated in language intervention is the way that interactional dominance is established and maintained within the therapeutic encounter. Using various data collection strategies, therapeutic discourse from 10 language intervention sessions was collected and analyzed. By employing an analytic device known as the "dominant interpretive framework," the interactional styles and strategies of two speech-language pathologists were investigated. Data revealed several systematic patterns of interaction that constrained the ranges of interaction between the clinician and the client. Several implications regarding client empowerment, mediation, and assimilation into the school culture are discussed.

Download Full-text

Promoting community engagement in a pre-registration nursing programme: a qualitative study of student experiences

British Journal of Nursing ◽

10.12968/bjon.2021.30.20.1190 ◽

2021 ◽

Vol 30 (20) ◽

pp. 1190-1197

Author(s):

Pam Hodge ◽

Nora Cooper ◽

Brian P Richardson

Keyword(s):

Data Collection ◽

Community Engagement ◽

Qualitative Data ◽

Learning Experience ◽

Student Nurses ◽

Risk Assessments ◽

Student Experiences ◽

Cross Sectional ◽

Health And Wellbeing ◽

Collection Strategies

Aims: To offer child health student nurses a broader learning experience in practice with an autonomous choice of a volunteer placement area. To reflect the changing nature of health care and the move of care closer to home in the placement experience. To evaluate participants' experiences. Design: This study used descriptive and interpretative methods of qualitative data collection. This successive cross-sectional data collection ran from 2017 to 2020. All data were thematically analysed using Braun and Clarke's model. Methods: Data collection strategies included two focus groups (n=14) and written reflections (n=19). Results: Students identified their increased confidence, development as a professional, wider learning and community engagement. They also appreciated the relief from formal assessment of practice and the chance to focus on the experience. Conclusion: Students positively evaluated this experience, reporting a wider understanding of health and wellbeing in the community. Consideration needs to be given to risk assessments in the areas students undertake the placements and the embedding of the experience into the overall curriculum.

Download Full-text

Success and failure: a case study of two rural telemedicine projects

Journal of Telemedicine and Telecare ◽

10.1258/135763303767149906 ◽

2003 ◽

Vol 9 (3) ◽

pp. 125-129 ◽

Cited By ~ 26

Author(s):

Pamela Whitten ◽

Inez Adams

Keyword(s):

Data Collection ◽

Organizational Structure ◽

Health Organizations ◽

Telemedicine Application ◽

Clinical Utilization ◽

Steady Growth ◽

Multiple Data ◽

One Year ◽

Collection Strategies

We studied two rural telemedicine projects in the state of Michigan: one that enjoyed success and steady growth in activity, and one that experienced frustration and a lack of clinical utilization. Multiple data collection strategies were employed during study periods, which lasted approximately one year. Both projects enjoyed a grassroots approach and had dedicated project coordinators. However, the more successful project benefited from resources and expertise not available to the less successful project. In addition, the more successful project possessed a more formalized organizational structure for the telemedicine application. A comparison of the two projects leads to a simple conclusion. Telemedicine programmes are positioned within larger health organizations and do not operate in a vacuum. It is crucial that the organization in which it is intended to launch telemedicine is examined carefully first. Each organization operates within a larger environment, which is often constrained by fiscal, geographical and personnel factors. All these will affect the introduction of telemedicine.

Download Full-text

ADaCS: A Tool for Analysing Data Collection Strategies

Computer Performance Engineering - Lecture Notes in Computer Science ◽

10.1007/978-3-319-66583-2_15 ◽

2017 ◽

pp. 230-245

Author(s):

John C. Mace ◽

Nipun Thekkummal ◽

Charles Morisset ◽

Aad Van Moorsel

Keyword(s):

Data Collection ◽

Collection Strategies

Download Full-text

Same Evidences, Different Interpretations – A Comparison of the Conflict Index between the Interstate Dyadic Events Data and Militarized Interstate Disputes Data in Peace-Conflict Models

Peace Economics Peace Science and Public Policy ◽

10.1515/peps-2013-0061 ◽

2014 ◽

Vol 20 (2) ◽

pp. 347-372

Author(s):

Scott Y. Lin ◽

Carlos Seiglie

Keyword(s):

Regression Analysis ◽

International Conflict ◽

Empirical Work ◽

Interstate Disputes ◽

Definition Of ◽

Events Data ◽

Militarized Interstate Disputes

AbstractStudying the determinants of international conflict, researchers have found a series of influential variables, but few have addressed the robustness of the results to changes in the definition of the dependent variable, conflict. The two main sources for operationalizing conflict in empirical work are data on militarized interstate disputes (MIDs) and events data. In this paper, we find that a χ2-test indicates a correlation between events data and MIDs data. However, detailed regression analysis indicates that there are some contradictory findings depending on whether we use events data as opposed to MIDs data to measure conflict.

Download Full-text

The Yusuf-Peto method was not a robust method for meta-analyses of rare events data from antidepressant trials

Journal of Clinical Epidemiology ◽

10.1016/j.jclinepi.2017.07.006 ◽

2017 ◽

Vol 91 ◽

pp. 129-136 ◽

Cited By ~ 9

Author(s):

Tarang Sharma ◽

Peter C. Gøtzsche ◽

Oliver Kuss

Keyword(s):

Rare Events ◽

Robust Method ◽

Meta Analyses ◽

Events Data

Download Full-text

Information content of data with respect to models

AJP Regulatory Integrative and Comparative Physiology ◽

10.1152/ajpregu.1983.245.5.r620 ◽

1983 ◽

Vol 245 (5) ◽

pp. R620-R623

Author(s):

M. Berman ◽

P. Van Eerdewegh

Keyword(s):

Experimental Data ◽

Data Collection ◽

Information Content ◽

Mathematical Framework ◽

Statistical Measures ◽

Parameter Values ◽

Collection Strategies

A measure is proposed for the information content of data with respect to models. A model, defined by a set of parameter values in a mathematical framework, is considered a point in a hyperspace. The proposed measure expresses the information content of experimental data as the contribution they make, in units of information bits, in defining a model to within a desired region of the hyperspace. This measure is then normalized to conventional statistical measures of uncertainty. It is shown how the measure can be used to estimate the information of newly planned experiments and help in decisions on data collection strategies.

Download Full-text