Combining Aggregate Data and Exit Polls for the Estimation of Voter Transitions

Our objective is the estimation of voter transitions between two consecutive parliamentary elections. Usually, such analyses have been based either on individual survey data or on aggregated data. To move beyond these methods and their respective problems, we propose the application of so-called hybrid models, which combine aggregate and individual data. We use a Bayesian approach and extend a multinomial-Dirichlet model proposed in the ecological inference literature. Our new hybrid model has been implemented in the R-package eiwild (= Ecological Inference with individual-level data). Based on extensive simulations, we are able to show that our new estimator exhibits a very good estimation performance in many realistic scenarios. Application case is the voter transition between the Bavarian Regional election and the German federal elections 2013 in the Metropolitan City of Munich. Our approach is also applicable to other areas of electoral research, market research, and epidemiology.

Download Full-text

Meffil: efficient normalisation and analysis of very large DNA methylation samples

10.1101/125963 ◽

2017 ◽

Cited By ~ 17

Author(s):

Josine Min ◽

Gibran Hemani ◽

George Davey Smith ◽

Caroline Relton ◽

Matthew Suderman

Keyword(s):

Dna Methylation ◽

Association Studies ◽

R Package ◽

Individual Level ◽

Technological Advances ◽

Level Data ◽

Fixed And Random Effects ◽

R Packages ◽

Meta Analyses ◽

Dramatic Growth

AbstractBackgroundTechnological advances in high throughput DNA methylation microarrays have allowed dramatic growth of a new branch of epigenetic epidemiology. DNA methylation datasets are growing ever larger in terms of the number of samples profiled, the extent of genome coverage, and the number of studies being meta-analysed. Novel computational solutions are required to efficiently handle these data.MethodsWe have developed meffil, an R package designed to quality control, normalize and perform epigenome-wide association studies (EWAS) efficiently on large samples of Illumina Infinium HumanMethylation450 and MethylationEPIC BeadChip microarrays. We tested meffil by applying it to 6000 450k microarrays generated from blood collected for two different datasets, Accessible Resource for Integrative Epigenomic Studies (ARIES) and The Genetics of Overweight Young Adults (GOYA) study.ResultsA complete reimplementation of functional normalization minimizes computational memory requirements to 5% of that required by other R packages, without increasing running time. Incorporating fixed and random effects alongside functional normalization, and automated estimation of functional normalisation parameters reduces technical variation in DNA methylation levels, thus reducing false positive associations and improving power. We also demonstrate that the ability to normalize datasets distributed across physically different locations without sharing any biologically-based individual-level data may reduce heterogeneity in meta-analyses of epigenome-wide association studies. However, we show that when batch is perfectly confounded with cases and controls functional normalization is unable to prevent spurious associations.Conclusionsmeffil is available online (https://github.com/perishky/meffil/) along with tutorials covering typical use cases.

Download Full-text

Persistent Legacies of the Empires: Partition of Poland and Electoral Turnout

East European Politics and Societies and Cultures ◽

10.1177/0888325420907678 ◽

2020 ◽

pp. 088832542090767

Author(s):

Piotr Zagórski ◽

Radosław Markowski

Keyword(s):

Religious Service Attendance ◽

Parliamentary Elections ◽

Individual Level ◽

National Election Study ◽

Level Data ◽

Electoral Turnout ◽

Cultural Legacies ◽

Socio Demographic Factors ◽

The Impact ◽

Service Attendance

During the long nineteenth century, Poland was divided among the Russian, Habsburg, and Prussian empires. The partition produced regional diversity in political culture and in institutional and economic development. We examine how the cultural legacies of the empires have influenced the propensity of Poles to cast a ballot in parliamentary elections since 1989. Polish National Election Study individual-level data are used to assess whether higher levels of electoral turnout in Galicia are indeed a legacy of the Habsburg rule. Our results confirm that, even after controlling for socio-demographic factors, there is a positive, substantive, and significant effect on turnout of living in the ex-Habsburg part of Poland. This effect can be explained by the frequency of religious service attendance and by ideology. Inhabitants of Galicia not only attend religious services more frequently and are more conservative than their counterparts in the rest of Poland, but also the more frequently they attend church and the closer to the radical right they place themselves, the more mobilized they are to vote. The impact of the legacies of the empires on political behavior in Poland seems persistent.

Download Full-text

PheWAS-ME: a web-app for interactive exploration of multimorbidity patterns in PheWAS

Bioinformatics ◽

10.1093/bioinformatics/btaa870 ◽

2020 ◽

Author(s):

Nick Strayer ◽

Jana K Shirey-Rice ◽

Yu Shyr ◽

Joshua C Denny ◽

Jill M Pulley ◽

...

Keyword(s):

Genetic Variant ◽

Statistical Tests ◽

R Package ◽

Supplementary Information ◽

Health Records ◽

Individual Level ◽

Level Data ◽

Phenotype Data ◽

Tests Of Association ◽

Web App

Abstract Summary Electronic health records (EHRs) linked with a DNA biobank provide unprecedented opportunities for biomedical research in precision medicine. The Phenome-wide association study (PheWAS) is a widely used technique for the evaluation of relationships between genetic variants and a large collection of clinical phenotypes recorded in EHRs. PheWAS analyses are typically presented as static tables and charts of summary statistics obtained from statistical tests of association between a genetic variant and individual phenotypes. Comorbidities are common and typically lead to complex, multivariate gene–disease association signals that are challenging to interpret. Discovering and interrogating multimorbidity patterns and their influence in PheWAS is difficult and time-consuming. We present PheWAS-ME: an interactive dashboard to visualize individual-level genotype and phenotype data side-by-side with PheWAS analysis results, allowing researchers to explore multimorbidity patterns and their associations with a genetic variant of interest. We expect this application to enrich PheWAS analyses by illuminating clinical multimorbidity patterns present in the data. Availability and implementation A demo PheWAS-ME application is publicly available at https://prod.tbilab.org/phewas_me/. Sample datasets are provided for exploration with the option to upload custom PheWAS results and corresponding individual-level data. Online versions of the appendices are available at https://prod.tbilab.org/phewas_me_info/. The source code is available as an R package on GitHub (https://github.com/tbilab/multimorbidity_explorer). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Summix: A method for detecting and adjusting for population structure in genetic summary data

10.1101/2021.02.03.429446 ◽

2021 ◽

Author(s):

IS Arriaga-MacKenzie ◽

G Matesi ◽

S Chen ◽

A Ronco ◽

KM Marker ◽

...

Keyword(s):

Population Structure ◽

South Asian ◽

R Package ◽

Individual Level ◽

Level Data ◽

Causal Variants ◽

High Utility ◽

Ancestry Proportions ◽

Summary Data ◽

Reference Samples

AbstractPublicly available genetic summary data have high utility in research and the clinic including prioritizing putative causal variants, polygenic scoring, and leveraging common controls. However, summarizing individual-level data can mask population structure resulting in confounding, reduced power, and incorrect prioritization of putative causal variants. This limits the utility of publicly available data, especially for understudied or admixed populations where additional research and resources are most needed. While several methods exist to estimate ancestry in individual-level data, methods to estimate ancestry proportions in summary data are lacking. Here, we present Summix, a method to efficiently deconvolute ancestry and provide ancestry-adjusted allele frequencies from summary data. Using continental reference ancestry, African (AFR), Non-Finnish European (EUR), East Asian (EAS), Indigenous American (IAM), South Asian (SAS), we obtain accurate and precise estimates (within 0.1%) for all simulation scenarios. We apply Summix to gnomAD v2.1 exome and genome groups and subgroups finding heterogeneous continental ancestry for several groups including African/African American (∼84% AFR, ∼14% EUR) and American/Latinx (∼4% AFR, ∼5% EAS, ∼43% EUR, ∼46% IAM). Compared to the unadjusted gnomAD AFs, Summix’s ancestry-adjusted AFs more closely match respective African and Latinx reference samples. Even on modern, dense panels of summary statistics, Summix yields results in seconds allowing for estimation of confidence intervals via block bootstrap. Given an accompanying R package, Summix increases the utility and equity of public genetic resources, empowering novel research opportunities.

Download Full-text

Ignoramus, Ignorabimus? On Uncertainty in Ecological Inference

Political Analysis ◽

10.1093/pan/mpm030 ◽

2007 ◽

Vol 16 (1) ◽

pp. 70-92 ◽

Cited By ~ 12

Author(s):

Martin Elff ◽

Thomas Gschwend ◽

Ron J. Johnston

Keyword(s):

New Zealand ◽

Maximum Entropy ◽

Prediction Intervals ◽

Ecological Inference ◽

Entropy Model ◽

Individual Level ◽

Level Data ◽

Data Generating Process ◽

The Individual ◽

True Values

Models of ecological inference (EI) have to rely on crucial assumptions about the individual-level data-generating process, which cannot be tested because of the unavailability of these data. However, these assumptions may be violated by the unknown data and this may lead to serious bias of estimates and predictions. The amount of bias, however, cannot be assessed without information that is unavailable in typical applications of EI. We therefore construct a model that at least approximately accounts for the additional, nonsampling error that may result from possible bias incurred by an EI procedure, a model that builds on the Principle of Maximum Entropy. By means of a systematic simulation experiment, we examine the performance of prediction intervals based on this second-stage Maximum Entropy model. The results of this simulation study suggest that these prediction intervals are at least approximately correct if all possible configurations of the unknown data are taken into account. Finally, we apply our method to a real-world example, where we actually know the true values and are able to assess the performance of our method: the prediction of district-level percentages of split-ticket voting in the 1996 General Election of New Zealand. It turns out that in 95.5% of the New Zealand voting districts, the actual percentage of split-ticket votes lies inside the 95% prediction intervals constructed by our method.

Download Full-text

Owning protest but sharing distrust? Confidence in the political system and anti-political-establishment party choice in the Finnish 2011 parliamentary elections

Finnish Journal of Social Research ◽

10.51815/fjsr.110721 ◽

2014 ◽

Vol 7 ◽

pp. 21-35

Author(s):

Maria Bäck ◽

Elina Kestilä-Kekkonen

Keyword(s):

Political System ◽

Political Trust ◽

Parliamentary Elections ◽

Individual Level ◽

Party Choice ◽

National Election Study ◽

Level Data ◽

Party Preference ◽

Political Distrust ◽

Logistic Regressions

In this study we explore to what extent did anti-political-establishment voting mobilized manifest political distrust in the 2011 Finnish parliamentary elections. In particular, we seek to determine whether the channels of manifest political distrust vary for different forms of political trust. Individual-level data from the Finnish National Election Study (FNES 2011, N = 1,268) is analyzed by applying multinomial logistic regressions. The results show that antipolitical-establishment voting effectively channels both specific and diffuse political distrust, but this dissatisfaction is not reflected as anti-incumbency voting. Furthermore, it seems that a significant amount of latent political distrust, which is not explicitly expressed by party preference at electoral polls, exists in the electorates of several governmental and opposition parties.

Download Full-text

Software Application Profile: SUMnlmr, an R package that facilitates flexible and reproducible non-linear Mendelian randomisation analyses

10.1101/2021.12.10.21267623 ◽

2021 ◽

Author(s):

Amy M Mason ◽

Stephen Burgess

Keyword(s):

Linear Model ◽

Piecewise Linear ◽

R Package ◽

Mendelian Randomisation ◽

Software Application ◽

Individual Level ◽

Fractional Polynomial ◽

Level Data ◽

Non Linear ◽

The Individual

Motivation Mendelian randomisation methods that estimate non-linear exposure-outcome relationships typically require individual-level data. This package implements non-linear Mendelian randomisation methods using stratified summarised data, facilitating analyses where individual-level data cannot easily be shared, and additionally increasing reproducibility as summarised data can be reported. Dependence on summarised data means the methods are independent of the form of the individual-level data, increasing flexibility to different outcome types (such as continuous, binary, or time-to-event outcomes). Implementation SUMnlmr is available as an R package (version 3.1.0 or higher). General features The package implements the previously proposed fractional polynomial and piecewise linear methods on stratified summarised data that can either be estimated from individual-level data using the package or supplied by a collaborator. It constructs plots to visualise the estimated exposure-outcome relationship, and provides statistics to assess preference for a non-linear model over a linear model. Availability The package is freely available from GitHub [ https://github.com/amymariemason/SUMnlmr].

Download Full-text

The Effect of Labor Force Participation on Female Suicide Rates: An Analysis of Individual Data from 16 States

OMEGA - Journal of Death and Dying ◽

10.2190/j2m1-1e6h-t17g-vwv3 ◽

1996 ◽

Vol 34 (2) ◽

pp. 163-169 ◽

Cited By ~ 2

Author(s):

Steve Stack

Keyword(s):

Labor Force ◽

Labor Force Participation ◽

Elderly Women ◽

Individual Data ◽

Middle Aged ◽

Aggregated Data ◽

Individual Level ◽

Younger Women ◽

Suicide Rates ◽

The Relationship

Previous American-based research on the effect of womens' labor force participation (WPLF) on suicide has been based on highly aggregated data which makes it difficult to determine the actual, individual level suicide rate of employed versus unemployed women. The present study employs recent data which allow for the calculation of such individual-level suicide rates. Controls are incorporated for age and marital status. The results indicate that the suicide rates for employed, younger women are consistently lower than the suicide rates of women who are unemployed. The same tends to be true of middle-aged women. For elderly women, however, the relationship reverses with WPLF being associated with relatively high suicide rates. The findings on young and middle-aged women support a role accumulation model of WPLF, while the findings on elderly women support the theory of status integration.

Download Full-text

A Bayesian Approach to Linking a Survey and a Census via Small Areas

Stats ◽

10.3390/stats4020031 ◽

2021 ◽

Vol 4 (2) ◽

pp. 509-528

Author(s):

Balgobin Nandram

Keyword(s):

Weighted Least Squares ◽

Homogeneous Model ◽

Projection Methods ◽

Posterior Density ◽

Heterogeneous Model ◽

Small Areas ◽

Household Level ◽

Individual Level ◽

Level Data ◽

Dirichlet Model

We predict the finite population proportion of a small area when individual-level data are available from a survey and more extensive household-level (not individual-level) data (covariates but not responses) are available from a census. The census and the survey consist of the same strata and primary sampling units (PSU, or wards) that are matched, but the households are not matched. There are some common covariates at the household level in the survey and the census and these covariates are used to link the households within wards. There are also covariates at the ward level, and the wards are the same in the survey and the census. Using a two-stage procedure, we study the multinomial counts in the sampled households within the wards and a projection method to infer about the non-sampled wards. This is accommodated by a multinomial-Dirichlet–Dirichlet model, a three-stage hierarchical Bayesian model for multinomial counts, as it is necessary to account for heterogeneity among the households. The key theoretical contribution of this paper is to develop a computational algorithm to sample the joint posterior density of the multinomial-Dirichlet–Dirichlet model. Specifically, we obtain samples from the distributions of the proportions for each multinomial cell. The second key contribution is to use two projection procedures (parametric based on the nested error regression model and non-parametric based on iterative re-weighted least squares), on these proportions to link the survey to the census, thereby providing a copy of the census counts. We compare the multinomial-Dirichlet–Dirichlet (heterogeneous) model and the multinomial-Dirichlet (homogeneous) model without household effects via these two projection methods. An example of the second Nepal Living Standards Survey is presented.

Download Full-text

ei.Datasets: Real Data Sets for Assessing Ecological Inference Algorithms

Social Science Computer Review ◽

10.1177/08944393211040808 ◽

2021 ◽

pp. 089443932110408

Author(s):

Jose M. Pavía

Keyword(s):

Simulated Data ◽

Ground Truth ◽

Real Data ◽

R Package ◽

Data Sets ◽

Ecological Inference ◽

Inference Models ◽

Individual Level ◽

Inference Algorithms ◽

Cross Classification

Ecological inference models aim to infer individual-level relationships using aggregate data. They are routinely used to estimate voter transitions between elections, disclose split-ticket voting behaviors, or infer racial voting patterns in U.S. elections. A large number of procedures have been proposed in the literature to solve these problems; therefore, an assessment and comparison of them are overdue. The secret ballot however makes this a difficult endeavor since real individual data are usually not accessible. The most recent work on ecological inference has assessed methods using a very small number of data sets with ground truth, combined with artificial, simulated data. This article dramatically increases the number of real instances by presenting a unique database (available in the R package ei.Datasets) composed of data from more than 550 elections where the true inner-cell values of the global cross-classification tables are known. The article describes how the data sets are organized, details the data curation and data wrangling processes performed, and analyses the main features characterizing the different data sets.

Download Full-text