scholarly journals Fine-mapping from summary data with the "Sum of Single Effects" model

2021 ◽  
Author(s):  
Yuxin Zou ◽  
Peter Carbonetto ◽  
Gao Wang ◽  
Matthew Stephens

In recent work, Wang et al introduced the "Sum of Single Effects" (SuSiE) model, and showed that it provides a simple and efficient approach to fine-mapping genetic variants from individual-level data. Here we present new methods for fitting the SuSiE model to summary data, for example to single-SNP z-scores from an association study and linkage disequilibrium (LD) values estimated from a suitable reference panel. To achieve this we introduce a simple strategy that could be used to extend any individual-level data method to deal with summary data. In essence, this strategy replaces the usual regression likelihood with an analogous likelihood based on summary data, exploiting the close connection between the two. Our strategy also has the benefit of dealing automatically with non-invertible LD matrices, which arise frequently in fine-mapping applications, and can complicate inference. We highlight other common practical issues in fine-mapping with summary data, including problems caused by inconsistencies between the z-scores and LD estimates, and we develop diagnostics to identify these inconsistencies. We also present a new refinement procedure that improves model fits in some data sets, and hence improves overall reliability of the SuSiE fine-mapping results. Simulation studies show that SuSiE applied to summary data is competitive, in both speed and accuracy, with the best available fine-mapping methods for summary data.

2020 ◽  
Vol 63 (5) ◽  
pp. 719-737
Author(s):  
F. Carson Mencken ◽  
Bethany Smith ◽  
Charles M. Tolbert

We test whether the self-employed have higher levels of civic inclination (trust, political activism, community closeness, community participation) compared to workers from the private sector. We examine the civic inclinations of the self-employed with two national cross-sectional data sets. We use a variety of discrete and continuous regression models. We find that the self-employed have higher levels of political activism, feel closer to neighbors and family, and have greater odds of engaging to solve community problems. We fail to detect differences in donating money, attending community events, and closeness to friends. Previous research has concluded with county-level data that the self-employed are important actors in building community and creating social capital. Our results add to this literature by showing that the self-employed have higher levels of civic inclination with individual-level data. Implications for theory and research are discussed.


2021 ◽  
Author(s):  
Yiliang Zhang ◽  
Youshu Cheng ◽  
Yixuan Ye ◽  
Wei Jiang ◽  
Qiongshi Lu ◽  
...  

AbstractWith the increasing accessibility of individual-level data from genome wide association studies, it is now common for researchers to have individual-level data of some traits in one specific population. For some traits, we can only access public released summary-level data due to privacy and safety concerns. The current methods to estimate genetic correlation can only be applied when the input data type of the two traits of interest is either both individual-level or both summary-level. When researchers have access to individual-level data for one trait and summary-level data for the other, they have to transform the individual-level data to summary-level data first and then apply summary data-based methods to estimate the genetic correlation. This procedure is computationally and statistically inefficient and introduces information loss. We introduce GENJI (Genetic correlation EstimatioN Jointly using Individual-level and summary data), a method that can estimate within-population or transethnic genetic correlation based on individual-level data for one trait and summary-level data for another trait. Through extensive simulations and analyses of real data on within-population and transethnic genetic correlation estimation, we show that GENJI produces more reliable and efficient estimation than summary data-based methods. Besides, when individual-level data are available for both traits, GENJI can achieve comparable performance than individual-level data-based methods. Downstream applications of genetic correlation can benefit from more accurate estimates. In particular, we show that more accurate genetic correlation estimation facilitates the predictability of cross-population polygenic risk scores.


2020 ◽  
Author(s):  
Leib Litman ◽  
Robert Hartman ◽  
Shalom Noach Jaffe ◽  
Jonathan Robinson

Thousands of readily downloadable county-level data sets offer untapped potential for linking geo-social influences to individual-level human behavior. In this study we describe a methodology for county-level sampling of online participants, allowing us to link the self-reported behavior of N = 1084 online respondents to contemporaneous county-level data on COVID-19 infection rate density. Using this approach, we show that infection rate density predicts person-level self-reported face mask wearing beyond multiple other demographic and attitudinal covariates. Using the present effort as a demonstration project, we describe the underlying sampling methodology and discuss the wider range of potential applications.


2019 ◽  
Vol 3 (1) ◽  
pp. 81-93 ◽  
Author(s):  
Blakeley B. McShane ◽  
Ulf Böckenholt

Meta-analysis typically involves the analysis of summary data (e.g., means, standard deviations, and sample sizes) from a set of studies via a statistical model that is a special case of a hierarchical (or multilevel) model. Unfortunately, the common summary-data approach to meta-analysis used in psychological research is often employed in settings where the complexity of the data warrants alternative approaches. In this article, we propose a thought experiment that can lead meta-analysts to move away from the common summary-data approach to meta-analysis and toward richer and more appropriate summary-data approaches when the complexity of the data warrants it. Specifically, we propose that it can be extremely fruitful for meta-analysts to act as if they possess the individual-level data from the studies and consider what model specifications they might fit even when they possess only summary data. This thought experiment is justified because (a) the analysis of the individual-level data from the studies via a hierarchical model is considered the “gold standard” for meta-analysis and (b) for a wide variety of cases common in meta-analysis, the summary-data and individual-level-data approaches are, by a principle known as statistical sufficiency, equivalent when the underlying models are appropriately specified. We illustrate the value of our thought experiment via a case study that evolves across five parts that cover a wide variety of data settings common in meta-analysis.


Author(s):  
Bernard Enjolras

AbstractVolunteer rates vary greatly across Europe despite the voluntary sector’s common history and tradition. This contribution advances a theoretical explanation for the variation in volunteering across Europe—the capability approach—and tests this approach by adopting a two-step strategy for modeling contextual effects. This approach, referring to the concept of capability introduced by Sen (Choice, welfare and measurement, Oxford University Press, 1980/1982), is based on the claim that the demand and supply sides of the voluntary sector can be expected to vary according to collective and individual capabilities to engage in volunteering. To empirically test the approach, the study relied on two data sources—the 2015 European Union (EU) Survey on Income and Living Conditions (EU-SILC), including an ad hoc module on volunteering at the individual level, and the Quality of Government Institute and PEW Research Center macro-level data sets—to operationalize economic, human, political, social, and religious contextual factors and assess their effects on individuals’ capability to volunteer. The results support the capability hypothesis at both levels. At the individual level, indicators of human, economic, and social resources have a positive effect on the likelihood of volunteering. At the contextual level, macro-structural indicators of economic, political, social, and religious contexts affect individuals’ ability to transform resources into functioning—that is, volunteering.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Luke R. Lloyd-Jones ◽  
Jian Zeng ◽  
Julia Sidorenko ◽  
Loïc Yengo ◽  
Gerhard Moser ◽  
...  

Abstract Accurate prediction of an individual’s phenotype from their DNA sequence is one of the great promises of genomics and precision medicine. We extend a powerful individual-level data Bayesian multiple regression model (BayesR) to one that utilises summary statistics from genome-wide association studies (GWAS), SBayesR. In simulation and cross-validation using 12 real traits and 1.1 million variants on 350,000 individuals from the UK Biobank, SBayesR improves prediction accuracy relative to commonly used state-of-the-art summary statistics methods at a fraction of the computational resources. Furthermore, using summary statistics for variants from the largest GWAS meta-analysis (n ≈ 700, 000) on height and BMI, we show that on average across traits and two independent data sets that SBayesR improves prediction R2 by 5.2% relative to LDpred and by 26.5% relative to clumping and p value thresholding.


2021 ◽  
Author(s):  
IS Arriaga-MacKenzie ◽  
G Matesi ◽  
S Chen ◽  
A Ronco ◽  
KM Marker ◽  
...  

AbstractPublicly available genetic summary data have high utility in research and the clinic including prioritizing putative causal variants, polygenic scoring, and leveraging common controls. However, summarizing individual-level data can mask population structure resulting in confounding, reduced power, and incorrect prioritization of putative causal variants. This limits the utility of publicly available data, especially for understudied or admixed populations where additional research and resources are most needed. While several methods exist to estimate ancestry in individual-level data, methods to estimate ancestry proportions in summary data are lacking. Here, we present Summix, a method to efficiently deconvolute ancestry and provide ancestry-adjusted allele frequencies from summary data. Using continental reference ancestry, African (AFR), Non-Finnish European (EUR), East Asian (EAS), Indigenous American (IAM), South Asian (SAS), we obtain accurate and precise estimates (within 0.1%) for all simulation scenarios. We apply Summix to gnomAD v2.1 exome and genome groups and subgroups finding heterogeneous continental ancestry for several groups including African/African American (∼84% AFR, ∼14% EUR) and American/Latinx (∼4% AFR, ∼5% EAS, ∼43% EUR, ∼46% IAM). Compared to the unadjusted gnomAD AFs, Summix’s ancestry-adjusted AFs more closely match respective African and Latinx reference samples. Even on modern, dense panels of summary statistics, Summix yields results in seconds allowing for estimation of confidence intervals via block bootstrap. Given an accompanying R package, Summix increases the utility and equity of public genetic resources, empowering novel research opportunities.


2020 ◽  
Author(s):  
Ciarrah Barry ◽  
Junxi Liu ◽  
Rebecca Richmond ◽  
Martin K Rutter ◽  
Deborah A Lawlor ◽  
...  

AbstractOver the last decade the availability of SNP-trait associations from genome-wide association studies data has led to an array of methods for performing Mendelian randomization studies using only summary statistics. A common feature of these methods, besides their intuitive simplicity, is the ability to combine data from several sources, incorporate multiple variants and account for biases due to weak instruments and pleiotropy. With the advent of large and accessible fully-genotyped cohorts such as UK Biobank, there is now increasing interest in understanding how best to apply these well developed summary data methods to individual level data, and to explore the use of more sophisticated causal methods allowing for non-linearity and effect modification.In this paper we describe a general procedure for optimally applying any two sample summary data method using one sample data. Our procedure first performs a meta-analysis of summary data estimates that are intentionally contaminated by collider bias between the genetic instruments and unmeasured confounders, due to conditioning on the observed exposure. A weighted sum of these estimates is then used to correct the standard observational association between an exposure and outcome. Simulations are conducted to demonstrate the method’s performance against naive applications of two sample summary data MR. We apply the approach to the UK Biobank cohort to investigate the causal role of sleep disturbance on HbA1c levels, an important determinant of diabetes.Our approach is closely related to the work of Dudbridge et al. (Nat. Comm. 10: 1561), who developed a technique to adjust for index event bias when uncovering genetic predictors of disease progression based on case-only data. Our paper serves to clarify that in any one sample MR analysis, it can be advantageous to estimate causal relationships by artificially inducing and then correcting for collider bias.


PLoS Genetics ◽  
2021 ◽  
Vol 17 (8) ◽  
pp. e1009703
Author(s):  
Ciarrah Barry ◽  
Junxi Liu ◽  
Rebecca Richmond ◽  
Martin K. Rutter ◽  
Deborah A. Lawlor ◽  
...  

Over the last decade the availability of SNP-trait associations from genome-wide association studies has led to an array of methods for performing Mendelian randomization studies using only summary statistics. A common feature of these methods, besides their intuitive simplicity, is the ability to combine data from several sources, incorporate multiple variants and account for biases due to weak instruments and pleiotropy. With the advent of large and accessible fully-genotyped cohorts such as UK Biobank, there is now increasing interest in understanding how best to apply these well developed summary data methods to individual level data, and to explore the use of more sophisticated causal methods allowing for non-linearity and effect modification. In this paper we describe a general procedure for optimally applying any two sample summary data method using one sample data. Our procedure first performs a meta-analysis of summary data estimates that are intentionally contaminated by collider bias between the genetic instruments and unmeasured confounders, due to conditioning on the observed exposure. These estimates are then used to correct the standard observational association between an exposure and outcome. Simulations are conducted to demonstrate the method’s performance against naive applications of two sample summary data MR. We apply the approach to the UK Biobank cohort to investigate the causal role of sleep disturbance on HbA1c levels, an important determinant of diabetes. Our approach can be viewed as a generalization of Dudbridge et al. (Nat. Comm. 10: 1561), who developed a technique to adjust for index event bias when uncovering genetic predictors of disease progression based on case-only data. Our work serves to clarify that in any one sample MR analysis, it can be advantageous to estimate causal relationships by artificially inducing and then correcting for collider bias.


2010 ◽  
Vol 29 (21) ◽  
pp. 2180-2193 ◽  
Author(s):  
Erol A. Peköz ◽  
Michael Shwartz ◽  
Cindy L. Christiansen ◽  
Dan Berlowitz

Sign in / Sign up

Export Citation Format

Share Document