summary statistics
Recently Published Documents


TOTAL DOCUMENTS

1158
(FIVE YEARS 482)

H-INDEX

46
(FIVE YEARS 10)

2022 ◽  
Author(s):  
Joanna von Berg ◽  
Michelle ten Dam ◽  
Sander W. van der Laan ◽  
Jeroen de Ridder

Pleiotropic SNPs are associated with multiple traits. Such SNPs can help pinpoint biological processes with an effect on multiple traits or point to a shared etiology between traits. We present PolarMorphism, a new method for the identification of pleiotropic SNPs from GWAS summary statistics. PolarMorphism can be readily applied to more than two traits or whole trait domains. PolarMorphism makes use of the fact that trait-specific SNP effect sizes can be seen as Cartesian coordinates and can thus be converted to polar coordinates r (distance from the origin) and theta (angle with the Cartesian x-axis). r describes the overall effect of a SNP, while theta describes the extent to which a SNP is shared. r and theta are used to determine the significance of SNP sharedness, resulting in a p-value per SNP that can be used for further analysis. We apply PolarMorphism to a large collection of publicly available GWAS summary statistics enabling the construction of a pleiotropy network that shows the extent to which traits share SNPs. This network shows how PolarMorphism can be used to gain insight into relationships between traits and trait domains. Furthermore, pathway analysis of the newly discovered pleiotropic SNPs demonstrates that analysis of more than two traits simultaneously yields more biologically relevant results than the combined results of pairwise analysis of the same traits. Finally, we show that PolarMorphism is more efficient and more powerful than previously published methods.


Author(s):  
Daria Martchenko ◽  
Aaron Shafer

Genomic approaches to the study of population demography rely on accurate SNP calling and by-proxy the site frequency spectrum (SFS). Two main questions for the design of such studies remain poorly investigated: do reduced genomic sequencing summary statistics reflect that of whole genome, and how do sequencing strategies and derived summary statistics impact demographic inferences? To address those questions, we applied the ddRAD sequencing approach to 254 individuals and whole genome resequencing approach to 35 mountain goat (Oreamnos americanus) individuals across the species range with a known demographic history. We identified SNPs with 5 different variant callers and used ANGSD to estimate the genotype likelihoods (GLs). We tested combinations of SNP filtering by linkage disequilibrium (LD), minor allele frequency (MAF) and the genomic region. We compared the resulting suite of summary statistics reflective of the SFS and quantified the relationship to demographic inferences by estimating the contemporary effective population size (Ne), isolation-by-distance and population structure, FST, and explicit modelling of the demographic history with δaδi. Filtering had a larger effect than sequencing strategy, with the former strongly influencing summary statistics. Estimates of contemporary Ne and isolation-by-distance patterns were largely robust to the choice of sequencing, pipeline, and filtering. Despite the high variance in summary statistics, whole genome and reduced representation approaches were overall similar in supporting a glacial induced vicariance and low Ne in mountain goats. We discuss why whole genome resequencing data is preferable, and reiterate support the use of GLs, in part because it limits user-determined filters.


Genes ◽  
2022 ◽  
Vol 13 (1) ◽  
pp. 112
Author(s):  
Georg Hahn ◽  
Dmitry Prokopenko ◽  
Sharon Lutz ◽  
Kristina Mullin ◽  
Rudolph Tanzi ◽  
...  

Polygenic risk scores are a popular means to predict the disease risk or disease susceptibility of an individual based on its genotype information. When adding other important epidemiological covariates such as age or sex, we speak of an integrated risk model. Methodological advances for fitting more accurate integrated risk models are of immediate importance to improve the precision of risk prediction, thereby potentially identifying patients at high risk early on when they are still able to benefit from preventive steps/interventions targeted at increasing their odds of survival, or at reducing their chance of getting a disease in the first place. This article proposes a smoothed version of the “Lassosum” penalty used to fit polygenic risk scores and integrated risk models using either summary statistics or raw data. The smoothing allows one to obtain explicit gradients everywhere for efficient minimization of the Lassosum objective function while guaranteeing bounds on the accuracy of the fit. An experimental section on both Alzheimer’s disease and COPD (chronic obstructive pulmonary disease) demonstrates the increased accuracy of the proposed smoothed Lassosum penalty compared to the original Lassosum algorithm (for the datasets under consideration), allowing it to draw equal with state-of-the-art methodology such as LDpred2 when evaluated via the AUC (area under the ROC curve) metric.


Author(s):  
Oliver Pain ◽  
Alexandra C. Gillett ◽  
Jehannine C. Austin ◽  
Lasse Folkersen ◽  
Cathryn M. Lewis

AbstractThere is growing interest in the clinical application of polygenic scores as their predictive utility increases for a range of health-related phenotypes. However, providing polygenic score predictions on the absolute scale is an important step for their safe interpretation. We have developed a method to convert polygenic scores to the absolute scale for binary and normally distributed phenotypes. This method uses summary statistics, requiring only the area-under-the-ROC curve (AUC) or variance explained (R2) by the polygenic score, and the prevalence of binary phenotypes, or mean and standard deviation of normally distributed phenotypes. Polygenic scores are converted using normal distribution theory. We also evaluate methods for estimating polygenic score AUC/R2 from genome-wide association study (GWAS) summary statistics alone. We validate the absolute risk conversion and AUC/R2 estimation using data for eight binary and three continuous phenotypes in the UK Biobank sample. When the AUC/R2 of the polygenic score is known, the observed and estimated absolute values were highly concordant. Estimates of AUC/R2 from the lassosum pseudovalidation method were most similar to the observed AUC/R2 values, though estimated values deviated substantially from the observed for autoimmune disorders. This study enables accurate interpretation of polygenic scores using only summary statistics, providing a useful tool for educational and clinical purposes. Furthermore, we have created interactive webtools implementing the conversion to the absolute (https://opain.github.io/GenoPred/PRS_to_Abs_tool.html). Several further barriers must be addressed before clinical implementation of polygenic scores, such as ensuring target individuals are well represented by the GWAS sample.


Author(s):  
Ivan Wolansky ◽  

Deep learning is a type of machine learning (ML) that is growing in importance in the medical field. It can often perform better than traditional ML models on different metrics, and it can handle non-linear problems due to activation functions. Activation functions are different non-linear functions that are used to restrict the values propagated to an interval. In deep learning, information propagates forward, passing through different layers of weights and activation functions, before reaching the final layer. Then a cost function is evaluated and propagated back through the network to adjust weights. A convolutional neural network (CNN) is a form of deep learning that is used primarily in imaging. CNNs perform significantly well with grid-like inputs because they learn shapes well. CNNs compute dot products between layers and kernels in a convolutional layer, prior to pooling, which outputs summary statistics. CNNs are better than trivial neural networks for imaging due to a number of reasons, like sparse interaction and equivariance of translation


Author(s):  
Arunabha Majumdar ◽  
Preksha Patel ◽  
Bogdan Pasaniuc ◽  
Roel A. Ophoff

AbstractIn genetic studies of psychiatric disorders in the pre-genome-wide association study (GWAS) era, one of the most commonly studied loci is the serotonin transporter (SLC6A4) promoter polymorphism, a 43-base-pair insertion/deletion polymorphism in the promoter region (5-HTTLPR). The genetic association signals between 5-HTTLPR and psychiatric phenotypes, however, have been inconsistent across many studies. Since the polymorphism cannot be tested via available SNP arrays, we had previously proposed an efficient machine learning algorithm to predict the genotypes of 5-HTTLPR based on the genotypes of eight nearby SNPs, which requires access to individual-level genotype and phenotype data. To utilize the advantage of publicly available GWAS summary statistics obtained from studies with very large sample sizes, we develop a GWAS summary-statistics-based approach for testing the variable number of tandem repeat (VNTR) associations with various phenotypes. We first cross-verify the accuracy of the summary-statistics-based approach for 61 phenotypes in the UK Biobank. Since we observed a strong similarity between the predicted individual-level 5-HTTLPR genotype-based approach and the summary-statistics-based approach, we applied our method to the available neurobehavioral GWAS summary statistics data obtained from large-scale GWAS. We found no genome-wide significant evidence for association between 5-HTTLPR and any of the neurobehavioral traits. We did observe, however, genome-wide significant evidence for association between this locus and human adult height, BMI, and total cholesterol. Our summary-statistics-based approach provides a systematic way to examine the role of VNTRs and related types of genetic polymorphisms in disease risk and trait susceptibility of phenotypes for which large-scale GWAS summary statistics data are available.


2021 ◽  
Vol 23 ◽  
Author(s):  
Pei He ◽  
Rong- Rong Cao ◽  
Fei- Yan Deng ◽  
Shu- Feng Lei

Background: Immune and skeletal systems physiologically and pathologically interact with each other. The immune and skeletal diseases may share potential pleiotropic genetics factors, but the shared specific genes are largely unknown Objective: This study aimed to investigate the overlapping genetic factors between multiple diseases (including rheumatoid arthritis (RA), psoriasis, osteoporosis, osteoarthritis, sarcopenia and fracture) Methods: The canonical correlation analysis (metaCCA) approach was used to identify the shared genes for six diseases by integrating genome-wide association study (GWAS)-derived summary statistics. Versatile Gene-based Association Study (VEGAS2) method was further applied to refine and validate the putative pleiotropic genes identified by metaCCA. Results: About 157 (p<8.19E-6), 319 (p<3.90E-6) and 77 (p<9.72E-6) potential pleiotropic genes were identified shared by two immune disease, four skeletal diseases, and all of the six diseases, respectively. The top three significant putative pleiotropic genes shared by both immune and skeletal diseases, including HLA-B, TSBP1 and TSBP1-AS1 (p<E-300) were located in the major histocompatibility complex (MHC) region. Nineteen of 77 putative pleiotropic genes identified by metaCCA analysis were associated with at least one disease in the VEGAS2 analysis. Specifically, majority (18) of these 19 putative validated pleiotropic genes were associated with RA. Conclusion: The metaCCA method identified some pleiotropic genes shared by the immune and skeletal diseases. These findings help to improve our understanding of the shared genetic mechanisms and signaling pathways underlying immune and skeletal diseases.


2021 ◽  
Vol 8 ◽  
Author(s):  
Jake Labriola ◽  
Rebecca Garabed ◽  
Carly Sinclair ◽  
Antoinette E. Marsh

There is increasing concern within the veterinary medical community (veterinarians and veterinary students) that disgruntled clients are unfairly leveraging various legal tools against veterinarians. Clinical veterinarians and veterinary students should be aware of the most common types of problems arising within the clinic and how they can lead to formal consumer complaints. The study describes and categorizes with greater detail the types of violations or “causes for discipline” that occur, as well as specific sanctions imposed on veterinarians formally disciplined for standard of care-related violations between 2017 and 2019, for California. In addition, the study calculated the frequency of disciplinary actions and their basic summary statistics regarding the temporal aspect of how lawsuits typically unfold. Using public documents from California, the study describes the analysis and trends for the purpose of providing contextual evidence to inform and guide potential veterinary educational interventions. Although specific to California, this study can serve as a template methodology for comparisons to other states.


2021 ◽  
Author(s):  
Gulnara R. Svishcheva ◽  
Evgeny S. Tiys ◽  
Elizaveta E. Elgaeva ◽  
Sofia G. Feoktistova ◽  
Paul R. H. J. Timmers ◽  
...  

We propose a novel effective framework for analysis of the shared genetic background for a set of genetically correlated traits using SNP-level GWAS summary statistics. This framework called SHAHER is based on the construction of a linear combination of traits by maximizing the proportion of its genetic variance explained by the shared genetic factors. SHAHER requires only full GWAS summary statistics and matrices of genetic and phenotypic correlations between traits as inputs. Our framework allows both shared and unshared genetic factors to be to effectively analyzed. We tested our framework using simulation studies, compared it with previous developments, and assessed its performance using three real datasets: anthropometric traits, psychiatric conditions and lipid concentrations. SHAHER is versatile and applicable to summary statistics from GWASs with arbitrary sample sizes and sample overlaps, allows incorporation of different GWAS models (Cox, linear and logistic) and is computationally fast.


2021 ◽  
Author(s):  
Tham Hoang ◽  
Giang Vu ◽  
Mai Tran ◽  
Nam Vo ◽  
Quang Le ◽  
...  

Abstract Background: A global pandemic has been declared for coronavirus disease 2019 (COVID-19), which has serious impacts on human health and healthcare systems in the affected areas, including Vietnam. None of the previous studies have a framework to provide summary statistics of the virus variants and assess the severity associated with virus proteins and host cells in COVID-19 patients in Vietnam. Method: In this paper, we comprehensively investigated SARS-CoV-2 variants and immune responses in COVID-19 patients in Vietnam. We provided summary statistics of a target sequence of SARS-CoV-2 for data scientists to use in downstream analysis for therapeutic targets. For host cells, we proposed a predictive model of the severity of COVID-19 based on public datasets of hospitalization status in Vietnam, incorporating a polygenic risk score. This score uses immunogenic SNP biomarkers as indicators of COVID-19 severity. Result: We identified that the Delta variant of SARS-CoV-2 is most prevalent in southern areas of Vietnam and it is different from other areas in the world using various data sources. Our predictive models of COVID-19 severity had high accuracy (Random Forest AUC = 0.81, Elastic Net AUC = 0.7, and SVM AUC = 0.69) and showed that the use of polygenic risk scores increased the models’ predictive capabilities. Conclusion: We provided a comprehensive analysis for COVID-19 severity in Vietnam. This investigation is not only helpful for COVID-19 treatment in therapeutic target studies, but also could influence further research on the disease progression and personalized clinical outcomes.


Sign in / Sign up

Export Citation Format

Share Document