scholarly journals Missingness Adapted Group Informed Clustered (MAGIC)-LASSO: A novel paradigm for prediction in data with widespread non-random missingness

2021 ◽  
Author(s):  
Amanda E. Gentry ◽  
Robert M. Kirkpatrick ◽  
Roseann E. Peterson ◽  
Bradley T. Webb

AbstractThe availability of large-scale biobanks linking rich phenotypes and biological measures are a powerful opportunity for scientific discovery. However, real-world collections frequently have extensive non-random missing data. Machine learning methods are able to predict missing data but performance is significantly impaired by block-wise missingness inherent to many biobanks. To address this, we developed Missingness Adapted Group-wise Informed Clustered LASSO (MAGIC-LASSO) which performs hierarchical clustering of variables based on missingness followed by sequential Group LASSO within clusters. Variables are pre-filtered for missingness and balance between training and target sets with final models built using stepwise inclusion of features ranked by completeness. This research has been conducted using the UK Biobank (n>500k) to predict unmeasured Alcohol Use Disorders Identification Test (AUDIT.) The phenotypic correlation between measured and predicted total score was 0.67 while genetic correlations between independent subjects was >0.86, demonstrating the method has significant accuracy and utility.

Author(s):  
Uriel Singer ◽  
Kira Radinsky ◽  
Eric Horvitz

Abstract Summary How do nuances of scientists’ attention influence what they discover? We pursue an understanding of the influences of patterns of attention on discovery with a case study about confirmations of protein–protein interactions over time. We find that modeling and accounting for attention can help us to recognize and interpret biases in large-scale and widely used databases of confirmed interactions and to better understand missing data and unknowns. Additionally, we present an analysis of how awareness of patterns of attention and use of debiasing techniques can foster earlier discoveries. Availability and implementation The data is freely available athttps://github.com/urielsinger/PPI-unbias.


2019 ◽  
Author(s):  
Shareefa Dalvie ◽  
Adam X. Maihofer ◽  
Jonathan R.I. Coleman ◽  
Bekh Bradley ◽  
Gerome Breen ◽  
...  

AbstractChildhood maltreatment is highly prevalent and serves as a risk factor for mental and physical disorders. Self-reported childhood maltreatment appears heritable, but the specific genetic influences on this phenotype are largely unknown. The aims of this study were to 1) identify genetic variation associated with reported childhood maltreatment, 2) calculate the relevant SNP-based heritability estimates, and 3) quantify the genetic overlap of reported childhood maltreatment with mental and physical health-related phenotypes. Genome-wide association analysis for childhood maltreatment was undertaken, using a discovery sample from the UK Biobank (UKBB) (n=124,000) and a replication sample from the Psychiatric Genomics Consortium–posttraumatic stress disorder working group (PGC-PTSD) (n=26,290). Heritability estimations for childhood maltreatment and genetic correlations with mental/physical health traits were calculated using linkage disequilibrium score regression (LDSR). Two genome-wide significant loci associated with childhood maltreatment, located on chromosomes 3p13 (rs142346759, beta=0.015, p=4.35×10−8, FOXP1) and 7q31.1 (rs10262462, beta=-0.016, p=3.24×10−8, FOXP2), were identified in the discovery dataset but were not replicated in the PGC-PTSD sample. SNP-based heritability for childhood maltreatment was estimated to be ∼6%. Childhood maltreatment was most significantly genetically correlated with depressive symptoms (rg=0.70, p=4.65×10−40). This is the first large-scale genetic study to identify specific variants associated with self-reported childhood maltreatment. FOXP genes could influence traits such as depression and thereby be relevant to childhood maltreatment. Alternatively, these variants may be associated with a greater likelihood of reporting maltreatment. A clearer understanding of the genetic relationships of childhood maltreatment, including particular abuse subtypes, with various psychiatric disorders, may ultimately be useful in in developing targeted treatment and prevention strategies.


2018 ◽  
Author(s):  
B.M.L. Baselmans ◽  
M. Bartels

AbstractWhether hedonism or eudaimonism are two distinguishable forms of well-being is a topic of ongoing debate. To shed light on the relation between the two, large-scale available molecular genetic data were leveraged to gain more insight into the genetic architecture of the overlap between hedonic and eudaimonic well-being. Hence, we conducted the first genome-wide association studies (GWAS) of eudaimonic well-being (N = ∼108K) and linked it to a GWAS of hedonic well-being (N = ∼ 222K). We identified the first two genome-wide significant independent loci for eudaimonic well-being and 6 independent loci for hedonic well-being. Joint analyses revealed a moderate phenotypic correlation (r = 0.53), but a high genetic correlation (rg = 0.78) between eudaimonic and hedonic well-being. For both traits we identified enrichment in the frontal cortex -and cingulate cortex as well as the cerebellum to be top ranked. Bi-directional Mendelian Randomization analyses using two-sample MR indicated some evidence for a causal relationship from hedonic well-being to eudaimonic well-being whereas no evidence was found for the reverse. Additionally, genetic correlations patterns with a range of positive and negative related phenotypes were largely similar for hedonic –and eudaimonic well-being. Our results reveal a large genetic overlap between hedonism and eudaimonism.


2019 ◽  
Vol 89 (10) ◽  
pp. 1055-1073 ◽  
Author(s):  
Nicolaas Molenaar ◽  
Marita Felder

ABSTRACT Dolomite is a common and volumetrically important mineral in many siliciclastic sandstones, including Permian Rotliegend sandstones (the Slochteren Formation). These sandstones form extensive gas reservoirs in the Southern Permian Basin in the Netherlands, Germany, Poland, and the UK. The reservoir quality of these sandstones is negatively influenced by the content and distribution of dolomite. The origin and the stratigraphic distribution of the dolomite is not yet fully understood. The aim of this study is to identify the origin of carbonate. The main methods used to achieve those aims are a combination of thin-section petrography, scanning electron microscopy (SEM and EDX), and XRD analyses. The present study shows that the typical dispersed occurrence of the dolomite is a consequence of dispersed detrital carbonate grains that served both as nuclei and source for authigenic dolomite cement. The dolomite cement formed syntaxial outgrowths and overgrowths around detrital carbonate grains. The study also shows that dolomite cement, often in combination with ankerite and siderite, precipitated during burial after mechanical compaction. Most of the carbonate grains consisted of dolomite before deposition. The carbonate grains were affected by compaction and pressure dissolution, and commonly have no well-defined outlines anymore. The distribution of dolomite cement in the Rotliegend sandstones was controlled by the presence of stable carbonate grains. Due to the restricted and variable content of carbonate grains and their dispersed occurrence, the cement is also dispersed and the degree of cementation heterogeneous. Our findings have important implications on diagenesis modeling. The presence of detrital carbonate excludes the need for external supply by any large-scale advective flow of diagenetic fluids. By knowing that the carbonate source is local and related to detrital grains instead of being externally derived from an unknown source, the presence of carbonate cement can be linked to a paleogeographic and sedimentological model.


2021 ◽  
Author(s):  
Anik Dutta ◽  
Fanny E. Hartmann ◽  
Carolina Sardinha Francisco ◽  
Bruce A. McDonald ◽  
Daniel Croll

AbstractThe adaptive potential of pathogens in novel or heterogeneous environments underpins the risk of disease epidemics. Antagonistic pleiotropy or differential resource allocation among life-history traits can constrain pathogen adaptation. However, we lack understanding of how the genetic architecture of individual traits can generate trade-offs. Here, we report a large-scale study based on 145 global strains of the fungal wheat pathogen Zymoseptoria tritici from four continents. We measured 50 life-history traits, including virulence and reproduction on 12 different wheat hosts and growth responses to several abiotic stressors. To elucidate the genetic basis of adaptation, we used genome-wide association mapping coupled with genetic correlation analyses. We show that most traits are governed by polygenic architectures and are highly heritable suggesting that adaptation proceeds mainly through allele frequency shifts at many loci. We identified negative genetic correlations among traits related to host colonization and survival in stressful environments. Such genetic constraints indicate that pleiotropic effects could limit the pathogen’s ability to cause host damage. In contrast, adaptation to abiotic stress factors was likely facilitated by synergistic pleiotropy. Our study illustrates how comprehensive mapping of life-history trait architectures across diverse environments allows to predict evolutionary trajectories of pathogens confronted with environmental perturbations.


Genetics ◽  
1996 ◽  
Vol 143 (3) ◽  
pp. 1409-1416 ◽  
Author(s):  
Kenneth R Koots ◽  
John P Gibson

Abstract A data set of 1572 heritability estimates and 1015 pairs of genetic and phenotypic correlation estimates, constructed from a survey of published beef cattle genetic parameter estimates, provided a rare opportunity to study realized sampling variances of genetic parameter estimates. The distribution of both heritability estimates and genetic correlation estimates, when plotted against estimated accuracy, was consistent with random error variance being some three times the sampling variance predicted from standard formulae. This result was consistent with the observation that the variance of estimates of heritabilities and genetic correlations between populations were about four times the predicted sampling variance, suggesting few real differences in genetic parameters between populations. Except where there was a strong biological or statistical expectation of a difference, there was little evidence for differences between genetic and phenotypic correlations for most trait combinations or for differences in genetic correlations between populations. These results suggest that, even for controlled populations, estimating genetic parameters specific to a given population is less useful than commonly believed. A serendipitous discovery was that, in the standard formula for theoretical standard error of a genetic correlation estimate, the heritabilities refer to the estimated values and not, as seems generally assumed, the true population values.


Science ◽  
2021 ◽  
pp. eabf2946
Author(s):  
Louis du Plessis ◽  
John T. McCrone ◽  
Alexander E. Zarebski ◽  
Verity Hill ◽  
Christopher Ruis ◽  
...  

The UK’s COVID-19 epidemic during early 2020 was one of world’s largest and unusually well represented by virus genomic sampling. Here we reveal the fine-scale genetic lineage structure of this epidemic through analysis of 50,887 SARS-CoV-2 genomes, including 26,181 from the UK sampled throughout the country’s first wave of infection. Using large-scale phylogenetic analyses, combined with epidemiological and travel data, we quantify the size, spatio-temporal origins and persistence of genetically-distinct UK transmission lineages. Rapid fluctuations in virus importation rates resulted in >1000 lineages; those introduced prior to national lockdown tended to be larger and more dispersed. Lineage importation and regional lineage diversity declined after lockdown, while lineage elimination was size-dependent. We discuss the implications of our genetic perspective on transmission dynamics for COVID-19 epidemiology and control.


2021 ◽  
Author(s):  
Lianne P. de Vries ◽  
Toos C. E. M. van Beijsterveldt ◽  
Hermine Maes ◽  
Lucía Colodro-Conde ◽  
Meike Bartels

AbstractThe distinction between genetic influences on the covariance (or bivariate heritability) and genetic correlations in bivariate twin models is often not well-understood or only one is reported while the results show distinctive information about the relation between traits. We applied bivariate twin models in a large sample of adolescent twins, to disentangle the association between well-being (WB) and four complex traits (optimism, anxious-depressed symptoms (AD), aggressive behaviour (AGG), and educational achievement (EA)). Optimism and AD showed respectively a strong positive and negative phenotypic correlation with WB, the negative correlation of WB and AGG is lower and the correlation with EA is nearly zero. All four traits showed a large genetic contribution to the covariance with well-being. The genetic correlations of well-being with optimism and AD are strong and smaller for AGG and EA. We used the results of the models to explain what information is retrieved based on the bivariate heritability versus the genetic correlations and the (clinical) implications.


Author(s):  
Prasad Nagakumar ◽  
Ceri-Louise Chadwick ◽  
Andrew Bush ◽  
Atul Gupta

AbstractThe COVID-19 pandemic caused by SARS-COV-2 virus fortunately resulted in few children suffering from severe disease. However, the collateral effects on the COVID-19 pandemic appear to have had significant detrimental effects on children affected and young people. There are also some positive impacts in the form of reduced prevalence of viral bronchiolitis. The new strain of SARS-COV-2 identified recently in the UK appears to have increased transmissibility to children. However, there are no large vaccine trials set up in children to evaluate safety and efficacy. In this short communication, we review the collateral effects of COVID-19 pandemic in children and young people. We highlight the need for urgent strategies to mitigate the risks to children due to the COVID-19 pandemic. What is Known:• Children and young people account for <2% of all COVID-19 hospital admissions• The collateral impact of COVID-19 pandemic on children and young people is devastating• Significant reduction in influenza and respiratory syncytial virus (RSV) infection in the southern hemisphere What is New:• The public health measures to reduce COVID-19 infection may have also resulted in near elimination of influenza and RSV infections across the globe• A COVID-19 vaccine has been licensed for adults. However, large scale vaccine studies are yet to be initiated although there is emerging evidence of the new SARS-COV-2 strain spreading more rapidly though young people.• Children and young people continue to bear the collateral effects of COVID-19 pandemic


Sign in / Sign up

Export Citation Format

Share Document