scholarly journals RubricOE: a learning framework for genetic epidemiology

Author(s):  
Subrata Saha ◽  
Aldo Guzmán-Sáenz ◽  
Aritra Bose ◽  
Filippo Utro ◽  
Daniel E. Platt ◽  
...  

AbstractGenetic epidemiology is a growing area of interest in the past years due to the availability of genetic data with the decreasing cost of sequencing. Machine learning (ML) algorithms can be a very useful tool to study the genetic factors on disease incidence or on different traits characterizing a population. There are many challenges that plagues the field of genetic epidemiology including the unbalanced case-control data sets, fallibility of standard genome wide association studies with single marker analysis, heavily underdetermined systems with millions of markers in contrast of a few thousands of samples, to name a few. Ensemble ML methods can be a very useful tool to tackle many of these challenges and thus we propose RubricOE, a pipeline of ML algorithms with error bar computations to obtain interpretable genetic and non-genetic features from genomic or transcriptomic data combined with clinical factors in the form of electronic health records. RubricOE is shown to be robust in simulation studies, detecting true associations with traits of interest in arbitrarily structured multi-ethnic populations.

2016 ◽  
Author(s):  
Piotr Szulc ◽  
Malgorzata Bogdan ◽  
Florian Frommlet ◽  
Hua Tang

AbstractIn Genome-Wide Association Studies (GWAS) genetic loci that influence complex traits are localized by inspecting associations between genotypes of genetic markers and the values of the trait of interest. On the other hand Admixture Mapping, which is performed in case of populations consisting of a recent mix of two ancestral groups, relies on the ancestry information at each locus (locus-specific ancestry).Recently it has been proposed to jointly model genotype and locus-specific ancestry within the framework of single marker tests. Here we extend this approach for population-based GWAS in the direction of multi marker models. A modified version of the Bayesian Information Criterion is developed for building a multi-locus model, which accounts for the differential correlation structure due to linkage disequilibrium and admixture linkage disequilibrium. Simulation studies and a real data example illustrate the advantages of this new approach compared to single-marker analysis and modern model selection strategies based on separately analyzing genotype and ancestry data, as well as to single-marker analysis combining genotypic and ancestry information. Depending on the signal strength our procedure automatically chooses whether genotypic or locus-specific ancestry markers are added to the model. This results in a good compromise between the power to detect causal mutations and the precision of their localization. The proposed method has been implemented in R and is available at http://www.math.uni.wroc.pl/~mbogdan/admixtures/.


2018 ◽  
Author(s):  
Jianjun Zhang ◽  
Zihan Zhao ◽  
Xuan Guo ◽  
Bin Guo ◽  
Baolin Wu

Genome-wide association studies (GWAS) have thus far achieved substantial success. In the last decade a large number of common variants underlying complex diseases have been identified through GWAS. In most existing GWAS, the identified common variants are obtained by single marker based tests, that is, testing one single nucleotide polymorphisms (SNP) at a time. Generally the basic functional unit of inheritance is a gene, rather than a SNP. Thus, results from gene level association test can be more readily integrated with downstream functional and pathogenic investigation. In this paper, we propose a general gene-based p-value adaptive combination approach (GPA) which can integrate association evidence of multiple genetic variants using only GWAS summary statistics (either p-value or other test statistics). The proposed method could be used to test both continuous and binary traits through not only a single but also multiple studies, which helps overcome the limitation of existing methods that only can be applied to specific type of data. We conducted thorough simulation studies to verify that the proposed method controls type I errors well, and performs favorably compared to single-marker analysis and other existing methods. We demonstrated the utility of our proposed method through analysis of GWAS meta-analysis results for fasting glucose and lipids from the international MAGIC consortium and Global Lipids Consortium, respectively. The proposed method identified some novel traits associated genes which can improve our understanding of the mechanisms involved in β-cell function, glucose homeostasis and lipids traits.


Cephalalgia ◽  
2015 ◽  
Vol 36 (7) ◽  
pp. 658-668 ◽  
Author(s):  
Rainer Malik ◽  
Bendik Winsvold ◽  
Eva Auffenberg ◽  
Martin Dichgans ◽  
Tobias Freilinger

Background A complex relationship between migraine and vascular disease has long been recognized. The pathophysiological basis underlying this correlation is incompletely understood. Aim The aim of this review is to focus on the migraine–vascular disorders connection from a genetic perspective, illustrating potentially shared (molecular) mechanisms. Results We first summarize the clinical presentation and genetic basis of CADASIL and other monogenic vascular syndromes with migraine as a prominent disease manifestation. Based on data from transgenic mouse models for familial hemiplegic migraine, we then discuss cortical spreading depression as a potential mechanistic link between migraine and ischemic stroke. Finally, we review data from genome-wide association studies, with a focus on overlapping findings with cervical artery dissection, ischemic stroke in general and cardiovascular disease. Conclusion A wealth of data supports a genetic link between migraine and vascular disease. Based on growing high-throughput data-sets, new genotyping techniques and in-depth phenotyping, further insights are expected for the future.


2017 ◽  
Author(s):  
Lavinia Paternoster ◽  
Kate Tilling ◽  
George Davey Smith

The past decade has been proclaimed as a hugely successful era of gene discovery through the high yields of many genome-wide association studies (GWAS). However, much of the perceived benefit of such discoveries lies in the promise that the identification of genes that influence disease would directly translate into the identification of potential therapeutic targets (1-4), but this has yet to be realised at a level reflecting expectation. One reason for this, we suggest, is that GWAS to date have generally not focused on phenotypes that directly relate to the progression of disease, and thus speak to disease treatment.


2021 ◽  
Author(s):  
Guangchao Sun ◽  
Ravi V. Mural ◽  
Jonathan D. Turkus ◽  
James C. Schnable

Southern rust is a severe foliar disease of maize (Zea mays) resulting from infection with the obligate biotrophic fungus Puccinia polysora. This disease reduces photosynthetic productivity, which in turn reduces yields, with the greatest yield losses (up to 50%) associated with earlier onset infections. P. polysora urediniospores overwinter only in tropical and subtropical regions but cause outbreaks when environmental conditions favor initial infection. Increased temperatures and humidity during the growing season combined with an increased frequency of moderate winters are likely to increase the frequency of severe southern rust outbreaks in the US corn belt. In summer 2020, a severe outbreak of southern rust was observed in eastern Nebraska (NE), USA. We scored a replicated maize association panel planted in Lincoln, NE for disease severity and found that disease incidence and severity showed significant variation among maize genotypes. Genome-wide association studies identified four loci associated with significant quantitative variation in disease severity. These loci were associated with candidate genes with plausible links to quantitative disease resistance. A transcriptome-wide association study identified additional genes associated with disease severity. Together, these results indicate that substantial diversity in resistance to southern rust exists among current temperate-adapted maize germplasm, including several candidate loci that may explain the observed variation in resistance to southern rust.


2020 ◽  
Vol 4 (Supplement_1) ◽  
Author(s):  
Flávia Rezende Tinano ◽  
Ana Pinheiro Machado Canton ◽  
Luciana Ribeiro Montenegro ◽  
Andrea de Castro Leal ◽  
Carolina Ramos ◽  
...  

Abstract Context: The clinical recognition of familial central precocious puberty (CPP) has significantly increased in the last years. This fact can be related to the recent descriptions of genetic causes associated with this pediatric condition, such as loss-of-function mutations of two imprinted genes (MKRN3 and DLK1). Inherited defects in both genes cause paternally inherited CPP. However, no genetic abnormality has been described in families with maternally inherited CPP so far. Objectives: To characterize the clinical and genetic features of several families with maternally inherited CPP. Setting and Participants: We analyzed clinical and genetic features of children with familial CPP. No brain MRI alterations were detected in the selected patients with CPP. MKRN3 and DLK1 pathogenic mutations were excluded. Whole-exome sequencing was performed in selected cases. Results: We studied 177 children from 141 families with familial CPP. Paternal inheritance was evidenced in 44 families (31%), whereas 58 (41%) had maternally inheritance. Indeterminate inheritance was detected in the remaining families. Maternally inherited CPP affected mainly female patients (69 girls and two boys). Thelarche occurred at mean age of 6.1 ± 1.9 years in this female group. Most of girls had Tanner 3 (41%) and Tanner 4 (35%) breast development at first evaluation. One boy had additional syndromic features (macrosomia, autism, bilateral eyelid ptosis, high arcade palate, irregular teeth and abnormal gait). The pedigree analysis of patients with maternally inherited CPP revealed the following affected family members: 42 mothers, 10 grandmothers, 11 sisters, 12 aunts, and 11 female cousins. Most of the families (41) had two affected consecutive generations, while eight families had three affected generations. No consanguinity was referred. Ongoing molecular analysis revealed two rare heterozygous variants in the boy with syndromic CPP and three affected family members with precocious menarche (mother, maternally half-sister, and maternally aunt): a frameshift deletion (p.F144fs) in MKKS; and a missense variant (p.P267L) in UGT2B4, which encodes a protein involved in estrogen hydroxylation and it was related to menarche timing in genome-wide association studies. Conclusions: Maternally inherited CPP was diagnosed mainly in girls, who had thelarche at mean age of 6 years old. Dominant pattern of inheritance was more prevalent, with direct maternal transmission in 72% of the studied families. New candidate genes might be implicated with maternally inherited CPP.


2010 ◽  
Vol 49 (06) ◽  
pp. 625-631
Author(s):  
H. Schäfer ◽  
B. H. Greene

Summary Background: Genome-wide association studies (GWAS) have been used successfully to identify genetic loci associated with complex diseases and phenotypes. Often this association takes the form of several significant signals (such as small p-values) in a univariate analysis at various markers within a single genetic region. Once confirmed, these associations lead to the question if a single marker tags the association signal of another, functionally relevant variant or if the single marker tags a functionally relevant haplo-type. To deal with this question, methods for family data based on logistic regression, adaptations of the transmission/disequilibrium test (TDT) or weighted haplotype likelihood (WHL) methods have been proposed in the literature. Objectives: Objectives were to examine the effect of parameters such as sample size, inheritance model, and the effects of linkage disequilibrium (LD) in the region on the ability of a selection of methods to detect an independent effect from an additional locus. Methods: All methods tested were applied to simulated genetic data of trios comprising a single affected offspring and two parents. Results: While regression-based methods have advantages such as model flexibility, potentially increasing power, the WHL method was more robust against increasing LD in the scenarios analyzed. Conclusions: Simulation results suggest that the regression and WHL methods are better able with regard to statistical power than the adaptation of the TDT analyzed here to detect genetic effects at an additional locus while controlling for confounding at another locus.


Animals ◽  
2020 ◽  
Vol 10 (8) ◽  
pp. 1300 ◽  
Author(s):  
Elisabetta Manca ◽  
Alberto Cesarani ◽  
Giustino Gaspa ◽  
Silvia Sorbolini ◽  
Nicolò P.P. Macciotta ◽  
...  

Genome-wide association studies (GWAS) are traditionally carried out by using the single marker regression model that, if a small number of individuals is involved, often lead to very few associations. The Bayesian methods, such as BayesR, have obtained encouraging results when they are applied to the GWAS. However, these approaches, require that an a priori posterior inclusion probability threshold be fixed, thus arbitrarily affecting the obtained associations. To partially overcome these problems, a multivariate statistical algorithm was proposed. The basic idea was that animals with different phenotypic values of a specific trait share different allelic combinations for genes involved in its determinism. Three multivariate techniques were used to highlight the differences between the individuals assembled in high and low phenotype groups: the canonical discriminant analysis, the discriminant analysis and the stepwise discriminant analysis. The multivariate method was tested both on simulated and on real data. The results from the simulation study highlighted that the multivariate GWAS detected a greater number of true associated single nucleotide polymorphisms (SNPs) and Quantitative trait loci (QTLs) than the single marker model and the Bayesian approach. For example, with 3000 animals, the traditional GWAS highlighted only 29 significantly associated markers and 13 QTLs, whereas the multivariate method found 127 associated SNPs and 65 QTLs. The gap between the two approaches slowly decreased as the number of animals increased. The Bayesian method gave worse results than the other two. On average, with the real data, the multivariate GWAS found 108 associated markers for each trait under study and among them, around 63% SNPs were also found in the single marker approach. Among the top 118 associated markers, 76 SNPs harbored putative candidate genes.


2017 ◽  
Vol 96 (11) ◽  
pp. 1192-1199 ◽  
Author(s):  
R. Grecco Machado ◽  
B. Frank Eames

Genome-wide association studies (GWASs) opened an innovative and productive avenue to investigate the molecular basis of human craniofacial disease. However, GWASs identify candidate genes only; they do not prove that any particular one is the functional villain underlying disease or just an unlucky genomic bystander. Genetic manipulation of animal models is the best approach to reveal which genetic loci identified from human GWASs are functionally related to specific diseases. The purpose of this review is to discuss the potential of zebrafish to resolve which candidate genetic loci are mechanistic drivers of craniofacial diseases. Many anatomic, embryonic, and genetic features of craniofacial development are conserved among zebrafish and mammals, making zebrafish a good model of craniofacial diseases. Also, the ability to manipulate gene function in zebrafish was greatly expanded over the past 20 y, enabling systems such as Gateway Tol2 and CRISPR-Cas9 to test gain- and loss-of-function alleles identified from human GWASs in coding and noncoding regions of DNA. With the optimization of genetic editing methods, large numbers of candidate genes can be efficiently interrogated. Finding the functional villains that underlie diseases will permit new treatments and prevention strategies and will increase understanding of how gene pathways operate during normal development.


Sign in / Sign up

Export Citation Format

Share Document