Machine learning based disease prediction from genotype data

2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Nikoletta Katsaouni ◽  
Araek Tashkandi ◽  
Lena Wiese ◽  
Marcel H. Schulz

Abstract Using results from genome-wide association studies for understanding complex traits is a current challenge. Here we review how genotype data can be used with different machine learning (ML) methods to predict phenotype occurrence and severity from genotype data. We discuss common feature encoding schemes and how studies handle the often small number of samples compared to the huge number of variants. We compare which ML methods are being applied, including recent results using deep neural networks. Further, we review the application of methods for feature explanation and interpretation.

2020 ◽  
Author(s):  
Meng Luo ◽  
Shiliang Gu

AbstractAlthough genome-wide association studies have successfully identified thousands of markers associated with various complex traits and diseases, our ability to predict such phenotypes remains limited. A perhaps ignored explanation lies in the limitations of the genetic models and statistical techniques commonly used in association studies. However, using genotype data for individuals to perform accurate genetic prediction of complex traits can promote genomic selection in animal and plant breeding and can lead to the development of personalized medicine in humans. Because most complex traits have a polygenic architecture, accurate genetic prediction often requires modeling genetic variants together via polygenic methods. Here, we also utilize our proposed polygenic methods, which refer to as the iterative screen regression model (ISR) for genome prediction. We compared ISR with several commonly used prediction methods with simulations. We further applied ISR to predicting 15 traits, including the five species of cattle, rice, wheat, maize, and mice. The results of the study indicate that the ISR method performs well than several commonly used polygenic methods and stability.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Declan Bennett ◽  
Donal O’Shea ◽  
John Ferguson ◽  
Derek Morris ◽  
Cathal Seoighe

AbstractOngoing increases in the size of human genotype and phenotype collections offer the promise of improved understanding of the genetics of complex diseases. In addition to the biological insights that can be gained from the nature of the variants that contribute to the genetic component of complex trait variability, these data bring forward the prospect of predicting complex traits and the risk of complex genetic diseases from genotype data. Here we show that advances in phenotype prediction can be applied to improve the power of genome-wide association studies. We demonstrate a simple and efficient method to model genetic background effects using polygenic scores derived from SNPs that are not on the same chromosome as the target SNP. Using simulated and real data we found that this can result in a substantial increase in the number of variants passing genome-wide significance thresholds. This increase in power to detect trait-associated variants also translates into an increase in the accuracy with which the resulting polygenic score predicts the phenotype from genotype data. Our results suggest that advances in methods for phenotype prediction can be exploited to improve the control of background genetic effects, leading to more accurate GWAS results and further improvements in phenotype prediction.


2021 ◽  
Author(s):  
Subrata Saha ◽  
Himanshu Narayan Singh ◽  
Ahmed Soliman ◽  
Sanguthevar Rajasekaran

Background: Current form of genome-wide association studies (GWAS) is inadequate to accurately explain the genetics of complex traits due to the lack of sufficient statistical power. It explores each variant individually, but current studies show that multiple variants with varying effect sizes actually act in a concerted way to develop a complex disease. To address this issue, we have developed an algorithmic framework that can effectively solve the multi-locus problem in GWAS with a very high level of confidence. Our methodology consists of three novel algorithms based on graph theory and machine learning. It identifies a set of highly discriminating variants that are stable and robust with little (if any) spuriousness. Consequently, likely these variants should be able to interpret missing heritability of a convoluted disease as an entity. Results: To demonstrate the efficacy of our proposed algorithms, we have considered astigmatism case-control GWAS dataset. Astigmatism is a common eye condition that causes blurred vision because of an error in the shape of the cornea. The cause of astigmatism is not entirely known but a sizable inheritability is assumed. Clinical studies show that developmental disorders (such as, autism) and astigmatism co-occur in a statistically significant number of individuals. By performing classical GWAS analysis, we didn't find any genome-wide statistically significant variants. Conversely, we have identified a set of stable, robust, and highly predictive variants that can together explain the genetics of astigmatism. We have performed a set of biological enrichment analyses based on gene ontology (GO) terms, disease ontology (DO) terms, biological pathways, network of pathways, and so forth to manifest the accuracy and novelty of our findings. Conclusions: Rigorous experimental evaluations show that our proposed methodology can solve GWAS multi-locus problem effectively and efficiently. It can identify signals from the GWAS dataset having small number of samples with a high level of accuracy. We believe that the proposed methodology based on graph theory and machine learning is the most comprehensive one compared to any other machine learning based tools in this domain.


2021 ◽  
Author(s):  
Declan Bennett ◽  
Dónal O'Shea ◽  
John Ferguson ◽  
Derek Morris ◽  
Cathal Seoighe

Abstract Ongoing increases in the size of human genotype and phenotype collections offer the promise of improved understanding of the genetics of complex diseases. In addition to the biological insights that can be gained from the nature of the variants that contribute to the genetic component of complex trait variability, these data bring forward the prospect of predicting complex traits and the risk of complex genetic diseases from genotype data. Here we show that advances in phenotype prediction can be applied to improve the power of genome-wide association studies. We demonstrate a simple and efficient method to model genetic background effects using polygenic scores derived from SNPs that are not on the same chromosome as the target SNP. Using simulated and real data we found that this can result in a substantial increase in the number of variants passing genome-wide significance thresholds. This increase in power to detect trait-associated variants also translates into an increase in the accuracy with which the resulting polygenic score predicts the phenotype from genotype data. Our results suggest that advances in methods for phenotype prediction can be exploited to improve the control of background genetic effects, leading to more accurate GWAS results and further improvements in phenotype prediction.


Author(s):  
Declan Bennett ◽  
Derek Morris ◽  
Cathal Seoighe

ABSTRACTOngoing increases in the size of human genotype and phenotype collections offer the promise of improved understanding of the genetics of complex diseases. In addition to the biological insights that can be gained from the nature of the variants that contribute to the genetic component of complex trait variability, these data have brought forward the prospect of predicting complex traits and the risk of complex genetic diseases from genotype data. Optimal realization of these objectives requires ongoing methodological developments, designed to identify true trait-associated variants and accurately predict phenotype from genotype. These methods must be computationally efficient, in order to remain tractable in the context of high variant densities and very large sample sizes. Here we show that the power of linear mixed models that are in widespread use for GWAS can be increased significantly by modeling off-target genetic effects using polygenic scores derived from SNPs that are not on the same chromosome as the target SNP. Using simulated and real data we found that this can result in a substantial increase in the number of variants passing genome-wide significance thresholds. This increase in power to detect trait-associated variants also translates into an increase in the accuracy with which the resulting polygenic score predicts the phenotype from genotype data. Our results suggest that advances in methods for phenotype prediction can be exploited to improve the control of off-target genetic effects, leading to more accurate GWAS results and further improvements in phenotype prediction.


2016 ◽  
Vol 283 (1835) ◽  
pp. 20160569 ◽  
Author(s):  
M. E. Goddard ◽  
K. E. Kemper ◽  
I. M. MacLeod ◽  
A. J. Chamberlain ◽  
B. J. Hayes

Complex or quantitative traits are important in medicine, agriculture and evolution, yet, until recently, few of the polymorphisms that cause variation in these traits were known. Genome-wide association studies (GWAS), based on the ability to assay thousands of single nucleotide polymorphisms (SNPs), have revolutionized our understanding of the genetics of complex traits. We advocate the analysis of GWAS data by a statistical method that fits all SNP effects simultaneously, assuming that these effects are drawn from a prior distribution. We illustrate how this method can be used to predict future phenotypes, to map and identify the causal mutations, and to study the genetic architecture of complex traits. The genetic architecture of complex traits is even more complex than previously thought: in almost every trait studied there are thousands of polymorphisms that explain genetic variation. Methods of predicting future phenotypes, collectively known as genomic selection or genomic prediction, have been widely adopted in livestock and crop breeding, leading to increased rates of genetic improvement.


2021 ◽  
Vol 42 (1) ◽  
Author(s):  
Dinesh K. Saini ◽  
Yuvraj Chopra ◽  
Jagmohan Singh ◽  
Karansher S. Sandhu ◽  
Anand Kumar ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document