scholarly journals Object-Attribute Biclustering for Elimination of Missing Genotypes in Ischemic Stroke Genome-Wide Data

2020 ◽  
Author(s):  
Dmitry I. Ignatov ◽  
Gennady V. Khvorykh ◽  
Andrey V. Khrunin ◽  
Stefan Nikolić ◽  
Makhmud Shaban ◽  
...  

AbstractMissing genotypes can affect the efficacy of machine learning approaches to identify the risk genetic variants of common diseases and traits. The problem occurs when genotypic data are collected from different experiments with different DNA microarrays, each being characterised by its pattern of uncalled (missing) genotypes. This can prevent the machine learning classifier from assigning the classes correctly. To tackle this issue, we used well-developed notions of object-attribute biclusters and formal concepts that correspond to dense subrelations in the binary relation patients × SNPs. The paper contains experimental results on applying a biclustering algorithm to a large real-world dataset collected for studying the genetic bases of ischemic stroke. The algorithm could identify large dense biclusters in the genotypic matrix for further processing, which in return significantly improved the quality of machine learning classifiers. The proposed algorithm was also able to generate biclusters for the whole dataset without size constraints in comparison to the In-Close4 algorithm for generation of formal concepts.

2021 ◽  
Author(s):  
Yin Yeng Lee ◽  
Mehari Endale ◽  
Gang Wu ◽  
Marc D Ruben ◽  
Lauren J Francey ◽  
...  

Genetics impacts sleep, yet, the molecular mechanisms underlying sleep regulation remain elusive. We built machine learning (ML) models to predict genes based on their similarity to known sleep genes. Our predictions fit with prior knowledge of sleep regulation and also identify several key genes/pathways to pursue in follow-up studies. We tested one of our findings, the NF-κB pathway, and showed that its genetic alteration affects sleep duration in mice. Our study highlights the power of ML to integrate prior knowledge and genome-wide data to study genetic regulation of sleep and other complex behaviors.


2021 ◽  
Author(s):  
M. W. Wojewodzic ◽  
J. P. Lavender

AbstractAberrant methylation patterns in human DNA have great potential for the discovery of novel diagnostic and disease progression biomarkers. In this paper, we used machine learning algorithms to identify promising methylation sites for diagnosing cancerous tissue and to classify patients based on methylation values at these sites.We used genome-wide DNA methylation patterns from both cancerous and normal tissue samples, obtained from the Genomic Data Commons consortium and trialled our methods on three types of urological cancer. A decision tree was used to identify the methylation sites most useful for diagnosis.The identified locations were then used to train a neural network to classify samples as either cancerous or non-cancerous. Using this two-step approach we found strong indicative biomarker panels for each of the three cancer types.These methods could likely be translated to other cancers and improved by using non-invasive liquid methods such as blood instead of biopsy tissue.


2019 ◽  
Author(s):  
Qi Yan ◽  
Yale Jiang ◽  
Heng Huang ◽  
Anand Swaroop ◽  
Emily Y. Chew ◽  
...  

ABSTRACTNumerous independent susceptibility variants have been identified for Age-related macular degeneration (AMD) by genome-wide association studies (GWAS). Since advanced AMD is currently incurable, an accurate prediction of a person’s AMD risk using genetic information is desirable for early diagnosis and clinical management. In this study, genotype data of 32,215 Caucasian individuals with age above 50 years from the International AMD Genomics Consortium in dbGAP were used to establish and validate prediction models for AMD risk using four different machine learning approaches: neural network, lasso regression, support vector machine, and random forest. A standard logistic regression model was also considered using a genetic risk score. To identify feature SNPs for AMD prediction models, we selected the genome-wide significant SNPs from GWAS. All methods achieved good performance for predicting normal controls versus advanced AMD cases (AUC=0.81∼0.82 in a separate test dataset) and normal controls versus any AMD (AUC=0.78∼0.79). By applying the state-of-art machine learning approaches on the large AMD GWAS data, the predictive models we established can provide an accurate estimation of an individual’s AMD risk profile across the person’s lifespan based on a comprehensive genetic information.


2016 ◽  
Vol 12 (S329) ◽  
pp. 422-422
Author(s):  
A. P. Marston ◽  
G. Morello ◽  
P. Morris ◽  
S. Van Dyk ◽  
J. Mauerhan

AbstractThe WR stellar population can be distinguished, at least partially, from other stellar populations by broad-band IR colour selection. We present the use of a machine learning classifier to quantitatively improve the selection of Galactic Wolf-Rayet (WR) candidates. These methods are used to separate the other stellar populations which have similar IR colours. We show the results of the classifications obtained by using the 2MASS J, H and K photometric bands, and the Spitzer/IRAC bands at 3.6, 4.5, 5.8 and 8.0μm. The k-Nearest Neighbour method has been used to select Galactic WR candidates for observational follow-up. A few candidates have been spectroscopically observed. Preliminary observations suggest that a detection rate of 50% can easily be achieved.


2020 ◽  
Vol 4 (5) ◽  
Author(s):  
Sangkyu Lee ◽  
Joseph O Deasy ◽  
Jung Hun Oh ◽  
Antonio Di Meglio ◽  
Agnes Dumas ◽  
...  

Abstract Background We aimed at predicting fatigue after breast cancer treatment using machine learning on clinical covariates and germline genome-wide data. Methods We accessed germline genome-wide data of 2799 early-stage breast cancer patients from the Cancer Toxicity study (NCT01993498). The primary endpoint was defined as scoring zero at diagnosis and higher than quartile 3 at 1 year after primary treatment completion on European Organization for Research and Treatment of Cancer quality-of-life questionnaires for Overall Fatigue and on the multidimensional questionnaire for Physical, Emotional, and Cognitive fatigue. First, we tested univariate associations of each endpoint with clinical variables and genome-wide variants. Then, using preselected clinical (false discovery rate < 0.05) and genomic (P < .001) variables, a multivariable preconditioned random-forest regression model was built and validated on a hold-out subset to predict fatigue. Gene set enrichment analysis identified key biological correlates (MetaCore). All statistical tests were 2-sided. Results Statistically significant clinical associations were found only with Emotional and Cognitive Fatigue, including receipt of chemotherapy, anxiety, and pain. Some single nucleotide polymorphisms had some degree of association (P < .001) with the different fatigue endpoints, although there were no genome-wide statistically significant (P < 5.00 × 10−8) associations. Only for Cognitive Fatigue, the predictive ability of the genomic multivariable model was statistically significantly better than random (area under the curve = 0.59, P = .01) and marginally improved with clinical variables (area under the curve = 0.60, P = .005). Single nucleotide polymorphisms found to be associated (P < .001) with Cognitive Fatigue belonged to genes linked to inflammation (false discovery rate adjusted P = .03), cognitive disorders (P = 1.51 × 10−12), and synaptic transmission (P = 6.28 × 10−8). Conclusions Genomic analyses in this large cohort of breast cancer survivors suggest a possible genetic role for severe Cognitive Fatigue that warrants further exploration.


Sign in / Sign up

Export Citation Format

Share Document