Machine learning methods and predictive ability metrics for genome-wide prediction of complex traits

2014 ◽  
Vol 166 ◽  
pp. 217-231 ◽  
Author(s):  
Oscar González-Recio ◽  
Guilherme J.M. Rosa ◽  
Daniel Gianola
2020 ◽  
Vol 98 (6) ◽  
Author(s):  
Anderson Antonio Carvalho Alves ◽  
Rebeka Magalhães da Costa ◽  
Tiago Bresolin ◽  
Gerardo Alves Fernandes Júnior ◽  
Rafael Espigolan ◽  
...  

Abstract The aim of this study was to compare the predictive performance of the Genomic Best Linear Unbiased Predictor (GBLUP) and machine learning methods (Random Forest, RF; Support Vector Machine, SVM; Artificial Neural Network, ANN) in simulated populations presenting different levels of dominance effects. Simulated genome comprised 50k SNP and 300 QTL, both biallelic and randomly distributed across 29 autosomes. A total of six traits were simulated considering different values for the narrow and broad-sense heritability. In the purely additive scenario with low heritability (h2 = 0.10), the predictive ability obtained using GBLUP was slightly higher than the other methods whereas ANN provided the highest accuracies for scenarios with moderate heritability (h2 = 0.30). The accuracies of dominance deviations predictions varied from 0.180 to 0.350 in GBLUP extended for dominance effects (GBLUP-D), from 0.06 to 0.185 in RF and they were null using the ANN and SVM methods. Although RF has presented higher accuracies for total genetic effect predictions, the mean-squared error values in such a model were worse than those observed for GBLUP-D in scenarios with large additive and dominance variances. When applied to prescreen important regions, the RF approach detected QTL with high additive and/or dominance effects. Among machine learning methods, only the RF was capable to cover implicitly dominance effects without increasing the number of covariates in the model, resulting in higher accuracies for the total genetic and phenotypic values as the dominance ratio increases. Nevertheless, whether the interest is to infer directly on dominance effects, GBLUP-D could be a more suitable method.


2020 ◽  
Vol 79 (Suppl 1) ◽  
pp. 598.2-598
Author(s):  
E. Myasoedova ◽  
A. Athreya ◽  
C. S. Crowson ◽  
R. Weinshilboum ◽  
L. Wang ◽  
...  

Background:Methotrexate (MTX) is the most common anchor drug for rheumatoid arthritis (RA), but the risk of missing the opportunity for early effective treatment with alternative medications is substantial given the delayed onset of MTX action and 30-40% inadequate response rate. There is a compelling need to accurately predicting MTX response prior to treatment initiation, which allows for effectively identifying patients at RA onset who are likely to respond to MTX.Objectives:To test the ability of machine learning approaches with clinical and genomic biomarkers to predict MTX response with replications in independent samples.Methods:Age, sex, clinical, serological and genome-wide association study (GWAS) data on patients with early RA of European ancestry from 647 patients (336 recruited in United Kingdom [UK]; 307 recruited across Europe; 70% female; 72% rheumatoid factor [RF] positive; mean age 54 years; mean baseline Disease Activity Score with 28-joint count [DAS28] 5.65) of the PhArmacogenetics of Methotrexate in RA (PAMERA) consortium was used in this study. The genomics data comprised 160 genome-wide significant single nucleotide polymorphisms (SNPs) with p<1×10-5 associated with risk of RA and MTX metabolism. DAS28 score was available at baseline and 3-month follow-up visit. Response to MTX monotherapy at the dose of ≥15 mg/week was defined as good or moderate by the EULAR response criteria at 3 months’ follow up visit. Supervised machine-learning methods were trained with 5-repeats and 10-fold cross-validation using data from PAMERA’s 336 UK patients. Class imbalance (higher % of MTX responders) in training was accounted by using simulated minority oversampling technique. Prediction performance was validated in PAMERA’s 307 European patients (not used in training).Results:Age, sex, RF positivity and baseline DAS28 data predicted MTX response with 58% accuracy of UK and European patients (p = 0.7). However, supervised machine-learning methods that combined demographics, RF positivity, baseline DAS28 and genomic SNPs predicted EULAR response at 3 months with area under the receiver operating curve (AUC) of 0.83 (p = 0.051) in UK patients, and achieved prediction accuracies (fraction of correctly predicted outcomes) of 76.2% (p = 0.054) in the European patients, with sensitivity of 72% and specificity of 77%. The addition of genomic data improved the predictive accuracies of MTX response by 19% and achieved cross-site replication. Baseline DAS28 scores and following SNPs rs12446816, rs13385025, rs113798271, and rs2372536 were among the top predictors of MTX response.Conclusion:Pharmacogenomic biomarkers combined with DAS28 scores predicted MTX response in patients with early RA more reliably than using demographics and DAS28 scores alone. Using pharmacogenomics biomarkers for identification of MTX responders at early stages of RA may help to guide effective RA treatment choices, including timely escalation of RA therapies. Further studies on personalized prediction of response to MTX and other anti-rheumatic treatments are warranted to optimize control of RA disease and improve outcomes in patients with RA.Disclosure of Interests:Elena Myasoedova: None declared, Arjun Athreya: None declared, Cynthia S. Crowson Grant/research support from: Pfizer research grant, Richard Weinshilboum Shareholder of: co-founder and stockholder in OneOme, Liewei Wang: None declared, Eric Matteson Grant/research support from: Pfizer, Consultant of: Boehringer Ingelheim, Gilead, TympoBio, Arena Pharmaceuticals, Speakers bureau: Simply Speaking


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Alberto Romagnoni ◽  
◽  
Simon Jégou ◽  
Kristel Van Steen ◽  
Gilles Wainrib ◽  
...  

2021 ◽  
Author(s):  
Jiangong Zhu ◽  
Yuan Huang ◽  
Michael Knapp ◽  
Xinhua Liu ◽  
Yixiu Wang ◽  
...  

Abstract Accurate capacity estimation is critical for reliable and safe operation of lithium-ion batteries. A proposed approach exploiting features from the relaxation voltage curve enables battery capacity estimation without requiring previous cycling information. Machine learning methods are used in the approach. A dataset including 27,330 data units are collected from batteries with LiNi0.86Co0.11Al0.03O2 cathode (NCA battery) cycled at different temperatures and currents until reaching about 71% of their nominal capacity. One data unit comprises three statistical features (variance, skewness, and maxima) derived from the relaxation voltage curve after fully charging and the following discharge capacity for verification. Models adopting machine learning methods, i.e., ElasticNet, XGBoost, Support Vector Regression (SVR), and Deep Neural Network (DNN), are compared to estimate the battery capacity. Both XGBoost and SVR methods show good predictive ability with 1.1 % root-mean-square error (RMSE). The DNN method presents a 1.5% RMSE higher than that obtained using ElasticNet and SVR. 30,312 data units are extracted from batteries with LiNi0.83Co0.11Mn0.07O2 cathode (NCM battery). The model trained by the NCA battery dataset is verified on the NCM battery dataset without changing model weights. The test RMSE is 3.1% for the XGBoost method and 1.8% RMSE for the DNN method, indicating the generalizability of the capacity estimation approach utilizing battery voltage relaxation.


Author(s):  
Yumiao Wang ◽  
Xueling Wu ◽  
Zhangjian Chen ◽  
Fu Ren ◽  
Luwei Feng ◽  
...  

The main goal of this study was to use the synthetic minority oversampling technique (SMOTE) to expand the quantity of landslide samples for machine learning methods (i.e., support vector machine (SVM), logistic regression (LR), artificial neural network (ANN), and random forest (RF)) to produce high-quality landslide susceptibility maps for Lishui City in Zhejiang Province, China. Landslide-related factors were extracted from topographic maps, geological maps, and satellite images. Twelve factors were selected as independent variables using correlation coefficient analysis and the neighborhood rough set (NRS) method. In total, 288 soil landslides were mapped using field surveys, historical records, and satellite images. The landslides were randomly divided into two datasets: 70% of all landslides were selected as the original training dataset and 30% were used for validation. Then, SMOTE was employed to generate datasets with sizes ranging from two to thirty times that of the training dataset to establish and compare the four machine learning methods for landslide susceptibility mapping. In addition, we used slope units to subdivide the terrain to determine the landslide susceptibility. Finally, the landslide susceptibility maps were validated using statistical indexes and the area under the curve (AUC). The results indicated that the performances of the four machine learning methods showed different levels of improvement as the sample sizes increased. The RF model exhibited a more substantial improvement (AUC improved by 24.12%) than did the ANN (18.94%), SVM (17.77%), and LR (3.00%) models. Furthermore, the ANN model achieved the highest predictive ability (AUC = 0.98), followed by the RF (AUC = 0.96), SVM (AUC = 0.94), and LR (AUC = 0.79) models. This approach significantly improves the performance of machine learning techniques for landslide susceptibility mapping, thereby providing a better tool for reducing the impacts of landslide disasters.


2014 ◽  
Vol 12 (2) ◽  
pp. 313 ◽  
Author(s):  
Alberto Gonzalez-Sanchez ◽  
Juan Frausto-Solis ◽  
Waldo Ojeda-Bustamante

Sign in / Sign up

Export Citation Format

Share Document