Machine Learning for Prediction of Stable Warfarin Dose in US Latinos and Latin Americans

Populations used to create warfarin dose prediction algorithms largely lacked participants reporting Hispanic or Latino ethnicity. While previous research suggests nonlinear modeling improves warfarin dose prediction, this research has mainly focused on populations with primarily European ancestry. We compare the accuracy of stable warfarin dose prediction using linear and nonlinear machine learning models in a large cohort enriched for US Latinos and Latin Americans (ULLA). Each model was tested using the same variables as published by the International Warfarin Pharmacogenetics Consortium (IWPC) and using an expanded set of variables including ethnicity and warfarin indication. We utilized a multiple linear regression model and three nonlinear regression models: Bayesian Additive Regression Trees, Multivariate Adaptive Regression Splines, and Support Vector Regression. We compared each model’s ability to predict stable warfarin dose within 20% of actual stable dose, confirming trained models in a 30% testing dataset with 100 rounds of resampling. In all patients (n = 7,030), inclusion of additional predictor variables led to a small but significant improvement in prediction of dose relative to the IWPC algorithm (47.8 versus 46.7% in IWPC, p = 1.43 × 10−15). Nonlinear models using IWPC variables did not significantly improve prediction of dose over the linear IWPC algorithm. In ULLA patients alone (n = 1,734), IWPC performed similarly to all other linear and nonlinear pharmacogenetic algorithms. Our results reinforce the validity of IWPC in a large, ethnically diverse population and suggest that additional variables that capture warfarin dose variability may improve warfarin dose prediction algorithms.

Download Full-text

Comparison of Ensemble Machine Learning Methods for Soil Erosion Pin Measurements

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10010042 ◽

2021 ◽

Vol 10 (1) ◽

pp. 42

Author(s):

Kieu Anh Nguyen ◽

Walter Chen ◽

Bor-Shiun Lin ◽

Uma Seeboonruang

Keyword(s):

Machine Learning ◽

Soil Erosion ◽

Ensemble Methods ◽

Machine Learning Algorithms ◽

Multivariate Adaptive Regression Splines ◽

Gradient Boosting ◽

Support Vector ◽

Ensemble Machine Learning ◽

Boosting Method ◽

Bagging Method

Although machine learning has been extensively used in various fields, it has only recently been applied to soil erosion pin modeling. To improve upon previous methods of quantifying soil erosion based on erosion pin measurements, this study explored the possible application of ensemble machine learning algorithms to the Shihmen Reservoir watershed in northern Taiwan. Three categories of ensemble methods were considered in this study: (a) Bagging, (b) boosting, and (c) stacking. The bagging method in this study refers to bagged multivariate adaptive regression splines (bagged MARS) and random forest (RF), and the boosting method includes Cubist and gradient boosting machine (GBM). Finally, the stacking method is an ensemble method that uses a meta-model to combine the predictions of base models. This study used RF and GBM as the meta-models, decision tree, linear regression, artificial neural network, and support vector machine as the base models. The dataset used in this study was sampled using stratified random sampling to achieve a 70/30 split for the training and test data, and the process was repeated three times. The performance of six ensemble methods in three categories was analyzed based on the average of three attempts. It was found that GBM performed the best among the ensemble models with the lowest root-mean-square error (RMSE = 1.72 mm/year), the highest Nash-Sutcliffe efficiency (NSE = 0.54), and the highest index of agreement (d = 0.81). This result was confirmed by the spatial comparison of the absolute differences (errors) between model predictions and observations using GBM and RF in the study area. In summary, the results show that as a group, the bagging method and the boosting method performed equally well, and the stacking method was third for the erosion pin dataset considered in this study.

Download Full-text

Performance Evaluation of Warfarin Dose Prediction Algorithms and Effects of Clinical Factors on Warfarin Dose in Chinese Patients

Therapeutic Drug Monitoring ◽

10.1097/ftd.0000000000000880 ◽

2021 ◽

Vol Publish Ahead of Print ◽

Author(s):

Weiqi Gao ◽

Zhihong Li ◽

Weihong Chen ◽

Shuqiu Zhang

Keyword(s):

Performance Evaluation ◽

Warfarin Dose ◽

Chinese Patients ◽

Clinical Factors ◽

Dose Prediction ◽

Prediction Algorithms

Download Full-text

Predicting Benzene Concentration Using Machine Learning and Time Series Algorithms

Mathematics ◽

10.3390/math8122205 ◽

2020 ◽

Vol 8 (12) ◽

pp. 2205

Author(s):

Luis Alfonso Menéndez García ◽

Fernando Sánchez Lasheras ◽

Paulino José García Nieto ◽

Laura Álvarez de Prado ◽

Antonio Bernardo Sánchez

Keyword(s):

Machine Learning ◽

Time Series ◽

Moving Average ◽

Environmental Pollutants ◽

Multivariate Adaptive Regression Splines ◽

Support Vector ◽

Learning Models ◽

Vector Autoregressive ◽

Benzene Concentration ◽

Machine Learning Models

Benzene is a pollutant which is very harmful to our health, so models are necessary to predict its concentration and relationship with other air pollutants. The data collected by eight stations in Madrid (Spain) over nine years were analyzed using the following regression-based machine learning models: multivariate linear regression (MLR), multivariate adaptive regression splines (MARS), multilayer perceptron neural network (MLP), support vector machines (SVM), autoregressive integrated moving-average (ARIMA) and vector autoregressive moving-average (VARMA) models. Benzene concentration predictions were made from the concentration of four environmental pollutants: nitrogen dioxide (NO2), nitrogen oxides (NOx), particulate matter (PM10) and toluene (C7H8), and the performance measures of the model were studied from the proposed models. In general, regression-based machine learning models are more effective at predicting than time series models.

Download Full-text

High-dimensional pharmacogenetic prediction of a continuous trait using machine learning techniques with application to warfarin dose prediction in African Americans

Bioinformatics ◽

10.1093/bioinformatics/btr159 ◽

2011 ◽

Vol 27 (10) ◽

pp. 1384-1389 ◽

Cited By ~ 37

Author(s):

Erdal Cosgun ◽

Nita A. Limdi ◽

Christine W. Duarte

Keyword(s):

Machine Learning ◽

African Americans ◽

Warfarin Dose ◽

Machine Learning Techniques ◽

High Dimensional ◽

Continuous Trait ◽

Dose Prediction ◽

Learning Techniques

Download Full-text

Latest Advances in Fractional Snow Cover Mapping on MODIS Data by Machine Learning Algorithms

10.5194/egusphere-egu2020-13193 ◽

2020 ◽

Author(s):

Semih Kuter ◽

Zuhal Akyurek

Keyword(s):

Machine Learning ◽

Snow Cover ◽

General Circulation ◽

Snow Water Equivalent ◽

Machine Learning Algorithms ◽

Multivariate Adaptive Regression Splines ◽

Support Vector ◽

Landsat 8 ◽

European Alps ◽

Fractional Snow Cover

<p>Spatial extent of snow has been declared as an essential climate variable. Accurate modeling of snow cover is crucial for the better prediction of snow water equivalent and, consequently, for the success of general circulation and weather forecasting models as well as climate change and hydrological studies. This presentation mainly focuses on the representation of the latest findings of our efforts in fractional snow cover mapping on MODIS images by data-driven machine learning methodologies. For this purpose, a dataset composed of 20 MODIS - Landsat 8 image pairs acquired between Apr 2013 and Dec 2016 over European Alps were employed. Artificial neural networks (ANN), multivariate adaptive regression splines (MARS), support vector regression (SVR) and random forest (RF) models were trained and tested by using reference FSC maps generated from higher spatial resolution Landsat 8 binary snow maps. ANN, MARS, SVR and RF models exhibited quite good performance with average R &#8776; 0.93, whereas the agreement between the reference FSC maps and the MODIS&#8217; own product MOD10A1 (C5) was slightly poorer with R &#8776; 0.88.</p>

Download Full-text

High Spatial Resolution Topsoil Organic Matter Content Mapping Across Desertified Land in Northern China

Frontiers in Environmental Science ◽

10.3389/fenvs.2021.668912 ◽

2021 ◽

Vol 9 ◽

Author(s):

Yang Junting ◽

Li Xiaosong ◽

Wu Bo ◽

Wu Junjun ◽

Sun Bin ◽

...

Keyword(s):

Machine Learning ◽

Organic Matter ◽

Spatial Resolution ◽

High Spatial Resolution ◽

Organic Matter Content ◽

Multiple Linear Regression Model ◽

Northern China ◽

Google Earth ◽

Support Vector ◽

Combating Desertification

Soil organic matter (SOM) content is an effective indicator of desertification; thus, monitoring its spatial‒temporal changes on a large scale is important for combating desertification. However, mapping SOM content in desertified land is challenging owing to the heterogeneous landscape, relatively low SOM content and vegetation coverage. Here, we modeled the SOM content in topsoil (0–20 cm) of desertified land in northern China by employing a high spatial resolution dataset and machine learning methods, with an emphasis on quarterly green and non-photosynthetic vegetation information, based on the Google Earth Engine (GEE). The results show: 1) the machine learning model performed better than the traditional multiple linear regression model (MLR) for SOM content estimation, and the Random Forest (RF) model was more accurate than the Support Vector Machine (SVM) model; 2) the quarterly information regarding green vegetation and non-photosynthetic were identified as key covariates for estimating the SOM content in desertified land, and an obvious improvement could be observed after simultaneously combining the Dead Fuel Index (DFI) and Normalized Difference Vegetation Index (NDVI) of the four quarters (R2 increased by 0.06, the root mean square error decreased by 0.05, the ratio of prediction deviation increased by 0.2, and the ratio of performance to interquartile distance increased by 0.5). In particular, the effects of the DFI in Q1 (the first quarter) and Q2 (the second quarter) on estimating low SOM content (<1%) were identified; finally, a timely (2019) and high spatial resolution (30 m) SOM content map for the desertified land in northern China was drawn which shows obvious advantages over existing SOM products, thus providing key data support for monitoring and combating desertification.

Download Full-text

Applied Machine Learning for Spine Surgeons: Predicting Outcome for Patients Undergoing Treatment for Lumbar Disc Herniation Using PRO Data

Global Spine Journal ◽

10.1177/2192568220967643 ◽

2020 ◽

pp. 219256822096764

Author(s):

Casper Friis Pedersen ◽

Mikkel Østerheden Andersen ◽

Leah Yacat Carreon ◽

Søren Eiskjær

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Lumbar Disc Herniation ◽

Disc Herniation ◽

Lumbar Disc ◽

Multivariate Adaptive Regression Splines ◽

Superior Performance ◽

Support Vector ◽

Conventional Models

Study Design: Retrospective/prospective study. Objective: Models based on preoperative factors can predict patients’ outcome at 1-year follow-up. This study measures the performance of several machine learning (ML) models and compares the results with conventional methods. Methods: Inclusion criteria were patients who had lumbar disc herniation (LDH) surgery, identified in the Danish national registry for spine surgery. Initial training of models included 16 independent variables, including demographics and presurgical patient-reported measures. Patients were grouped by reaching minimal clinically important difference or not for EuroQol, Oswestry Disability Index, Visual Analog Scale (VAS) Leg, and VAS Back and by their ability to return to work at 1 year follow-up. Data were randomly split into training, validation, and test sets by 50%/35%/15%. Deep learning, decision trees, random forest, boosted trees, and support vector machines model were trained, and for comparison, multivariate adaptive regression splines (MARS) and logistic regression models were used. Model fit was evaluated by inspecting area under the curve curves and performance during validation. Results: Seven models were arrived at. Classification errors were within ±1% to 4% SD across validation folds. ML did not yield superior performance compared with conventional models. MARS and deep learning performed consistently well. Discrepancy was greatest among VAS Leg models. Conclusions: Five predictive ML and 2 conventional models were developed, predicting improvement for LDH patients at the 1-year follow-up. We demonstrate that it is possible to build an ensemble of models with little effort as a starting point for further model optimization and selection.

Download Full-text

Genotype-Guided vs Clinically-Guided Stable Warfarin Dose Prediction and Stable Dose Establishment In A Predominantly Non-European Ancestry Population

Expert Review of Precision Medicine and Drug Development ◽

10.1080/23808993.2021.1989303 ◽

2021 ◽

pp. 1-5

Author(s):

Annesti F. Elmasri ◽

Heejin Hur ◽

Jin Han ◽

James C. Lee

Keyword(s):

Warfarin Dose ◽

European Ancestry ◽

Stable Dose ◽

Dose Prediction

Download Full-text

Developing a Novel Machine Learning-Based Classification Scheme for Predicting SPCs in Colorectal Cancer Survivors

Applied Sciences ◽

10.3390/app10041355 ◽

2020 ◽

Vol 10 (4) ◽

pp. 1355 ◽

Cited By ~ 1

Author(s):

Wen-Chien Ting ◽

Horng-Rong Chang ◽

Chi-Chang Chang ◽

Chi-Jie Lu

Keyword(s):

Colorectal Cancer ◽

Machine Learning ◽

Risk Factors ◽

Primary Site ◽

Multivariate Adaptive Regression Splines ◽

Support Vector ◽

Second Primary ◽

Primary Tumors ◽

Machine Learning Classification ◽

Extreme Gradient Boosting

Colorectal cancer is ranked third and fourth in terms of mortality and cancer incidence in the world. While advances in treatment strategies have provided cancer patients with longer survival, potentially harmful second primary cancers can occur. Therefore, second primary colorectal cancer analysis is an important issue with regard to clinical management. In this study, a novel predictive scheme was developed for predicting the risk factors associated with second colorectal cancer in patients with colorectal cancer by integrating five machine learning classification techniques, including support vector machine, random forest, multivariate adaptive regression splines, extreme learning machine, and extreme gradient boosting. A total of 4287 patients in the datasets provided by three hospital tumor registries were used. Our empirical results revealed that this proposed predictive scheme provided promising classification results and the identification of important risk factors for predicting second colorectal cancer based on accuracy, sensitivity, specificity, and area under the curve metrics. Collectively, our clinical findings suggested that the most important risk factors were the combined stage, age at diagnosis, BMI, surgical margins of the primary site, tumor size, sex, regional lymph nodes positive, grade/differentiation, primary site, and drinking behavior. Accordingly, these risk factors should be monitored for the early detection of second primary tumors in order to improve treatment and intervention strategies.

Download Full-text

An Evaluation of Eight Machine Learning Regression Algorithms for Forest Aboveground Biomass Estimation from Multiple Satellite Data Products

Remote Sensing ◽

10.3390/rs12244015 ◽

2020 ◽

Vol 12 (24) ◽

pp. 4015

Author(s):

Yuzhen Zhang ◽

Jun Ma ◽

Shunlin Liang ◽

Xisheng Li ◽

Manyao Li

Keyword(s):

Machine Learning ◽

Satellite Data ◽

Aboveground Biomass ◽

Random Sampling ◽

Stratified Sampling ◽

Multivariate Adaptive Regression Splines ◽

Support Vector ◽

Regression Algorithms ◽

Ensemble Algorithms ◽

Forest Aboveground Biomass

This study provided a comprehensive evaluation of eight machine learning regression algorithms for forest aboveground biomass (AGB) estimation from satellite data based on leaf area index, canopy height, net primary production, and tree cover data, as well as climatic and topographical data. Some of these algorithms have not been commonly used for forest AGB estimation such as the extremely randomized trees, stochastic gradient boosting, and categorical boosting (CatBoost) regression. For each algorithm, its hyperparameters were optimized using grid search with cross-validation, and the optimal AGB model was developed using the training dataset (80%) and AGB was predicted on the test dataset (20%). Performance metrics, feature importance as well as overestimation and underestimation were considered as indicators for evaluating the performance of an algorithm. To reduce the impacts of the random training-test data split and sampling method on the performance, the above procedures were repeated 50 times for each algorithm under the random sampling, the stratified sampling, and separate modeling scenarios. The results showed that five tree-based ensemble algorithms performed better than the three nonensemble algorithms (multivariate adaptive regression splines, support vector regression, and multilayer perceptron), and the CatBoost algorithm outperformed the other algorithms for AGB estimation. Compared with the random sampling scenario, the stratified sampling scenario and separate modeling did not significantly improve the AGB estimates, but modeling AGB for each forest type separately provided stable results in terms of the contributions of the predictor variables to the AGB estimates. All the algorithms showed forest AGB were underestimated when the AGB values were larger than 210 Mg/ha and overestimated when the AGB values were less than 120 Mg/ha. This study highlighted the capability of ensemble algorithms to improve AGB estimates and the necessity of improving AGB estimates for high and low AGB levels in future studies.

Download Full-text