Seasonal Forecast of Non-monsoonal Winter Precipitation over the Eurasian Continent using Machine Learning Models

Journal of Climate ◽

10.1175/jcli-d-21-0113.1 ◽

2021 ◽

pp. 1-42

Author(s):

QiFeng Qian ◽

XiaoJing Jia ◽

Hai Lin ◽

Ruizhi Zhang

Keyword(s):

Machine Learning ◽

Dynamic Models ◽

Winter Precipitation ◽

Seasonal Forecast ◽

Forecast Skill ◽

The Arctic ◽

Gradient Boosting ◽

Seasonal Forecasts ◽

Eurasian Continent ◽

Extreme Gradient Boosting

AbstractIn this study, four machine learning (ML) models (gradient boost decision tree (GBDT), light gradient boosting machine (LightGBM), categorical boosting (CatBoost) and extreme gradient boosting (XGBoost)) are used to perform seasonal forecasts for non-monsoonal winter precipitation over the Eurasian continent (30-60°N, 30-105°E) (NWPE). The seasonal forecast results from a traditional linear regression (LR) model and two dynamic models are compared. The ML and LR models are trained using the data for the period of 1979-2010, and then, these empirical models are used to perform the seasonal forecast of NWPE for 2011-2018. Our results show that the four ML models have reasonable seasonal forecast skills for the NWPE and clearly outperform the LR model. The ML models and the dynamic models have skillful forecasts for the NWPE over different regions. The ensemble means of the forecasts including the ML models and dynamic models show higher forecast skill for the NWEP than the ensemble mean of the dynamic-only models. The forecast skill of the ML models mainly benefits from a skillful forecast of the third empirical orthogonal function (EOF) mode (EOF3) of the NWPE, which has a good and consistent prediction among the ML models. Our results also illustrate that the sea ice over the Arctic in the previous autumn is the most important predictor in the ML models in forecasting the NWPE. This study suggests that ML models may be useful tools to help improve seasonal forecasts of the NWPE.

Download Full-text

Forecasting North America Winter Surface Air Temperature Using Machine Learning Methods

10.5194/egusphere-egu2020-4465 ◽

2020 ◽

Author(s):

Qifeng Qian ◽

Xiaojing Jia ◽

Hai Lin

Keyword(s):

Machine Learning ◽

North America ◽

Air Temperature ◽

Climate Models ◽

Surface Air Temperature ◽

Seasonal Forecast ◽

Forecast Skill ◽

Gradient Boosting ◽

Support Vector ◽

Extreme Gradient Boosting

<p>Two machine learning (ML) models (Support Vector Regression and Extreme Gradient Boosting; SVR and XGBoost hereafter) have been developed to perform seasonal forecast for the winter (December&#8211;January&#8211;February, DJF) surface air temperature (SAT) in North America (NA) in this study. The seasonal forecast skills of the two ML models are evaluated in a cross-validated fashion. Forecast results from one Linear Regression (LR and hereafter) model and two Canadian dynamic climate models are used for the purpose of a comparison. In the take-one-out hindcast experiment, the two ML models and the LR model show reasonable seasonal forecast skills for the winter SAT in NA. Comparing to the two Canadian dynamic models, the two ML models and the LR model have better forecast skill for the winter SAT over the central NA which mainly get contribution of a skillful forecast of the second Empirical Orthogonal Function (EOF) mode of winter SAT over NA. In general, the SVR model and XGBoost model hindcasts show better forecast performances than LR model. However, the LR model shows less dependence on the size of the training dataset than SVR and XGBoost models. In the real forecast experiments during the period 2011-2017, compared to the two Canadian dynamic climate models, the two ML models clearly improve the forecast skill of winter SAT over northern and central NA. The results of this study suggest that ML models may provide real-time supplementary forecast tools to improve the forecast skill and may operationally facilitate the seasonal forecast of the winter climate of NA.&#160;</p>

Download Full-text

Predicting Undesired Treatment Outcome in Mental Healthcare: Machine Learning Study (Preprint)

10.2196/preprints.17235 ◽

2019 ◽

Author(s):

Kasper Van Mens ◽

Joran Lokkerbol ◽

Richard Janssen ◽

Robert de Lange ◽

Bea Tiemens

Keyword(s):

Machine Learning ◽

Treatment Outcome ◽

Mental Health Treatment ◽

Mental Healthcare ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Trade Off ◽

Trade Offs ◽

Outcome Monitoring ◽

Extreme Gradient Boosting

BACKGROUND It remains a challenge to predict which treatment will work for which patient in mental healthcare. OBJECTIVE In this study we compare machine algorithms to predict during treatment which patients will not benefit from brief mental health treatment and present trade-offs that must be considered before an algorithm can be used in clinical practice. METHODS Using an anonymized dataset containing routine outcome monitoring data from a mental healthcare organization in the Netherlands (n = 2,655), we applied three machine learning algorithms to predict treatment outcome. The algorithms were internally validated with cross-validation on a training sample (n = 1,860) and externally validated on an unseen test sample (n = 795). RESULTS The performance of the three algorithms did not significantly differ on the test set. With a default classification cut-off at 0.5 predicted probability, the extreme gradient boosting algorithm showed the highest positive predictive value (ppv) of 0.71(0.61 – 0.77) with a sensitivity of 0.35 (0.29 – 0.41) and area under the curve of 0.78. A trade-off can be made between ppv and sensitivity by choosing different cut-off probabilities. With a cut-off at 0.63, the ppv increased to 0.87 and the sensitivity dropped to 0.17. With a cut-off of at 0.38, the ppv decreased to 0.61 and the sensitivity increased to 0.57. CONCLUSIONS Machine learning can be used to predict treatment outcomes based on routine monitoring data.This allows practitioners to choose their own trade-off between being selective and more certain versus inclusive and less certain.

Download Full-text

Evaluation of Three Different Machine Learning Methods for Object-Based Artificial Terrace Mapping—A Case Study of the Loess Plateau, China

Remote Sensing ◽

10.3390/rs13051021 ◽

2021 ◽

Vol 13 (5) ◽

pp. 1021

Author(s):

Hu Ding ◽

Jiaming Na ◽

Shangjing Jiang ◽

Jie Zhu ◽

Kai Liu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Loess Plateau ◽

Water Conservation ◽

Nearest Neighbor ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

The Loess Plateau ◽

Object Based ◽

Extreme Gradient Boosting

Artificial terraces are of great importance for agricultural production and soil and water conservation. Automatic high-accuracy mapping of artificial terraces is the basis of monitoring and related studies. Previous research achieved artificial terrace mapping based on high-resolution digital elevation models (DEMs) or imagery. As a result of the importance of the contextual information for terrace mapping, object-based image analysis (OBIA) combined with machine learning (ML) technologies are widely used. However, the selection of an appropriate classifier is of great importance for the terrace mapping task. In this study, the performance of an integrated framework using OBIA and ML for terrace mapping was tested. A catchment, Zhifanggou, in the Loess Plateau, China, was used as the study area. First, optimized image segmentation was conducted. Then, features from the DEMs and imagery were extracted, and the correlations between the features were analyzed and ranked for classification. Finally, three different commonly-used ML classifiers, namely, extreme gradient boosting (XGBoost), random forest (RF), and k-nearest neighbor (KNN), were used for terrace mapping. The comparison with the ground truth, as delineated by field survey, indicated that random forest performed best, with a 95.60% overall accuracy (followed by 94.16% and 92.33% for XGBoost and KNN, respectively). The influence of class imbalance and feature selection is discussed. This work provides a credible framework for mapping artificial terraces.

Download Full-text

Yearly evolution of Euro-Atlantic weather regimes and of their sub-seasonal predictability

Climate Dynamics ◽

10.1007/s00382-021-05679-y ◽

2021 ◽

Author(s):

Nicola Cortesi ◽

Verónica Torralba ◽

Llorenó Lledó ◽

Andrea Manrique-Suñén ◽

Nube Gonzalez-Reviriego ◽

...

Keyword(s):

Annual Cycle ◽

Winter Season ◽

Skill Score ◽

Seasonal Forecast ◽

Forecast Skill ◽

Seasonal Forecasts ◽

Sea Level Pressure ◽

Weather Regimes ◽

Start Dates ◽

Weekly Frequency

AbstractIt is often assumed that weather regimes adequately characterize atmospheric circulation variability. However, regime classifications spanning many months and with a low number of regimes may not satisfy this assumption. The first aim of this study is to test such hypothesis for the Euro-Atlantic region. The second one is to extend the assessment of sub-seasonal forecast skill in predicting the frequencies of occurrence of the regimes beyond the winter season. Two regime classifications of four regimes each were obtained from sea level pressure anomalies clustered from October to March and from April to September respectively. Their spatial patterns were compared with those representing the annual cycle. Results highlight that the two regime classifications are able to reproduce most part of the patterns of the annual cycle, except during the transition weeks between the two periods, when patterns of the annual cycle resembling Atlantic Low regime are not also observed in any of the two classifications. Forecast skill of Atlantic Low was found to be similar to that of NAO+, the regime replacing Atlantic Low in the two classifications. Thus, although clustering yearly circulation data in two periods of 6 months each introduces a few deviations from the annual cycle of the regime patterns, it does not negatively affect sub-seasonal forecast skill. Beyond the winter season and the first ten forecast days, sub-seasonal forecasts of ECMWF are still able to achieve weekly frequency correlations of r = 0.5 for some regimes and start dates, including summer ones. ECMWF forecasts beat climatological forecasts in case of long-lasting regime events, and when measured by the fair continuous ranked probability skill score, but not when measured by the Brier skill score. Thus, more efforts have to be done yet in order to achieve minimum skill necessary to develop forecast products based on weather regimes outside winter season.

Download Full-text

A Machine Learning Method for Predicting Vegetation Indices in China

Remote Sensing ◽

10.3390/rs13061147 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1147

Author(s):

Xiangqian Li ◽

Wenping Yuan ◽

Wenjie Dong

Keyword(s):

Machine Learning ◽

Growing Season ◽

Crop Growth ◽

Spatiotemporal Distribution ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Severe Drought ◽

Vegetation Growth ◽

Extreme Gradient Boosting ◽

Boosting Method

To forecast the terrestrial carbon cycle and monitor food security, vegetation growth must be accurately predicted; however, current process-based ecosystem and crop-growth models are limited in their effectiveness. This study developed a machine learning model using the extreme gradient boosting method to predict vegetation growth throughout the growing season in China from 2001 to 2018. The model used satellite-derived vegetation data for the first month of each growing season, CO2 concentration, and several meteorological factors as data sources for the explanatory variables. Results showed that the model could reproduce the spatiotemporal distribution of vegetation growth as represented by the satellite-derived normalized difference vegetation index (NDVI). The predictive error for the growing season NDVI was less than 5% for more than 98% of vegetated areas in China; the model represented seasonal variations in NDVI well. The coefficient of determination (R2) between the monthly observed and predicted NDVI was 0.83, and more than 69% of vegetated areas had an R2 > 0.8. The effectiveness of the model was examined for a severe drought year (2009), and results showed that the model could reproduce the spatiotemporal distribution of NDVI even under extreme conditions. This model provides an alternative method for predicting vegetation growth and has great potential for monitoring vegetation dynamics and crop growth.

Download Full-text

Machine learning models to identify low adherence to influenza vaccination among Korean adults with cardiovascular disease

BMC Cardiovascular Disorders ◽

10.1186/s12872-021-01925-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Moojung Kim ◽

Young Jae Kim ◽

Sung Jin Park ◽

Kwang Gi Kim ◽

Pyung Chun Oh ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Influenza Vaccination ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Age Group ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.

Download Full-text

Prediction of population behavior of Listeria monocytogenes in food using machine learning and a microbial growth and survival database

Scientific Reports ◽

10.1038/s41598-021-90164-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Satoko Hiura ◽

Shige Koseki ◽

Kento Koyama

Keyword(s):

Machine Learning ◽

Data Mining ◽

Listeria Monocytogenes ◽

Water Activity ◽

Bacterial Population ◽

Gradient Boosting ◽

Initial Cell ◽

Data Mining Approach ◽

Cell Counts ◽

Extreme Gradient Boosting

AbstractIn predictive microbiology, statistical models are employed to predict bacterial population behavior in food using environmental factors such as temperature, pH, and water activity. As the amount and complexity of data increase, handling all data with high-dimensional variables becomes a difficult task. We propose a data mining approach to predict bacterial behavior using a database of microbial responses to food environments. Listeria monocytogenes, which is one of pathogens, population growth and inactivation data under 1,007 environmental conditions, including five food categories (beef, culture medium, pork, seafood, and vegetables) and temperatures ranging from 0 to 25 °C, were obtained from the ComBase database (www.combase.cc). We used eXtreme gradient boosting tree, a machine learning algorithm, to predict bacterial population behavior from eight explanatory variables: ‘time’, ‘temperature’, ‘pH’, ‘water activity’, ‘initial cell counts’, ‘whether the viable count is initial cell number’, and two types of categories regarding food. The root mean square error of the observed and predicted values was approximately 1.0 log CFU regardless of food category, and this suggests the possibility of predicting viable bacterial counts in various foods. The data mining approach examined here will enable the prediction of bacterial population behavior in food by identifying hidden patterns within a large amount of data.

Download Full-text

Development and validation of a difficult laryngoscopy prediction model using machine learning of neck circumference and thyromental height

BMC Anesthesiology ◽

10.1186/s12871-021-01343-4 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Jong Ho Kim ◽

Haewon Kim ◽

Ji Su Jang ◽

Sung Mi Hwang ◽

So Young Lim ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Confidence Interval ◽

Neck Circumference ◽

Difficult Laryngoscopy ◽

Gradient Boosting ◽

Test Set ◽

Equal Distribution ◽

Light Gradient ◽

Extreme Gradient Boosting

Abstract Background Predicting difficult airway is challengeable in patients with limited airway evaluation. The aim of this study is to develop and validate a model that predicts difficult laryngoscopy by machine learning of neck circumference and thyromental height as predictors that can be used even for patients with limited airway evaluation. Methods Variables for prediction of difficulty laryngoscopy included age, sex, height, weight, body mass index, neck circumference, and thyromental distance. Difficult laryngoscopy was defined as Grade 3 and 4 by the Cormack-Lehane classification. The preanesthesia and anesthesia data of 1677 patients who had undergone general anesthesia at a single center were collected. The data set was randomly stratified into a training set (80%) and a test set (20%), with equal distribution of difficulty laryngoscopy. The training data sets were trained with five algorithms (logistic regression, multilayer perceptron, random forest, extreme gradient boosting, and light gradient boosting machine). The prediction models were validated through a test set. Results The model’s performance using random forest was best (area under receiver operating characteristic curve = 0.79 [95% confidence interval: 0.72–0.86], area under precision-recall curve = 0.32 [95% confidence interval: 0.27–0.37]). Conclusions Machine learning can predict difficult laryngoscopy through a combination of several predictors including neck circumference and thyromental height. The performance of the model can be improved with more data, a new variable and combination of models.

Download Full-text

Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival

Scientific Reports ◽

10.1038/s41598-021-86327-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Arturo Moncada-Torres ◽

Marissa C. van Maaren ◽

Mathijs P. Hendriks ◽

Sabine Siesling ◽

Gijs Geleijnse

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Explicit Knowledge ◽

Cox Regression ◽

Metastatic Breast ◽

Gradient Boosting ◽

Support Vector ◽

Netherlands Cancer Registry ◽

Extreme Gradient Boosting ◽

The Impact

AbstractCox Proportional Hazards (CPH) analysis is the standard for survival analysis in oncology. Recently, several machine learning (ML) techniques have been adapted for this task. Although they have shown to yield results at least as good as classical methods, they are often disregarded because of their lack of transparency and little to no explainability, which are key for their adoption in clinical settings. In this paper, we used data from the Netherlands Cancer Registry of 36,658 non-metastatic breast cancer patients to compare the performance of CPH with ML techniques (Random Survival Forests, Survival Support Vector Machines, and Extreme Gradient Boosting [XGB]) in predicting survival using the $$c$$ c -index. We demonstrated that in our dataset, ML-based models can perform at least as good as the classical CPH regression ($$c$$ c -index $$\sim \,0.63$$ ∼ 0.63 ), and in the case of XGB even better ($$c$$ c -index $$\sim 0.73$$ ∼ 0.73 ). Furthermore, we used Shapley Additive Explanation (SHAP) values to explain the models’ predictions. We concluded that the difference in performance can be attributed to XGB’s ability to model nonlinearities and complex interactions. We also investigated the impact of specific features on the models’ predictions as well as their corresponding insights. Lastly, we showed that explainable ML can generate explicit knowledge of how models make their predictions, which is crucial in increasing the trust and adoption of innovative ML techniques in oncology and healthcare overall.

Download Full-text

Seasonal Forecasts of the Twentieth Century

Bulletin of the American Meteorological Society ◽

10.1175/bams-d-19-0019.1 ◽

2020 ◽

Vol 101 (8) ◽

pp. E1413-E1426 ◽

Cited By ~ 2

Author(s):

Antje Weisheimer ◽

Daniel J. Befort ◽

Dave MacLeod ◽

Tim Palmer ◽

Chris O’Reilly ◽

...

Keyword(s):

Twentieth Century ◽

Large Scale ◽

Seasonal Forecast ◽

Forecast Skill ◽

Seasonal Forecasts ◽

Climate Anomalies ◽

Global Circulation Models ◽

Skill Scores ◽

Physically Based ◽

Forecast Models

Abstract Forecasts of seasonal climate anomalies using physically based global circulation models are routinely made at operational meteorological centers around the world. A crucial component of any seasonal forecast system is the set of retrospective forecasts, or hindcasts, from past years that are used to estimate skill and to calibrate the forecasts. Hindcasts are usually produced over a period of around 20–30 years. However, recent studies have demonstrated that seasonal forecast skill can undergo pronounced multidecadal variations. These results imply that relatively short hindcasts are not adequate for reliably testing seasonal forecasts and that small hindcast sample sizes can potentially lead to skill estimates that are not robust. Here we present new and unprecedented 110-year-long coupled hindcasts of the next season over the period 1901–2010. Their performance for the recent period is in good agreement with those of operational forecast models. While skill for ENSO is very high during recent decades, it is markedly reduced during the 1930s–1950s. Skill at the beginning of the twentieth century is, however, as high as for recent high-skill periods. Consistent with findings in atmosphere-only hindcasts, a midcentury drop in forecast skill is found for a range of atmospheric fields, including large-scale indices such as the NAO and the PNA patterns. As with ENSO, skill scores for these indices recover in the early twentieth century, suggesting that the midcentury drop in skill is not due to a lack of good observational data. A public dissemination platform for our hindcast data is available, and we invite the scientific community to explore them.

Download Full-text