Seasonal Forecast of Non-monsoonal Winter Precipitation over the Eurasian Continent using Machine Learning Models

2021 ◽  
pp. 1-42
Author(s):  
QiFeng Qian ◽  
XiaoJing Jia ◽  
Hai Lin ◽  
Ruizhi Zhang

AbstractIn this study, four machine learning (ML) models (gradient boost decision tree (GBDT), light gradient boosting machine (LightGBM), categorical boosting (CatBoost) and extreme gradient boosting (XGBoost)) are used to perform seasonal forecasts for non-monsoonal winter precipitation over the Eurasian continent (30-60°N, 30-105°E) (NWPE). The seasonal forecast results from a traditional linear regression (LR) model and two dynamic models are compared. The ML and LR models are trained using the data for the period of 1979-2010, and then, these empirical models are used to perform the seasonal forecast of NWPE for 2011-2018. Our results show that the four ML models have reasonable seasonal forecast skills for the NWPE and clearly outperform the LR model. The ML models and the dynamic models have skillful forecasts for the NWPE over different regions. The ensemble means of the forecasts including the ML models and dynamic models show higher forecast skill for the NWEP than the ensemble mean of the dynamic-only models. The forecast skill of the ML models mainly benefits from a skillful forecast of the third empirical orthogonal function (EOF) mode (EOF3) of the NWPE, which has a good and consistent prediction among the ML models. Our results also illustrate that the sea ice over the Arctic in the previous autumn is the most important predictor in the ML models in forecasting the NWPE. This study suggests that ML models may be useful tools to help improve seasonal forecasts of the NWPE.

2020 ◽  
Author(s):  
Qifeng Qian ◽  
Xiaojing Jia ◽  
Hai Lin

<p>Two machine learning (ML) models (Support Vector Regression and Extreme Gradient Boosting; SVR and XGBoost hereafter) have been developed to perform seasonal forecast for the winter (December–January–February, DJF) surface air temperature (SAT) in North America (NA) in this study. The seasonal forecast skills of the two ML models are evaluated in a cross-validated fashion. Forecast results from one Linear Regression (LR and hereafter) model and two Canadian dynamic climate models are used for the purpose of a comparison. In the take-one-out hindcast experiment, the two ML models and the LR model show reasonable seasonal forecast skills for the winter SAT in NA. Comparing to the two Canadian dynamic models, the two ML models and the LR model have better forecast skill for the winter SAT over the central NA which mainly get contribution of a skillful forecast of the second Empirical Orthogonal Function (EOF) mode of winter SAT over NA. In general, the SVR model and XGBoost model hindcasts show better forecast performances than LR model. However, the LR model shows less dependence on the size of the training dataset than SVR and XGBoost models. In the real forecast experiments during the period 2011-2017, compared to the two Canadian dynamic climate models, the two ML models clearly improve the forecast skill of winter SAT over northern and central NA. The results of this study suggest that ML models may provide real-time supplementary forecast tools to improve the forecast skill and may operationally facilitate the seasonal forecast of the winter climate of NA. </p>


2019 ◽  
Author(s):  
Kasper Van Mens ◽  
Joran Lokkerbol ◽  
Richard Janssen ◽  
Robert de Lange ◽  
Bea Tiemens

BACKGROUND It remains a challenge to predict which treatment will work for which patient in mental healthcare. OBJECTIVE In this study we compare machine algorithms to predict during treatment which patients will not benefit from brief mental health treatment and present trade-offs that must be considered before an algorithm can be used in clinical practice. METHODS Using an anonymized dataset containing routine outcome monitoring data from a mental healthcare organization in the Netherlands (n = 2,655), we applied three machine learning algorithms to predict treatment outcome. The algorithms were internally validated with cross-validation on a training sample (n = 1,860) and externally validated on an unseen test sample (n = 795). RESULTS The performance of the three algorithms did not significantly differ on the test set. With a default classification cut-off at 0.5 predicted probability, the extreme gradient boosting algorithm showed the highest positive predictive value (ppv) of 0.71(0.61 – 0.77) with a sensitivity of 0.35 (0.29 – 0.41) and area under the curve of 0.78. A trade-off can be made between ppv and sensitivity by choosing different cut-off probabilities. With a cut-off at 0.63, the ppv increased to 0.87 and the sensitivity dropped to 0.17. With a cut-off of at 0.38, the ppv decreased to 0.61 and the sensitivity increased to 0.57. CONCLUSIONS Machine learning can be used to predict treatment outcomes based on routine monitoring data.This allows practitioners to choose their own trade-off between being selective and more certain versus inclusive and less certain.


2021 ◽  
Vol 13 (5) ◽  
pp. 1021
Author(s):  
Hu Ding ◽  
Jiaming Na ◽  
Shangjing Jiang ◽  
Jie Zhu ◽  
Kai Liu ◽  
...  

Artificial terraces are of great importance for agricultural production and soil and water conservation. Automatic high-accuracy mapping of artificial terraces is the basis of monitoring and related studies. Previous research achieved artificial terrace mapping based on high-resolution digital elevation models (DEMs) or imagery. As a result of the importance of the contextual information for terrace mapping, object-based image analysis (OBIA) combined with machine learning (ML) technologies are widely used. However, the selection of an appropriate classifier is of great importance for the terrace mapping task. In this study, the performance of an integrated framework using OBIA and ML for terrace mapping was tested. A catchment, Zhifanggou, in the Loess Plateau, China, was used as the study area. First, optimized image segmentation was conducted. Then, features from the DEMs and imagery were extracted, and the correlations between the features were analyzed and ranked for classification. Finally, three different commonly-used ML classifiers, namely, extreme gradient boosting (XGBoost), random forest (RF), and k-nearest neighbor (KNN), were used for terrace mapping. The comparison with the ground truth, as delineated by field survey, indicated that random forest performed best, with a 95.60% overall accuracy (followed by 94.16% and 92.33% for XGBoost and KNN, respectively). The influence of class imbalance and feature selection is discussed. This work provides a credible framework for mapping artificial terraces.


2021 ◽  
Author(s):  
Nicola Cortesi ◽  
Verónica Torralba ◽  
Llorenó Lledó ◽  
Andrea Manrique-Suñén ◽  
Nube Gonzalez-Reviriego ◽  
...  

AbstractIt is often assumed that weather regimes adequately characterize atmospheric circulation variability. However, regime classifications spanning many months and with a low number of regimes may not satisfy this assumption. The first aim of this study is to test such hypothesis for the Euro-Atlantic region. The second one is to extend the assessment of sub-seasonal forecast skill in predicting the frequencies of occurrence of the regimes beyond the winter season. Two regime classifications of four regimes each were obtained from sea level pressure anomalies clustered from October to March and from April to September respectively. Their spatial patterns were compared with those representing the annual cycle. Results highlight that the two regime classifications are able to reproduce most part of the patterns of the annual cycle, except during the transition weeks between the two periods, when patterns of the annual cycle resembling Atlantic Low regime are not also observed in any of the two classifications. Forecast skill of Atlantic Low was found to be similar to that of NAO+, the regime replacing Atlantic Low in the two classifications. Thus, although clustering yearly circulation data in two periods of 6 months each introduces a few deviations from the annual cycle of the regime patterns, it does not negatively affect sub-seasonal forecast skill. Beyond the winter season and the first ten forecast days, sub-seasonal forecasts of ECMWF are still able to achieve weekly frequency correlations of r = 0.5 for some regimes and start dates, including summer ones. ECMWF forecasts beat climatological forecasts in case of long-lasting regime events, and when measured by the fair continuous ranked probability skill score, but not when measured by the Brier skill score. Thus, more efforts have to be done yet in order to achieve minimum skill necessary to develop forecast products based on weather regimes outside winter season.


2021 ◽  
Vol 13 (6) ◽  
pp. 1147
Author(s):  
Xiangqian Li ◽  
Wenping Yuan ◽  
Wenjie Dong

To forecast the terrestrial carbon cycle and monitor food security, vegetation growth must be accurately predicted; however, current process-based ecosystem and crop-growth models are limited in their effectiveness. This study developed a machine learning model using the extreme gradient boosting method to predict vegetation growth throughout the growing season in China from 2001 to 2018. The model used satellite-derived vegetation data for the first month of each growing season, CO2 concentration, and several meteorological factors as data sources for the explanatory variables. Results showed that the model could reproduce the spatiotemporal distribution of vegetation growth as represented by the satellite-derived normalized difference vegetation index (NDVI). The predictive error for the growing season NDVI was less than 5% for more than 98% of vegetated areas in China; the model represented seasonal variations in NDVI well. The coefficient of determination (R2) between the monthly observed and predicted NDVI was 0.83, and more than 69% of vegetated areas had an R2 > 0.8. The effectiveness of the model was examined for a severe drought year (2009), and results showed that the model could reproduce the spatiotemporal distribution of NDVI even under extreme conditions. This model provides an alternative method for predicting vegetation growth and has great potential for monitoring vegetation dynamics and crop growth.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Moojung Kim ◽  
Young Jae Kim ◽  
Sung Jin Park ◽  
Kwang Gi Kim ◽  
Pyung Chun Oh ◽  
...  

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Satoko Hiura ◽  
Shige Koseki ◽  
Kento Koyama

AbstractIn predictive microbiology, statistical models are employed to predict bacterial population behavior in food using environmental factors such as temperature, pH, and water activity. As the amount and complexity of data increase, handling all data with high-dimensional variables becomes a difficult task. We propose a data mining approach to predict bacterial behavior using a database of microbial responses to food environments. Listeria monocytogenes, which is one of pathogens, population growth and inactivation data under 1,007 environmental conditions, including five food categories (beef, culture medium, pork, seafood, and vegetables) and temperatures ranging from 0 to 25 °C, were obtained from the ComBase database (www.combase.cc). We used eXtreme gradient boosting tree, a machine learning algorithm, to predict bacterial population behavior from eight explanatory variables: ‘time’, ‘temperature’, ‘pH’, ‘water activity’, ‘initial cell counts’, ‘whether the viable count is initial cell number’, and two types of categories regarding food. The root mean square error of the observed and predicted values was approximately 1.0 log CFU regardless of food category, and this suggests the possibility of predicting viable bacterial counts in various foods. The data mining approach examined here will enable the prediction of bacterial population behavior in food by identifying hidden patterns within a large amount of data.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Jong Ho Kim ◽  
Haewon Kim ◽  
Ji Su Jang ◽  
Sung Mi Hwang ◽  
So Young Lim ◽  
...  

Abstract Background Predicting difficult airway is challengeable in patients with limited airway evaluation. The aim of this study is to develop and validate a model that predicts difficult laryngoscopy by machine learning of neck circumference and thyromental height as predictors that can be used even for patients with limited airway evaluation. Methods Variables for prediction of difficulty laryngoscopy included age, sex, height, weight, body mass index, neck circumference, and thyromental distance. Difficult laryngoscopy was defined as Grade 3 and 4 by the Cormack-Lehane classification. The preanesthesia and anesthesia data of 1677 patients who had undergone general anesthesia at a single center were collected. The data set was randomly stratified into a training set (80%) and a test set (20%), with equal distribution of difficulty laryngoscopy. The training data sets were trained with five algorithms (logistic regression, multilayer perceptron, random forest, extreme gradient boosting, and light gradient boosting machine). The prediction models were validated through a test set. Results The model’s performance using random forest was best (area under receiver operating characteristic curve = 0.79 [95% confidence interval: 0.72–0.86], area under precision-recall curve = 0.32 [95% confidence interval: 0.27–0.37]). Conclusions Machine learning can predict difficult laryngoscopy through a combination of several predictors including neck circumference and thyromental height. The performance of the model can be improved with more data, a new variable and combination of models.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Arturo Moncada-Torres ◽  
Marissa C. van Maaren ◽  
Mathijs P. Hendriks ◽  
Sabine Siesling ◽  
Gijs Geleijnse

AbstractCox Proportional Hazards (CPH) analysis is the standard for survival analysis in oncology. Recently, several machine learning (ML) techniques have been adapted for this task. Although they have shown to yield results at least as good as classical methods, they are often disregarded because of their lack of transparency and little to no explainability, which are key for their adoption in clinical settings. In this paper, we used data from the Netherlands Cancer Registry of 36,658 non-metastatic breast cancer patients to compare the performance of CPH with ML techniques (Random Survival Forests, Survival Support Vector Machines, and Extreme Gradient Boosting [XGB]) in predicting survival using the $$c$$ c -index. We demonstrated that in our dataset, ML-based models can perform at least as good as the classical CPH regression ($$c$$ c -index $$\sim \,0.63$$ ∼ 0.63 ), and in the case of XGB even better ($$c$$ c -index $$\sim 0.73$$ ∼ 0.73 ). Furthermore, we used Shapley Additive Explanation (SHAP) values to explain the models’ predictions. We concluded that the difference in performance can be attributed to XGB’s ability to model nonlinearities and complex interactions. We also investigated the impact of specific features on the models’ predictions as well as their corresponding insights. Lastly, we showed that explainable ML can generate explicit knowledge of how models make their predictions, which is crucial in increasing the trust and adoption of innovative ML techniques in oncology and healthcare overall.


2020 ◽  
Vol 101 (8) ◽  
pp. E1413-E1426 ◽  
Author(s):  
Antje Weisheimer ◽  
Daniel J. Befort ◽  
Dave MacLeod ◽  
Tim Palmer ◽  
Chris O’Reilly ◽  
...  

Abstract Forecasts of seasonal climate anomalies using physically based global circulation models are routinely made at operational meteorological centers around the world. A crucial component of any seasonal forecast system is the set of retrospective forecasts, or hindcasts, from past years that are used to estimate skill and to calibrate the forecasts. Hindcasts are usually produced over a period of around 20–30 years. However, recent studies have demonstrated that seasonal forecast skill can undergo pronounced multidecadal variations. These results imply that relatively short hindcasts are not adequate for reliably testing seasonal forecasts and that small hindcast sample sizes can potentially lead to skill estimates that are not robust. Here we present new and unprecedented 110-year-long coupled hindcasts of the next season over the period 1901–2010. Their performance for the recent period is in good agreement with those of operational forecast models. While skill for ENSO is very high during recent decades, it is markedly reduced during the 1930s–1950s. Skill at the beginning of the twentieth century is, however, as high as for recent high-skill periods. Consistent with findings in atmosphere-only hindcasts, a midcentury drop in forecast skill is found for a range of atmospheric fields, including large-scale indices such as the NAO and the PNA patterns. As with ENSO, skill scores for these indices recover in the early twentieth century, suggesting that the midcentury drop in skill is not due to a lack of good observational data. A public dissemination platform for our hindcast data is available, and we invite the scientific community to explore them.


Sign in / Sign up

Export Citation Format

Share Document