Machine learning-based dynamical seasonal prediction of summer rainfall in China

Author(s):  
Jialin Wang ◽  
Jing Yang ◽  
Hongli Ren ◽  
Jinxiao Li ◽  
Qing Bao ◽  
...  

<p>The seasonal prediction of summer rainfall is crucial for regional disaster reduction but currently has a low prediction skill. This study developed a machine learning (ML)-based dynamical (MLD) seasonal prediction method for summer rainfall in China based on suitable circulation fields from an operational dynamical prediction model CAS FGOALS-f2. Through choosing optimum hyperparameters for three ML methods to reach the best fitting and the least overfitting, gradient boosting regression trees eventually exhibit the highest prediction skill, obtaining averaged values of 0.33 in the reference training period (1981-2010) and 0.19 in eight individual years (2011-2018) of independent prediction, which significantly improves the previous dynamical prediction skill by more than 300%. Further study suggests that both reducing overfitting and using the best dynamical prediction are imperative in MLD application prospects, which warrants further investigation.</p>

2021 ◽  
Vol 13 (11) ◽  
pp. 2096
Author(s):  
Zhongqi Yu ◽  
Yuanhao Qu ◽  
Yunxin Wang ◽  
Jinghui Ma ◽  
Yu Cao

A visibility forecast model called a boosting-based fusion model (BFM) was established in this study. The model uses a fusion machine learning model based on multisource data, including air pollutants, meteorological observations, moderate resolution imaging spectroradiometer (MODIS) aerosol optical depth (AOD) data, and an operational regional atmospheric environmental modeling System for eastern China (RAEMS) outputs. Extreme gradient boosting (XGBoost), a light gradient boosting machine (LightGBM), and a numerical prediction method, i.e., RAEMS were fused to establish this prediction model. Three sets of prediction models, that is, BFM, LightGBM based on multisource data (LGBM), and RAEMS, were used to conduct visibility prediction tasks. The training set was from 1 January 2015 to 31 December 2018 and used several data pre-processing methods, including a synthetic minority over-sampling technique (SMOTE) data resampling, a loss function adjustment, and a 10-fold cross verification. Moreover, apart from the basic features (variables), more spatial and temporal gradient features were considered. The testing set was from 1 January to 31 December 2019 and was adopted to validate the feasibility of the BFM, LGBM, and RAEMS. Statistical indicators confirmed that the machine learning methods improved the RAEMS forecast significantly and consistently. The root mean square error and correlation coefficient of BFM for the next 24/48 h were 5.01/5.47 km and 0.80/0.77, respectively, which were much higher than those of RAEMS. The statistics and binary score analysis for different areas in Shanghai also proved the reliability and accuracy of using BFM, particularly in low-visibility forecasting. Overall, BFM is a suitable tool for predicting the visibility. It provides a more accurate visibility forecast for the next 24 and 48 h in Shanghai than LGBM and RAEMS. The results of this study provide support for real-time operational visibility forecasts.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Min Huang ◽  
Dandan Liu ◽  
Liyun Ma ◽  
Jingyang Wang ◽  
Yuming Wang ◽  
...  

With the rapid development of science and technology, UAVs (Unmanned Aerial Vehicles) have become a new type of weapon in the informatization battlefield by their advantages of low loss and zero casualty rate. In recent years, UAV navigation electromagnetic decoy and electromagnetic interference crashes have activated widespread international attention. The UAV LiDAR detection system is susceptible to electromagnetic interference in a complex electromagnetic environment, which results in inaccurate detection and causes the mission to fail. Therefore, it is very necessary to predict the effects of the electromagnetic environment. Traditional electromagnetic environment effect prediction methods mostly use a single model of mathematical model and machine learning, but the traditional prediction method has poor processing nonlinear ability and weak generalization ability. Therefore, this paper uses the Stacking fusion model algorithm in machine learning to study the electromagnetic environment effect prediction. This paper proposes a Stacking fusion model based on machine learning to predict electromagnetic environment effects. The method consists of Extreme Gradient Boosting algorithm (XGB), Gradient Boosting Decision Tree algorithm (GBDT), K Nearest Neighbor algorithm (KNN), and Decision Tree algorithm (DT). Experimental results show that, comprising with the other seven machine learning algorithms, the Stacking fusion model has a better classification prediction accuracy of 0.9762, a lower Hamming code distance of 0.0336, and a higher Kappa coefficient of 0.955. The fusion model proposed in this paper has a better predictive effect on electromagnetic environment effects and is of great significance for improving the accuracy and safety of UAV LiDAR detection systems under the complex electromagnetic environment on the battlefield.


2021 ◽  
Vol 35 (4) ◽  
pp. 583-593
Author(s):  
Jialin Wang ◽  
Jing Yang ◽  
Hong-Li Ren ◽  
Jinxiao Li ◽  
Qing Bao ◽  
...  

2021 ◽  
Vol 9 (4) ◽  
pp. 376 ◽  
Author(s):  
Yunfei Yang ◽  
Haiwen Tu ◽  
Lei Song ◽  
Lin Chen ◽  
De Xie ◽  
...  

Resistance is one of the important performance indicators of ships. In this paper, a prediction method based on the Radial Basis Function neural network (RBFNN) is proposed to predict the resistance of a 13500 transmission extension unit (13500TEU) container ship at different drafts. The predicted draft state in the known range is called interpolation prediction; otherwise, it is extrapolation prediction. First, ship features are extracted to make the resistance Rt prediction. The resistance prediction results show that the performance of the RBFNN is significantly better than the other four machine learning models, backpropagation neural network (BPNN), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost). Then, the ship data is processed in a dimensionless manner, and the models mentioned above are used to predict the total resistance coefficient Ct of the container ship. The prediction results show that the RBFNN prediction model still performs well. Good results can be obtained by RBFNN in interpolation prediction, even when using part of dimensionless features. Finally, the accuracy of the prediction method based on RBFNN is greatly improved compared with the modified admiralty coefficient.


Improving the performance of link prediction is a significant role in the evaluation of social network. Link prediction is known as one of the primary purposes for recommended systems, bio information, and web. Most machine learning methods that depend on SNA model’s metrics use supervised learning to develop link prediction models. Supervised learning actually needed huge amount of data set to train the model of link prediction to obtain an optimal level of performance. In few years, Deep Reinforcement Learning (DRL) has achieved excellent success in various domain such as SNA. In this paper, we present the use of deep reinforcement learning (DRL) to improve the performance and accuracy of the model for the applied dataset. The experiment shows that the dataset created by the DRL model through self-play or auto-simulation can be utilized to improve the link prediction model. We have used three different datasets: JUNANES, MAMBO, JAKE. Experimental results show that the DRL proposed method provide accuracy of 85% for JUNANES, 87% for MAMABO, and 78% for JAKE dataset which outperforms the GBM next highest accuracy of 75% for JUNANES, 79% for MAMBO and 71% for JAKE dataset respectively trained with 2500 iteration and also in terms of AUC measures as well. The DRL model shows the better efficiency than a traditional machine learning strategy, such as, Random Forest and the gradient boosting machine (GBM).


2020 ◽  
Vol 39 (5) ◽  
pp. 6579-6590
Author(s):  
Sandy Çağlıyor ◽  
Başar Öztayşi ◽  
Selime Sezgin

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.


2019 ◽  
Vol 21 (9) ◽  
pp. 662-669 ◽  
Author(s):  
Junnan Zhao ◽  
Lu Zhu ◽  
Weineng Zhou ◽  
Lingfeng Yin ◽  
Yuchen Wang ◽  
...  

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.


2019 ◽  
Author(s):  
Kasper Van Mens ◽  
Joran Lokkerbol ◽  
Richard Janssen ◽  
Robert de Lange ◽  
Bea Tiemens

BACKGROUND It remains a challenge to predict which treatment will work for which patient in mental healthcare. OBJECTIVE In this study we compare machine algorithms to predict during treatment which patients will not benefit from brief mental health treatment and present trade-offs that must be considered before an algorithm can be used in clinical practice. METHODS Using an anonymized dataset containing routine outcome monitoring data from a mental healthcare organization in the Netherlands (n = 2,655), we applied three machine learning algorithms to predict treatment outcome. The algorithms were internally validated with cross-validation on a training sample (n = 1,860) and externally validated on an unseen test sample (n = 795). RESULTS The performance of the three algorithms did not significantly differ on the test set. With a default classification cut-off at 0.5 predicted probability, the extreme gradient boosting algorithm showed the highest positive predictive value (ppv) of 0.71(0.61 – 0.77) with a sensitivity of 0.35 (0.29 – 0.41) and area under the curve of 0.78. A trade-off can be made between ppv and sensitivity by choosing different cut-off probabilities. With a cut-off at 0.63, the ppv increased to 0.87 and the sensitivity dropped to 0.17. With a cut-off of at 0.38, the ppv decreased to 0.61 and the sensitivity increased to 0.57. CONCLUSIONS Machine learning can be used to predict treatment outcomes based on routine monitoring data.This allows practitioners to choose their own trade-off between being selective and more certain versus inclusive and less certain.


Sign in / Sign up

Export Citation Format

Share Document