scholarly journals ML-SWAN-v1: a hybrid machine learning framework for the concentration prediction and discovery of transport pathways of surface water nutrients

2020 ◽  
Vol 13 (9) ◽  
pp. 4253-4270
Author(s):  
Benya Wang ◽  
Matthew R. Hipsey ◽  
Carolyn Oldham

Abstract. Nutrient data from catchments discharging to receiving waters are monitored for catchment management. However, nutrient data are often sparse in time and space and have non-linear responses to environmental factors, making it difficult to systematically analyse long- and short-term trends and undertake nutrient budgets. To address these challenges, we developed a hybrid machine learning (ML) framework that first separated baseflow and quickflow from total flow, generated data for missing nutrient species, and then utilised the pre-generated nutrient data as additional variables in a final simulation of tributary water quality. Hybrid random forest (RF) and gradient boosting machine (GBM) models were employed and their performance compared with a linear model, a multivariate weighted regression model, and stand-alone RF and GBM models that did not pre-generate nutrient data. The six models were used to predict six different nutrients discharged from two study sites in Western Australia: Ellen Brook (small and ephemeral) and the Murray River (large and perennial). Our results showed that the hybrid RF and GBM models had significantly higher accuracy and lower prediction uncertainty for almost all nutrient species across the two sites. The pre-generated nutrient and hydrological data were highlighted as the most important components of the hybrid model. The model results also indicated different hydrological transport pathways for total nitrogen (TN) export from two tributary catchments. We demonstrated that the hybrid model provides a flexible method to combine data of varied resolution and quality and is accurate for the prediction of responses of surface water nutrient concentrations to hydrologic variability.

2020 ◽  
Author(s):  
Benya Wang ◽  
Matthew R. Hipsey ◽  
Carolyn Oldham

Abstract. Nutrient data from catchments discharging to receiving waters are necessary to monitor and manage water quality, however, they are often sparse in time and space and have non-linear responses to environmental factors, making it difficult to systematically analyse long- and short-term trends and undertake nutrient budgets. To address these challenges, we developed a hybrid machine learning (ML) framework that first separated baseflow and quickflow from total flow, and then generated data for missing nutrient species, using relationships with hydrological data, rainfall, and temporal data. The generated nutrient data were then included as additional variables in a final simulation of tributary water quality. Hybrid random forest (RF) and gradient boosting machines (GBM) models were employed and their performance compared with a linear model, a multivariate weighted regression model and stand-alone RF and GBM models that did not pre-generate nutrient data. The six models were used to predict TN, TP, NH3, dissolved organic carbon (DOC), dissolved organic nitrogen (DON), and filterable reactive phosphorus (FRP) discharged from two study sites in Western Australia: Ellen Brook (small and ephemeral) and the Murray River (large and perennial). Our results showed that the hybrid RF and GBM models had significantly higher accuracy and lower prediction uncertainty for almost all nutrient species across the two sites. We demonstrated that the hybrid model provides a flexible method to combine data of varied resolution and quality, and is accurate for the prediction of responses of surface water nutrient concentrations to hydrologic variability.


2021 ◽  
Author(s):  
Zhihao Song ◽  
Bin Chen ◽  
Yue Huang ◽  
Li Dong ◽  
Tingting Yang

Abstract. The satellite remote-sensing aerosol optical depth (AOD) and meteorological elements were employed to invert PM2.5 in order to control air pollution more effectively. This paper proposes a restricted gradient-descent linear hybrid machine learning model (RGD–LHMLM) by integrating a random forest (RF), a gradient boosting regression tree (GBRT), and a deep neural network (DNN) to estimate the concentration of PM2.5 in China in 2019. The research data included Himawari-8 AOD with high spatiotemporal resolution, ERA-5 meteorological data, and geographic information. The results showed that, in the hybrid model developed by linear fitting, the DNN accounted for the largest proportion, whereas the weight coefficient was 0.62. The R2 values of RF, GBRT, and DNN were reported 0.79, 0.81, and 0.8, respectively. Preferably, the generalization ability of the mixed model was better than that of each sub-model, and R2 reached 0.84, whereas RMSE and MAE were reported 12.92 µg/m3 and 8.01 µg/m3, respectively. For the RGD-LHMLM, R2 was above 0.7 in more than 70 % of the sites, whereas RMSE and MAE were below 20 µg/m3 and 15 µg/m3, respectively, in more than 70 % of the sites due to the correlation coefficient having seasonal difference between the meteorological factor and PM2.5. Furthermore, the hybrid model performed best in winter (mean R2 was 0.84) and worst in summer (mean R2 was 0.71). The spatiotemporal distribution characteristics of PM2.5 in China were then estimated and analyzed. According to the results, there was severe pollution in winter with an average concentration of PM2.5 being reported 62.10 µg/m3. However, there was slight pollution in summer with an average concentration of PM2.5 being reported 47.39 µg/m3. The findings also indicate that North China and East China are more polluted than other areas and that their average annual concentration of PM2.5 was reported 82.68 µg/m3. Moreover, there was relatively low pollution in Inner Mongolia, Qinghai, and Tibet, for their average PM2.5 concentrations were reported below 40 µg/m3.


2021 ◽  
Vol 14 (8) ◽  
pp. 5333-5347
Author(s):  
Zhihao Song ◽  
Bin Chen ◽  
Yue Huang ◽  
Li Dong ◽  
Tingting Yang

Abstract. Satellite remote sensing aerosol optical depth (AOD) and meteorological elements were employed to invert PM2.5 (the fine particulate matter with a diameter below 2.5 µm) in order to control air pollution more effectively. This paper proposes a restricted gradient-descent linear hybrid machine learning model (RGD-LHMLM) by integrating a random forest (RF), a gradient boosting regression tree (GBRT), and a deep neural network (DNN) to estimate the concentration of PM2.5 in China in 2019. The research data included Himawari-8 AOD with high spatiotemporal resolution, ERA5 meteorological data, and geographic information. The results showed that, in the hybrid model developed by linear fitting, the DNN accounted for the largest proportion, and the weight coefficient was 0.62. The R2 values of RF, GBRT, and DNN were reported as 0.79, 0.81, and 0.8, respectively. Preferably, the generalization ability of the mixed model was better than that of each sub-model, and R2 (determination coefficient) reached 0.84, and RMSE (root mean square error) and MAE (mean absolute error) were reported as 12.92 and 8.01 µg m−3, respectively. For the RGD-LHMLM, R2 was above 0.7 in more than 70 % of the sites and RMSE and MAE were below 20 and 15 µg m−3, respectively, in more than 70 % of the sites due to the correlation coefficient having a seasonal difference between the meteorological factor and PM2.5. Furthermore, the hybrid model performed best in winter (mean R2 was 0.84) and worst in summer (mean R2 was 0.71). The spatiotemporal distribution characteristics of PM2.5 in China were then estimated and analyzed. According to the results, there was severe pollution in winter with an average concentration of PM2.5 being reported as 62.10 µg m−3. However, there was only slight pollution in summer with an average concentration of PM2.5 being reported as 47.39 µg m−3. The period from 10:00 to 15:00 LT (Beijing time, UTC+8 every day is the best time for model inversion; at this time the pollution is also high. The findings also indicate that North China and East China are more polluted than other areas, and their average annual concentration of PM2.5 was reported as 82.68 µg m−3. Moreover, there was relatively low pollution in Inner Mongolia, Qinghai, and Tibet, for their average PM2.5 concentrations were reported below 40 µg m−3.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
B. A Omodunbi

Diabetes mellitus is a health disorder that occurs when the blood sugar level becomes extremely high due to body resistance in producing the required amount of insulin. The aliment happens to be among the major causes of death in Nigeria and the world at large. This study was carried out to detect diabetes mellitus by developing a hybrid model that comprises of two machine learning model namely Light Gradient Boosting Machine (LGBM) and K-Nearest Neighbor (KNN). This research is aimed at developing a machine learning model for detecting the occurrence of diabetes in patients. The performance metrics employed in evaluating the finding for this study are Receiver Operating Characteristics (ROC) Curve, Five-fold Cross-validation, precision, and accuracy score. The proposed system had an accuracy of 91% and the area under the Receiver Operating Characteristic Curve was 93%. The experimental result shows that the prediction accuracy of the hybrid model is better than traditional machine learning


2021 ◽  
Vol 13 (19) ◽  
pp. 3838
Author(s):  
Yan Liu ◽  
Sha Zhang ◽  
Jiahua Zhang ◽  
Lili Tang ◽  
Yun Bai

Accurate estimates of evapotranspiration (ET) over croplands on a regional scale can provide useful information for agricultural management. The hybrid ET model that combines the physical framework, namely the Penman-Monteith equation and machine learning (ML) algorithms, have proven to be effective in ET estimates. However, few studies compared the performances in estimating ET between multiple hybrid model versions using different ML algorithms. In this study, we constructed six different hybrid ET models based on six classical ML algorithms, namely the K nearest neighbor algorithm, random forest, support vector machine, extreme gradient boosting algorithm, artificial neural network (ANN) and long short-term memory (LSTM), using observed data of 17 eddy covariance flux sites of cropland over the globe. Each hybrid model was assessed to estimate ET with ten different input data combinations. In each hybrid model, the ML algorithm was used to model the stomatal conductance (Gs), and then ET was estimated using the Penman-Monteith equation, along with the ML-based Gs. The results showed that all hybrid models can reasonably reproduce ET of cropland with the models using two or more remote sensing (RS) factors. The results also showed that although including RS factors can remarkably contribute to improving ET estimates, hybrid models except for LSTM using three or more RS factors were only marginally better than those using two RS factors. We also evidenced that the ANN-based model exhibits the optimal performance among all ML-based models in modeling daily ET, as indicated by the lower root-mean-square error (RMSE, 18.67–21.23 W m−2) and higher correlations coefficient (r, 0.90–0.94). ANN are more suitable for modeling Gs as compared to other ML algorithms under investigation, being able to provide methodological support for accurate estimation of cropland ET on a regional scale.


Telecom ◽  
2022 ◽  
Vol 3 (1) ◽  
pp. 52-69
Author(s):  
Jabed Al Faysal ◽  
Sk Tahmid Mostafa ◽  
Jannatul Sultana Tamanna ◽  
Khondoker Mirazul Mumenin ◽  
Md. Mashrur Arifin ◽  
...  

In the past few years, Internet of Things (IoT) devices have evolved faster and the use of these devices is exceedingly increasing to make our daily activities easier than ever. However, numerous security flaws persist on IoT devices due to the fact that the majority of them lack the memory and computing resources necessary for adequate security operations. As a result, IoT devices are affected by a variety of attacks. A single attack on network systems or devices can lead to significant damages in data security and privacy. However, machine-learning techniques can be applied to detect IoT attacks. In this paper, a hybrid machine learning scheme called XGB-RF is proposed for detecting intrusion attacks. The proposed hybrid method was applied to the N-BaIoT dataset containing hazardous botnet attacks. Random forest (RF) was used for the feature selection and eXtreme Gradient Boosting (XGB) classifier was used to detect different types of attacks on IoT environments. The performance of the proposed XGB-RF scheme is evaluated based on several evaluation metrics and demonstrates that the model successfully detects 99.94% of the attacks. After comparing it with state-of-the-art algorithms, our proposed model has achieved better performance for every metric. As the proposed scheme is capable of detecting botnet attacks effectively, it can significantly contribute to reducing the security concerns associated with IoT systems.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 1143
Author(s):  
Kalaiarasi Sonai Muthu Anbananthen ◽  
Sridevi Subbiah ◽  
Deisy Chelliah ◽  
Prithika Sivakumar ◽  
Varsha Somasundaram ◽  
...  

Background: In recent times, digitization is gaining importance in different domains of knowledge such as agriculture, medicine, recommendation platforms, the Internet of Things (IoT), and weather forecasting. In agriculture, crop yield estimation is essential for improving productivity and decision-making processes such as financial market forecasting, and addressing food security issues. The main objective of the article is to predict and improve the accuracy of crop yield forecasting using hybrid machine learning (ML) algorithms. Methods: This article proposes hybrid ML algorithms that use specialized ensembling methods such as stacked generalization, gradient boosting, random forest, and least absolute shrinkage and selection operator (LASSO) regression. Stacked generalization is a new model which learns how to best combine the predictions from two or more models trained on the dataset. To demonstrate the applications of the proposed algorithm, aerial-intel datasets from the github data science repository are used. Results: Based on the experimental results done on the agricultural data, the following observations have been made. The performance of the individual algorithm and hybrid ML algorithms are compared using cross-validation to identify the most promising performers for the agricultural dataset.  The accuracy of random forest regressor, gradient boosted tree regression, and stacked generalization ensemble methods are 87.71%, 86.98%, and 88.89% respectively. Conclusions: The proposed stacked generalization ML algorithm statistically outperforms with an accuracy of 88.89% and hence demonstrates that the proposed approach is an effective algorithm for predicting crop yield. The system also gives fast and accurate responses to the farmers.


2020 ◽  
Vol 39 (5) ◽  
pp. 6579-6590
Author(s):  
Sandy Çağlıyor ◽  
Başar Öztayşi ◽  
Selime Sezgin

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.


2020 ◽  
Author(s):  
Saeed Nosratabadi ◽  
Amir Mosavi ◽  
Puhong Duan ◽  
Pedram Ghamisi ◽  
Ferdinand Filip ◽  
...  

This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse range of economics research from the stock market, marketing, and e-commerce to corporate banking and cryptocurrency. Prisma method, a systematic literature review methodology, was used to ensure the quality of the survey. The findings reveal that the trends follow the advancement of hybrid models, which, based on the accuracy metric, outperform other learning algorithms. It is further expected that the trends will converge toward the advancements of sophisticated hybrid deep learning models.


Sign in / Sign up

Export Citation Format

Share Document