scholarly journals Surface Shortwave Net Radiation Estimation from Landsat TM/ETM+ Data Using Four Machine Learning Algorithms

2019 ◽  
Vol 11 (23) ◽  
pp. 2847 ◽  
Author(s):  
Yezhe Wang ◽  
Bo Jiang ◽  
Shunlin Liang ◽  
Dongdong Wang ◽  
Tao He ◽  
...  

Surface shortwave net radiation (SSNR) flux is essential for the determination of the radiation energy balance between the atmosphere and the Earth’s surface. The satellite-derived intermediate SSNR data are strongly needed to bridge the gap between existing coarse-resolution SSNR products and point-based measurements. In this study, four different machine learning (ML) algorithms were tested to estimate the SSNR from the Landsat Thematic Mapper (TM)/ Enhanced Thematic Mapper Plus (ETM+) top-of-atmosphere (TOA) reflectance and other ancillary information (i.e., clearness index, water vapor) at instantaneous and daily scales under all sky conditions. The four ML algorithms include the multivariate adaptive regression splines (MARS), backpropagation neural network (BPNN), support vector regression (SVR), and gradient boosting regression tree (GBRT). Collected in-situ measurements were used to train the global model (using all data) and the conditional models (in which all data were divided into subsets and the models were fitted separately). The validation results indicated that the GBRT-based global model (GGM) performs the best at both the instantaneous and daily scales. For example, the GGM based on the TM data yielded a coefficient of determination value (R2) of 0.88 and 0.94, an average root mean square error (RMSE) of 73.23 W∙m-2 (15.09%) and 18.76 W·m-2 (11.2%), and a bias of 0.64 W·m-2 and –1.74 W·m-2 for instantaneous and daily SSNR, respectively. Compared to the Global LAnd Surface Satellite (GLASS) daily SSNR product, the daily TM-SSNR showed a very similar spatial distribution but with more details. Further analysis also demonstrated the robustness of the GGM for various land cover types, elevation, general atmospheric conditions, and seasons

Materials ◽  
2021 ◽  
Vol 14 (15) ◽  
pp. 4068
Author(s):  
Xu Huang ◽  
Mirna Wasouf ◽  
Jessada Sresakoolchai ◽  
Sakdirat Kaewunruen

Cracks typically develop in concrete due to shrinkage, loading actions, and weather conditions; and may occur anytime in its life span. Autogenous healing concrete is a type of self-healing concrete that can automatically heal cracks based on physical or chemical reactions in concrete matrix. It is imperative to investigate the healing performance that autogenous healing concrete possesses, to assess the extent of the cracking and to predict the extent of healing. In the research of self-healing concrete, testing the healing performance of concrete in a laboratory is costly, and a mass of instances may be needed to explore reliable concrete design. This study is thus the world’s first to establish six types of machine learning algorithms, which are capable of predicting the healing performance (HP) of self-healing concrete. These algorithms involve an artificial neural network (ANN), a k-nearest neighbours (kNN), a gradient boosting regression (GBR), a decision tree regression (DTR), a support vector regression (SVR) and a random forest (RF). Parameters of these algorithms are tuned utilising grid search algorithm (GSA) and genetic algorithm (GA). The prediction performance indicated by coefficient of determination (R2) and root mean square error (RMSE) measures of these algorithms are evaluated on the basis of 1417 data sets from the open literature. The results show that GSA-GBR performs higher prediction performance (R2GSA-GBR = 0.958) and stronger robustness (RMSEGSA-GBR = 0.202) than the other five types of algorithms employed to predict the healing performance of autogenous healing concrete. Therefore, reliable prediction accuracy of the healing performance and efficient assistance on the design of autogenous healing concrete can be achieved.


2021 ◽  
Vol 10 (1) ◽  
pp. 42
Author(s):  
Kieu Anh Nguyen ◽  
Walter Chen ◽  
Bor-Shiun Lin ◽  
Uma Seeboonruang

Although machine learning has been extensively used in various fields, it has only recently been applied to soil erosion pin modeling. To improve upon previous methods of quantifying soil erosion based on erosion pin measurements, this study explored the possible application of ensemble machine learning algorithms to the Shihmen Reservoir watershed in northern Taiwan. Three categories of ensemble methods were considered in this study: (a) Bagging, (b) boosting, and (c) stacking. The bagging method in this study refers to bagged multivariate adaptive regression splines (bagged MARS) and random forest (RF), and the boosting method includes Cubist and gradient boosting machine (GBM). Finally, the stacking method is an ensemble method that uses a meta-model to combine the predictions of base models. This study used RF and GBM as the meta-models, decision tree, linear regression, artificial neural network, and support vector machine as the base models. The dataset used in this study was sampled using stratified random sampling to achieve a 70/30 split for the training and test data, and the process was repeated three times. The performance of six ensemble methods in three categories was analyzed based on the average of three attempts. It was found that GBM performed the best among the ensemble models with the lowest root-mean-square error (RMSE = 1.72 mm/year), the highest Nash-Sutcliffe efficiency (NSE = 0.54), and the highest index of agreement (d = 0.81). This result was confirmed by the spatial comparison of the absolute differences (errors) between model predictions and observations using GBM and RF in the study area. In summary, the results show that as a group, the bagging method and the boosting method performed equally well, and the stacking method was third for the erosion pin dataset considered in this study.


Water ◽  
2020 ◽  
Vol 12 (3) ◽  
pp. 713 ◽  
Author(s):  
Aliva Nanda ◽  
Sumit Sen ◽  
Awshesh Nath Sharma ◽  
K. P. Sudheer

Soil temperature plays an important role in understanding hydrological, ecological, meteorological, and land surface processes. However, studies related to soil temperature variability are very scarce in various parts of the world, especially in the Indian Himalayan Region (IHR). Thus, this study aims to analyze the spatio-temporal variability of soil temperature in two nested hillslopes of the lesser Himalaya and to check the efficiency of different machine learning algorithms to estimate soil temperature in the data-scarce region. To accomplish this goal, grassed (GA) and agro-forested (AgF) hillslopes were instrumented with Odyssey water level and decagon soil moisture and temperature sensors. The average soil temperature of the south aspect hillslope (i.e., GA hillslope) was higher than the north aspect hillslope (i.e., AgF hillslope). After analyzing 40 rainfall events from both hillslopes, it was observed that a rainfall duration of greater than 7.5 h or an event with an average rainfall intensity greater than 7.5 mm/h results in more than 2 °C soil temperature drop. Further, a drop in soil temperature less than 1 °C was also observed during very high-intensity rainfall which has a very short event duration. During the rainy season, the soil temperature drop of the GA hillslope is higher than the AgF hillslope as the former one infiltrates more water. This observation indicates the significant correlation between soil moisture rise and soil temperature drop. The potential of four machine learning algorithms was also explored in predicting soil temperature under data-scarce conditions. Among the four machine learning algorithms, an extreme gradient boosting system (XGBoost) performed better for both the hillslopes followed by random forests (RF), multilayer perceptron (MLP), and support vector machine (SVMs). The addition of rainfall to meteorological and meteorological + soil moisture datasets did not improve the models considerably. However, the addition of soil moisture to meteorological parameters improved the model significantly.


Water ◽  
2020 ◽  
Vol 12 (12) ◽  
pp. 3490
Author(s):  
Noor Hafsa ◽  
Sayeed Rushd ◽  
Mohammed Al-Yaari ◽  
Muhammad Rahman

Applications of machine learning algorithms (MLAs) to modeling the adsorption efficiencies of different heavy metals have been limited by the adsorbate–adsorbent pair and the selection of specific MLAs. In the current study, adsorption efficiencies of fourteen heavy metal–adsorbent (HM-AD) pairs were modeled with a variety of ML models such as support vector regression with polynomial and radial basis function kernels, random forest (RF), stochastic gradient boosting, and bayesian additive regression tree (BART). The wet experiment-based actual measurements were supplemented with synthetic data samples. The first batch of dry experiments was performed to model the removal efficiency of an HM with a specific AD. The ML modeling was then implemented on the whole dataset to develop a generalized model. A ten-fold cross-validation method was used for the model selection, while the comparative performance of the MLAs was evaluated with statistical metrics comprising Spearman’s rank correlation coefficient, coefficient of determination (R2), mean absolute error, and root-mean-squared-error. The regression tree methods, BART, and RF demonstrated the most robust and optimum performance with 0.96 ⫹ R2 ⫹ 0.99. The current study provides a generalized methodology to implement ML in modeling the efficiency of not only a specific adsorption process but also a group of comparable processes involving multiple HM-AD pairs.


Author(s):  
Gudipally Chandrashakar

In this article, we used historical time series data up to the current day gold price. In this study of predicting gold price, we consider few correlating factors like silver price, copper price, standard, and poor’s 500 value, dollar-rupee exchange rate, Dow Jones Industrial Average Value. Considering the prices of every correlating factor and gold price data where dates ranging from 2008 January to 2021 February. Few algorithms of machine learning are used to analyze the time-series data are Random Forest Regression, Support Vector Regressor, Linear Regressor, ExtraTrees Regressor and Gradient boosting Regression. While seeing the results the Extra Tree Regressor algorithm gives the predicted value of gold prices more accurately.


2021 ◽  
Author(s):  
Polash Banerjee

Abstract Wildfires in limited extent and intensity can be a boon for the forest ecosystem. However, recent episodes of wildfires of 2019 in Australia and Brazil are sad reminders of their heavy ecological and economical costs. Understanding the role of environmental factors in the likelihood of wildfires in a spatial context would be instrumental in mitigating it. In this study, 14 environmental features encompassing meteorological, topographical, ecological, in situ and anthropogenic factors have been considered for preparing the wildfire likelihood map of Sikkim Himalaya. A comparative study on the efficiency of machine learning methods like Generalized Linear Model (GLM), Support Vector Machine (SVM), Random Forest (RF) and Gradient Boosting Model (GBM) has been performed to identify the best performing algorithm in wildfire prediction. The study indicates that all the machine learning methods are good at predicting wildfires. However, RF has outperformed, followed by GBM in the prediction. Also, environmental features like average temperature, average wind speed, proximity to roadways and tree cover percentage are the most important determinants of wildfires in Sikkim Himalaya. This study can be considered as a decision support tool for preparedness, efficient resource allocation and sensitization of people towards mitigation of wildfires in Sikkim.


Author(s):  
Harsha A K

Abstract: Since the advent of encryption, there has been a steady increase in malware being transmitted over encrypted networks. Traditional approaches to detect malware like packet content analysis are inefficient in dealing with encrypted data. In the absence of actual packet contents, we can make use of other features like packet size, arrival time, source and destination addresses and other such metadata to detect malware. Such information can be used to train machine learning classifiers in order to classify malicious and benign packets. In this paper, we offer an efficient malware detection approach using classification algorithms in machine learning such as support vector machine, random forest and extreme gradient boosting. We employ an extensive feature selection process to reduce the dimensionality of the chosen dataset. The dataset is then split into training and testing sets. Machine learning algorithms are trained using the training set. These models are then evaluated against the testing set in order to assess their respective performances. We further attempt to tune the hyper parameters of the algorithms, in order to achieve better results. Random forest and extreme gradient boosting algorithms performed exceptionally well in our experiments, resulting in area under the curve values of 0.9928 and 0.9998 respectively. Our work demonstrates that malware traffic can be effectively classified using conventional machine learning algorithms and also shows the importance of dimensionality reduction in such classification problems. Keywords: Malware Detection, Extreme Gradient Boosting, Random Forest, Feature Selection.


2020 ◽  
Vol 9 (9) ◽  
pp. 507
Author(s):  
Sanjiwana Arjasakusuma ◽  
Sandiaga Swahyu Kusuma ◽  
Stuart Phinn

Machine learning has been employed for various mapping and modeling tasks using input variables from different sources of remote sensing data. For feature selection involving high- spatial and spectral dimensionality data, various methods have been developed and incorporated into the machine learning framework to ensure an efficient and optimal computational process. This research aims to assess the accuracy of various feature selection and machine learning methods for estimating forest height using AISA (airborne imaging spectrometer for applications) hyperspectral bands (479 bands) and airborne light detection and ranging (lidar) height metrics (36 metrics), alone and combined. Feature selection and dimensionality reduction using Boruta (BO), principal component analysis (PCA), simulated annealing (SA), and genetic algorithm (GA) in combination with machine learning algorithms such as multivariate adaptive regression spline (MARS), extra trees (ET), support vector regression (SVR) with radial basis function, and extreme gradient boosting (XGB) with trees (XGbtree and XGBdart) and linear (XGBlin) classifiers were evaluated. The results demonstrated that the combinations of BO-XGBdart and BO-SVR delivered the best model performance for estimating tropical forest height by combining lidar and hyperspectral data, with R2 = 0.53 and RMSE = 1.7 m (18.4% of nRMSE and 0.046 m of bias) for BO-XGBdart and R2 = 0.51 and RMSE = 1.8 m (15.8% of nRMSE and −0.244 m of bias) for BO-SVR. Our study also demonstrated the effectiveness of BO for variables selection; it could reduce 95% of the data to select the 29 most important variables from the initial 516 variables from lidar metrics and hyperspectral data.


2019 ◽  
Vol 9 (19) ◽  
pp. 4069 ◽  
Author(s):  
Huixiang Liu ◽  
Qing Li ◽  
Dongbing Yu ◽  
Yu Gu

Air pollution has become an important environmental issue in recent decades. Forecasts of air quality play an important role in warning people about and controlling air pollution. We used support vector regression (SVR) and random forest regression (RFR) to build regression models for predicting the Air Quality Index (AQI) in Beijing and the nitrogen oxides (NOX) concentration in an Italian city, based on two publicly available datasets. The root-mean-square error (RMSE), correlation coefficient (r), and coefficient of determination (R2) were used to evaluate the performance of the regression models. Experimental results showed that the SVR-based model performed better in the prediction of the AQI (RMSE = 7.666, R2 = 0.9776, and r = 0.9887), and the RFR-based model performed better in the prediction of the NOX concentration (RMSE = 83.6716, R2 = 0.8401, and r = 0.9180). This work also illustrates that combining machine learning with air quality prediction is an efficient and convenient way to solve some related environment problems.


2020 ◽  
Author(s):  
Semih Kuter ◽  
Zuhal Akyurek

<p>Spatial extent of snow has been declared as an essential climate variable. Accurate modeling of snow cover is crucial for the better prediction of snow water equivalent and, consequently, for the success of general circulation and weather forecasting models as well as climate change and hydrological studies. This presentation mainly focuses on the representation of the latest findings of our efforts in fractional snow cover mapping on MODIS images by data-driven machine learning methodologies. For this purpose, a dataset composed of 20 MODIS - Landsat 8 image pairs acquired between Apr 2013 and Dec 2016 over European Alps were employed. Artificial neural networks (ANN), multivariate adaptive regression splines (MARS), support vector regression (SVR) and random forest (RF) models were trained and tested by using reference FSC maps generated from higher spatial resolution Landsat 8 binary snow maps. ANN, MARS, SVR and RF models exhibited quite good performance with average R ≈ 0.93, whereas the agreement between the reference FSC maps and the MODIS’ own product MOD10A1 (C5) was slightly poorer with R ≈ 0.88.</p>


Sign in / Sign up

Export Citation Format

Share Document