MACHINE LEARNING APPLIED TO SENTINEL-2 AND LANDSAT-8 MULTISPECTRAL AND MEDIUM-RESOLUTION SATELLITE IMAGERY FOR THE DETECTION OF RICE PRODUCTION AREAS IN NGANJUK, EAST JAVA, INDONESIA

Statistics Indonesia (BPS) has been introducing the use of Area Sampling Frame (ASF) surveys from 2018 to estimate rice production areas, although the process continues to suffer from the high costs of human and other resources. To support this type of conventional field survey, a more scalable and inexpensive approach using publicly-available remote sensing data, for example from the Sentinel-2 and Landsat-8 satellites, has been explored. In this research, we compare the performance gain from Sentinel-2 and Landsat-8 images using a multiple composite-index enriched machine learning classifier to detect rice production areas located in Nganjuk, East Java, Indonesia as a case study area. We build a detection model from a set of machine learning classifiers, Decision Tree (CART), Support Vector Machine, Logistic Regression, Ensemble Bagging Methods (Random Forest and Extra Trees), and Ensemble Boosting Methods (AdaBoost and XGBoost). The composite indices consist of the NDVI and EVI for agricultural and forest areas, NDWI for water and cloud, and NDBI, NDTI, and BSI for built-up areas, fallows, and asphalt-based roads. Validated by k-fold cross-validation, Sentinel-2 and Landsat-8 achieved F1-scores of 0.930 and 0.919 respectively at the scale of 30 meters per pixel. Using a 10 meter resolution per pixel for the Sentinel-2 imagery showed an increased F1-score of up to 0.971. Our evaluation shows that the higher spatial resolution imagery of Sentinel-2 achieves a better prediction, not only performance-wise, but also as a better representation of actual conditions.

Download Full-text

Spatial Prediction of Agrochemical Properties on the Scale of a Single Field Using Machine Learning Methods Based on Remote Sensing Data

Agronomy ◽

10.3390/agronomy11112266 ◽

2021 ◽

Vol 11 (11) ◽

pp. 2266

Author(s):

Ilnas Sahabiev ◽

Elena Smirnova ◽

Kamil Giniyatullin

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Remote Sensing Data ◽

Spatial Prediction ◽

Landsat 8 ◽

Landsat 8 Oli ◽

Learning Methods ◽

Machine Learning Methods ◽

Agrochemical Properties ◽

Sentinel 2

Creating accurate digital maps of the agrochemical properties of soils on a field scale with a limited data set is a problem that slows down the introduction of precision farming. The use of machine learning methods based on the use of direct and indirect predictors of spatial changes in the agrochemical properties of soils is promising. Spectral indicators of open soil based on remote sensing data, as well as soil properties, were used to create digital maps of available forms of nitrogen, phosphorus, and potassium. It was shown that machine learning methods based on support vectors (SVMr) and random forest (RF) using spectral reflectance data are similarly accurate at spatial prediction. An acceptable prediction was obtained for available nitrogen and available potassium; the variability of available phosphorus was modeled less accurately. The coefficient of determination (R2) of the best model for nitrogen is R2SVMr = 0.90 (Landsat 8 OLI) and R2SVMr = 0.79 (Sentinel 2), for potassium—R2SVMr = 0.82 (Landsat 8 OLI) and R2SVMr = 0.77 (Sentinel 2), for phosphorus—R2SVMr = 0.68 (Landsat 8 OLI), R2SVMr = 0.64 (Sentinel 2). The models based on remote sensing data were refined when soil organic matter (SOC) and fractions of texture (Silt, Clay) were included as predictors. The SVMr models were the most accurate. For Landsat 8 OLI, the SVMr model has a R2 value: nitrogen—R2 = 0.95, potassium—R2 = 0.89 and phosphorus—R2 = 0.65. Based on Sentinel 2, nitrogen—R2 = 0.92, potassium—R2 = 0.88, phosphorus—R2 = 0.72. The spatial prediction of nitrogen content is influenced by SOC, potassium—by SOC and texture, phosphorus—by texture. The validation of the final models was carried out on an independent sample on soils from a chernozem zone. For nitrogen based on Landsat 8 OLI R2 = 0.88, for potassium R2 = 0.65, and for phosphorus R2 = 0.31. Based on Sentinel 2, for nitrogen R2 = 0.85, for potassium R2 = 0.62, and for phosphorus R2 = 0.71. The inclusion of SOC and texture in remote sensing-based machine learning models makes it possible to improve the spatial prediction of nitrogen, phosphorus and potassium availability of soils in chernozem zones and can potentially be widely used to create digital agrochemical maps on the scale of a single field.

Download Full-text

Delineating Smallholder Maize Farms from Sentinel-1 Coupled with Sentinel-2 Data Using Machine Learning

Sustainability ◽

10.3390/su13094728 ◽

2021 ◽

Vol 13 (9) ◽

pp. 4728

Author(s):

Zinhle Mashaba-Munghemezulu ◽

George Johannes Chirima ◽

Cilence Munghemezulu

Keyword(s):

Machine Learning ◽

Food Security ◽

Rural Communities ◽

Machine Learning Algorithms ◽

Support Vector ◽

Subsistence Agriculture ◽

Smallholder Farms ◽

Main Driver ◽

Sentinel 2

Rural communities rely on smallholder maize farms for subsistence agriculture, the main driver of local economic activity and food security. However, their planted area estimates are unknown in most developing countries. This study explores the use of Sentinel-1 and Sentinel-2 data to map smallholder maize farms. The random forest (RF), support vector (SVM) machine learning algorithms and model stacking (ST) were applied. Results show that the classification of combined Sentinel-1 and Sentinel-2 data improved the RF, SVM and ST algorithms by 24.2%, 8.7%, and 9.1%, respectively, compared to the classification of Sentinel-1 data individually. Similarities in the estimated areas (7001.35 ± 1.2 ha for RF, 7926.03 ± 0.7 ha for SVM and 7099.59 ± 0.8 ha for ST) show that machine learning can estimate smallholder maize areas with high accuracies. The study concludes that the single-date Sentinel-1 data were insufficient to map smallholder maize farms. However, single-date Sentinel-1 combined with Sentinel-2 data were sufficient in mapping smallholder farms. These results can be used to support the generation and validation of national crop statistics, thus contributing to food security.

Download Full-text

Mapping Allochemical Limestone Formations in Hazara, Pakistan Using Google Cloud Architecture: Application of Machine-Learning Algorithms on Multispectral Data

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10020058 ◽

2021 ◽

Vol 10 (2) ◽

pp. 58

Author(s):

Muhammad Fawad Akbar Khan ◽

Khan Muhammad ◽

Shahid Bashir ◽

Shahab Ud Din ◽

Muhammad Hanif

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Learning Algorithms ◽

Remote Sensing Data ◽

Kappa Coefficient ◽

Machine Learning Algorithms ◽

Landsat 8 ◽

Sensing Data ◽

Fossiliferous Limestone

Low-resolution Geological Survey of Pakistan (GSP) maps surrounding the region of interest show oolitic and fossiliferous limestone occurrences correspondingly in Samanasuk, Lockhart, and Margalla hill formations in the Hazara division, Pakistan. Machine-learning algorithms (MLAs) have been rarely applied to multispectral remote sensing data for differentiating between limestone formations formed due to different depositional environments, such as oolitic or fossiliferous. Unlike the previous studies that mostly report lithological classification of rock types having different chemical compositions by the MLAs, this paper aimed to investigate MLAs’ potential for mapping subclasses within the same lithology, i.e., limestone. Additionally, selecting appropriate data labels, training algorithms, hyperparameters, and remote sensing data sources were also investigated while applying these MLAs. In this paper, first, oolitic (Samanasuk), fossiliferous (Lockhart and Margalla) limestone-bearing formations along with the adjoining Hazara formation were mapped using random forest (RF), support vector machine (SVM), classification and regression tree (CART), and naïve Bayes (NB) MLAs. The RF algorithm reported the best accuracy of 83.28% and a Kappa coefficient of 0.78. To further improve the targeted allochemical limestone formation map, annotation labels were generated by the fusion of maps obtained from principal component analysis (PCA), decorrelation stretching (DS), X-means clustering applied to ASTER-L1T, Landsat-8, and Sentinel-2 datasets. These labels were used to train and validate SVM, CART, NB, and RF MLAs to obtain a binary classification map of limestone occurrences in the Hazara division, Pakistan using the Google Earth Engine (GEE) platform. The classification of Landsat-8 data by CART reported 99.63% accuracy, with a Kappa coefficient of 0.99, and was in good agreement with the field validation. This binary limestone map was further classified into oolitic (Samanasuk) and fossiliferous (Lockhart and Margalla) formations by all the four MLAs; in this case, RF surpassed all the other algorithms with an improved accuracy of 96.36%. This improvement can be attributed to better annotation, resulting in a binary limestone classification map, which formed a mask for improved classification of oolitic and fossiliferous limestone in the area.

Download Full-text

MACHINE LEARNING METHODS IN MONITORING OPERATING BEHAVIOUR OF MARINE TWO-STROKE DIESEL ENGINE

Transport ◽

10.3846/transport.2020.14038 ◽

2020 ◽

Vol 35 (5) ◽

pp. 462-473

Author(s):

Aleksandar Vorkapić ◽

Radoslav Radonja ◽

Karlo Babić ◽

Sanda Martinčić-Ipšić

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Fuel Consumption ◽

Performance Monitoring ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Support Vector ◽

Operating Parameters ◽

Detection Model ◽

Modelling Framework

The aim of this article is to enhance performance monitoring of a two-stroke electronically controlled ship propulsion engine on the operating envelope. This is achieved by setting up a machine learning model capable of monitoring influential operating parameters and predicting the fuel consumption. Model is tested with different machine learning algorithms, namely linear regression, multilayer perceptron, Support Vector Machines (SVM) and Random Forests (RF). Upon verification of modelling framework and analysing the results in order to improve the prediction accuracy, the best algorithm is selected based on standard evaluation metrics, i.e. Root Mean Square Error (RMSE) and Relative Absolute Error (RAE). Experimental results show that, by taking an adequate combination and processing of relevant sensory data, SVM exhibit the lowest RMSE 7.1032 and RAE 0.5313%. RF achieve the lowest RMSE 22.6137 and RAE 3.8545% in a setting when minimal number of input variables is considered, i.e. cylinder indicated pressures and propulsion engine revolutions. Further, article deals with the detection of anomalies of operating parameters, which enables the evaluation of the propulsion engine condition and the early identification of failures and deterioration. Such a time-dependent, self-adopting anomaly detection model can be used for comparison with the initial condition recorded during the test and sea run or after survey and docking. Finally, we propose a unified model structure, incorporating fuel consumption prediction and anomaly detection model with on-board decision-making process regarding navigation and maintenance.

Download Full-text

Possibilities of milpa identification in Yucatan through remote sensing techniques and Sentinel-2 data

10.29007/hbs2 ◽

2019 ◽

Author(s):

Juan Carlos Valdiviezo-Navarro ◽

Adan Salazar-Garibay ◽

Karla Juliana Rodríguez-Robayo ◽

Lilián Juárez ◽

María Elena Méndez-López ◽

...

Keyword(s):

Remote Sensing ◽

Urban Areas ◽

Food Sovereignty ◽

Region Of Interest ◽

Remote Sensing Data ◽

Support Vector ◽

Multispectral Images ◽

Crop Fields ◽

Remote Sensing Techniques ◽

Sentinel 2

Maya milpa is one of the most important agrifood systems in Mesoamerica, not only because its ancient origin but also due to lead an increase in landscape diversity and to be a relevant source of families food security and food sovereignty. Nowadays, satellite remote sensing data, as the multispectral images of Sentinel-2 platforms, permit us the monitor- ing of different kinds of structures such as water bodies, urban areas, and particularly agricultural fields. Through its multispectral signatures, mono-crop fields or homogeneous vegetation zones like corn fields, barley fields, or other ones, have been successfully detected by using classification techniques with multispectral images. However, Maya milpa is a complex field which is conformed by different kinds of vegetables species and fragments of natural vegetation that in conjunction cannot be considered as a mono-crop field. In this work, we show some preliminary studies on the availability of monitoring this complex system in a region of interest in Yucatan, through a support vector machine (SVM) approach.

Download Full-text

IntruDTree: A Machine Learning-Based Cyber Security Intrusion Detection Model

10.20944/preprints202004.0481.v1 ◽

2020 ◽

Author(s):

Iqbal H. Sarker ◽

Yoosef B. Abushark ◽

Fawaz Alsolami ◽

Asif Irshad Khan

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Cyber Security ◽

Intrusion Detection System ◽

Detection System ◽

Machine Learning Techniques ◽

Support Vector ◽

Security Model ◽

K Nearest Neighbor ◽

Detection Model

Cyber security has recently received enormous attention in today’s security concerns, due to the popularity of the Internet-of-Things (IoT), the tremendous growth of computer networks, and the huge number of relevant applications. Thus, detecting various cyber-attacks or anomalies in a network and building an effective intrusion detection system that performs an essential role in today’s security is becoming more important. Artificial intelligence, particularly machine learning techniques, can be used for building such a data-driven intelligent intrusion detection system. In order to achieve this goal, in this paper, we present an Intrusion Detection Tree (“IntruDTree”) machine-learning-based security model that first takes into account the ranking of security features according to their importance and then build a tree-based generalized intrusion detection model based on the selected important features. This model is not only effective in terms of prediction accuracy for unseen test cases but also minimizes the computational complexity of the model by reducing the feature dimensions. Finally, the effectiveness of our IntruDTree model was examined by conducting experiments on cybersecurity datasets and computing the precision, recall, fscore, accuracy, and ROC values to evaluate. We also compare the outcome results of IntruDTree model with several traditional popular machine learning methods such as the naive Bayes classifier, logistic regression, support vector machines, and k-nearest neighbor, to analyze the effectiveness of the resulting security model.

Download Full-text

Modeling of Aboveground Biomass with Landsat 8 OLI and Machine Learning in Temperate Forests

Forests ◽

10.3390/f11010011 ◽

2019 ◽

Vol 11 (1) ◽

pp. 11

Author(s):

Pablito M. López-Serrano ◽

José Luis Cárdenas Domínguez ◽

José Javier Corral-Rivas ◽

Enrique Jiménez ◽

Carlos A. López-Sánchez ◽

...

Keyword(s):

Machine Learning ◽

Aboveground Biomass ◽

Goodness Of Fit ◽

Accurate Estimation ◽

Support Vector ◽

Landsat 8 ◽

Sensing Applications ◽

Learning Techniques ◽

Physical Variables ◽

Selection Of

An accurate estimation of forests’ aboveground biomass (AGB) is required because of its relevance to the carbon cycle, and because of its economic and ecological importance. The selection of appropriate variables from satellite information and physical variables is important for precise AGB prediction mapping. Because of the complex relationships for AGB prediction, non-parametric machine-learning techniques represent potentially useful techniques for AGB estimation, but their use and comparison in forest remote-sensing applications is still relatively limited. The objective of the present study was to evaluate the performance of automatic learning techniques, support vector regression (SVR) and random forest (RF), to predict the observed AGB (from 318 permanent sampling plots) from the Landsat 8 Landsat 8 Operational Land Imager (OLI) sensor, spectral indexes, texture indexes and physical variables the Sierra Madre Occidental in Mexico. The result showed that the best SVR model explained 80% of the total variance (root mean square error (RMSE) = 8.20 Mg ha−1). The variables that best predicted AGB, in order of importance, were the bands that belong to the region of red and near and middle infrared, and the average temperature. The results show that the SVR technique has a good potential for the estimation of the AGB and that the selection of the model hyperparameters has important implications for optimizing the goodness of fit.

Download Full-text

Mapping of the Canopy Openings in Mixed Beech–Fir Forest at Sentinel-2 Subpixel Level Using UAV and Machine Learning Approach

Remote Sensing ◽

10.3390/rs12233925 ◽

2020 ◽

Vol 12 (23) ◽

pp. 3925

Author(s):

Ivan Pilaš ◽

Mateo Gašparović ◽

Alan Novkinić ◽

Damir Klobučar

Keyword(s):

Machine Learning ◽

Forest Canopy ◽

Vegetation Index ◽

Predictive Performance ◽

Spatial Extent ◽

Gradient Boosting ◽

Support Vector ◽

Stochastic Gradient Boosting ◽

Extreme Gradient Boosting ◽

Sentinel 2

The presented study demonstrates a bi-sensor approach suitable for rapid and precise up-to-date mapping of forest canopy gaps for the larger spatial extent. The approach makes use of Unmanned Aerial Vehicle (UAV) red, green and blue (RGB) images on smaller areas for highly precise forest canopy mask creation. Sentinel-2 was used as a scaling platform for transferring information from the UAV to a wider spatial extent. Various approaches to an improvement in the predictive performance were examined: (I) the highest R2 of the single satellite index was 0.57, (II) the highest R2 using multiple features obtained from the single-date, S-2 image was 0.624, and (III) the highest R2 on the multitemporal set of S-2 images was 0.697. Satellite indices such as Atmospherically Resistant Vegetation Index (ARVI), Infrared Percentage Vegetation Index (IPVI), Normalized Difference Index (NDI45), Pigment-Specific Simple Ratio Index (PSSRa), Modified Chlorophyll Absorption Ratio Index (MCARI), Color Index (CI), Redness Index (RI), and Normalized Difference Turbidity Index (NDTI) were the dominant predictors in most of the Machine Learning (ML) algorithms. The more complex ML algorithms such as the Support Vector Machines (SVM), Random Forest (RF), Stochastic Gradient Boosting (GBM), Extreme Gradient Boosting (XGBoost), and Catboost that provided the best performance on the training set exhibited weaker generalization capabilities. Therefore, a simpler and more robust Elastic Net (ENET) algorithm was chosen for the final map creation.

Download Full-text

Comparison of Machine Learning Algorithms for Wildland-Urban Interface Fuelbreak Planning Integrating ALS and UAV-Borne LiDAR Data and Multispectral Images

Drones ◽

10.3390/drones4020021 ◽

2020 ◽

Vol 4 (2) ◽

pp. 21 ◽

Cited By ~ 1

Author(s):

Francisco Rodríguez-Puerta ◽

Rafael Alonso Ponce ◽

Fernando Pérez-Rodríguez ◽

Beatriz Águeda ◽

Saray Martín-García ◽

...

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Random Forest ◽

Learning Algorithms ◽

Remote Sensing Data ◽

Machine Learning Algorithms ◽

Data Sources ◽

Lidar Data ◽

Sensing Data ◽

Sentinel 2

Controlling vegetation fuels around human settlements is a crucial strategy for reducing fire severity in forests, buildings and infrastructure, as well as protecting human lives. Each country has its own regulations in this respect, but they all have in common that by reducing fuel load, we in turn reduce the intensity and severity of the fire. The use of Unmanned Aerial Vehicles (UAV)-acquired data combined with other passive and active remote sensing data has the greatest performance to planning Wildland-Urban Interface (WUI) fuelbreak through machine learning algorithms. Nine remote sensing data sources (active and passive) and four supervised classification algorithms (Random Forest, Linear and Radial Support Vector Machine and Artificial Neural Networks) were tested to classify five fuel-area types. We used very high-density Light Detection and Ranging (LiDAR) data acquired by UAV (154 returns·m−2 and ortho-mosaic of 5-cm pixel), multispectral data from the satellites Pleiades-1B and Sentinel-2, and low-density LiDAR data acquired by Airborne Laser Scanning (ALS) (0.5 returns·m−2, ortho-mosaic of 25 cm pixels). Through the Variable Selection Using Random Forest (VSURF) procedure, a pre-selection of final variables was carried out to train the model. The four algorithms were compared, and it was concluded that the differences among them in overall accuracy (OA) on training datasets were negligible. Although the highest accuracy in the training step was obtained in SVML (OA=94.46%) and in testing in ANN (OA=91.91%), Random Forest was considered to be the most reliable algorithm, since it produced more consistent predictions due to the smaller differences between training and testing performance. Using a combination of Sentinel-2 and the two LiDAR data (UAV and ALS), Random Forest obtained an OA of 90.66% in training and of 91.80% in testing datasets. The differences in accuracy between the data sources used are much greater than between algorithms. LiDAR growth metrics calculated using point clouds in different dates and multispectral information from different seasons of the year are the most important variables in the classification. Our results support the essential role of UAVs in fuelbreak planning and management and thus, in the prevention of forest fires.

Download Full-text

A Machine Learning Approach for Mapping Forest Vegetation in Riparian Zones in an Atlantic Biome Environment Using Sentinel-2 Imagery

Remote Sensing ◽

10.3390/rs12244086 ◽

2020 ◽

Vol 12 (24) ◽

pp. 4086

Author(s):

Danielle Elis Garcia Furuya ◽

João Alex Floriano Aguiar ◽

Nayara V. Estrabis ◽

Mayara Maezano Faita Pinheiro ◽

Michelle Taís Garcia Furuya ◽

...

Keyword(s):

Machine Learning ◽

Environmental Planning ◽

Riparian Zone ◽

Learning Algorithms ◽

Vegetation Mapping ◽

Forest Vegetation ◽

Machine Learning Algorithms ◽

Riparian Zones ◽

Support Vector ◽

Sentinel 2

Riparian zones consist of important environmental regions, specifically to maintain the quality of water resources. Accurately mapping forest vegetation in riparian zones is an important issue, since it may provide information about numerous surface processes that occur in these areas. Recently, machine learning algorithms have gained attention as an innovative approach to extract information from remote sensing imagery, including to support the mapping task of vegetation areas. Nonetheless, studies related to machine learning application for forest vegetation mapping in the riparian zones exclusively is still limited. Therefore, this paper presents a framework for forest vegetation mapping in riparian zones based on machine learning models using orbital multispectral images. A total of 14 Sentinel-2 images registered throughout the year, covering a large riparian zone of a portion of a wide river in the Pontal do Paranapanema region, São Paulo state, Brazil, was adopted as the dataset. This area is mainly composed of the Atlantic Biome vegetation, and it is near to the last primary fragment of its biome, being an important region from the environmental planning point of view. We compared the performance of multiple machine learning algorithms like decision tree (DT), random forest (RF), support vector machine (SVM), and normal Bayes (NB). We evaluated different dates and locations with all models. Our results demonstrated that the DT learner has, overall, the highest accuracy in this task. The DT algorithm also showed high accuracy when applied on different dates and in the riparian zone of another river. We conclude that the proposed approach is appropriated to accurately map forest vegetation in riparian zones, including temporal context.

Download Full-text