Fusion of Multispectral Aerial Imagery and Vegetation Indices for Machine Learning-Based Ground Classification

Unmanned Aerial Vehicles (UAVs) are emerging and promising platforms for carrying different types of cameras for remote sensing. The application of multispectral vegetation indices for ground cover classification has been widely adopted and has proved its reliability. However, the fusion of spectral bands and vegetation indices for machine learning-based land surface investigation has hardly been studied. In this paper, we studied the fusion of spectral bands information from UAV multispectral images and derived vegetation indices for almond plantation classification using several machine learning methods. We acquired multispectral images over an almond plantation using a UAV. First, a multispectral orthoimage was generated from the acquired multispectral images using SfM (Structure from Motion) photogrammetry methods. Eleven types of vegetation indexes were proposed based on the multispectral orthoimage. Then, 593 data points that contained multispectral bands and vegetation indexes were randomly collected and prepared for this study. After comparing six machine learning algorithms (Support Vector Machine, K-Nearest Neighbor, Linear Discrimination Analysis, Decision Tree, Random Forest, and Gradient Boosting), we selected three (SVM, KNN, and LDA) to study the fusion of multi-spectral bands information and derived vegetation index for classification. With the vegetation indexes increased, the model classification accuracy of all three selected machine learning methods gradually increased, then dropped. Our results revealed that that: (1) spectral information from multispectral images can be used for machine learning-based ground classification, and among all methods, SVM had the best performance; (2) combination of multispectral bands and vegetation indexes can improve the classification accuracy comparing to only spectral bands among all three selected methods; (3) among all VIs, NDEGE, NDVIG, and NDVGE had consistent performance in improving classification accuracies, and others may reduce the accuracy. Machine learning methods (SVM, KNN, and LDA) can be used for classifying almond plantation using multispectral orthoimages, and fusion of multispectral bands with vegetation indexes can improve machine learning-based classification accuracy if the vegetation indexes are properly selected.

Download Full-text

MODIS-FIRMS and ground-truthing based wildfire likelihood mapping of Sikkim Himalaya using machine learning algorithms.

10.21203/rs.3.rs-750123/v1 ◽

2021 ◽

Author(s):

Polash Banerjee

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Tree Cover ◽

Anthropogenic Factors ◽

Gradient Boosting ◽

Support Vector ◽

Learning Methods ◽

Sikkim Himalaya ◽

Environmental Features ◽

Machine Learning Methods

Abstract Wildfires in limited extent and intensity can be a boon for the forest ecosystem. However, recent episodes of wildfires of 2019 in Australia and Brazil are sad reminders of their heavy ecological and economical costs. Understanding the role of environmental factors in the likelihood of wildfires in a spatial context would be instrumental in mitigating it. In this study, 14 environmental features encompassing meteorological, topographical, ecological, in situ and anthropogenic factors have been considered for preparing the wildfire likelihood map of Sikkim Himalaya. A comparative study on the efficiency of machine learning methods like Generalized Linear Model (GLM), Support Vector Machine (SVM), Random Forest (RF) and Gradient Boosting Model (GBM) has been performed to identify the best performing algorithm in wildfire prediction. The study indicates that all the machine learning methods are good at predicting wildfires. However, RF has outperformed, followed by GBM in the prediction. Also, environmental features like average temperature, average wind speed, proximity to roadways and tree cover percentage are the most important determinants of wildfires in Sikkim Himalaya. This study can be considered as a decision support tool for preparedness, efficient resource allocation and sensitization of people towards mitigation of wildfires in Sikkim.

Download Full-text

Landslide susceptibility mapping using machine learning for Wenchuan County, Sichuan province, China

E3S Web of Conferences ◽

10.1051/e3sconf/202019803023 ◽

2020 ◽

Vol 198 ◽

pp. 03023

Author(s):

Xin Yang ◽

Rui Liu ◽

Luyao Li ◽

Mei Yang ◽

Yuantao Yang

Keyword(s):

Machine Learning ◽

Landslide Susceptibility ◽

Susceptibility Mapping ◽

Machine Learning Algorithms ◽

Landslide Susceptibility Mapping ◽

Support Vector ◽

Roc Curve Analysis ◽

Learning Methods ◽

Machine Learning Methods ◽

Boosted Decision Tree

Landslide susceptibility mapping is a method used to assess the probability and spatial distribution of landslide occurrences. Machine learning methods have been widely used in landslide susceptibility in recent years. In this paper, six popular machine learning algorithms namely logistic regression, multi-layer perceptron, random forests, support vector machine, Adaboost, and gradient boosted decision tree were leveraged to construct landslide susceptibility models with a total of 1365 landslide points and 14 predisposing factors. Subsequently, the landslide susceptibility maps (LSM) were generated by the trained models. LSM shows the main landslide zone is concentrated in the southeastern area of Wenchuan County. The result of ROC curve analysis shows that all models fitted the training datasets and achieved satisfactory results on validation datasets. The results of this paper reveal that machine learning methods are feasible to build robust landslide susceptibility models.

Download Full-text

APPLICATION OF MACHINE LEARNING ALGORITHMS FOR PROCESSING COMMENTS FROM THE YOUTUBE VIDEO HOSTING UNDER TRAINING VIDEOS

Science and Transport Progress Bulletin of Dnipropetrovsk National University of Railway Transport ◽

10.15802/stp2020/225264 ◽

2021 ◽

pp. 33-42

Author(s):

L. S. Koriashkina ◽

H. V. Symonets

Keyword(s):

Machine Learning ◽

Gradient Descent ◽

Russian Language ◽

Stochastic Gradient ◽

Machine Learning Algorithms ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods ◽

Gradient Enhancement

Purpose. Detecting toxic comments on YouTube video hosting under training videos by classifying unstructured text using a combination of machine learning methods. Methodology. To work with the specified type of data, machine learning methods were used for cleaning, normalizing, and presenting textual data in a form acceptable for processing on a computer. Directly to classify comments as “toxic”, we used a logistic regression classifier, a linear support vector classification method without and with a learning method – stochastic gradient descent, a random forest classifier and a gradient enhancement classifier. In order to assess the work of the classifiers, the methods of calculating the matrix of errors, accuracy, completeness and F-measure were used. For a more generalized assessment, a cross-validation method was used. Python programming language. Findings. Based on the assessment indicators, the most optimal methods were selected – support vector machine (Linear SVM), without and with the training method using stochastic gradient descent. The described technologies can be used to analyze the textual comments under any training videos to detect toxic reviews. Also, the approach can be useful for identifying unwanted or even aggressive information on social networks or services where reviews are provided. Originality. It consists in a combination of methods for preprocessing a specific type of text, taking into account such features as the possibility of having a timecode, emoji, links, and the like, as well as in the adaptation of classification methods of machine learning for the analysis of Russian-language comments. Practical value. It is about optimizing (simplification) the comment analysis process. The need for this processing is due to the growing volumes of text data, especially in the field of education through quarantine conditions and the transition to distance learning. The volume of educational Internet content already needs to automate the processing and analysis of feedback, over time this need will only grow.

Download Full-text

Hydraulic Flow Unit Classification and Prediction Using Machine Learning Techniques: A Case Study from the Nam Con Son Basin, Offshore Vietnam

Energies ◽

10.3390/en14227714 ◽

2021 ◽

Vol 14 (22) ◽

pp. 7714

Author(s):

Ha Quang Man ◽

Doan Huy Hien ◽

Kieu Duy Thong ◽

Bui Viet Dung ◽

Nguyen Minh Hoa ◽

...

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Flow Unit ◽

Supervised Machine Learning ◽

Support Vector ◽

Learning Methods ◽

Log Data ◽

Hydraulic Flow ◽

Core Data ◽

Machine Learning Methods

The test study area is the Miocene reservoir of Nam Con Son Basin, offshore Vietnam. In the study we used unsupervised learning to automatically cluster hydraulic flow units (HU) based on flow zone indicators (FZI) in a core plug dataset. Then we applied supervised learning to predict HU by combining core and well log data. We tested several machine learning algorithms. In the first phase, we derived hydraulic flow unit clustering of porosity and permeability of core data using unsupervised machine learning methods such as Ward’s, K mean, Self-Organize Map (SOM) and Fuzzy C mean (FCM). Then we applied supervised machine learning methods including Artificial Neural Networks (ANN), Support Vector Machines (SVM), Boosted Tree (BT) and Random Forest (RF). We combined both core and log data to predict HU logs for the full well section of the wells without core data. We used four wells with six logs (GR, DT, NPHI, LLD, LSS and RHOB) and 578 cores from the Miocene reservoir to train, validate and test the data. Our goal was to show that the correct combination of cores and well logs data would provide reservoir engineers with a tool for HU classification and estimation of permeability in a continuous geological profile. Our research showed that machine learning effectively boosts the prediction of permeability, reduces uncertainty in reservoir modeling, and improves project economics.

Download Full-text

Using Machine Learning Methods To Identify Coal Pay Zones from Drilling and Logging-While-Drilling (LWD) Data

SPE Journal ◽

10.2118/198288-pa ◽

2020 ◽

Vol 25 (03) ◽

pp. 1241-1258 ◽

Cited By ~ 2

Author(s):

Ruizhi Zhong ◽

Raymond L. Johnson ◽

Zhongwei Chen

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

Learning Methods ◽

Well Completion ◽

Machine Learning Methods ◽

Extreme Gradient Boosting ◽

Logging While Drilling

Summary Accurate coal identification is critical in coal seam gas (CSG) (also known as coalbed methane or CBM) developments because it determines well completion design and directly affects gas production. Density logging using radioactive source tools is the primary tool for coal identification, adding well trips to condition the hole and additional well costs for logging runs. In this paper, machine learning methods are applied to identify coals from drilling and logging-while-drilling (LWD) data to reduce overall well costs. Machine learning algorithms include logistic regression (LR), support vector machine (SVM), artificial neural network (ANN), random forest (RF), and extreme gradient boosting (XGBoost). The precision, recall, and F1 score are used as evaluation metrics. Because coal identification is an imbalanced data problem, the performance on the minority class (i.e., coals) is limited. To enhance the performance on coal prediction, two data manipulation techniques [naive random oversampling (NROS) technique and synthetic minority oversampling technique (SMOTE)] are separately coupled with machine learning algorithms. Case studies are performed with data from six wells in the Surat Basin, Australia. For the first set of experiments (single-well experiments), both the training data and test data are in the same well. The machine learning methods can identify coal pay zones for sections with poor or missing logs. It is found that rate of penetration (ROP) is the most important feature. The second set of experiments (multiple-well experiments) uses the training data from multiple nearby wells, which can predict coal pay zones in a new well. The most important feature is gamma ray. After placing slotted casings, all wells have coal identification rates greater than 90%, and three wells have coal identification rates greater than 99%. This indicates that machine learning methods (either XGBoost or ANN/RF with NROS/SMOTE) can be an effective way to identify coal pay zones and reduce coring or logging costs in CSG developments.

Download Full-text

Result and Performance Analysis of Rainfall Prediction System Based on Deep Neural Network

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit2063165 ◽

2020 ◽

pp. 633-638

Author(s):

Akshay Rajendra Naik ◽

A. V. Deorankar ◽

P. B. Ambhore

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Neural Network ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Methods ◽

Rainfall Prediction ◽

Machine Learning Methods ◽

Vector Machines ◽

And Performance

Rainfall prediction is useful for all people for decision making in all fields, such as out door gamming, farming, traveling, and factory and for other activities. We studied various methods for rainfall prediction such as machine learning and neural networks. There is various machine learning algorithms are used in previous existing methods such as naïve byes, support vector machines, random forest, decision trees, and ensemble learning methods. We used deep neural network for rainfall prediction, and for optimization of deep neural network Adam optimizer is used for setting modal parameters, as a result our method gives better results as compare to other machine learning methods.

Download Full-text

A Very Large-Scale Bioactivity Comparison of Deep Learning and Multiple Machine Learning Algorithms for Drug Discovery

10.26434/chemrxiv.12781241 ◽

2020 ◽

Author(s):

Thomas R. Lane ◽

Daniel H. Foil ◽

Eni Minerali ◽

Fabio Urbina ◽

Kimberley M. Zorn ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Drug Discovery ◽

Deep Neural Networks ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods

Machine learning methods are attracting considerable attention from the pharmaceutical industry for use in drug discovery and applications beyond. In recent studies we have applied multiple machine learning algorithms, modeling metrics and in some cases compared molecular descriptors to build models for individual targets or properties on a relatively small scale. Several research groups have used large numbers of datasets from public databases such as ChEMBL in order to evaluate machine learning methods of interest to them. The largest of these types of studies used on the order of 1400 datasets. We have now extracted well over 5000 datasets from CHEMBL for use with the ECFP6 fingerprint and comparison of our proprietary software Assay CentralTM with random forest, k-Nearest Neighbors, support vector classification, naïve Bayesian, AdaBoosted decision trees, and deep neural networks (3 levels). Model performance <a>was</a> assessed using an array of five-fold cross-validation metrics including area-under-the-curve, F1 score, Cohen’s kappa and Matthews correlation coefficient. <a>Based on ranked normalized scores for the metrics or datasets all methods appeared comparable while the distance from the top indicated Assay CentralTM and support vector classification were comparable. </a>Unlike prior studies which have placed considerable emphasis on deep neural networks (deep learning), no advantage was seen in this case where minimal tuning was performed of any of the methods. If anything, Assay CentralTM may have been at a slight advantage as the activity cutoff for each of the over 5000 datasets representing over 570,000 unique compounds was based on Assay CentralTMperformance, but support vector classification seems to be a strong competitor. We also apply Assay CentralTM to prospective predictions for PXR and hERG to further validate these models. This work currently appears to be the largest comparison of machine learning algorithms to date. Future studies will likely evaluate additional databases, descriptors and algorithms, as well as further refining methods for evaluating and comparing models.

Download Full-text

Data Driven Natural Gas Spot Price Prediction Models Using Machine Learning Methods

Energies ◽

10.3390/en12091680 ◽

2019 ◽

Vol 12 (9) ◽

pp. 1680 ◽

Cited By ~ 7

Author(s):

Moting Su ◽

Zongyi Zhang ◽

Ye Zhu ◽

Donglan Zha ◽

Wenying Wen

Keyword(s):

Machine Learning ◽

Natural Gas ◽

Machine Learning Algorithms ◽

Data Driven ◽

Spot Price ◽

Support Vector ◽

Price Forecasting ◽

Learning Methods ◽

Machine Learning Methods ◽

Gas Price

Natural gas has been proposed as a solution to increase the security of energy supply and reduce environmental pollution around the world. Being able to forecast natural gas price benefits various stakeholders and has become a very valuable tool for all market participants in competitive natural gas markets. Machine learning algorithms have gradually become popular tools for natural gas price forecasting. In this paper, we investigate data-driven predictive models for natural gas price forecasting based on common machine learning tools, i.e., artificial neural networks (ANN), support vector machines (SVM), gradient boosting machines (GBM), and Gaussian process regression (GPR). We harness the method of cross-validation for model training and monthly Henry Hub natural gas spot price data from January 2001 to October 2018 for evaluation. Results show that these four machine learning methods have different performance in predicting natural gas prices. However, overall ANN reveals better prediction performance compared with SVM, GBM, and GPR.

Download Full-text

Cyber-attack method and perpetrator prediction using machine learning algorithms

PeerJ Computer Science ◽

10.7717/peerj-cs.475 ◽

2021 ◽

Vol 7 ◽

pp. e475

Author(s):

Abdulkadir Bilen ◽

Ahmet Bedri Özer

Keyword(s):

Machine Learning ◽

Cyber Attacks ◽

Machine Learning Algorithms ◽

Support Vector ◽

Cyber Crime ◽

Cyber Attack ◽

Accuracy Rate ◽

Learning Methods ◽

Machine Learning Methods ◽

Cyber Crimes

Cyber-attacks have become one of the biggest problems of the world. They cause serious financial damages to countries and people every day. The increase in cyber-attacks also brings along cyber-crime. The key factors in the fight against crime and criminals are identifying the perpetrators of cyber-crime and understanding the methods of attack. Detecting and avoiding cyber-attacks are difficult tasks. However, researchers have recently been solving these problems by developing security models and making predictions through artificial intelligence methods. A high number of methods of crime prediction are available in the literature. On the other hand, they suffer from a deficiency in predicting cyber-crime and cyber-attack methods. This problem can be tackled by identifying an attack and the perpetrator of such attack, using actual data. The data include the type of crime, gender of perpetrator, damage and methods of attack. The data can be acquired from the applications of the persons who were exposed to cyber-attacks to the forensic units. In this paper, we analyze cyber-crimes in two different models with machine-learning methods and predict the effect of the defined features on the detection of the cyber-attack method and the perpetrator. We used eight machine-learning methods in our approach and concluded that their accuracy ratios were close. The Support Vector Machine Linear was found out to be the most successful in the cyber-attack method, with an accuracy rate of 95.02%. In the first model, we could predict the types of attacks that the victims were likely to be exposed to with a high accuracy. The Logistic Regression was the leading method in detecting attackers with an accuracy rate of 65.42%. In the second model, we predicted whether the perpetrators could be identified by comparing their characteristics. Our results have revealed that the probability of cyber-attack decreases as the education and income level of victim increases. We believe that cyber-crime units will use the proposed model. It will also facilitate the detection of cyber-attacks and make the fight against these attacks easier and more effective.

Download Full-text

A Comparison of Machine Learning Methods to Forecast Tropospheric Ozone Levels in Delhi

Atmosphere ◽

10.3390/atmos13010046 ◽

2021 ◽

Vol 13 (1) ◽

pp. 46

Author(s):

Eliana Kai Juarez ◽

Mark R. Petersen

Keyword(s):

Machine Learning ◽

Short Term Memory ◽

Ground Level ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbor ◽

Learning Methods ◽

Ground Level Ozone ◽

Machine Learning Methods ◽

Hourly Data

Ground-level ozone is a pollutant that is harmful to urban populations, particularly in developing countries where it is present in significant quantities. It greatly increases the risk of heart and lung diseases and harms agricultural crops. This study hypothesized that, as a secondary pollutant, ground-level ozone is amenable to 24 h forecasting based on measurements of weather conditions and primary pollutants such as nitrogen oxides and volatile organic compounds. We developed software to analyze hourly records of 12 air pollutants and 5 weather variables over the course of one year in Delhi, India. To determine the best predictive model, eight machine learning algorithms were tuned, trained, tested, and compared using cross-validation with hourly data for a full year. The algorithms, ranked by R2 values, were XGBoost (0.61), Random Forest (0.61), K-Nearest Neighbor Regression (0.55), Support Vector Regression (0.48), Decision Trees (0.43), AdaBoost (0.39), and linear regression (0.39). When trained by separate seasons across five years, the predictive capabilities of all models increased, with a maximum R2 of 0.75 during winter. Bidirectional Long Short-Term Memory was the least accurate model for annual training, but had some of the best predictions for seasonal training. Out of five air quality index categories, the XGBoost model was able to predict the correct category 24 h in advance 90% of the time when trained with full-year data. Separated by season, winter is considerably more predictable (97.3%), followed by post-monsoon (92.8%), monsoon (90.3%), and summer (88.9%). These results show the importance of training machine learning methods with season-specific data sets and comparing a large number of methods for specific applications.

Download Full-text