Comparative Assessment of Machine Learning Methods for Urban Vegetation Mapping Using Multitemporal Sentinel-1 Imagery

Mapping of green vegetation in urban areas using remote sensing techniques can be used as a tool for integrated spatial planning to deal with urban challenges. In this context, multitemporal (MT) synthetic aperture radar (SAR) data have not been equally investigated, as compared to optical satellite data. This research compared various machine learning methods using single-date and MT Sentinel-1 (S1) imagery. The research was focused on vegetation mapping in urban areas across Europe. Urban vegetation was classified using six classifiers—random forests (RF), support vector machine (SVM), extreme gradient boosting (XGB), multi-layer perceptron (MLP), AdaBoost.M1 (AB), and extreme learning machine (ELM). Whereas, SVM showed the best performance in the single-date image analysis, the MLP classifier yielded the highest overall accuracy in the MT classification scenario. Mean overall accuracy (OA) values for all machine learning methods increased from 57% to 77% with speckle filtering. Using MT SAR data, i.e., three and five S1 imagery, an additional increase in the OA of 8.59% and 13.66% occurred, respectively. Additionally, using three and five S1 imagery for classification, the F1 measure for forest and low vegetation land-cover class exceeded 90%. This research allowed us to confirm the possibility of MT C-band SAR imagery for urban vegetation mapping.

Download Full-text

Prediction of Hanwoo Cattle Phenotypes from Genotypes Using Machine Learning Methods

Animals ◽

10.3390/ani11072066 ◽

2021 ◽

Vol 11 (7) ◽

pp. 2066

Author(s):

Swati Srivastava ◽

Bryan Irvine Lopez ◽

Himansu Kumar ◽

Myoungjin Jang ◽

Han-Ha Chai ◽

...

Keyword(s):

Machine Learning ◽

Support Vector ◽

Learning Methods ◽

Eye Muscle ◽

Important Species ◽

Machine Learning Methods ◽

Extreme Gradient Boosting ◽

Boosting Method ◽

Predictive Correlation ◽

Hanwoo Cattle

Hanwoo was originally raised for draft purposes, but the increase in local demand for red meat turned that purpose into full-scale meat-type cattle rearing; it is now considered one of the most economically important species and a vital food source for Koreans. The application of genomic selection in Hanwoo breeding programs in recent years was expected to lead to higher genetic progress. However, better statistical methods that can improve the genomic prediction accuracy are required. Hence, this study aimed to compare the predictive performance of three machine learning methods, namely, random forest (RF), extreme gradient boosting method (XGB), and support vector machine (SVM), when predicting the carcass weight (CWT), marbling score (MS), backfat thickness (BFT) and eye muscle area (EMA). Phenotypic and genotypic data (53,866 SNPs) from 7324 commercial Hanwoo cattle that were slaughtered at the age of around 30 months were used. The results showed that the boosting method XGB showed the highest predictive correlation for CWT and MS, followed by GBLUP, SVM, and RF. Meanwhile, the best predictive correlation for BFT and EMA was delivered by GBLUP, followed by SVM, RF, and XGB. Although XGB presented the highest predictive correlations for some traits, we did not find an advantage of XGB or any machine learning methods over GBLUP according to the mean squared error of prediction. Thus, we still recommend the use of GBLUP in the prediction of genomic breeding values for carcass traits in Hanwoo cattle.

Download Full-text

MODIS-FIRMS and ground-truthing based wildfire likelihood mapping of Sikkim Himalaya using machine learning algorithms.

10.21203/rs.3.rs-750123/v1 ◽

2021 ◽

Author(s):

Polash Banerjee

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Tree Cover ◽

Anthropogenic Factors ◽

Gradient Boosting ◽

Support Vector ◽

Learning Methods ◽

Sikkim Himalaya ◽

Environmental Features ◽

Machine Learning Methods

Abstract Wildfires in limited extent and intensity can be a boon for the forest ecosystem. However, recent episodes of wildfires of 2019 in Australia and Brazil are sad reminders of their heavy ecological and economical costs. Understanding the role of environmental factors in the likelihood of wildfires in a spatial context would be instrumental in mitigating it. In this study, 14 environmental features encompassing meteorological, topographical, ecological, in situ and anthropogenic factors have been considered for preparing the wildfire likelihood map of Sikkim Himalaya. A comparative study on the efficiency of machine learning methods like Generalized Linear Model (GLM), Support Vector Machine (SVM), Random Forest (RF) and Gradient Boosting Model (GBM) has been performed to identify the best performing algorithm in wildfire prediction. The study indicates that all the machine learning methods are good at predicting wildfires. However, RF has outperformed, followed by GBM in the prediction. Also, environmental features like average temperature, average wind speed, proximity to roadways and tree cover percentage are the most important determinants of wildfires in Sikkim Himalaya. This study can be considered as a decision support tool for preparedness, efficient resource allocation and sensitization of people towards mitigation of wildfires in Sikkim.

Download Full-text

Machine Learning to Forecast Medical Attentions of Pneumonia Cases in Colombian Cities: An implementation with Air Quality, Meteorological and Admission Data

10.21203/rs.3.rs-53367/v1 ◽

2020 ◽

Author(s):

Juan David Gutiérrez

Keyword(s):

Public Health ◽

Machine Learning ◽

Air Pollution ◽

Gradient Boosting ◽

Learning Methods ◽

Machine Learning Methods ◽

Health Authorities ◽

Admission Data ◽

Extreme Gradient Boosting ◽

Public Health Authorities

Abstract Background: Previous authors have evidenced the relationship between air pollution-aerosols and meteorological variables with the occurrence of pneumonia. Forecasting the number of attentions of pneumonia cases may be useful to optimize the allocation of healthcare resources and support public health authorities to implement emergency plans to face an increase in patients. The purpose of this study is to implement four machine-learning methods to forecast the number of attentions of pneumonia cases in the five largest cities of Colombia by using air pollution-aerosols, and meteorological and admission data.Methods: The number of attentions of pneumonia cases in the five most populated Colombian cities was provided by public health authorities between January 2009 and December 2019. Air pollution-aerosols and meteorological data were obtained from remote sensors. Four machine-learning methods were implemented for each city. We selected the machine-learning methods with the best performance in each city and implemented two techniques to identify the most relevant variables in the forecasting developed by the best-performing machine-learning models. Results: According to R2 metric, random forest was the machine-learning method with the best performance for Bogotá, Medellín and Cali; whereas for Barranquilla, the best performance was obtained from the Bayesian adaptive regression trees, and for Cartagena, extreme gradient boosting had the best performance. The most important variables for the forecasting were related to the admission data.Conclusions: The results obtained from this study suggest that machine learning can be used to efficiently forecast the number of attentions of pneumonia cases, and therefore, it can be a useful decision-making tool for public health authorities.

Download Full-text

Using Machine Learning Methods To Identify Coal Pay Zones from Drilling and Logging-While-Drilling (LWD) Data

SPE Journal ◽

10.2118/198288-pa ◽

2020 ◽

Vol 25 (03) ◽

pp. 1241-1258 ◽

Cited By ~ 2

Author(s):

Ruizhi Zhong ◽

Raymond L. Johnson ◽

Zhongwei Chen

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

Learning Methods ◽

Well Completion ◽

Machine Learning Methods ◽

Extreme Gradient Boosting ◽

Logging While Drilling

Summary Accurate coal identification is critical in coal seam gas (CSG) (also known as coalbed methane or CBM) developments because it determines well completion design and directly affects gas production. Density logging using radioactive source tools is the primary tool for coal identification, adding well trips to condition the hole and additional well costs for logging runs. In this paper, machine learning methods are applied to identify coals from drilling and logging-while-drilling (LWD) data to reduce overall well costs. Machine learning algorithms include logistic regression (LR), support vector machine (SVM), artificial neural network (ANN), random forest (RF), and extreme gradient boosting (XGBoost). The precision, recall, and F1 score are used as evaluation metrics. Because coal identification is an imbalanced data problem, the performance on the minority class (i.e., coals) is limited. To enhance the performance on coal prediction, two data manipulation techniques [naive random oversampling (NROS) technique and synthetic minority oversampling technique (SMOTE)] are separately coupled with machine learning algorithms. Case studies are performed with data from six wells in the Surat Basin, Australia. For the first set of experiments (single-well experiments), both the training data and test data are in the same well. The machine learning methods can identify coal pay zones for sections with poor or missing logs. It is found that rate of penetration (ROP) is the most important feature. The second set of experiments (multiple-well experiments) uses the training data from multiple nearby wells, which can predict coal pay zones in a new well. The most important feature is gamma ray. After placing slotted casings, all wells have coal identification rates greater than 90%, and three wells have coal identification rates greater than 99%. This indicates that machine learning methods (either XGBoost or ANN/RF with NROS/SMOTE) can be an effective way to identify coal pay zones and reduce coring or logging costs in CSG developments.

Download Full-text

ML-Based Analysis of Particle Distributions in High-Intensity Laser Experiments: Role of Binning Strategy

Entropy ◽

10.3390/e23010021 ◽

2020 ◽

Vol 23 (1) ◽

pp. 21

Author(s):

Yury Rodimkov ◽

Evgeny Efimenko ◽

Valentin Volokitin ◽

Elena Panova ◽

Alexey Polovinkin ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Quantum Electrodynamics ◽

Strong Field ◽

Experimental Studies ◽

Research Area ◽

Gradient Boosting ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods

When entering the phase of big data processing and statistical inferences in experimental physics, the efficient use of machine learning methods may require optimal data preprocessing methods and, in particular, optimal balance between details and noise. In experimental studies of strong-field quantum electrodynamics with intense lasers, this balance concerns data binning for the observed distributions of particles and photons. Here we analyze the aspect of binning with respect to different machine learning methods (Support Vector Machine (SVM), Gradient Boosting Trees (GBT), Fully-Connected Neural Network (FCNN), Convolutional Neural Network (CNN)) using numerical simulations that mimic expected properties of upcoming experiments. We see that binning can crucially affect the performance of SVM and GBT, and, to a less extent, FCNN and CNN. This can be interpreted as the latter methods being able to effectively learn the optimal binning, discarding unnecessary information. Nevertheless, given limited training sets, the results indicate that the efficiency can be increased by optimizing the binning scale along with other hyperparameters. We present specific measurements of accuracy that can be useful for planning of experiments in the specified research area.

Download Full-text

Prediction of Liver Weight Recovery by an Integrated Metabolomics and Machine Learning Approach After 2/3 Partial Hepatectomy

Frontiers in Pharmacology ◽

10.3389/fphar.2021.760474 ◽

2021 ◽

Vol 12 ◽

Author(s):

Runbin Sun ◽

Haokai Zhao ◽

Shuzhen Huang ◽

Ran Zhang ◽

Zhenyao Lu ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Liver Regeneration ◽

Partial Hepatectomy ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods ◽

Liver Index ◽

Extreme Gradient Boosting

Liver has an ability to regenerate itself in mammals, whereas the mechanism has not been fully explained. Here we used a GC/MS-based metabolomic method to profile the dynamic endogenous metabolic change in the serum of C57BL/6J mice at different times after 2/3 partial hepatectomy (PHx), and nine machine learning methods including Least Absolute Shrinkage and Selection Operator Regression (LASSO), Partial Least Squares Regression (PLS), Principal Components Regression (PCR), k-Nearest Neighbors (KNN), Support Vector Machines (SVM), Random Forest (RF), eXtreme Gradient Boosting (xgbDART), Neural Network (NNET) and Bayesian Regularized Neural Network (BRNN) were used for regression between the liver index and metabolomic data at different stages of liver regeneration. We found a tree-based random forest method that had the minimum average Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) and the maximum R square (R2) and is time-saving. Furthermore, variable of importance in the project (VIP) analysis of RF method was performed and metabolites with VIP ranked top 20 were selected as the most critical metabolites contributing to the model. Ornithine, phenylalanine, 2-hydroxybutyric acid, lysine, etc. were chosen as the most important metabolites which had strong correlations with the liver index. Further pathway analysis found Arginine biosynthesis, Pantothenate and CoA biosynthesis, Galactose metabolism, Valine, leucine and isoleucine degradation were the most influenced pathways. In summary, several amino acid metabolic pathways and glucose metabolism pathway were dynamically changed during liver regeneration. The RF method showed advantages for predicting the liver index after PHx over other machine learning methods used and a metabolic clock containing four metabolites is established to predict the liver index during liver regeneration.

Download Full-text

Predicting algal biochar yield using eXtreme Gradient Boosting (XGB) algorithm of machine learning methods

Algal Research ◽

10.1016/j.algal.2020.102006 ◽

2020 ◽

Vol 50 ◽

pp. 102006 ◽

Cited By ~ 3

Author(s):

Abhijeet Pathy ◽

Saswat Meher ◽

Balasubramanian P

Keyword(s):

Machine Learning ◽

Gradient Boosting ◽

Learning Methods ◽

Machine Learning Methods ◽

Extreme Gradient Boosting ◽

Algal Biochar

Download Full-text

Estimating Surface Downward Longwave Radiation Using Machine Learning Methods

Atmosphere ◽

10.3390/atmos11111147 ◽

2020 ◽

Vol 11 (11) ◽

pp. 1147 ◽

Cited By ~ 1

Author(s):

Chunjie Feng ◽

Xiaotong Zhang ◽

Yu Wei ◽

Weiyu Zhang ◽

Ning Hou ◽

...

Keyword(s):

Machine Learning ◽

Land Cover ◽

Time Scale ◽

Longwave Radiation ◽

Surface Radiation ◽

Gradient Boosting ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods ◽

Downward Longwave Radiation

The downward longwave radiation (Ld, 4–100 μm) is a major component of research for the surface radiation energy budget and balance. In this study, we applied five machine learning methods, namely artificial neural network (ANN), support vector regression (SVR), gradient boosting regression tree (GBRT), random forest (RF), and multivariate adaptive regression spline (MARS), to estimate Ld using ground measurements collected from 27 Baseline Surface Radiation Network (BSRN) stations. Ld measurements in situ were used to validate the accuracy of Ld estimation models on daily and monthly time scales. A comparison of the results demonstrated that the estimates on the basis of the GBRT method had the highest accuracy, with an overall root-mean-square error (RMSE) of 17.50 W m−2 and an R value of 0.96 for the test dataset on a daily time scale. These values were 11.19 W m−2 and 0.98, respectively, on a monthly time scale. The effects of land cover and elevation were further studied to comprehensively evaluate the performance of each machine learning method. All machine learning methods achieved better results over the grass land cover type but relatively worse results over the tundra. GBRT, RF, and MARS methods were found to show good performance at both the high- and low-altitude sites.

Download Full-text

Predicting COVID-19 mortality risk in Toronto, Canada: a comparison of tree-based and regression-based machine learning methods

BMC Medical Research Methodology ◽

10.1186/s12874-021-01441-4 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Cindy Feng ◽

George Kephart ◽

Elizabeth Juarez-Colunga

Keyword(s):

Machine Learning ◽

Mortality Risk ◽

Predictive Accuracy ◽

Classification Tree ◽

Superior Performance ◽

Gradient Boosting ◽

Learning Methods ◽

Linear Discriminant ◽

Machine Learning Methods ◽

Extreme Gradient Boosting

Abstract Background Coronavirus disease (COVID-19) presents an unprecedented threat to global health worldwide. Accurately predicting the mortality risk among the infected individuals is crucial for prioritizing medical care and mitigating the healthcare system’s burden. The present study aimed to assess the predictive accuracy of machine learning methods to predict the COVID-19 mortality risk. Methods We compared the performance of classification tree, random forest (RF), extreme gradient boosting (XGBoost), logistic regression, generalized additive model (GAM) and linear discriminant analysis (LDA) to predict the mortality risk among 49,216 COVID-19 positive cases in Toronto, Canada, reported from March 1 to December 10, 2020. We used repeated split-sample validation and k-steps-ahead forecasting validation. Predictive models were estimated using training samples, and predictive accuracy of the methods for the testing samples was assessed using the area under the receiver operating characteristic curve, Brier’s score, calibration intercept and calibration slope. Results We found XGBoost is highly discriminative, with an AUC of 0.9669 and has superior performance over conventional tree-based methods, i.e., classification tree or RF methods for predicting COVID-19 mortality risk. Regression-based methods (logistic, GAM and LASSO) had comparable performance to the XGBoost with slightly lower AUCs and higher Brier’s scores. Conclusions XGBoost offers superior performance over conventional tree-based methods and minor improvement over regression-based methods for predicting COVID-19 mortality risk in the study population.

Download Full-text

Using machine learning methods to determine Myers-Briggs Type Index (Mbti) types of people

Bulletin of the National Engineering Academy of the Republic of Kazakhstan ◽

10.47533/2020.1606-146x.58 ◽

2021 ◽

Vol 1 (79) ◽

pp. 32-39

Author(s):

A. Myngzhassar ◽

◽

A. B. Kuldzhabekov ◽

S. Daribayev ◽

А. N. Temirbekov ◽

...

Keyword(s):

Machine Learning ◽

Social Networks ◽

Text Messages ◽

Gradient Boosting ◽

Learning Methods ◽

Psychological Types ◽

Machine Learning Methods ◽

Extreme Gradient Boosting ◽

Computer Linguistics ◽

Myers Briggs

The article is based on the problems of machine learning in the field of computer linguistics, in particular, the identification of psychological types of people on the basis of text messages on social networks. The purpose of this article is to study the methods of machine learning Naive bayes and Extreme Gradient Boosting (XGBoost) to create a classifier for the Kazakh language, which determines the type of Myers-Briggs Type Index (MBTI) based on text samples of people’s posts on social networks. The course of research experiments in the use of machine learning methods and the results of the study are presented and the results obtained are compared.

Download Full-text