BENCHMARK OF MACHINE LEARNING METHODS FOR CLASSIFICATION OF A SENTINEL-2 IMAGE

2016 ◽

Vol XLI-B7 ◽

pp. 335-340 ◽

Cited By ~ 12

Author(s):

F. Pirotti ◽

F. Sunar ◽

M. Piragnolo

Keyword(s):

Machine Learning ◽

Land Cover ◽

Cross Validation ◽

Training Dataset ◽

Support Vector ◽

Machine Learning Methods ◽

Control Dataset ◽

Vector Machines ◽

Sentinel 2

Thanks to mainly ESA and USGS, a large bulk of free images of the Earth is readily available nowadays. One of the main goals of remote sensing is to label images according to a set of semantic categories, i.e. image classification. This is a very challenging issue since land cover of a specific class may present a large spatial and spectral variability and objects may appear at different scales and orientations. In this study, we report the results of benchmarking 9 machine learning algorithms tested for accuracy and speed in training and classification of land-cover classes in a Sentinel-2 dataset. The following machine learning methods (MLM) have been tested: linear discriminant analysis, k-nearest neighbour, random forests, support vector machines, multi layered perceptron, multi layered perceptron ensemble, ctree, boosting, logarithmic regression. The validation is carried out using a control dataset which consists of an independent classification in 11 land-cover classes of an area about 60 km2, obtained by manual visual interpretation of high resolution images (20 cm ground sampling distance) by experts. In this study five out of the eleven classes are used since the others have too few samples (pixels) for testing and validating subsets. The classes used are the following: (i) urban (ii) sowable areas (iii) water (iv) tree plantations (v) grasslands. Validation is carried out using three different approaches: (i) using pixels from the training dataset (train), (ii) using pixels from the training dataset and applying cross-validation with the k-fold method (kfold) and (iii) using all pixels from the control dataset. Five accuracy indices are calculated for the comparison between the values predicted with each model and control values over three sets of data: the training dataset (train), the whole control dataset (full) and with k-fold cross-validation (kfold) with ten folds. Results from validation of predictions of the whole dataset (full) show the random forests method with the highest values; kappa index ranging from 0.55 to 0.42 respectively with the most and least number pixels for training. The two neural networks (multi layered perceptron and its ensemble) and the support vector machines - with default radial basis function kernel - methods follow closely with comparable performance.

Download Full-text

EEG-Based Classification of the Driver Alertness State

Current Directions in Biomedical Engineering ◽

10.1515/cdbme-2020-3091 ◽

2020 ◽

Vol 6 (3) ◽

pp. 353-356

Author(s):

Martin Golz ◽

Sebastian Thomas ◽

Adolf Schenka

Keyword(s):

Machine Learning ◽

Gradient Boosting ◽

Support Vector ◽

Weighting Matrix ◽

Machine Learning Methods ◽

Young Drivers ◽

Eeg Data ◽

Vector Machines ◽

Generalized Matrix

AbstractGMLVQ (Generalized Matrix Relevance Learning Vector Quantization) is a method of machine learning with an adaptive metric. While training, the prototype vectors as well as the weight matrix of the metric are adapted simultaneously. The method is presented in more detail and compared with other machine learning methods employing a fixed metric. It was investigated how accurately the methods can assign the 6-channel EEG of 25 young drivers, who drove overnight in the simulation lab, to the two classes of mild and severe drowsiness. Results of cross-validation show that GMLVQ is at 81.7 ± 1.3 % mean classification accuracy. It is not as accurate as support-vector machines (SVM) and gradient boosting machines (GBM) and cannot exploit the potential of learning adaptive metrics in the case of EEG data. However, information is provided on the relevance of each signal feature from the weighting matrix.

Download Full-text

Identifying Cancer Targets Based on Machine Learning Methods via Chou’s 5-steps Rule and General Pseudo Components

Current Topics in Medicinal Chemistry ◽

10.2174/1568026619666191016155543 ◽

2019 ◽

Vol 19 (25) ◽

pp. 2301-2317 ◽

Cited By ~ 2

Author(s):

Ruirui Liang ◽

Jiayang Xie ◽

Chi Zhang ◽

Mengying Zhang ◽

Hai Huang ◽

...

Keyword(s):

Machine Learning ◽

Growth Rate ◽

Big Data ◽

Human Genome Project ◽

Genome Project ◽

Support Vector ◽

Successful Implementation ◽

Learning Methods ◽

Machine Learning Methods ◽

Vector Machines

In recent years, the successful implementation of human genome project has made people realize that genetic, environmental and lifestyle factors should be combined together to study cancer due to the complexity and various forms of the disease. The increasing availability and growth rate of ‘big data’ derived from various omics, opens a new window for study and therapy of cancer. In this paper, we will introduce the application of machine learning methods in handling cancer big data including the use of artificial neural networks, support vector machines, ensemble learning and naïve Bayes classifiers.

Download Full-text

Predictive modeling for peri-implantitis by using machine learning techniques

Scientific Reports ◽

10.1038/s41598-021-90642-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Tomoaki Mameno ◽

Masahiro Wada ◽

Kazunori Nozaki ◽

Toshihito Takahashi ◽

Yoshitaka Tsujioka ◽

...

Keyword(s):

Machine Learning ◽

Demographic Data ◽

Risk Indicators ◽

Machine Learning Techniques ◽

Support Vector ◽

Machine Learning Methods ◽

Complex Interactions ◽

Learning Techniques ◽

Increased Risk ◽

Vector Machines

AbstractThe purpose of this retrospective cohort study was to create a model for predicting the onset of peri-implantitis by using machine learning methods and to clarify interactions between risk indicators. This study evaluated 254 implants, 127 with and 127 without peri-implantitis, from among 1408 implants with at least 4 years in function. Demographic data and parameters known to be risk factors for the development of peri-implantitis were analyzed with three models: logistic regression, support vector machines, and random forests (RF). As the results, RF had the highest performance in predicting the onset of peri-implantitis (AUC: 0.71, accuracy: 0.70, precision: 0.72, recall: 0.66, and f1-score: 0.69). The factor that had the most influence on prediction was implant functional time, followed by oral hygiene. In addition, PCR of more than 50% to 60%, smoking more than 3 cigarettes/day, KMW less than 2 mm, and the presence of less than two occlusal supports tended to be associated with an increased risk of peri-implantitis. Moreover, these risk indicators were not independent and had complex effects on each other. The results of this study suggest that peri-implantitis onset was predicted in 70% of cases, by RF which allows consideration of nonlinear relational data with complex interactions.

Download Full-text

Delineating Smallholder Maize Farms from Sentinel-1 Coupled with Sentinel-2 Data Using Machine Learning

Sustainability ◽

10.3390/su13094728 ◽

2021 ◽

Vol 13 (9) ◽

pp. 4728

Author(s):

Zinhle Mashaba-Munghemezulu ◽

George Johannes Chirima ◽

Cilence Munghemezulu

Keyword(s):

Machine Learning ◽

Food Security ◽

Rural Communities ◽

Machine Learning Algorithms ◽

Support Vector ◽

Subsistence Agriculture ◽

Smallholder Farms ◽

Main Driver ◽

Sentinel 2

Rural communities rely on smallholder maize farms for subsistence agriculture, the main driver of local economic activity and food security. However, their planted area estimates are unknown in most developing countries. This study explores the use of Sentinel-1 and Sentinel-2 data to map smallholder maize farms. The random forest (RF), support vector (SVM) machine learning algorithms and model stacking (ST) were applied. Results show that the classification of combined Sentinel-1 and Sentinel-2 data improved the RF, SVM and ST algorithms by 24.2%, 8.7%, and 9.1%, respectively, compared to the classification of Sentinel-1 data individually. Similarities in the estimated areas (7001.35 ± 1.2 ha for RF, 7926.03 ± 0.7 ha for SVM and 7099.59 ± 0.8 ha for ST) show that machine learning can estimate smallholder maize areas with high accuracies. The study concludes that the single-date Sentinel-1 data were insufficient to map smallholder maize farms. However, single-date Sentinel-1 combined with Sentinel-2 data were sufficient in mapping smallholder farms. These results can be used to support the generation and validation of national crop statistics, thus contributing to food security.

Download Full-text

A machine learning based method for classification of fractal features of forearm sEMG using Twin Support vector machines

2010 Annual International Conference of the IEEE Engineering in Medicine and Biology ◽

10.1109/iembs.2010.5627902 ◽

2010 ◽

Cited By ~ 12

Author(s):

S P Arjunan ◽

D K Kumar ◽

G R Naik

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Support Vector ◽

Twin Support Vector Machines ◽

Vector Machines

Download Full-text

KDClassifier: Urinary Proteomic Spectra Analysis Based on Machine Learning for Classification of Kidney Diseases

10.1101/2020.12.01.20242198 ◽

2020 ◽

Author(s):

Wanjun Zhao ◽

Yong Zhang ◽

Xinming Li ◽

Yonghong Mao ◽

Changwei Wu ◽

...

Keyword(s):

Machine Learning ◽

Mass Spectrum ◽

Kidney Disease ◽

Kidney Diseases ◽

Training Dataset ◽

Validation Dataset ◽

Support Vector ◽

Urinary Proteomics ◽

Diagnosis Model

AbstractBackgroundBy extracting the spectrum features from urinary proteomics based on an advanced mass spectrometer and machine learning algorithms, more accurate reporting results can be achieved for disease classification. We attempted to establish a novel diagnosis model of kidney diseases by combining machine learning with an extreme gradient boosting (XGBoost) algorithm with complete mass spectrum information from the urinary proteomics.MethodsWe enrolled 134 patients (including those with IgA nephropathy, membranous nephropathy, and diabetic kidney disease) and 68 healthy participants as a control, and for training and validation of the diagnostic model, applied a total of 610,102 mass spectra from their urinary proteomics produced using high-resolution mass spectrometry. We divided the mass spectrum data into a training dataset (80%) and a validation dataset (20%). The training dataset was directly used to create a diagnosis model using XGBoost, random forest (RF), a support vector machine (SVM), and artificial neural networks (ANNs). The diagnostic accuracy was evaluated using a confusion matrix. We also constructed the receiver operating-characteristic, Lorenz, and gain curves to evaluate the diagnosis model.ResultsCompared with RF, the SVM, and ANNs, the modified XGBoost model, called a Kidney Disease Classifier (KDClassifier), showed the best performance. The accuracy of the diagnostic XGBoost model was 96.03% (CI = 95.17%-96.77%; Kapa = 0.943; McNemar’s Test, P value = 0.00027). The area under the curve of the XGBoost model was 0.952 (CI = 0.9307-0.9733). The Kolmogorov-Smirnov (KS) value of the Lorenz curve was 0.8514. The Lorenz and gain curves showed the strong robustness of the developed model.ConclusionsThis study presents the first XGBoost diagnosis model, i.e., the KDClassifier, combined with complete mass spectrum information from the urinary proteomics for distinguishing different kidney diseases. KDClassifier achieves a high accuracy and robustness, providing a potential tool for the classification of all types of kidney diseases.

Download Full-text

Research on Parallel Support Vector Machine Based on Spark Big Data Platform

Scientific Programming ◽

10.1155/2021/7998417 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Yao Huimin

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Big Data ◽

Support Vector Machines ◽

Cross Validation ◽

Machine Learning Algorithms ◽

Support Vector ◽

Lambda Architecture ◽

Vector Machines ◽

Data Platform

With the development of cloud computing and distributed cluster technology, the concept of big data has been expanded and extended in terms of capacity and value, and machine learning technology has also received unprecedented attention in recent years. Traditional machine learning algorithms cannot solve the problem of effective parallelization, so a parallelization support vector machine based on Spark big data platform is proposed. Firstly, the big data platform is designed with Lambda architecture, which is divided into three layers: Batch Layer, Serving Layer, and Speed Layer. Secondly, in order to improve the training efficiency of support vector machines on large-scale data, when merging two support vector machines, the “special points” other than support vectors are considered, that is, the points where the nonsupport vectors in one subset violate the training results of the other subset, and a cross-validation merging algorithm is proposed. Then, a parallelized support vector machine based on cross-validation is proposed, and the parallelization process of the support vector machine is realized on the Spark platform. Finally, experiments on different datasets verify the effectiveness and stability of the proposed method. Experimental results show that the proposed parallelized support vector machine has outstanding performance in speed-up ratio, training time, and prediction accuracy.

Download Full-text

The impact of different parameter sets on the classification of asteroid types

10.5194/epsc2021-807 ◽

2021 ◽

Author(s):

Hanna Klimczak ◽

Wojciech Kotłowski ◽

Dagmara Oszkiewicz ◽

Francesca DeMeo ◽

Agnieszka Kryszczyńska ◽

...

Keyword(s):

Gradient Boosting ◽

Support Vector ◽

Multilayer Perceptrons ◽

Machine Learning Methods ◽

Vector Machines ◽

Science Centre ◽

The Difference ◽

The Impact

The aim of the project is the classification of asteroids according to the most commonly used asteroid taxonomy (Bus-Demeo et al. 2009) with the use of various machine learning methods like Logistic Regression, Naive Bayes, Support Vector Machines, Gradient Boosting and Multilayer Perceptrons. Different parameter sets are used for classification in order to compare the quality of prediction with limited amount of data, namely the difference in performance between using the 0.45mu to 2.45mu spectral range and multiple spectral features, as well as performing the Prinicpal Component Analysis to reduce the dimensions of the spectral data. &#160; This work has been supported by grant&#160;No. 2017/25/B/ST9/00740 from the National Science Centre, Poland.

Download Full-text

Using Remote Monitoring and Machine Learning to Classify Slam Events of Wave Piercing Catamarans

The International Journal of Maritime Engineering ◽

10.5750/ijme.v163ia3.797 ◽

2021 ◽

Vol 163 (A3) ◽

Author(s):

B Shabani ◽

J Ali-Lavroff ◽

D S Holloway ◽

S Penev ◽

D Dessi ◽

...

Keyword(s):

Machine Learning ◽

Monitoring System ◽

Probability Distributions ◽

Feature Space ◽

Support Vector ◽

Acceleration Threshold ◽

Vector Machines ◽

Structural Responses ◽

As Stress

An onboard monitoring system can measure features such as stress cycles counts and provide warnings due to slamming. Considering current technology trends there is the opportunity of incorporating machine learning methods into monitoring systems. A hull monitoring system has been developed and installed on a 111 m wave piercing catamaran (Hull 091) to remotely monitor the ship kinematics and hull structural responses. Parallel to that, an existing dataset of a similar vessel (Hull 061) was analysed using unsupervised and supervised learning models; these were found to be beneficial for the classification of bow entry events according to key kinematic parameters. A comparison of different algorithms including linear support vector machines, naïve Bayes and decision tree for the bow entry classification were conducted. In addition, using empirical probability distributions, the likelihood of wet-deck slamming was estimated given a vertical bow acceleration threshold of 1 in head seas, clustering the feature space with the approximate probabilities of 0.001, 0.030 and 0.25.

Download Full-text