The impact of different parameter sets on the classification of asteroid types

AbstractGMLVQ (Generalized Matrix Relevance Learning Vector Quantization) is a method of machine learning with an adaptive metric. While training, the prototype vectors as well as the weight matrix of the metric are adapted simultaneously. The method is presented in more detail and compared with other machine learning methods employing a fixed metric. It was investigated how accurately the methods can assign the 6-channel EEG of 25 young drivers, who drove overnight in the simulation lab, to the two classes of mild and severe drowsiness. Results of cross-validation show that GMLVQ is at 81.7 ± 1.3 % mean classification accuracy. It is not as accurate as support-vector machines (SVM) and gradient boosting machines (GBM) and cannot exploit the potential of learning adaptive metrics in the case of EEG data. However, information is provided on the relevance of each signal feature from the weighting matrix.

Download Full-text

Comparison of Supervised Classification Models on Textual Data

Mathematics ◽

10.3390/math8050851 ◽

2020 ◽

Vol 8 (5) ◽

pp. 851 ◽

Cited By ~ 1

Author(s):

Bi-Min Hsu

Keyword(s):

Text Classification ◽

Comprehensive Evaluation ◽

Model Performance ◽

Gradient Boosting ◽

Support Vector ◽

Multilayer Perceptrons ◽

Machine Learning Methods ◽

Textual Data ◽

Textual Classification

Text classification is an essential aspect in many applications, such as spam detection and sentiment analysis. With the growing number of textual documents and datasets generated through social media and news articles, an increasing number of machine learning methods are required for accurate textual classification. For this paper, a comprehensive evaluation of the performance of multiple supervised learning models, such as logistic regression (LR), decision trees (DT), support vector machine (SVM), AdaBoost (AB), random forest (RF), multinomial naive Bayes (NB), multilayer perceptrons (MLP), and gradient boosting (GB), was conducted to assess the efficiency and robustness, as well as limitations, of these models on the classification of textual data. SVM, LR, and MLP had better performance in general, with SVM being the best, while DT and AB had much lower accuracies amongst all the tested models. Further exploration on the use of different SVM kernels was performed, demonstrating the advantage of using linear kernels over polynomial, sigmoid, and radial basis function kernels for text classification. The effects of removing stop words on model performance was also investigated; DT performed better with stop words removed, while all other models were relatively unaffected by the presence or absence of stop words.

Download Full-text

Classification of the fragrant styles and evaluation of the aromatic quality of flue-cured tobacco leaves by machine-learning methods

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720016500335 ◽

2016 ◽

Vol 14 (06) ◽

pp. 1650033 ◽

Cited By ~ 1

Author(s):

Li Gu ◽

Lichun Xue ◽

Qi Song ◽

Fengji Wang ◽

Huaqin He ◽

...

Keyword(s):

Evaluation System ◽

Chemical Compounds ◽

Support Vector ◽

Tobacco Leaves ◽

Machine Learning Methods ◽

Online Tools ◽

Svm Algorithm ◽

Assessment Performance

During commercial transactions, the quality of flue-cured tobacco leaves must be characterized efficiently, and the evaluation system should be easily transferable across different traders. However, there are over 3000 chemical compounds in flue-cured tobacco leaves; thus, it is impossible to evaluate the quality of flue-cured tobacco leaves using all the chemical compounds. In this paper, we used Support Vector Machine (SVM) algorithm together with 22 chemical compounds selected by ReliefF-Particle Swarm Optimization (R-PSO) to classify the fragrant style of flue-cured tobacco leaves, where the Accuracy (ACC) and Matthews Correlation Coefficient (MCC) were 90.95% and 0.80, respectively. SVM algorithm combined with 19 chemical compounds selected by R-PSO achieved the best assessment performance of the aromatic quality of tobacco leaves, where the PCC and MSE were 0.594 and 0.263, respectively. Finally, we constructed two online tools to classify the fragrant style and evaluate the aromatic quality of flue-cured tobacco leaf samples. These tools can be accessed at http://bioinformatics.fafu.edu.cn/tobacco .

Download Full-text

Improving the Interpretability of Classification Rules Discovered by an Ant Colony Algorithm: Extended Results

Evolutionary Computation ◽

10.1162/evco_a_00155 ◽

2016 ◽

Vol 24 (3) ◽

pp. 385-409 ◽

Cited By ~ 6

Author(s):

Fernando E. B. Otero ◽

Alex A. Freitas

Keyword(s):

Ant Colony Algorithm ◽

Predictive Accuracy ◽

Ant Colony ◽

Support Vector ◽

Classification Rules ◽

Class Prediction ◽

Vector Machines ◽

Size Measure ◽

The Impact

Most ant colony optimization (ACO) algorithms for inducing classification rules use a ACO-based procedure to create a rule in a one-at-a-time fashion. An improved search strategy has been proposed in the cAnt-Miner[Formula: see text] algorithm, where an ACO-based procedure is used to create a complete list of rules (ordered rules), i.e., the ACO search is guided by the quality of a list of rules instead of an individual rule. In this paper we propose an extension of the cAnt-Miner[Formula: see text] algorithm to discover a set of rules (unordered rules). The main motivations for this work are to improve the interpretation of individual rules by discovering a set of rules and to evaluate the impact on the predictive accuracy of the algorithm. We also propose a new measure to evaluate the interpretability of the discovered rules to mitigate the fact that the commonly used model size measure ignores how the rules are used to make a class prediction. Comparisons with state-of-the-art rule induction algorithms, support vector machines, and the cAnt-Miner[Formula: see text] producing ordered rules are also presented.

Download Full-text

A pairwise output coding method for multi-class EEG classification of a self-induced BCI

An International Journal of Optimization and Control Theories & Applications (IJOCTA) ◽

10.11121/ijocta.01.2018.00516 ◽

2018 ◽

Vol 8 (2) ◽

pp. 216-227

Author(s):

Nurhan Gursel Ozmen ◽

Levent Gumusel

Keyword(s):

Classical Method ◽

Support Vector ◽

High Temporal Resolution ◽

Motor Tasks ◽

Coding Method ◽

Vector Machines ◽

The Difference ◽

Repetition Number ◽

Output Coding

In brain computer interface (BCI) research, electroencephalography (EEG) is the most widely used method due to its noninvasiveness, high temporal resolution and portability. Most of the EEG-based BCI studies are aimed at developing methodologies for signal processing, feature extraction and classification. In this study, an experimental EEG study was carried out with six subjects performing imagery mental and motor tasks. We present a multi-class EEG decoding with a novel pairwise output coding method of EEGs to improve the performance of self-induced BCI systems. This method involves an augmented one-versus-one multiclass classification with less time and reduced number of electrodes. Furthermore, a train repetition number is introduced in the training step to optimize the data selection. The difference among right and left hemispheres is also searched. Finally, the difference between experienced and novice subjects is also observed. The experimental results have demonstrated that, the use of proposed classification algorithm produces high classification accuracies (98%) with nine channels. Reduced numbers of channels (four channels) have 100% accuracies for mental tasks and 87% accuracies for motor tasks with Support Vector Machines (SVM). The classification accuracies are quite high though the proposed one-versus-one technique worked well compared to the classical method. The results would be promising for a real-time study.

Download Full-text

BENCHMARK OF MACHINE LEARNING METHODS FOR CLASSIFICATION OF A SENTINEL-2 IMAGE

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xli-b7-335-2016 ◽

2016 ◽

Vol XLI-B7 ◽

pp. 335-340 ◽

Cited By ~ 4

Author(s):

F. Pirotti ◽

F. Sunar ◽

M. Piragnolo

Keyword(s):

Machine Learning ◽

Land Cover ◽

Cross Validation ◽

Training Dataset ◽

Support Vector ◽

Machine Learning Methods ◽

Control Dataset ◽

Vector Machines ◽

Sentinel 2

Thanks to mainly ESA and USGS, a large bulk of free images of the Earth is readily available nowadays. One of the main goals of remote sensing is to label images according to a set of semantic categories, i.e. image classification. This is a very challenging issue since land cover of a specific class may present a large spatial and spectral variability and objects may appear at different scales and orientations. In this study, we report the results of benchmarking 9 machine learning algorithms tested for accuracy and speed in training and classification of land-cover classes in a Sentinel-2 dataset. The following machine learning methods (MLM) have been tested: linear discriminant analysis, k-nearest neighbour, random forests, support vector machines, multi layered perceptron, multi layered perceptron ensemble, ctree, boosting, logarithmic regression. The validation is carried out using a control dataset which consists of an independent classification in 11 land-cover classes of an area about 60 km2, obtained by manual visual interpretation of high resolution images (20 cm ground sampling distance) by experts. In this study five out of the eleven classes are used since the others have too few samples (pixels) for testing and validating subsets. The classes used are the following: (i) urban (ii) sowable areas (iii) water (iv) tree plantations (v) grasslands. Validation is carried out using three different approaches: (i) using pixels from the training dataset (train), (ii) using pixels from the training dataset and applying cross-validation with the k-fold method (kfold) and (iii) using all pixels from the control dataset. Five accuracy indices are calculated for the comparison between the values predicted with each model and control values over three sets of data: the training dataset (train), the whole control dataset (full) and with k-fold cross-validation (kfold) with ten folds. Results from validation of predictions of the whole dataset (full) show the random forests method with the highest values; kappa index ranging from 0.55 to 0.42 respectively with the most and least number pixels for training. The two neural networks (multi layered perceptron and its ensemble) and the support vector machines - with default radial basis function kernel - methods follow closely with comparable performance.

Download Full-text

BENCHMARK OF MACHINE LEARNING METHODS FOR CLASSIFICATION OF A SENTINEL-2 IMAGE

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsarchives-xli-b7-335-2016 ◽

2016 ◽

Vol XLI-B7 ◽

pp. 335-340 ◽

Cited By ~ 12

Author(s):

F. Pirotti ◽

F. Sunar ◽

M. Piragnolo

Keyword(s):

Machine Learning ◽

Land Cover ◽

Cross Validation ◽

Training Dataset ◽

Support Vector ◽

Machine Learning Methods ◽

Control Dataset ◽

Vector Machines ◽

Sentinel 2

Thanks to mainly ESA and USGS, a large bulk of free images of the Earth is readily available nowadays. One of the main goals of remote sensing is to label images according to a set of semantic categories, i.e. image classification. This is a very challenging issue since land cover of a specific class may present a large spatial and spectral variability and objects may appear at different scales and orientations. In this study, we report the results of benchmarking 9 machine learning algorithms tested for accuracy and speed in training and classification of land-cover classes in a Sentinel-2 dataset. The following machine learning methods (MLM) have been tested: linear discriminant analysis, k-nearest neighbour, random forests, support vector machines, multi layered perceptron, multi layered perceptron ensemble, ctree, boosting, logarithmic regression. The validation is carried out using a control dataset which consists of an independent classification in 11 land-cover classes of an area about 60 km2, obtained by manual visual interpretation of high resolution images (20 cm ground sampling distance) by experts. In this study five out of the eleven classes are used since the others have too few samples (pixels) for testing and validating subsets. The classes used are the following: (i) urban (ii) sowable areas (iii) water (iv) tree plantations (v) grasslands. Validation is carried out using three different approaches: (i) using pixels from the training dataset (train), (ii) using pixels from the training dataset and applying cross-validation with the k-fold method (kfold) and (iii) using all pixels from the control dataset. Five accuracy indices are calculated for the comparison between the values predicted with each model and control values over three sets of data: the training dataset (train), the whole control dataset (full) and with k-fold cross-validation (kfold) with ten folds. Results from validation of predictions of the whole dataset (full) show the random forests method with the highest values; kappa index ranging from 0.55 to 0.42 respectively with the most and least number pixels for training. The two neural networks (multi layered perceptron and its ensemble) and the support vector machines - with default radial basis function kernel - methods follow closely with comparable performance.

Download Full-text

Misfire-misfuel classification using support vector machines

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1243/09544070jauto301 ◽

2007 ◽

Vol 221 (9) ◽

pp. 1183-1195 ◽

Cited By ~ 6

Author(s):

E Gani ◽

C Manzie

Keyword(s):

Support Vector Machines ◽

Oxygen Sensor ◽

Control Unit ◽

Support Vector ◽

Exhaust Gas ◽

Vector Machines ◽

Recovery Strategies ◽

Different Types ◽

The Impact

This paper proposes the use of support vector machines to perform classification between different types of missed combustion event in a six-cylinder engine. On-board diagnostics regulations require the detection of missed combustion events, which is possible through interpretation of crankshaft speed information. However, current approaches provide no information on the actual cause of the event, in particular whether it was caused by a misfuel (absence of fuel) or a misfire (absence of spark) event. Whilst the impact on the environment and emission treatment systems due to misfuel is minimal, misfire events are detrimental to both. Consequently information regarding the causes of missing combustion events potentially allows the development of unique recovery strategies particular to the source of the problem. In this paper, an approach is proposed that will provide the potential for, firstly, detection of a missing combustion event and, secondly, real-time classification of the event into either misfuel or misfire events using feedback from a heated universal exhaust gas oxygen sensor. In order to evaluate the potential of such a system in an engine control unit, a computational complexity measure is also presented.

Download Full-text

Identification of Predictor Genes for Feed Efficiency in Beef Cattle by Applying Machine Learning Methods to Multi-Tissue Transcriptome Data

Frontiers in Genetics ◽

10.3389/fgene.2021.619857 ◽

2021 ◽

Vol 12 ◽

Author(s):

Weihao Chen ◽

Pâmela A. Alexandre ◽

Gabriela Ribeiro ◽

Heidge Fukumasu ◽

Wei Sun ◽

...

Keyword(s):

Machine Learning ◽

Feed Efficiency ◽

Gradient Boosting ◽

Support Vector ◽

Sequencing Data ◽

Machine Learning Methods ◽

Extreme Gradient Boosting ◽

High Feed ◽

Differential Gene

Machine learning (ML) methods have shown promising results in identifying genes when applied to large transcriptome datasets. However, no attempt has been made to compare the performance of combining different ML methods together in the prediction of high feed efficiency (HFE) and low feed efficiency (LFE) animals. In this study, using RNA sequencing data of five tissues (adrenal gland, hypothalamus, liver, skeletal muscle, and pituitary) from nine HFE and nine LFE Nellore bulls, we evaluated the prediction accuracies of five analytical methods in classifying FE animals. These included two conventional methods for differential gene expression (DGE) analysis (t-test and edgeR) as benchmarks, and three ML methods: Random Forests (RFs), Extreme Gradient Boosting (XGBoost), and combination of both RF and XGBoost (RX). Utility of a subset of candidate genes selected from each method for classification of FE animals was assessed by support vector machine (SVM). Among all methods, the smallest subsets of genes (117) identified by RX outperformed those chosen by t-test, edgeR, RF, or XGBoost in classification accuracy of animals. Gene co-expression network analysis confirmed the interactivity existing among these genes and their relevance within the network related to their prediction ranking based on ML. The results demonstrate a great potential for applying a combination of ML methods to large transcriptome datasets to identify biologically important genes for accurately classifying FE animals.

Download Full-text

The Tomatoes and Chilies Type Classifications by Using Machine Learning Methods

Journal of Development Research ◽

10.28926/jdr.v4i1.93 ◽

2020 ◽

Vol 4 (1) ◽

pp. 1-6

Author(s):

Irzal Ahmad Sabilla ◽

Chastine Fatichah

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Support Vector ◽

Staple Food ◽

K Nearest Neighbor ◽

Learning Methods ◽

Linear Discriminant ◽

Machine Learning Methods

Vegetables are ingredients for flavoring, such as tomatoes and chilies. A Both of these ingredients are processed to accompany the people's staple food in the form of sauce and seasoning. In supermarkets, these vegetables can be found easily, but many people do not understand how to choose the type and quality of chilies and tomatoes. This study discusses the classification of types of cayenne, curly, green, red chilies, and tomatoes with good and bad conditions using machine learning and contrast enhancement techniques. The machine learning methods used are Support Vector Machine (SVM), K-Nearest Neighbor (K-NN), Linear Discriminant Analysis (LDA), and Random Forest (RF). The results of testing the best method are measured based on the value of accuracy. In addition to the accuracy of this study, it also measures the speed of computation so that the methods used are efficient.

Download Full-text