The impact of different parameter sets on the classification of asteroid types

2021 ◽  
Author(s):  
Hanna Klimczak ◽  
Wojciech Kotłowski ◽  
Dagmara Oszkiewicz ◽  
Francesca DeMeo ◽  
Agnieszka Kryszczyńska ◽  
...  

<p>The aim of the project is the classification of asteroids according to the most commonly used asteroid taxonomy (Bus-Demeo et al. 2009) with the use of various machine learning methods like Logistic Regression, Naive Bayes, Support Vector Machines, Gradient Boosting and Multilayer Perceptrons. Different parameter sets are used for classification in order to compare the quality of prediction with limited amount of data, namely the difference in performance between using the 0.45mu to 2.45mu spectral range and multiple spectral features, as well as performing the Prinicpal Component Analysis to reduce the dimensions of the spectral data.</p> <p> </p> <p>This work has been supported by grant No. 2017/25/B/ST9/00740 from the National Science Centre, Poland.</p>

2020 ◽  
Vol 6 (3) ◽  
pp. 353-356
Author(s):  
Martin Golz ◽  
Sebastian Thomas ◽  
Adolf Schenka

AbstractGMLVQ (Generalized Matrix Relevance Learning Vector Quantization) is a method of machine learning with an adaptive metric. While training, the prototype vectors as well as the weight matrix of the metric are adapted simultaneously. The method is presented in more detail and compared with other machine learning methods employing a fixed metric. It was investigated how accurately the methods can assign the 6-channel EEG of 25 young drivers, who drove overnight in the simulation lab, to the two classes of mild and severe drowsiness. Results of cross-validation show that GMLVQ is at 81.7 ± 1.3 % mean classification accuracy. It is not as accurate as support-vector machines (SVM) and gradient boosting machines (GBM) and cannot exploit the potential of learning adaptive metrics in the case of EEG data. However, information is provided on the relevance of each signal feature from the weighting matrix.


Mathematics ◽  
2020 ◽  
Vol 8 (5) ◽  
pp. 851 ◽  
Author(s):  
Bi-Min Hsu

Text classification is an essential aspect in many applications, such as spam detection and sentiment analysis. With the growing number of textual documents and datasets generated through social media and news articles, an increasing number of machine learning methods are required for accurate textual classification. For this paper, a comprehensive evaluation of the performance of multiple supervised learning models, such as logistic regression (LR), decision trees (DT), support vector machine (SVM), AdaBoost (AB), random forest (RF), multinomial naive Bayes (NB), multilayer perceptrons (MLP), and gradient boosting (GB), was conducted to assess the efficiency and robustness, as well as limitations, of these models on the classification of textual data. SVM, LR, and MLP had better performance in general, with SVM being the best, while DT and AB had much lower accuracies amongst all the tested models. Further exploration on the use of different SVM kernels was performed, demonstrating the advantage of using linear kernels over polynomial, sigmoid, and radial basis function kernels for text classification. The effects of removing stop words on model performance was also investigated; DT performed better with stop words removed, while all other models were relatively unaffected by the presence or absence of stop words.


2016 ◽  
Vol 14 (06) ◽  
pp. 1650033 ◽  
Author(s):  
Li Gu ◽  
Lichun Xue ◽  
Qi Song ◽  
Fengji Wang ◽  
Huaqin He ◽  
...  

During commercial transactions, the quality of flue-cured tobacco leaves must be characterized efficiently, and the evaluation system should be easily transferable across different traders. However, there are over 3000 chemical compounds in flue-cured tobacco leaves; thus, it is impossible to evaluate the quality of flue-cured tobacco leaves using all the chemical compounds. In this paper, we used Support Vector Machine (SVM) algorithm together with 22 chemical compounds selected by ReliefF-Particle Swarm Optimization (R-PSO) to classify the fragrant style of flue-cured tobacco leaves, where the Accuracy (ACC) and Matthews Correlation Coefficient (MCC) were 90.95% and 0.80, respectively. SVM algorithm combined with 19 chemical compounds selected by R-PSO achieved the best assessment performance of the aromatic quality of tobacco leaves, where the PCC and MSE were 0.594 and 0.263, respectively. Finally, we constructed two online tools to classify the fragrant style and evaluate the aromatic quality of flue-cured tobacco leaf samples. These tools can be accessed at http://bioinformatics.fafu.edu.cn/tobacco .


2016 ◽  
Vol 24 (3) ◽  
pp. 385-409 ◽  
Author(s):  
Fernando E. B. Otero ◽  
Alex A. Freitas

Most ant colony optimization (ACO) algorithms for inducing classification rules use a ACO-based procedure to create a rule in a one-at-a-time fashion. An improved search strategy has been proposed in the cAnt-Miner[Formula: see text] algorithm, where an ACO-based procedure is used to create a complete list of rules (ordered rules), i.e., the ACO search is guided by the quality of a list of rules instead of an individual rule. In this paper we propose an extension of the cAnt-Miner[Formula: see text] algorithm to discover a set of rules (unordered rules). The main motivations for this work are to improve the interpretation of individual rules by discovering a set of rules and to evaluate the impact on the predictive accuracy of the algorithm. We also propose a new measure to evaluate the interpretability of the discovered rules to mitigate the fact that the commonly used model size measure ignores how the rules are used to make a class prediction. Comparisons with state-of-the-art rule induction algorithms, support vector machines, and the cAnt-Miner[Formula: see text] producing ordered rules are also presented.


Author(s):  
Nurhan Gursel Ozmen ◽  
Levent Gumusel

In brain computer interface (BCI) research, electroencephalography (EEG) is the most widely used method due to its noninvasiveness, high temporal resolution and portability. Most of the EEG-based BCI studies are aimed at developing methodologies for signal processing, feature extraction and classification. In this study, an experimental EEG study was carried out with six subjects performing imagery mental and motor tasks. We present a  multi-class EEG decoding with a novel pairwise output coding method of EEGs to improve the performance of self-induced BCI systems. This method involves an augmented one-versus-one multiclass classification with less time and reduced number of electrodes. Furthermore, a train repetition number is introduced in the training step to optimize the data selection. The difference among right and left hemispheres is also searched. Finally, the difference between experienced and novice subjects is also observed. The experimental results have demonstrated that, the use of proposed classification algorithm produces high classification accuracies (98%) with nine channels. Reduced numbers of channels (four channels) have 100% accuracies for mental tasks and 87% accuracies for motor tasks with Support Vector Machines (SVM). The classification accuracies are quite high though the proposed one-versus-one technique worked well compared to the classical method. The results would be promising for a real-time study.


Author(s):  
F. Pirotti ◽  
F. Sunar ◽  
M. Piragnolo

Thanks to mainly ESA and USGS, a large bulk of free images of the Earth is readily available nowadays. One of the main goals of remote sensing is to label images according to a set of semantic categories, i.e. image classification. This is a very challenging issue since land cover of a specific class may present a large spatial and spectral variability and objects may appear at different scales and orientations. <br><br> In this study, we report the results of benchmarking 9 machine learning algorithms tested for accuracy and speed in training and classification of land-cover classes in a Sentinel-2 dataset. The following machine learning methods (MLM) have been tested: linear discriminant analysis, k-nearest neighbour, random forests, support vector machines, multi layered perceptron, multi layered perceptron ensemble, ctree, boosting, logarithmic regression. The validation is carried out using a control dataset which consists of an independent classification in 11 land-cover classes of an area about 60 km<sup>2</sup>, obtained by manual visual interpretation of high resolution images (20 cm ground sampling distance) by experts. In this study five out of the eleven classes are used since the others have too few samples (pixels) for testing and validating subsets. The classes used are the following: (i) urban (ii) sowable areas (iii) water (iv) tree plantations (v) grasslands. <br><br> Validation is carried out using three different approaches: (i) using pixels from the training dataset (<i>train</i>), (ii) using pixels from the training dataset and applying cross-validation with the k-fold method (<i>kfold</i>) and (iii) using all pixels from the control dataset. Five accuracy indices are calculated for the comparison between the values predicted with each model and control values over three sets of data: the training dataset (train), the whole control dataset (<i>full</i>) and with k-fold cross-validation (<i>kfold</i>) with ten folds. Results from validation of predictions of the whole dataset (<i>full</i>) show the random forests method with the highest values; kappa index ranging from 0.55 to 0.42 respectively with the most and least number pixels for training. The two neural networks (multi layered perceptron and its ensemble) and the support vector machines - with default radial basis function kernel - methods follow closely with comparable performance.


Author(s):  
F. Pirotti ◽  
F. Sunar ◽  
M. Piragnolo

Thanks to mainly ESA and USGS, a large bulk of free images of the Earth is readily available nowadays. One of the main goals of remote sensing is to label images according to a set of semantic categories, i.e. image classification. This is a very challenging issue since land cover of a specific class may present a large spatial and spectral variability and objects may appear at different scales and orientations. &lt;br&gt;&lt;br&gt; In this study, we report the results of benchmarking 9 machine learning algorithms tested for accuracy and speed in training and classification of land-cover classes in a Sentinel-2 dataset. The following machine learning methods (MLM) have been tested: linear discriminant analysis, k-nearest neighbour, random forests, support vector machines, multi layered perceptron, multi layered perceptron ensemble, ctree, boosting, logarithmic regression. The validation is carried out using a control dataset which consists of an independent classification in 11 land-cover classes of an area about 60 km&lt;sup&gt;2&lt;/sup&gt;, obtained by manual visual interpretation of high resolution images (20 cm ground sampling distance) by experts. In this study five out of the eleven classes are used since the others have too few samples (pixels) for testing and validating subsets. The classes used are the following: (i) urban (ii) sowable areas (iii) water (iv) tree plantations (v) grasslands. &lt;br&gt;&lt;br&gt; Validation is carried out using three different approaches: (i) using pixels from the training dataset (&lt;i&gt;train&lt;/i&gt;), (ii) using pixels from the training dataset and applying cross-validation with the k-fold method (&lt;i&gt;kfold&lt;/i&gt;) and (iii) using all pixels from the control dataset. Five accuracy indices are calculated for the comparison between the values predicted with each model and control values over three sets of data: the training dataset (train), the whole control dataset (&lt;i&gt;full&lt;/i&gt;) and with k-fold cross-validation (&lt;i&gt;kfold&lt;/i&gt;) with ten folds. Results from validation of predictions of the whole dataset (&lt;i&gt;full&lt;/i&gt;) show the random forests method with the highest values; kappa index ranging from 0.55 to 0.42 respectively with the most and least number pixels for training. The two neural networks (multi layered perceptron and its ensemble) and the support vector machines - with default radial basis function kernel - methods follow closely with comparable performance.


Author(s):  
E Gani ◽  
C Manzie

This paper proposes the use of support vector machines to perform classification between different types of missed combustion event in a six-cylinder engine. On-board diagnostics regulations require the detection of missed combustion events, which is possible through interpretation of crankshaft speed information. However, current approaches provide no information on the actual cause of the event, in particular whether it was caused by a misfuel (absence of fuel) or a misfire (absence of spark) event. Whilst the impact on the environment and emission treatment systems due to misfuel is minimal, misfire events are detrimental to both. Consequently information regarding the causes of missing combustion events potentially allows the development of unique recovery strategies particular to the source of the problem. In this paper, an approach is proposed that will provide the potential for, firstly, detection of a missing combustion event and, secondly, real-time classification of the event into either misfuel or misfire events using feedback from a heated universal exhaust gas oxygen sensor. In order to evaluate the potential of such a system in an engine control unit, a computational complexity measure is also presented.


2021 ◽  
Vol 12 ◽  
Author(s):  
Weihao Chen ◽  
Pâmela A. Alexandre ◽  
Gabriela Ribeiro ◽  
Heidge Fukumasu ◽  
Wei Sun ◽  
...  

Machine learning (ML) methods have shown promising results in identifying genes when applied to large transcriptome datasets. However, no attempt has been made to compare the performance of combining different ML methods together in the prediction of high feed efficiency (HFE) and low feed efficiency (LFE) animals. In this study, using RNA sequencing data of five tissues (adrenal gland, hypothalamus, liver, skeletal muscle, and pituitary) from nine HFE and nine LFE Nellore bulls, we evaluated the prediction accuracies of five analytical methods in classifying FE animals. These included two conventional methods for differential gene expression (DGE) analysis (t-test and edgeR) as benchmarks, and three ML methods: Random Forests (RFs), Extreme Gradient Boosting (XGBoost), and combination of both RF and XGBoost (RX). Utility of a subset of candidate genes selected from each method for classification of FE animals was assessed by support vector machine (SVM). Among all methods, the smallest subsets of genes (117) identified by RX outperformed those chosen by t-test, edgeR, RF, or XGBoost in classification accuracy of animals. Gene co-expression network analysis confirmed the interactivity existing among these genes and their relevance within the network related to their prediction ranking based on ML. The results demonstrate a great potential for applying a combination of ML methods to large transcriptome datasets to identify biologically important genes for accurately classifying FE animals.


2020 ◽  
Vol 4 (1) ◽  
pp. 1-6
Author(s):  
Irzal Ahmad Sabilla ◽  
Chastine Fatichah

Vegetables are ingredients for flavoring, such as tomatoes and chilies. A Both of these ingredients are processed to accompany the people's staple food in the form of sauce and seasoning. In supermarkets, these vegetables can be found easily, but many people do not understand how to choose the type and quality of chilies and tomatoes. This study discusses the classification of types of cayenne, curly, green, red chilies, and tomatoes with good and bad conditions using machine learning and contrast enhancement techniques. The machine learning methods used are Support Vector Machine (SVM), K-Nearest Neighbor (K-NN), Linear Discriminant Analysis (LDA), and Random Forest (RF). The results of testing the best method are measured based on the value of accuracy. In addition to the accuracy of this study, it also measures the speed of computation so that the methods used are efficient.


Sign in / Sign up

Export Citation Format

Share Document