scholarly journals Geographical Classification of Tannat Wines Based on Support Vector Machines and Feature Selection

Beverages ◽  
2018 ◽  
Vol 4 (4) ◽  
pp. 97 ◽  
Author(s):  
Nattane Costa ◽  
Laura Llobodanin ◽  
Inar Castro ◽  
Rommel Barbosa

Geographical product recognition has become an issue for researchers and food industries. One way to obtain useful information about the fingerprint of wines is by examining that fingerprint’s chemical components. In this paper, we present a data mining and predictive analysis to classify Brazilian and Uruguayan Tannat wines from the South region using the support vector machine (SVM) classification algorithm with the radial basis kernel function and the F-score feature selection method. A total of 37 Tannat wines differing in geographical origin (9 Brazilian samples and 28 Uruguayan samples) were analyzed. We concluded that given the use of at least one anthocyanin (peon-3-glu) and the radical scavenging activity (DPPH), the Tannat wines can be classified with 94.64% accuracy and 0.90 Matthew’s correlation coefficient (MCC). Furthermore, the combination of SVM and feature selection proved useful for determining the main chemical parameters that discriminate with regard to the origin of Tannat wines and classifying them with a high degree of accuracy. Additionally, to our knowledge, this is the first study to classify the Tannat wine variety in the context of two countries in South America.

2014 ◽  
Vol 618 ◽  
pp. 573-577 ◽  
Author(s):  
Yu Qiang Qin ◽  
Yu Dong Qi ◽  
Hui Ying

The assessment of risk of default on credit is important for financial institutions. Logistic regression and discriminant analysis are techniques traditionally used in credit rating for determining likelihood to default based on consumer application and credit reference agency data. We test support vector machines (SVM) against these traditional methods on a large credit card database. We find that they are competitive and can be used as the basis of a feature selection method to discover those features that are most significant in determining risk of default.


2011 ◽  
Vol 10 ◽  
pp. CIN.S7111 ◽  
Author(s):  
Sandra L. Taylor ◽  
Kyoungmi Kim

With technological advances now allowing measurement of thousands of genes, proteins and metabolites, researchers are using this information to develop diagnostic and prognostic tests and discern the biological pathways underlying diseases. Often, an investigator's objective is to develop a classification rule to predict group membership of unknown samples based on a small set of features and that could ultimately be used in a clinical setting. While common classification methods such as random forest and support vector machines are effective at separating groups, they do not directly translate into a clinically-applicable classification rule based on a small number of features. We present a simple feature selection and classification method for biomarker detection that is intuitively understandable and can be directly extended for application to a clinical setting. We first use a jackknife procedure to identify important features and then, for classification, we use voting classifiers which are simple and easy to implement. We compared our method to random forest and support vector machines using three benchmark cancer ‘omics datasets with different characteristics. We found our jackknife procedure and voting classifier to perform comparably to these two methods in terms of accuracy. Further, the jackknife procedure yielded stable feature sets. Voting classifiers in combination with a robust feature selection method such as our jackknife procedure offer an effective, simple and intuitive approach to feature selection and classification with a clear extension to clinical applications.


2013 ◽  
Vol 2013 ◽  
pp. 1-10 ◽  
Author(s):  
Mustafa Serter Uzer ◽  
Nihat Yilmaz ◽  
Onur Inan

This paper offers a hybrid approach that uses the artificial bee colony (ABC) algorithm for feature selection and support vector machines for classification. The purpose of this paper is to test the effect of elimination of the unimportant and obsolete features of the datasets on the success of the classification, using the SVM classifier. The developed approach conventionally used in liver diseases and diabetes diagnostics, which are commonly observed and reduce the quality of life, is developed. For the diagnosis of these diseases, hepatitis, liver disorders and diabetes datasets from the UCI database were used, and the proposed system reached a classification accuracies of 94.92%, 74.81%, and 79.29%, respectively. For these datasets, the classification accuracies were obtained by the help of the 10-fold cross-validation method. The results show that the performance of the method is highly successful compared to other results attained and seems very promising for pattern recognition applications.


2017 ◽  
Vol 4 (1) ◽  
pp. 12-17
Author(s):  
Ahmad Firdaus

The classification of hoax news or news with incorrect information is one of the text categorization applications.Like text-based categorization of machine applications in general, this system consists of pre-processing andexecution of classification models. In this study, experiments were conducted to select the best technique in each sub-process by using 1200 articles hoax and 600 articles no hoax collected manually. This research Triedexperimenting to determine the best preprocessing stages between stop removals and stemming and showing the results of the deception Tree algorithm achieving an accuracy of 100% concluded above naive byes more stable level of accuracy in the number of datasets used in all candidates. Information gain, TFIDF and GGA based on using Naive Byes algorithm, supporting Vector Machine and Decision Tree no significant percentage change occurred on all candidates. But after using GGA (Optimize Generation) feature selection there is an increase of accuracy level The results of a comparison of classification algorithms between Naive Byes, decision trees and Support Vector machines combined with the GGA feature selection method for classifying the best result is generated by the selection of GGA + Decision Tree feature on candidate 2 (Paslon2) 100% and in the selection of the Information Gain + Decision Tree Feature selection with the lowest accuracy Candidate 3 at 36.67%, but overall improvement of accuracy Occurred on all algorithm after using feature selection and Naive byes more stable level of accuracy in the number of datasets used in all candidates.


2013 ◽  
Vol 2013 ◽  
pp. 1-10 ◽  
Author(s):  
M. A. Duarte-Mermoud ◽  
N. H. Beltrán ◽  
S. A. Salah

Recently, a new crossover technique for genetic algorithms has been proposed. The technique, called probabilistic adaptive crossover (PAX), includes the estimation of the probability distribution of the population, storing the information regarding the best and the worst solutions of the problem being solved in a probability vector. The use of the proposed technique to face Chilean wine classification based on chromatograms obtained from an HPLC is reported in this paper. PAX is used in the first stage as the feature selection method and then support vector machines (SVM) and linear discriminant analysis (LDA) are used as classifiers. The results are compared with those obtained using the uniform (discrete) crossover standard technique and a variant of PAX called mixed crossover.


2021 ◽  
pp. 016555152199103
Author(s):  
Bekir Parlak ◽  
Alper Kursat Uysal

As the huge dimensionality of textual data restrains the classification accuracy, it is essential to apply feature selection (FS) methods as dimension reduction step in text classification (TC) domain. Most of the FS methods for TC contain several number of probabilities. In this study, we proposed a new FS method named as Extensive Feature Selector (EFS), which benefits from corpus-based and class-based probabilities in its calculations. The performance of EFS is compared with nine well-known FS methods, namely, Chi-Squared (CHI2), Class Discriminating Measure (CDM), Discriminative Power Measure (DPM), Odds Ratio (OR), Distinguishing Feature Selector (DFS), Comprehensively Measure Feature Selection (CMFS), Discriminative Feature Selection (DFSS), Normalised Difference Measure (NDM) and Max–Min Ratio (MMR) using Multinomial Naive Bayes (MNB), Support-Vector Machines (SVMs) and k-Nearest Neighbour (KNN) classifiers on four benchmark data sets. These data sets are Reuters-21578, 20-Newsgroup, Mini 20-Newsgroup and Polarity. The experiments were carried out for six different feature sizes which are 10, 30, 50, 100, 300 and 500. Experimental results show that the performance of EFS method is more successful than the other nine methods in most cases according to micro- F1 and macro- F1 scores.


Sign in / Sign up

Export Citation Format

Share Document