scholarly journals Fuzzy One-Class Classification Model Using Contamination Neighborhoods

2012 ◽  
Vol 2012 ◽  
pp. 1-10 ◽  
Author(s):  
Lev V. Utkin

A fuzzy classification model is studied in the paper. It is based on the contaminated (robust) model which produces fuzzy expected risk measures characterizing classification errors. Optimal classification parameters of the models are derived by minimizing the fuzzy expected risk. It is shown that an algorithm for computing the classification parameters is reduced to a set of standard support vector machine tasks with weighted data points. Experimental results with synthetic data illustrate the proposed fuzzy model.

Sensors ◽  
2018 ◽  
Vol 18 (11) ◽  
pp. 3673 ◽  
Author(s):  
Zhili Long ◽  
Ronghua He ◽  
Yuxiang He ◽  
Haoyao Chen ◽  
Zuohua Li

This paper presents a modeling approach to feature classification and environment mapping for indoor mobile robotics via a rotary ultrasonic array and fuzzy modeling. To compensate for the distance error detected by the ultrasonic sensor, a novel feature extraction approach termed “minimum distance of point” (MDP) is proposed to determine the accurate distance and location of target objects. A fuzzy model is established to recognize and classify the features of objects such as flat surfaces, corner, and cylinder. An environmental map is constructed for automated robot navigation based on this fuzzy classification, combined with a cluster algorithm and least-squares fitting. Firstly, the platform of the rotary ultrasonic array is established by using four low-cost ultrasonic sensors and a motor. Fundamental measurements, such as the distance of objects at different rotary angles and with different object materials, are carried out. Secondly, the MDP feature extraction algorithm is proposed to extract precise object locations. Compared with the conventional range of constant distance (RCD) method, the MDP method can compensate for errors in feature location and feature matching. With the data clustering algorithm, a range of ultrasonic distances is attained and used as the input dataset. The fuzzy classification model—including rules regarding data fuzzification, reasoning, and defuzzification—is established to effectively recognize and classify the object feature types. Finally, accurate environment mapping of a service robot, based on MDP and fuzzy modeling of the measurements from the ultrasonic array, is demonstrated. Experimentally, our present approach can realize environment mapping for mobile robotics with the advantages of acceptable accuracy and low cost.


2015 ◽  
Vol 25 (07) ◽  
pp. 1550029 ◽  
Author(s):  
Enrique Castillo ◽  
Diego Peteiro-Barral ◽  
Bertha Guijarro Berdiñas ◽  
Oscar Fontenla-Romero

This paper presents a novel distributed one-class classification approach based on an extension of the ν-SVM method, thus permitting its application to Big Data data sets. In our method we will consider several one-class classifiers, each one determined using a given local data partition on a processor, and the goal is to find a global model. The cornerstone of this method is the novel mathematical formulation that makes the optimization problem separable whilst avoiding some data points considered as outliers in the final solution. This is particularly interesting and important because the decision region generated by the method will be unaffected by the position of the outliers and the form of the data will fit more precisely. Another interesting property is that, although built in parallel, the classifiers exchange data during learning in order to improve their individual specialization. Experimental results using different datasets demonstrate the good performance in accuracy of the decision regions of the proposed method in comparison with other well-known classifiers while saving training time due to its distributed nature.


Author(s):  
XULEI YANG ◽  
QING SONG ◽  
YUE WANG

This paper presents a weighted support vector machine (WSVM) to improve the outlier sensitivity problem of standard support vector machine (SVM) for two-class data classification. The basic idea is to assign different weights to different data points such that the WSVM training algorithm learns the decision surface according to the relative importance of data points in the training data set. The weights used in WSVM are generated by a robust fuzzy clustering algorithm, kernel-based possibilistic c-means (KPCM) algorithm, whose partition generates relative high values for important data points but low values for outliers. Experimental results indicate that the proposed method reduces the effect of outliers and yields higher classification rate than standard SVM does when outliers exist in the training data set.


Water ◽  
2021 ◽  
Vol 13 (17) ◽  
pp. 2387
Author(s):  
Fernando Salazar ◽  
André Conde ◽  
Joaquín Irazábal ◽  
David J. Vicente

Dam safety assessment is typically made by comparison between the outcome of some predictive model and measured monitoring data. This is done separately for each response variable, and the results are later interpreted before decision making. In this work, three approaches based on machine learning classifiers are evaluated for the joint analysis of a set of monitoring variables: multi-class, two-class and one-class classification. Support vector machines are applied to all prediction tasks, and random forest is also used for multi-class and two-class. The results show high accuracy for multi-class classification, although the approach has limitations for practical use. The performance in two-class classification is strongly dependent on the features of the anomalies to detect and their similarity to those used for model fitting. The one-class classification model based on support vector machines showed high prediction accuracy, while avoiding the need for correctly selecting and modelling the potential anomalies. A criterion for anomaly detection based on model predictions is defined, which results in a decrease in the misclassification rate. The possibilities and limitations of all three approaches for practical use are discussed.


2018 ◽  
Vol 17 ◽  
pp. 117693511881021 ◽  
Author(s):  
Melissa Zhao ◽  
Yushi Tang ◽  
Hyunkyung Kim ◽  
Kohei Hasegawa

Objective: Despite existing prognostic markers, breast cancer prognosis remains a difficult subject due to the complex relationships between many contributing factors and survival. This study seeks to integrate multiple clinicopathological and genomic factors with dimensional reduction across machine learning algorithms to compare survival predictions. Methods: This is a secondary analysis of the data from a prospective cohort study of female patients with breast cancer enrolled in the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC). We constructed a series of predictive models: ensemble models (Gradient Boosting and Random Forest), support vector machine (SVM), and artificial neural networks (ANN) for 5-year survival based on clinicopathological and gene expression data after K-means clustering with K-nearest-neighbor (KNN) classification. Model performance was evaluated by receiver operating characteristic (ROC) curve, accuracy, and calibration slope (CS). Model stability was assessed over 10 random runs in terms of ROC, accuracy, CS, and variable importance. Results: The analytic cohort is composed of 1874 patients with breast cancer. Overall, the median age was 62 years; the 5-year survival rate was 75%. ROC and accuracy were not significantly different between models (ROC and accuracy around 0.67 and 0.72 across models, respectively). However, ensemble methods resulted in better fit (CS) with stable measures of variable importance across 10 random training/validation splits. K-means clustering of gene expression profiles on training data points along with KNN classification of validation data points was a robust method of dimensional reduction. Furthermore, the gene expression cluster with the highest mortality risk was an influential factor in model prediction. Conclusions: Using machine learning methods to construct predictive models for 5-year survival in patients with breast cancer, we demonstrated discrimination ability across models with new insight into the stability and utility of dimensional reduction on genomic features in breast cancer survival prediction.


Author(s):  
Zida Ziyan Azkiya ◽  
Fatma Indriani ◽  
Heru Kartika Chandra

Abstrak— Pada kasus deteksi penderita penyakit demam berdarah (Dengue Hemorrhagic Fever- DHF), data training yang tersedia umumnya hanya data pasien penderita positif. Sedangkan data orang normal (data negatif) tidak tersedia secara khusus. Pada makalah ini dipaparkan pembangunan model klasifikasi untuk deteksi DHF dengan pendekatan One Class Classification (OCC). Data yang digunakan pada penelitian ini adalah hasil uji darah dari laboratorium dari pasien penderita penyakit demam berdarah. Metode yang diteliti adalah One-class Support Vector Machine dan K-Means. Hasil yang diperoleh pada penelitian ini adalah untuk metode SVM memiliki nilai precision = 1,0, recall = 0,993, f-1 score = 0,997, dan tingkat akurasi sebesar 99,7%  sedangkan dengan metode K-Means diperoleh nilai precision = 0,901, recall = 0,973, f-1 score = 0,936, dan tingkat akurasi sebesar 93,3%. Hal ini  menunjukkan bahwa metode SVM sedikit lebih unggul dibandingkan dengan K-Means untuk kasus ini. Kata Kunci— demam berdarah, Dengue Hemorrhagic Fever, K-Means, One Class Classification, OSVMAbstract— Two class classification problem maps input into two target classes. In certain cases, training data is available only in the form of a single class, as in the case of Dengue Hemorrhagic Fever (DHF) patients, where only data of positive patients is available. In this paper, we report our experiment in building a classification model for detecting DHF infection using One Class Classification (OCC) approach. Data from this study is sourced from laboratory tests of patients with dengue fever. The OCC methods compared are One-Class Support Vector Machine and One-Class K-Means. The result shows SVM method obtained precision value = 1.0, recall = 0.993, f-1 score = 0.997, and accuracy of 99.7% while the K-Means method obtained precision value = 0.901, recall = 0.973, f- 1 score = 0.936, and accuracy of 93.3%. This indicates that the SVM method is slightly superior to K-Means for One-Class Classification of DHF patients. Keywords— Dengue Hemorrhagic Fever, K-Means, One Class Classification, OSVM


2020 ◽  
Author(s):  
Lewis Mervin ◽  
Avid M. Afzal ◽  
Ola Engkvist ◽  
Andreas Bender

In the context of bioactivity prediction, the question of how to calibrate a score produced by a machine learning method into reliable probability of binding to a protein target is not yet satisfactorily addressed. In this study, we compared the performance of three such methods, namely Platt Scaling, Isotonic Regression and Venn-ABERS in calibrating prediction scores for ligand-target prediction comprising the Naïve Bayes, Support Vector Machines and Random Forest algorithms with bioactivity data available at AstraZeneca (40 million data points (compound-target pairs) across 2112 targets). Performance was assessed using Stratified Shuffle Split (SSS) and Leave 20% of Scaffolds Out (L20SO) validation.


2020 ◽  
Vol 4 (2) ◽  
pp. 329-335
Author(s):  
Rusydi Umar ◽  
Imam Riadi ◽  
Purwono

The failure of most startups in Indonesia is caused by team performance that is not solid and competent. Programmers are an integral profession in a startup team. The development of social media can be used as a strategic tool for recruiting the best programmer candidates in a company. This strategic tool is in the form of an automatic classification system of social media posting from prospective programmers. The classification results are expected to be able to predict the performance patterns of each candidate with a predicate of good or bad performance. The classification method with the best accuracy needs to be chosen in order to get an effective strategic tool so that a comparison of several methods is needed. This study compares classification methods including the Support Vector Machines (SVM) algorithm, Random Forest (RF) and Stochastic Gradient Descent (SGD). The classification results show the percentage of accuracy with k = 10 cross validation for the SVM algorithm reaches 81.3%, RF at 74.4%, and SGD at 80.1% so that the SVM method is chosen as a model of programmer performance classification on social media activities.


Sign in / Sign up

Export Citation Format

Share Document