Comparison of Tree Based Supervised Classification Methods with Mammogram Data Set

AbstractPredicting the defect-prone modules when the previous defect labels of modules are limited is a challenging problem encountered in the software industry. Supervised classification approaches cannot build high-performance prediction models with few defect data, leading to the need for new methods, techniques, and tools. One solution is to combine labeled data points with unlabeled data points during learning phase. Semi-supervised classification methods use not only labeled data points but also unlabeled ones to improve the generalization capability. In this study, we evaluated four semi-supervised classification methods for semi-supervised defect prediction. Low-density separation (LDS), support vector machine (SVM), expectation-maximization (EM-SEMI), and class mass normalization (CMN) methods have been investigated on NASA data sets, which are CM1, KC1, KC2, and PC1. Experimental results showed that SVM and LDS algorithms outperform CMN and EM-SEMI algorithms. In addition, LDS algorithm performs much better than SVM when the data set is large. In this study, the LDS-based prediction approach is suggested for software defect prediction when there are limited fault data.

Download Full-text

Comparison of Supervised Classification Methods for Protein Profiling in Cancer Diagnosis

Cancer Informatics ◽

10.1177/117693510700300023 ◽

2007 ◽

Vol 3 ◽

pp. 117693510700300 ◽

Cited By ~ 6

Author(s):

Nadège Dossat ◽

Alain Mangé ◽

Jérôme Solassol ◽

William Jacot ◽

Ludovic Lhermitte ◽

...

Keyword(s):

Mass Spectrometry ◽

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Supervised Classification ◽

Protein Profiling ◽

Clinical Proteomics ◽

High Dimensional ◽

Classification Methods ◽

Linear Discriminant ◽

Supervised Classification Methods

A key challenge in clinical proteomics of cancer is the identification of biomarkers that could allow detection, diagnosis and prognosis of the diseases. Recent advances in mass spectrometry and proteomic instrumentations offer unique chance to rapidly identify these markers. These advances pose considerable challenges, similar to those created by microarray-based investigation, for the discovery of pattern of markers from high-dimensional data, specific to each pathologic state (e.g. normal vs cancer). We propose a three-step strategy to select important markers from high-dimensional mass spectrometry data using surface enhanced laser desorption/ionization (SELDI) technology. The first two steps are the selection of the most discriminating biomarkers with a construction of different classifiers. Finally, we compare and validate their performance and robustness using different supervised classification methods such as Support Vector Machine, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Neural Networks, Classification Trees and Boosting Trees. We show that the proposed method is suitable for analysing high-throughput proteomics data and that the combination of logistic regression and Linear Discriminant Analysis outperform other methods tested.

Download Full-text

Supervised Classification Methods for Fake News Identification

Artificial Intelligence and Soft Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-030-61534-5_40 ◽

2020 ◽

pp. 445-454

Author(s):

Thanh Cong Truong ◽

Quoc Bao Diep ◽

Ivan Zelinka ◽

Roman Senkerik

Keyword(s):

Supervised Classification ◽

Classification Methods ◽

Fake News ◽

Supervised Classification Methods

Download Full-text

Supervised classification methods applied to airborne hyperspectral images: comparative study using mutual information

Procedia Computer Science ◽

10.1016/j.procs.2019.01.013 ◽

2019 ◽

Vol 148 ◽

pp. 97-106

Author(s):

Hasna Nhaila ◽

Asma Elmaizi ◽

Elkebir Sarhrouni ◽

Ahmed Hammouch

Keyword(s):

Mutual Information ◽

Comparative Study ◽

Supervised Classification ◽

Hyperspectral Images ◽

Classification Methods ◽

Supervised Classification Methods

Download Full-text

Accuracy Analysis Comparison of Supervised Classification Methods for Anomaly Detection on Levees Using SAR Imagery

Electronics ◽

10.3390/electronics6040083 ◽

2017 ◽

Vol 6 (4) ◽

pp. 83

Author(s):

Ramakalavathi Marapareddy ◽

James Aanstoos ◽

Nicolas Younan

Keyword(s):

Anomaly Detection ◽

Supervised Classification ◽

Accuracy Analysis ◽

Classification Methods ◽

Sar Imagery ◽

Supervised Classification Methods

Download Full-text

A comparison of supervised classification methods for a statistical set of features: Application: Amazigh OCR

2015 Intelligent Systems and Computer Vision (ISCV) ◽

10.1109/isacv.2015.7106171 ◽

2015 ◽

Cited By ~ 6

Author(s):

Nabil Aharrane ◽

Karim El Moutaouakil ◽

Khalid Satori

Keyword(s):

Supervised Classification ◽

Classification Methods ◽

Supervised Classification Methods

Download Full-text

A Comparison of Supervised Classification Methods for the Prediction of Substrate Type Using Multibeam Acoustic and Legacy Grain-Size Data

PLoS ONE ◽

10.1371/journal.pone.0093950 ◽

2014 ◽

Vol 9 (4) ◽

pp. e93950 ◽

Cited By ~ 73

Author(s):

David Stephens ◽

Markus Diesing

Keyword(s):

Grain Size ◽

Supervised Classification ◽

Classification Methods ◽

Substrate Type ◽

Supervised Classification Methods ◽

Size Data

Download Full-text

Supervised Classification Methods in Condition Monitoring of Rolling Element Bearings

Applied Condition Monitoring - Advances in Condition Monitoring of Machinery in Non-Stationary Operations ◽

10.1007/978-3-319-61927-9_13 ◽

2017 ◽

pp. 133-145

Author(s):

Paweł Różak ◽

Jakub Zieliński ◽

Piotr Czop ◽

Adam Jabłoński ◽

Tomasz Barszcz ◽

...

Keyword(s):

Condition Monitoring ◽

Supervised Classification ◽

Rolling Element Bearings ◽

Classification Methods ◽

Rolling Element ◽

Supervised Classification Methods

Download Full-text

Four fuzzy supervised classification methods for discriminating classes of non-convex shape

Fuzzy Sets and Systems ◽

10.1016/s0165-0114(03)00265-3 ◽

2004 ◽

Vol 141 (2) ◽

pp. 219-240 ◽

Cited By ~ 7

Author(s):

A. Devillez

Keyword(s):

Supervised Classification ◽

Classification Methods ◽

Convex Shape ◽

Supervised Classification Methods

Download Full-text

Automatic landslide detection using the Random Forest classification - the importance of the train-test split ratio

10.5194/egusphere-egu21-12046 ◽

2021 ◽

Author(s):

Kamila Pawluszek-Filipiak ◽

Andrzej Borkowski

Keyword(s):

Random Forest ◽

Supervised Classification ◽

Research Question ◽

Remote Sensing Data ◽

Classification Methods ◽

Testing Area ◽

Split Ratio ◽

Training Samples ◽

Landslide Mapping ◽

Supervised Classification Methods

Landslide identification is the fundamental step to reduce the potential damaging effects of landslide activities. A variety of techniques and approaches has been developed to detect landslides. Conventional landslide identification is a complex and laborious task due to a large amount of the field work and materials that have to be investigated. Additionally, the conventional geomorphological mapping mainly provides a subjective representation of landscape complexities at different scales. Sometimes, in certain conditions, such as densely-vegetated terrain, conventional landslide mapping is ineffective or even impossible.Therefore, innovative methods that allow for the reduction of subjectivism, time, and effort have increasingly become the subject of interest in landslide research. These methods mainly focus on semi-automated or automatic landslide mapping and include analysis of remote sensing data, such as optical images, Digital Elevation Models (DEMs) derived by Light Detection and Ranging etc. Among them, the pixel-based approach (PBA) and the object-based image analysis (OBIA) methods can be distinguished, for which supervised classification methods are usually utilized.The accuracy of supervised classification methods strongly corresponds to the training samples - its quality and amount. Supervised classification methods require the collection of training as well as testing data to generate and assess the accuracy of the classification results. It is a challenging task, especially in forested areas, to capture ground truths of the good quality to train the classifier and to identify landslides. Considering this, we decided to investigate the following research question: What is the appropriate training&#8211;testing dataset split ratio in supervised classification to detect landslides in a testing area based on DEMs? Since PBA and OBIA approaches are nowadays widely utilized, we investigated this issue for both methods. The Random Forest classifier was implemented for both methods. The experiments were performed in Poland in the Outer Carpathians.Accuracy measures calculated for the region growing validation indicated that the training area should be similarly large to the testing area in DEM-based automatic landslide detection. Additionally, we found that the OBIA approach performs slightly better than PBA when the quantity of training samples is lower. Besides this, we also attempted to increase the detection performance and to generate final landslide inventory. For this purpose, the intersection of the OBIA and PBA results together with median filtering and the removal of small elongated objects were carried out. We achieved the Overall Accuracy of 80% and F1 Score of 0.50.

Download Full-text