GENERATING FUZZY RULES FROM TRAINING DATA CONTAINING NOISE FOR HANDLING CLASSIFICATION PROBLEMS

Background Artificial neural networks (ANNs) are a robust class of machine learning models and are a frequent choice for solving classification problems. However, determining the structure of the ANNs is not trivial as a large number of weights (connection links) may lead to overfitting the training data. Although several ANN pruning algorithms have been proposed for the simplification of ANNs, these algorithms are not able to efficiently cope with intricate ANN structures required for complex classification problems. Methods We developed DANNP, a web-based tool, that implements parallelized versions of several ANN pruning algorithms. The DANNP tool uses a modified version of the Fast Compressed Neural Network software implemented in C++ to considerably enhance the running time of the ANN pruning algorithms we implemented. In addition to the performance evaluation of the pruned ANNs, we systematically compared the set of features that remained in the pruned ANN with those obtained by different state-of-the-art feature selection (FS) methods. Results Although the ANN pruning algorithms are not entirely parallelizable, DANNP was able to speed up the ANN pruning up to eight times on a 32-core machine, compared to the serial implementations. To assess the impact of the ANN pruning by DANNP tool, we used 16 datasets from different domains. In eight out of the 16 datasets, DANNP significantly reduced the number of weights by 70%–99%, while maintaining a competitive or better model performance compared to the unpruned ANN. Finally, we used a naïve Bayes classifier derived with the features selected as a byproduct of the ANN pruning and demonstrated that its accuracy is comparable to those obtained by the classifiers trained with the features selected by several state-of-the-art FS methods. The FS ranking methodology proposed in this study allows the users to identify the most discriminant features of the problem at hand. To the best of our knowledge, DANNP (publicly available at www.cbrc.kaust.edu.sa/dannp) is the only available and on-line accessible tool that provides multiple parallelized ANN pruning options. Datasets and DANNP code can be obtained at www.cbrc.kaust.edu.sa/dannp/data.php and https://doi.org/10.5281/zenodo.1001086.

Download Full-text

Semi-Supervised Learning

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch192 ◽

2011 ◽

pp. 1022-1027

Author(s):

Tobias Scheffer

Keyword(s):

Supervised Learning ◽

Supervised Classification ◽

Unlabeled Data ◽

Training Data ◽

Classification Algorithms ◽

Classification Problems

For many classification problems, unlabeled training data are inexpensive and readily available, whereas labeling training data imposes costs. Semi-supervised classification algorithms aim at utilizing information contained in unlabeled data in addition to the (few) labeled data.

Download Full-text

Learning Diatoms Classification from a Dry Test Slide by Holographic Microscopy

Sensors ◽

10.3390/s20216353 ◽

2020 ◽

Vol 20 (21) ◽

pp. 6353

Author(s):

Pasquale Memmolo ◽

Pierluigi Carcagnì ◽

Vittorio Bianco ◽

Francesco Merola ◽

Andouglas Goncalves da Silva Junior ◽

...

Keyword(s):

Data Augmentation ◽

Imaging Modality ◽

Training Data ◽

Classification Problems ◽

Deep Convolutional Neural Networks ◽

Live Diatoms ◽

Freshwater Habitats ◽

Commercial Glass ◽

Holographic Microscopy

Diatoms are among the dominant phytoplankters in marine and freshwater habitats, and important biomarkers of water quality, making their identification and classification one of the current challenges for environmental monitoring. To date, taxonomy of the species populating a water column is still conducted by marine biologists on the basis of their own experience. On the other hand, deep learning is recognized as the elective technique for solving image classification problems. However, a large amount of training data is usually needed, thus requiring the synthetic enlargement of the dataset through data augmentation. In the case of microalgae, the large variety of species that populate the marine environments makes it arduous to perform an exhaustive training that considers all the possible classes. However, commercial test slides containing one diatom element per class fixed in between two glasses are available on the market. These are usually prepared by expert diatomists for taxonomy purposes, thus constituting libraries of the populations that can be found in oceans. Here we show that such test slides are very useful for training accurate deep Convolutional Neural Networks (CNNs). We demonstrate the successful classification of diatoms based on a proper CNNs ensemble and a fully augmented dataset, i.e., creation starting from one single image per class available from a commercial glass slide containing 50 fixed species in a dry setting. This approach avoids the time-consuming steps of water sampling and labeling by skilled marine biologists. To accomplish this goal, we exploit the holographic imaging modality, which permits the accessing of a quantitative phase-contrast maps and a posteriori flexible refocusing due to its intrinsic 3D imaging capability. The network model is then validated by using holographic recordings of live diatoms imaged in water samples i.e., in their natural wet environmental condition.

Download Full-text

CharTeC-Net: An Efficient and Lightweight Character-Based Convolutional Network for Text Classification

Journal of Electrical and Computer Engineering ◽

10.1155/2020/9701427 ◽

2020 ◽

Vol 2020 ◽

pp. 1-7 ◽

Cited By ~ 2

Author(s):

Aboubakar Nasser Samatin Njikam ◽

Huan Zhao

Keyword(s):

Text Classification ◽

Building Block ◽

Large Scale ◽

State Of The Art ◽

Building Blocks ◽

Training Data ◽

Superior Performance ◽

Classification Problems ◽

Computationally Efficient ◽

Convolutional Network

This paper introduces an extremely lightweight (with just over around two hundred thousand parameters) and computationally efficient CNN architecture, named CharTeC-Net (Character-based Text Classification Network), for character-based text classification problems. This new architecture is composed of four building blocks for feature extraction. Each of these building blocks, except the last one, uses 1 × 1 pointwise convolutional layers to add more nonlinearity to the network and to increase the dimensions within each building block. In addition, shortcut connections are used in each building block to facilitate the flow of gradients over the network, but more importantly to ensure that the original signal present in the training data is shared across each building block. Experiments on eight standard large-scale text classification and sentiment analysis datasets demonstrate CharTeC-Net’s superior performance over baseline methods and yields competitive accuracy compared with state-of-the-art methods, although CharTeC-Net has only between 181,427 and 225,323 parameters and weighs less than 1 megabyte.

Download Full-text

An Algorithm of Edge Detection Based on FSVM

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.321-324.1046 ◽

2013 ◽

Vol 321-324 ◽

pp. 1046-1050

Author(s):

Ai Ping Cai

Keyword(s):

Target Identification ◽

Main Idea ◽

Detection Algorithm ◽

Training Data ◽

Classification Model ◽

Detection Methods ◽

Support Vector ◽

Classification Problems ◽

Tracking Motion ◽

Image Edge

The support vector machine (SVM) has been shown to be an efficient approach for a variety of classification problems. It has also been widely used in target identification and tracking, motion analysis, image segmentation technology. Traditional detection methods mostly exist pseudo-edge and poor anti-noise capability. Under these circumstances, developing an efficient method is necessary. In this paper, we propose a new detection algorithm based on FSVM, the main idea is to train classified sample and give all training data a degree of membership, increase punishment to the wrong sub-sample. Then training and testing the FSVM classification model. Finally, extract edge of the image by using FSVM classification model. Experimental results show that the new algorithm can detect a clear image edge and have a good anti-noise nature.

Download Full-text

A New Opinion Mining Method based on Fuzzy Classifier and Particle Swarm Optimization (PSO) Algorithm

Cybernetics and Information Technologies ◽

10.2478/cait-2018-0026 ◽

2018 ◽

Vol 18 (2) ◽

pp. 36-50

Author(s):

Samira Bordbar ◽

Pirooz Shamsinejad

Keyword(s):

Fuzzy Logic ◽

Particle Swarm Optimization ◽

Opinion Mining ◽

Particle Swarm ◽

Pso Algorithm ◽

Fuzzy Rules ◽

Training Data ◽

Fuzzy Classifier ◽

Swarm Optimization ◽

Main Challenge

Abstract Opinion Mining or Sentiment Analysis is the task of extracting people final opinion about something through their unstructured sentiments. The Opinion Mining process is as follows: first, product features which are most important to a user are extracted from his/her comments. Then, sentiments will be emotionally classified using their emotional implications. In this paper we propose an opinion classification method based on Fuzzy Logic. Up to now, a few methods have taken advantage of fuzzy logic in opinion classification and all of them have imported fuzzy rules into system as background knowledge. But the main challenge here is finding the fuzzy rules. Our contribution is to automatically extract fuzzy rules and their parameters from training data. Here we have used the Particle Swarm Optimization (PSO) algorithm to extract fuzzy rules from training data. Also, for better results we have devised a mutation-based PSO. All proposed methods have been implemented and tested on relevant data. Results confirm that our method can reach better accuracy than current state of the art methods in this domain.

Download Full-text

Selecting fuzzy rules by genetic algorithm for classification problems

[Proceedings 1993] Second IEEE International Conference on Fuzzy Systems ◽

10.1109/fuzzy.1993.327358 ◽

2002 ◽

Cited By ~ 28

Author(s):

H. Ishibuchi ◽

K. Nozaki ◽

N. Yamamoto

Keyword(s):

Genetic Algorithm ◽

Fuzzy Rules ◽

Classification Problems

Download Full-text

Few-Shot Learning for Post-Earthquake Urban Damage Detection

Remote Sensing ◽

10.3390/rs14010040 ◽

2021 ◽

Vol 14 (1) ◽

pp. 40

Author(s):

Eftychia Koukouraki ◽

Leonardo Vanneschi ◽

Marco Painho

Keyword(s):

Damage Assessment ◽

Binary Classification ◽

Training Data ◽

Classification Problems ◽

Deep Convolutional Neural Networks ◽

Damage Classification ◽

Emergency Relief ◽

Cost Sensitive Learning ◽

Urban Structures ◽

Address Data

Among natural disasters, earthquakes are recorded to have the highest rates of human loss in the past 20 years. Their unexpected nature has severe consequences on both human lives and material infrastructure, demanding urgent action to be taken. For effective emergency relief, it is necessary to gain awareness about the level of damage in the affected areas. The use of remotely sensed imagery is popular in damage assessment applications; however, it requires a considerable amount of labeled data, which are not always easy to obtain. Taking into consideration the recent developments in the fields of Machine Learning and Computer Vision, this study investigates and employs several Few-Shot Learning (FSL) strategies in order to address data insufficiency and imbalance in post-earthquake urban damage classification. While small datasets have been tested against binary classification problems, which usually divide the urban structures into collapsed and non-collapsed, the potential of limited training data in multi-class classification has not been fully explored. To tackle this gap, four models were created, following different data balancing methods, namely cost-sensitive learning, oversampling, undersampling and Prototypical Networks. After a quantitative comparison among them, the best performing model was found to be the one based on Prototypical Networks, and it was used for the creation of damage assessment maps. The contribution of this work is twofold: we show that oversampling is the most suitable data balancing method for training Deep Convolutional Neural Networks (CNN) when compared to cost-sensitive learning and undersampling, and we demonstrate the appropriateness of Prototypical Networks in the damage classification context.

Download Full-text

Assessing the Impact of Expert Labelling of Training Data on the Quality of Automatic Classification of Lithological Groups Using Artificial Neural Networks

Applied Computer Systems ◽

10.2478/acss-2020-0016 ◽

2020 ◽

Vol 25 (2) ◽

pp. 145-152

Author(s):

Yan Kuchin ◽

Ravil Mukhamediev ◽

Kirill Yakunin ◽

Janis Grundspenkis ◽

Adilkhan Symagulov

Keyword(s):

Neural Network ◽

Automatic Classification ◽

Training Data ◽

Classification Problems ◽

Expert Opinions ◽

Input Parameters ◽

Artificial Neural ◽

Artificial Neural Network Ann ◽

The Impact

AbstractMachine learning (ML) methods are nowadays widely used to automate geophysical study. Some of ML algorithms are used to solve lithological classification problems during uranium mining process. One of the key aspects of using classical ML methods is causing data features and estimating their influence on the classification. This paper presents a quantitative assessment of the impact of expert opinions on the classification process. In other words, we have prepared the data, identified the experts and performed a series of experiments with and without taking into account the fact that the expert identifier is supplied to the input of the automatic classifier during training and testing. Feedforward artificial neural network (ANN) has been used as a classifier. The results of the experiments show that the “knowledge” of the ANN of which expert interpreted the data improves the quality of the automatic classification in terms of accuracy (by 5 %) and recall (by 20 %). However, due to the fact that the input parameters of the model may depend on each other, the SHapley Additive exPlanations (SHAP) method has been used to further assess the impact of expert identifier. SHAP has allowed assessing the degree of parameter influence. It has revealed that the expert ID is at least two times more influential than any of the other input parameters of the neural network. This circumstance imposes significant restrictions on the application of ANNs to solve the task of lithological classification at the uranium deposits.

Download Full-text

An Indoor Scene Classification Method for Service Robot Based on CNN Feature

Journal of Robotics ◽

10.1155/2019/8591035 ◽

2019 ◽

Vol 2019 ◽

pp. 1-12 ◽

Cited By ~ 5

Author(s):

Shaopeng Liu ◽

Guohui Tian

Keyword(s):

Feature Matching ◽

Training Data ◽

Fine Tuning ◽

Classification Method ◽

Service Robot ◽

Test Accuracy ◽

Scene Classification ◽

Classification Problems ◽

Matching Algorithm ◽

Indoor Scene

Indoor scene classification plays a vital part in environment cognition of service robot. With the development of deep learning, fine-tuning CNN (Convolutional Neural Network) on target datasets has become a popular way to solve classification problems. However, this method cannot obtain satisfying indoor scene classification results because of overfitting when scene training datasets are insufficient. To solve this problem, an indoor scene classification method is proposed in this paper, which utilizes CNN feature of scene images to generate scene category features to classify scenes by a novel feature matching algorithm. The novel feature matching algorithm can further improve the speed of scene classification. In addition, overfitting is eliminated by our method even though the training data is limited. The presented method was evaluated on two benchmark scene datasets, Scene 15 dataset and MIT 67 dataset, acquiring 96.49% and 81.69% accuracy, respectively. The experiment results showed that our method was superior to other scene classification methods in terms of accuracy, speed, and robustness. To further evaluate our method, test experiments on unknown scene images from SUN 397 dataset had been done, and the models based on different training datasets obtained 94.34% and 79.80% test accuracy severally, which proved that the proposed method owned good performance in indoor scene classification.

Download Full-text