GENERATING FUZZY RULES FROM TRAINING DATA CONTAINING NOISE FOR HANDLING CLASSIFICATION PROBLEMS

2002 ◽  
Vol 33 (7) ◽  
pp. 723-748 ◽  
Author(s):  
SHYI-MING CHEN ◽  
CHENG-HSUAN KAO ◽  
CHENG-HAO YU
2017 ◽  
Vol 3 ◽  
pp. e137 ◽  
Author(s):  
Mona Alshahrani ◽  
Othman Soufan ◽  
Arturo Magana-Mora ◽  
Vladimir B. Bajic

Background Artificial neural networks (ANNs) are a robust class of machine learning models and are a frequent choice for solving classification problems. However, determining the structure of the ANNs is not trivial as a large number of weights (connection links) may lead to overfitting the training data. Although several ANN pruning algorithms have been proposed for the simplification of ANNs, these algorithms are not able to efficiently cope with intricate ANN structures required for complex classification problems. Methods We developed DANNP, a web-based tool, that implements parallelized versions of several ANN pruning algorithms. The DANNP tool uses a modified version of the Fast Compressed Neural Network software implemented in C++ to considerably enhance the running time of the ANN pruning algorithms we implemented. In addition to the performance evaluation of the pruned ANNs, we systematically compared the set of features that remained in the pruned ANN with those obtained by different state-of-the-art feature selection (FS) methods. Results Although the ANN pruning algorithms are not entirely parallelizable, DANNP was able to speed up the ANN pruning up to eight times on a 32-core machine, compared to the serial implementations. To assess the impact of the ANN pruning by DANNP tool, we used 16 datasets from different domains. In eight out of the 16 datasets, DANNP significantly reduced the number of weights by 70%–99%, while maintaining a competitive or better model performance compared to the unpruned ANN. Finally, we used a naïve Bayes classifier derived with the features selected as a byproduct of the ANN pruning and demonstrated that its accuracy is comparable to those obtained by the classifiers trained with the features selected by several state-of-the-art FS methods. The FS ranking methodology proposed in this study allows the users to identify the most discriminant features of the problem at hand. To the best of our knowledge, DANNP (publicly available at www.cbrc.kaust.edu.sa/dannp) is the only available and on-line accessible tool that provides multiple parallelized ANN pruning options. Datasets and DANNP code can be obtained at www.cbrc.kaust.edu.sa/dannp/data.php and https://doi.org/10.5281/zenodo.1001086.


Author(s):  
Tobias Scheffer

For many classification problems, unlabeled training data are inexpensive and readily available, whereas labeling training data imposes costs. Semi-supervised classification algorithms aim at utilizing information contained in unlabeled data in addition to the (few) labeled data.


Sensors ◽  
2020 ◽  
Vol 20 (21) ◽  
pp. 6353
Author(s):  
Pasquale Memmolo ◽  
Pierluigi Carcagnì ◽  
Vittorio Bianco ◽  
Francesco Merola ◽  
Andouglas Goncalves da Silva Junior ◽  
...  

Diatoms are among the dominant phytoplankters in marine and freshwater habitats, and important biomarkers of water quality, making their identification and classification one of the current challenges for environmental monitoring. To date, taxonomy of the species populating a water column is still conducted by marine biologists on the basis of their own experience. On the other hand, deep learning is recognized as the elective technique for solving image classification problems. However, a large amount of training data is usually needed, thus requiring the synthetic enlargement of the dataset through data augmentation. In the case of microalgae, the large variety of species that populate the marine environments makes it arduous to perform an exhaustive training that considers all the possible classes. However, commercial test slides containing one diatom element per class fixed in between two glasses are available on the market. These are usually prepared by expert diatomists for taxonomy purposes, thus constituting libraries of the populations that can be found in oceans. Here we show that such test slides are very useful for training accurate deep Convolutional Neural Networks (CNNs). We demonstrate the successful classification of diatoms based on a proper CNNs ensemble and a fully augmented dataset, i.e., creation starting from one single image per class available from a commercial glass slide containing 50 fixed species in a dry setting. This approach avoids the time-consuming steps of water sampling and labeling by skilled marine biologists. To accomplish this goal, we exploit the holographic imaging modality, which permits the accessing of a quantitative phase-contrast maps and a posteriori flexible refocusing due to its intrinsic 3D imaging capability. The network model is then validated by using holographic recordings of live diatoms imaged in water samples i.e., in their natural wet environmental condition.


2020 ◽  
Vol 2020 ◽  
pp. 1-7 ◽  
Author(s):  
Aboubakar Nasser Samatin Njikam ◽  
Huan Zhao

This paper introduces an extremely lightweight (with just over around two hundred thousand parameters) and computationally efficient CNN architecture, named CharTeC-Net (Character-based Text Classification Network), for character-based text classification problems. This new architecture is composed of four building blocks for feature extraction. Each of these building blocks, except the last one, uses 1 × 1 pointwise convolutional layers to add more nonlinearity to the network and to increase the dimensions within each building block. In addition, shortcut connections are used in each building block to facilitate the flow of gradients over the network, but more importantly to ensure that the original signal present in the training data is shared across each building block. Experiments on eight standard large-scale text classification and sentiment analysis datasets demonstrate CharTeC-Net’s superior performance over baseline methods and yields competitive accuracy compared with state-of-the-art methods, although CharTeC-Net has only between 181,427 and 225,323 parameters and weighs less than 1 megabyte.


2013 ◽  
Vol 321-324 ◽  
pp. 1046-1050
Author(s):  
Ai Ping Cai

The support vector machine (SVM) has been shown to be an efficient approach for a variety of classification problems. It has also been widely used in target identification and tracking, motion analysis, image segmentation technology. Traditional detection methods mostly exist pseudo-edge and poor anti-noise capability. Under these circumstances, developing an efficient method is necessary. In this paper, we propose a new detection algorithm based on FSVM, the main idea is to train classified sample and give all training data a degree of membership, increase punishment to the wrong sub-sample. Then training and testing the FSVM classification model. Finally, extract edge of the image by using FSVM classification model. Experimental results show that the new algorithm can detect a clear image edge and have a good anti-noise nature.


2018 ◽  
Vol 18 (2) ◽  
pp. 36-50
Author(s):  
Samira Bordbar ◽  
Pirooz Shamsinejad

Abstract Opinion Mining or Sentiment Analysis is the task of extracting people final opinion about something through their unstructured sentiments. The Opinion Mining process is as follows: first, product features which are most important to a user are extracted from his/her comments. Then, sentiments will be emotionally classified using their emotional implications. In this paper we propose an opinion classification method based on Fuzzy Logic. Up to now, a few methods have taken advantage of fuzzy logic in opinion classification and all of them have imported fuzzy rules into system as background knowledge. But the main challenge here is finding the fuzzy rules. Our contribution is to automatically extract fuzzy rules and their parameters from training data. Here we have used the Particle Swarm Optimization (PSO) algorithm to extract fuzzy rules from training data. Also, for better results we have devised a mutation-based PSO. All proposed methods have been implemented and tested on relevant data. Results confirm that our method can reach better accuracy than current state of the art methods in this domain.


2021 ◽  
Vol 14 (1) ◽  
pp. 40
Author(s):  
Eftychia Koukouraki ◽  
Leonardo Vanneschi ◽  
Marco Painho

Among natural disasters, earthquakes are recorded to have the highest rates of human loss in the past 20 years. Their unexpected nature has severe consequences on both human lives and material infrastructure, demanding urgent action to be taken. For effective emergency relief, it is necessary to gain awareness about the level of damage in the affected areas. The use of remotely sensed imagery is popular in damage assessment applications; however, it requires a considerable amount of labeled data, which are not always easy to obtain. Taking into consideration the recent developments in the fields of Machine Learning and Computer Vision, this study investigates and employs several Few-Shot Learning (FSL) strategies in order to address data insufficiency and imbalance in post-earthquake urban damage classification. While small datasets have been tested against binary classification problems, which usually divide the urban structures into collapsed and non-collapsed, the potential of limited training data in multi-class classification has not been fully explored. To tackle this gap, four models were created, following different data balancing methods, namely cost-sensitive learning, oversampling, undersampling and Prototypical Networks. After a quantitative comparison among them, the best performing model was found to be the one based on Prototypical Networks, and it was used for the creation of damage assessment maps. The contribution of this work is twofold: we show that oversampling is the most suitable data balancing method for training Deep Convolutional Neural Networks (CNN) when compared to cost-sensitive learning and undersampling, and we demonstrate the appropriateness of Prototypical Networks in the damage classification context.


2020 ◽  
Vol 25 (2) ◽  
pp. 145-152
Author(s):  
Yan Kuchin ◽  
Ravil Mukhamediev ◽  
Kirill Yakunin ◽  
Janis Grundspenkis ◽  
Adilkhan Symagulov

AbstractMachine learning (ML) methods are nowadays widely used to automate geophysical study. Some of ML algorithms are used to solve lithological classification problems during uranium mining process. One of the key aspects of using classical ML methods is causing data features and estimating their influence on the classification. This paper presents a quantitative assessment of the impact of expert opinions on the classification process. In other words, we have prepared the data, identified the experts and performed a series of experiments with and without taking into account the fact that the expert identifier is supplied to the input of the automatic classifier during training and testing. Feedforward artificial neural network (ANN) has been used as a classifier. The results of the experiments show that the “knowledge” of the ANN of which expert interpreted the data improves the quality of the automatic classification in terms of accuracy (by 5 %) and recall (by 20 %). However, due to the fact that the input parameters of the model may depend on each other, the SHapley Additive exPlanations (SHAP) method has been used to further assess the impact of expert identifier. SHAP has allowed assessing the degree of parameter influence. It has revealed that the expert ID is at least two times more influential than any of the other input parameters of the neural network. This circumstance imposes significant restrictions on the application of ANNs to solve the task of lithological classification at the uranium deposits.


2019 ◽  
Vol 2019 ◽  
pp. 1-12 ◽  
Author(s):  
Shaopeng Liu ◽  
Guohui Tian

Indoor scene classification plays a vital part in environment cognition of service robot. With the development of deep learning, fine-tuning CNN (Convolutional Neural Network) on target datasets has become a popular way to solve classification problems. However, this method cannot obtain satisfying indoor scene classification results because of overfitting when scene training datasets are insufficient. To solve this problem, an indoor scene classification method is proposed in this paper, which utilizes CNN feature of scene images to generate scene category features to classify scenes by a novel feature matching algorithm. The novel feature matching algorithm can further improve the speed of scene classification. In addition, overfitting is eliminated by our method even though the training data is limited. The presented method was evaluated on two benchmark scene datasets, Scene 15 dataset and MIT 67 dataset, acquiring 96.49% and 81.69% accuracy, respectively. The experiment results showed that our method was superior to other scene classification methods in terms of accuracy, speed, and robustness. To further evaluate our method, test experiments on unknown scene images from SUN 397 dataset had been done, and the models based on different training datasets obtained 94.34% and 79.80% test accuracy severally, which proved that the proposed method owned good performance in indoor scene classification.


Sign in / Sign up

Export Citation Format

Share Document