A new data classification improvement approach based on kernel clustering

Abstract Data classification is one of the most critical issues in data mining with a large number of real-life applications. In many practical classification issues, there are various forms of anomalies in the real dataset. For example, the training set contains outliers, often enough to confuse the classifier and reduce its ability to learn from the data. In this paper, we propose a new data classification improvement approach based on kernel clustering. The proposed method can improve the classification performance by optimizing the training set. We first use the existing kernel clustering method to cluster the training set and optimize it based on the similarity between the training samples in each class and the corresponding class center. Then, the optimized reliable training set is trained to the standard classifier in the kernel space to classify each query sample. Extensive performance analysis shows that the proposed method achieves high performance, thus improving the classifier’s effectiveness.

Download Full-text

Nonnegative Sparse Probabilistic Estimation for Single Sample Face Recognition

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800142056008x ◽

2020 ◽

Vol 34 (12) ◽

pp. 2056008

Author(s):

Shuhuan Zhao

Keyword(s):

Face Recognition ◽

Sparse Representation ◽

Real Life ◽

Test Sample ◽

Single Sample ◽

Collaborative Representation ◽

Adaptive Variation ◽

Training Set ◽

Probabilistic Estimation ◽

Training Samples

Face recognition (FR) is a hotspot in pattern recognition and image processing for its wide applications in real life. One of the most challenging problems in FR is single sample face recognition (SSFR). In this paper, we proposed a novel algorithm based on nonnegative sparse representation, collaborative presentation, and probabilistic graph estimation to address SSFR. The proposed algorithm is named as Nonnegative Sparse Probabilistic Estimation (NNSPE). To extract the variation information from the generic training set, we first select some neighbor samples from the generic training set for each sample in the gallery set and the generic training set can be partitioned into some reference subsets. To make more meaningful reconstruction, the proposed method adopts nonnegative sparse representation to reconstruct training samples, and according to the reconstruction coefficients, NNSPE computes the probabilistic label estimation for the samples of the generic training set. Then, for a given test sample, collaborative representation (CR) is used to acquire an adaptive variation subset. Finally, the NNSPE classifies the test sample with the adaptive variation subset and probabilistic label estimation. The experiments on the AR and PIE verify the effectiveness of the proposed method both in recognition rates and time cost.

Download Full-text

Robust patch-based sparse representation for hyperspectral image classification

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s021969131750028x ◽

2017 ◽

Vol 15 (03) ◽

pp. 1750028 ◽

Cited By ~ 2

Author(s):

Haoliang Yuan

Keyword(s):

Sparse Representation ◽

Spatial Information ◽

Hyperspectral Image ◽

Real Life ◽

Test Sample ◽

Classification Performance ◽

Iteration Algorithm ◽

Frobenius Norm ◽

Training Samples ◽

Sparse Representation Classification

Sparse representation classification (SRC) has been successfully applied into hyperspectral image (HSI). A test sample (pixel) can be linearly represented by a few training samples of the training set. The class label of the test sample is then decided by the reconstruction residuals. To incorporate the spatial information to improve the classification performance, a patch matrix, which includes a spatial neighborhood set, is used to replace the original pixel. Generally, the objective function of the reconstruction residuals is represented as Frobenius-norm, which actually treats the elements in the reconstruction residuals in the same way. However, when a patch locates in the image edge, the samples in the patch may belong to different classes. Frobenius-norm is not suitable to compute the reconstruction residuals. In this paper, we propose a robust patch-based sparse representation classification (RPSRC) based on [Formula: see text]-norm. An iteration algorithm is given to compute RPSRC efficiently. Extensive experimental results on two real-life HSI datasets demonstrate the effectiveness of RPSRC.

Download Full-text

Multi-Spectral Image Classification Based on an Object-Based Active Learning Approach

Remote Sensing ◽

10.3390/rs12030504 ◽

2020 ◽

Vol 12 (3) ◽

pp. 504 ◽

Cited By ~ 1

Author(s):

Tengfei Su ◽

Shengwei Zhang ◽

Tingxi Liu

Keyword(s):

Remote Sensing ◽

Active Learning ◽

Classification Performance ◽

Landscape Patterns ◽

Training Set ◽

Object Based Image Analysis ◽

Classification Uncertainty ◽

Object Based ◽

Training Samples ◽

Key Steps

In remote sensing, active learning (AL) is considered to be an effective solution to the problem of producing sufficient classification accuracy with a limited number of training samples. Though this field has been extensively studied, most papers exist in the pixel-based paradigm. In object-based image analysis (OBIA), AL has been comparatively less studied. This paper aims to propose a new AL method for selecting object-based samples. The proposed AL method solves the problem of how to identify the most informative segment-samples so that classification performance can be optimized. The advantage of this algorithm is that informativeness can be estimated by using various object-based features. The new approach has three key steps. First, a series of one-against-one binary random forest (RF) classifiers are initialized by using a small initial training set. This strategy allows for the estimation of the classification uncertainty in great detail. Second, each tested sample is processed by using the binary RFs, and a classification uncertainty value that can reflect informativeness is derived. Third, the samples with high uncertainty values are selected and then labeled by a supervisor. They are subsequently added into the training set, based on which the binary RFs are re-trained for the next iteration. The whole procedure is iterated until a stopping criterion is met. To validate the proposed method, three pairs of multi-spectral remote sensing images with different landscape patterns were used in this experiment. The results indicate that the proposed method can outperform other state-of-the-art AL methods. To be more specific, the highest overall accuracies for the three datasets were all obtained by using the proposed AL method, and the values were 88.32%, 85.77%, and 93.12% for “T1,” “T2,” and “T3,” respectively. Furthermore, since object-based features have a serious impact on the performance of AL, eight combinations of four feature types are investigated. The results show that the best feature combination is different for the three datasets due to the variation of the feature separability.

Download Full-text

Semi-Supervised PolSAR Image Classification Based on Self-Training and Superpixels

Remote Sensing ◽

10.3390/rs11161933 ◽

2019 ◽

Vol 11 (16) ◽

pp. 1933 ◽

Cited By ~ 5

Author(s):

Yangyang Li ◽

Ruoting Xing ◽

Licheng Jiao ◽

Yanqiao Chen ◽

Yingte Chai ◽

...

Keyword(s):

Image Classification ◽

State Of The Art ◽

Sample Selection ◽

Spatial Relations ◽

Speckle Noise ◽

Classification Performance ◽

Selection Strategy ◽

Training Set ◽

Polarimetric Synthetic Aperture Radar ◽

Unlabeled Sample

Polarimetric synthetic aperture radar (PolSAR) image classification is a recent technology with great practical value in the field of remote sensing. However, due to the time-consuming and labor-intensive data collection, there are few labeled datasets available. Furthermore, most available state-of-the-art classification methods heavily suffer from the speckle noise. To solve these problems, in this paper, a novel semi-supervised algorithm based on self-training and superpixels is proposed. First, the Pauli-RGB image is over-segmented into superpixels to obtain a large number of homogeneous areas. Then, features that can mitigate the effects of the speckle noise are obtained using spatial weighting in the same superpixel. Next, the training set is expanded iteratively utilizing a semi-supervised unlabeled sample selection strategy that elaborately makes use of spatial relations provided by superpixels. In addition, a stacked sparse auto-encoder is self-trained using the expanded training set to obtain classification results. Experiments on two typical PolSAR datasets verified its capability of suppressing the speckle noise and showed excellent classification performance with limited labeled data.

Download Full-text

Use of Machine Learning to Investigate the Quantitative Checklist for Autism in Toddlers (Q-CHAT) towards Early Autism Screening

Diagnostics ◽

10.3390/diagnostics11030574 ◽

2021 ◽

Vol 11 (3) ◽

pp. 574

Author(s):

Gennaro Tartarisco ◽

Giovanni Cicceri ◽

Davide Di Pietro ◽

Elisa Leonardi ◽

Stefania Aiello ◽

...

Keyword(s):

Machine Learning ◽

High Performance ◽

Behavioral Science ◽

Autistic Traits ◽

Classification Performance ◽

Recursive Feature Elimination ◽

Diagnostic Tools ◽

Support Vector ◽

K Nearest Neighbors ◽

Autism Screening

In the past two decades, several screening instruments were developed to detect toddlers who may be autistic both in clinical and unselected samples. Among others, the Quantitative CHecklist for Autism in Toddlers (Q-CHAT) is a quantitative and normally distributed measure of autistic traits that demonstrates good psychometric properties in different settings and cultures. Recently, machine learning (ML) has been applied to behavioral science to improve the classification performance of autism screening and diagnostic tools, but mainly in children, adolescents, and adults. In this study, we used ML to investigate the accuracy and reliability of the Q-CHAT in discriminating young autistic children from those without. Five different ML algorithms (random forest (RF), naïve Bayes (NB), support vector machine (SVM), logistic regression (LR), and K-nearest neighbors (KNN)) were applied to investigate the complete set of Q-CHAT items. Our results showed that ML achieved an overall accuracy of 90%, and the SVM was the most effective, being able to classify autism with 95% accuracy. Furthermore, using the SVM–recursive feature elimination (RFE) approach, we selected a subset of 14 items ensuring 91% accuracy, while 83% accuracy was obtained from the 3 best discriminating items in common to ours and the previously reported Q-CHAT-10. This evidence confirms the high performance and cross-cultural validity of the Q-CHAT, and supports the application of ML to create shorter and faster versions of the instrument, maintaining high classification accuracy, to be used as a quick, easy, and high-performance tool in primary-care settings.

Download Full-text

How Relevant Is It to Use Mineral Proxies to Mimic the Atmospheric Reactivity of Natural Dust Samples? A Reactivity Study Using SO2 as Probe Molecule

Minerals ◽

10.3390/min11030282 ◽

2021 ◽

Vol 11 (3) ◽

pp. 282

Author(s):

Darya Urupina ◽

Manolis N. Romanias ◽

Frederic Thevenet

Keyword(s):

High Performance ◽

Real Life ◽

Complex Mixtures ◽

Experimental Conditions ◽

The Real ◽

Atmospheric Processes ◽

Real Effects ◽

Mineral Aerosols ◽

Life Impact ◽

Reactivity Study

The experimental investigation of heterogeneous atmospheric processes involving mineral aerosols is extensively performed in the literature using proxy materials. In this work we questioned the validity of using proxies such as Fe2O3, FeOOH, Al2O3, MgO, CaO, TiO2, MnO2, SiO2, and CaCO3 to represent the behavior of complex mixtures of minerals, such as natural desert and volcanic dusts. Five volcanic dusts and three desert dusts were compared to a number of metal oxides, commonly used in the literature to mimic the behavior of desert dusts in the ability to form sulfites and sulfates on the surface exposed to SO2 gas. First, all samples were aged at room temperature, atmospheric pressure, under controlled experimental conditions of 175 ppm SO2 for 1 h under 30% of relative humidity. Second, they were extracted with 1% formalin and analyzed by High-Performance Liquid Chromatography (HPLC) to quantify and compare the amount of sulfites and sulfates formed on their surfaces. It was evidenced that under the experimental conditions of this study neither one selected pure oxide nor a mixture of oxides can adequately typify the behavior of complex mixtures of natural minerals. Therefore, to evaluate the real-life impact of natural dust on atmospheric processes it is of vital importance to work directly with the natural samples, both to observe the real effects of desert and volcanic dusts and to evaluate the relevancy of proposed proxies.

Download Full-text

Data Augmentation and Spectral Structure Features for Limited Samples Hyperspectral Classification

Remote Sensing ◽

10.3390/rs13040547 ◽

2021 ◽

Vol 13 (4) ◽

pp. 547

Author(s):

Wenning Wang ◽

Xuebin Liu ◽

Xuanqin Mou

Keyword(s):

Classification Accuracy ◽

Data Augmentation ◽

Classification Problem ◽

Classification Performance ◽

Spectral Structure ◽

Limited Sample ◽

Sample Classification ◽

Training Samples ◽

Traditional Classification ◽

Hyperspectral Classification

For both traditional classification and current popular deep learning methods, the limited sample classification problem is very challenging, and the lack of samples is an important factor affecting the classification performance. Our work includes two aspects. First, the unsupervised data augmentation for all hyperspectral samples not only improves the classification accuracy greatly with the newly added training samples, but also further improves the classification accuracy of the classifier by optimizing the augmented test samples. Second, an effective spectral structure extraction method is designed, and the effective spectral structure features have a better classification accuracy than the true spectral features.

Download Full-text

Application of Artificial Intelligence (AI) for Sustainable Highway and Road System

Symmetry ◽

10.3390/sym13010060 ◽

2020 ◽

Vol 13 (1) ◽

pp. 60

Author(s):

Md Arifuzzaman ◽

Muhammad Aniq Gul ◽

Kaffayatullah Khan ◽

S. M. Zakir Hossain

Keyword(s):

Experimental Data ◽

High Performance ◽

Adhesion Force ◽

Model Development ◽

Real Life ◽

Bayesian Optimization ◽

Support Vector ◽

Adhesive Properties ◽

Bayesian Optimization Algorithm ◽

The Mean

There are several environmental factors such as temperature differential, moisture, oxidation, etc. that affect the extended life of the modified asphalt influencing its desired adhesive properties. Knowledge of the properties of asphalt adhesives can help to provide a more resilient and durable asphalt surface. In this study, a hybrid of Bayesian optimization algorithm and support vector regression approach is recommended to predict the adhesion force of asphalt. The effects of three important variables viz., conditions (fresh, wet and aged), binder types (base, 4% SB, 5% SB, 4% SBS and 5% SBS), and Carbon Nano Tube doses (0.5%, 1.0% and 1.5%) on adhesive force are taken into consideration. Real-life experimental data (405 specimens) are considered for model development. Using atomic force microscopy, the adhesive strength of nanoscales of test specimens is determined according to functional groups on the asphalt. It is found that the model predictions overlap with the experimental data with a high R2 of 90.5% and relative deviation are scattered around zero line. Besides, the mean, median and standard deviations of experimental and the predicted values are very close. In addition, the mean absolute Error, root mean square error and fractional bias values were found to be low, indicating the high performance of the developed model.

Download Full-text

Design aspects of (super)hydrophobic mesh based oil-collecting device with improved efficiency

SN Applied Sciences ◽

10.1007/s42452-021-04179-2 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Weikang Xu ◽

Zhentao Zhang ◽

Xiaomei Cai ◽

Yazhen Hong ◽

Tianliang Lin ◽

...

Keyword(s):

3D Printing ◽

Water Surface ◽

High Performance ◽

Oil Spills ◽

Real Life ◽

Oily Wastewater ◽

Coating Materials ◽

The Past ◽

Oil Storage ◽

Future Success

AbstractEffective treatment of frequent oil spills and endless discharged oily wastewater is crucial for the ecosystem and human health. In the past two decades, the collection of oil from water surface has been widely studied through the simple fabrication of superhydrophobic meshes with various coating materials, but little attention is paid to the design aspects of the meshes based oil-collecting device and practical oil collection. Here, 3D-printing devices with different configurations of (super)hydrophobic meshes, circular truncated cone (CTC), cylinder and inverted CTC, and the same inverted cone-shaped structure (below the meshes for temporary oil storage) are investigated. Results demonstrate that the CTC meshes based device especially for an oblate one not only shows higher stability and discharge of the collected oils than previous reports, but also allows floating oils to enter the (super)hydrophobic mesh faster. We anticipate that future success in developing high-performance (super)hydrophobic meshes and the further optimization of the CTC mesh-based device parameters will make our proposed device more practical for the treatment of real-life oil spills.

Download Full-text

Supervised Domain Adaptation for Automated Semantic Segmentation of the Atrial Cavity

Entropy ◽

10.3390/e23070898 ◽

2021 ◽

Vol 23 (7) ◽

pp. 898

Author(s):

Marta Saiz-Vivó ◽

Adrián Colomer ◽

Carles Fonfría ◽

Luis Martí-Bonmatí ◽

Valery Naranjo

Keyword(s):

High Performance ◽

Computational Models ◽

Domain Adaptation ◽

Semantic Segmentation ◽

Patient Specific ◽

Mr Images ◽

Training Samples ◽

Volumetric Images ◽

Acquisition Costs ◽

Left And Right

Atrial fibrillation (AF) is the most common cardiac arrhythmia. At present, cardiac ablation is the main treatment procedure for AF. To guide and plan this procedure, it is essential for clinicians to obtain patient-specific 3D geometrical models of the atria. For this, there is an interest in automatic image segmentation algorithms, such as deep learning (DL) methods, as opposed to manual segmentation, an error-prone and time-consuming method. However, to optimize DL algorithms, many annotated examples are required, increasing acquisition costs. The aim of this work is to develop automatic and high-performance computational models for left and right atrium (LA and RA) segmentation from a few labelled MRI volumetric images with a 3D Dual U-Net algorithm. For this, a supervised domain adaptation (SDA) method is introduced to infer knowledge from late gadolinium enhanced (LGE) MRI volumetric training samples (80 LA annotated samples) to a network trained with balanced steady-state free precession (bSSFP) MR images of limited number of annotations (19 RA and LA annotated samples). The resulting knowledge-transferred model SDA outperformed the same network trained from scratch in both RA (Dice equals 0.9160) and LA (Dice equals 0.8813) segmentation tasks.

Download Full-text