CGD: Multi-View Clustering via Cross-View Graph Diffusion

Graph based multi-view clustering has been paid great attention by exploring the neighborhood relationship among data points from multiple views. Though achieving great success in various applications, we observe that most of previous methods learn a consensus graph by building certain data representation models, which at least bears the following drawbacks. First, their clustering performance highly depends on the data representation capability of the model. Second, solving these resultant optimization models usually results in high computational complexity. Third, there are often some hyper-parameters in these models need to tune for obtaining the optimal results. In this work, we propose a general, effective and parameter-free method with convergence guarantee to learn a unified graph for multi-view data clustering via cross-view graph diffusion (CGD), which is the first attempt to employ diffusion process for multi-view clustering. The proposed CGD takes the traditional predefined graph matrices of different views as input, and learns an improved graph for each single view via an iterative cross diffusion process by 1) capturing the underlying manifold geometry structure of original data points, and 2) leveraging the complementary information among multiple graphs. The final unified graph used for clustering is obtained by averaging the improved view associated graphs. Extensive experiments on several benchmark datasets are conducted to demonstrate the effectiveness of the proposed method in terms of seven clustering evaluation metrics.

Download Full-text

Finding the Number of Clusters in Data and Better Initial Centers for K-means Algorithm

International Journal of Intelligent Systems and Applications ◽

10.5815/ijisa.2020.06.01 ◽

2020 ◽

Vol 12 (6) ◽

pp. 1-20

Author(s):

Ahmed Fahim ◽

Keyword(s):

Data Clustering ◽

Linear Time ◽

Original Data ◽

Local Minima ◽

Expected Number ◽

Open Problems ◽

Number Of Clusters ◽

Benchmark Datasets ◽

Selection Of

The k-means is the most well-known algorithm for data clustering in data mining. Its simplicity and speed of convergence to local minima are the most important advantages of it, in addition to its linear time complexity. The most important open problems in this algorithm are the selection of initial centers and the determination of the exact number of clusters in advance. This paper proposes a solution for these two problems together; by adding a preprocess step to get the expected number of clusters in data and better initial centers. There are many researches to solve each of these problems separately, but there is no research to solve both problems together. The preprocess step requires o(n log n); where n is size of the dataset. This preprocess step aims to get initial portioning of data without determining the number of clusters in advance, then computes the means of initial clusters. After that we apply k-means on original data using the resulting information from the preprocess step to get the final clusters. We use many benchmark datasets to test the proposed method. The experimental results show the efficiency of the proposed method.

Download Full-text

DIMENSIONALITY REDUCTION VIA AN ORTHOGONAL AUTOENCODER APPROACH FOR HYPERSPECTRAL IMAGE CLASSIFICATION

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b3-2020-357-2020 ◽

2020 ◽

Vol XLIII-B3-2020 ◽

pp. 357-362

Author(s):

V. H. Ayma ◽

V. A. Ayma ◽

J. Gutierrez

Keyword(s):

Dimensionality Reduction ◽

Hyperspectral Image ◽

Original Data ◽

Data Representation ◽

Hyperspectral Data ◽

Attractive Alternative ◽

Kennedy Space Center ◽

Classification Tasks ◽

Multiple Levels ◽

Kennedy Space

Abstract. Nowadays, the increasing amount of information provided by hyperspectral sensors requires optimal solutions to ease the subsequent analysis of the produced data. A common issue in this matter relates to the hyperspectral data representation for classification tasks. Existing approaches address the data representation problem by performing a dimensionality reduction over the original data. However, mining complementary features that reduce the redundancy from the multiple levels of hyperspectral images remains challenging. Thus, exploiting the representation power of neural networks based techniques becomes an attractive alternative in this matter. In this work, we propose a novel dimensionality reduction implementation for hyperspectral imaging based on autoencoders, ensuring the orthogonality among features to reduce the redundancy in hyperspectral data. The experiments conducted on the Pavia University, the Kennedy Space Center, and Botswana hyperspectral datasets evidence such representation power of our approach, leading to better classification performances compared to traditional hyperspectral dimensionality reduction algorithms.

Download Full-text

Distributionally Adversarial Attack

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33012253 ◽

2019 ◽

Vol 33 ◽

pp. 2253-2260 ◽

Cited By ~ 10

Author(s):

Tianhang Zheng ◽

Changyou Chen ◽

Kui Ren

Keyword(s):

Data Distribution ◽

Original Data ◽

First Order ◽

Risk Optimization ◽

Wide Range ◽

Data Points ◽

Adversarial Attack ◽

Projected Gradient Descent ◽

Original Objective ◽

Direct Dependency

Recent work on adversarial attack has shown that Projected Gradient Descent (PGD) Adversary is a universal first-order adversary, and the classifier adversarially trained by PGD is robust against a wide range of first-order attacks. It is worth noting that the original objective of an attack/defense model relies on a data distribution p(x), typically in the form of risk maximization/minimization, e.g., max/min Ep(x) L(x) with p(x) some unknown data distribution and L(·) a loss function. However, since PGD generates attack samples independently for each data sample based on L(·), the procedure does not necessarily lead to good generalization in terms of risk optimization. In this paper, we achieve the goal by proposing distributionally adversarial attack (DAA), a framework to solve an optimal adversarial-data distribution, a perturbed distribution that satisfies the L∞ constraint but deviates from the original data distribution to increase the generalization risk maximally. Algorithmically, DAA performs optimization on the space of potential data distributions, which introduces direct dependency between all data points when generating adversarial samples. DAA is evaluated by attacking state-of-the-art defense models, including the adversarially-trained models provided by MIT MadryLab. Notably, DAA ranks the first place on MadryLab’s white-box leaderboards, reducing the accuracy of their secret MNIST model to 88.56% (with l∞ perturbations of ε = 0.3) and the accuracy of their secret CIFAR model to 44.71% (with l∞ perturbations of ε = 8.0). Code for the experiments is released on https://github.com/tianzheng4/Distributionally-Adversarial-Attack.

Download Full-text

Robust hypergraph regularized non-negative matrix factorization for sample clustering and feature selection in multi-view gene expression data

Human Genomics ◽

10.1186/s40246-019-0222-6 ◽

2019 ◽

Vol 13 (S1) ◽

Cited By ~ 1

Author(s):

Na Yu ◽

Ying-Lian Gao ◽

Jin-Xing Liu ◽

Juan Wang ◽

Junliang Shang

Keyword(s):

Feature Selection ◽

Matrix Factorization ◽

Gene Selection ◽

Matrix Decomposition ◽

Original Data ◽

Data Representation ◽

Abnormal Expression ◽

Sample Points ◽

Laplacian Regularization ◽

Non Negative Matrix Factorization

Abstract Background As one of the most popular data representation methods, non-negative matrix decomposition (NMF) has been widely concerned in the tasks of clustering and feature selection. However, most of the previously proposed NMF-based methods do not adequately explore the hidden geometrical structure in the data. At the same time, noise and outliers are inevitably present in the data. Results To alleviate these problems, we present a novel NMF framework named robust hypergraph regularized non-negative matrix factorization (RHNMF). In particular, the hypergraph Laplacian regularization is imposed to capture the geometric information of original data. Unlike graph Laplacian regularization which captures the relationship between pairwise sample points, it captures the high-order relationship among more sample points. Moreover, the robustness of the RHNMF is enhanced by using the L2,1-norm constraint when estimating the residual. This is because the L2,1-norm is insensitive to noise and outliers. Conclusions Clustering and common abnormal expression gene (com-abnormal expression gene) selection are conducted to test the validity of the RHNMF model. Extensive experimental results on multi-view datasets reveal that our proposed model outperforms other state-of-the-art methods.

Download Full-text

WS-AM: Weakly Supervised Attention Map for Scene Recognition

Electronics ◽

10.3390/electronics8101072 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1072 ◽

Cited By ~ 3

Author(s):

Shifeng Xia ◽

Jiexian Zeng ◽

Lu Leng ◽

Xiang Fu

Keyword(s):

Large Scale ◽

Multiple Scales ◽

Scene Recognition ◽

Small Scale ◽

Great Success ◽

Global Feature ◽

Benchmark Datasets ◽

Local Mean ◽

Weakly Supervised ◽

Fully Connected

Recently, convolutional neural networks (CNNs) have achieved great success in scene recognition. Compared with traditional hand-crafted features, CNN can be used to extract more robust and generalized features for scene recognition. However, the existing scene recognition methods based on CNN do not sufficiently take into account the relationship between image regions and categories when choosing local regions, which results in many redundant local regions and degrades recognition accuracy. In this paper, we propose an effective method for exploring discriminative regions of the scene image. Our method utilizes the gradient-weighted class activation mapping (Grad-CAM) technique and weakly supervised information to generate the attention map (AM) of scene images, dubbed WS-AM—weakly supervised attention map. The regions, where the local mean and the local center value are both large in the AM, correspond to the discriminative regions helpful for scene recognition. We sampled discriminative regions on multiple scales and extracted the features of large-scale and small-scale regions with two different pre-trained CNNs, respectively. The features from two different scales were aggregated by the improved vector of locally aggregated descriptor (VLAD) coding and max pooling, respectively. Finally, the pre-trained CNN was used to extract the global feature of the image in the fully- connected (fc) layer, and the local features were combined with the global feature to obtain the image representation. We validated the effectiveness of our method on three benchmark datasets: MIT Indoor 67, Scene 15, and UIUC Sports, and obtained 85.67%, 94.80%, and 95.12% accuracy, respectively. Compared with some state-of-the-art methods, the WS-AM method requires fewer local regions, so it has a better real-time performance.

Download Full-text

Deep Discriminative Representation Learning with Attention Map for Scene Classification

Remote Sensing ◽

10.3390/rs12091366 ◽

2020 ◽

Vol 12 (9) ◽

pp. 1366 ◽

Cited By ~ 5

Author(s):

Jun Li ◽

Daoyu Lin ◽

Yang Wang ◽

Guangluan Xu ◽

Yunyan Zhang ◽

...

Keyword(s):

Remote Sensing ◽

Feature Fusion ◽

Representation Learning ◽

Classification Performance ◽

Great Success ◽

Scene Classification ◽

Remote Sensing Images ◽

Discriminative Ability ◽

Feature Representations ◽

Benchmark Datasets

In recent years, convolutional neural networks (CNNs) have shown great success in the scene classification of computer vision images. Although these CNNs can achieve excellent classification accuracy, the discriminative ability of feature representations extracted from CNNs is still limited in distinguishing more complex remote sensing images. Therefore, we propose a unified feature fusion framework based on attention mechanism in this paper, which is called Deep Discriminative Representation Learning with Attention Map (DDRL-AM). Firstly, by applying Gradient-weighted Class Activation Mapping (Grad-CAM) algorithm, attention maps associated with the predicted results are generated in order to make CNNs focus on the most salient parts of the image. Secondly, a spatial feature transformer (SFT) is designed to extract discriminative features from attention maps. Then an innovative two-channel CNN architecture is proposed by the fusion of features extracted from attention maps and the RGB (red green blue) stream. A new objective function that considers both center and cross-entropy loss are optimized to decrease the influence of inter-class dispersion and within-class variance. In order to show its effectiveness in classifying remote sensing images, the proposed DDRL-AM method is evaluated on four public benchmark datasets. The experimental results demonstrate the competitive scene classification performance of the DDRL-AM approach. Moreover, the visualization of features extracted by the proposed DDRL-AM method can prove that the discriminative ability of features has been increased.

Download Full-text

Linear Manifold Regularization with Adaptive Graph for Semi-supervised Dimensionality Reduction

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/439 ◽

2017 ◽

Cited By ~ 4

Author(s):

Kai Xiong ◽

Feiping Nie ◽

Junwei Han

Keyword(s):

Dimensionality Reduction ◽

Objective Function ◽

Linear Manifold ◽

Original Data ◽

Manifold Regularization ◽

Redundant Information ◽

Novel Approach ◽

Benchmark Datasets ◽

Optimal Graph ◽

Subsequent Task

Many previous graph-based methods perform dimensionality reduction on a pre-defined graph. However, due to the noise and redundant information in the original data, the pre-defined graph has no clear structure and may not be appropriate for the subsequent task. To overcome the drawbacks, in this paper, we propose a novel approach called linear manifold regularization with adaptive graph (LMRAG) for semi-supervised dimensionality reduction. LMRAG directly incorporates the graph construction into the objective function, thus the projection matrix and the optimal graph can be simultaneously optimized. Due to the structure constraint, the learned graph is sparse and has clear structure. Extensive experiments on several benchmark datasets demonstrate the effectiveness of the proposed method.

Download Full-text

Quantitative Toxicity Prediction via Ensembling of Heterogeneous Predictors

10.21203/rs.2.19338/v1 ◽

2019 ◽

Author(s):

Abdul Karim ◽

Vahid Riahi ◽

Avinash Mishra ◽

Abdollah Dehzangi ◽

M. A. Hakim Newton ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Prediction Models ◽

Individual Performance ◽

Learning Model ◽

Data Representation ◽

Toxicity Prediction ◽

Machine Learning Model ◽

Machine Learning Approach ◽

Benchmark Datasets

Abstract Representing molecules in the form of only one type of features and using those features to predict their activities is one of the most important approaches for machine-learning-based chemical-activity-prediction. For molecular activities like quantitative toxicity prediction, the performance depends on the type of features extracted and the machine learning approach used. For such cases, using one type of features and machine learning model restricts the prediction performance to specific representation and model used. In this paper, we study quantitative toxicity prediction and propose a machine learning model for the same. Our model uses an ensemble of heterogeneous predictors instead of typically using homogeneous predictors. The predictors that we use vary either on the type of features used or on the deep learning architecture employed. Each of these predictors presumably has its own strengths and weaknesses in terms of toxicity prediction. Our motivation is to make a combined model that utilizes different types of features and architectures to obtain better collective performance that could go beyond the performance of each individual predictor. We use six predictors in our model and test the model on four standard quantitative toxicity benchmark datasets. Experimental results show that our model outperforms the state-of-the-art toxicity prediction models in 8 out of 12 accuracy measures. Our experiments show that ensembling heterogeneous predictor improves the performance over single predictors and homogeneous ensembling of single predictors.The results show that each data representation or deep learning based predictor has its own strengths and weaknesses, thus employing a model ensembling multiple heterogeneous predictors could go beyond individual performance of each data representation or each predictor type.

Download Full-text

A Deep Non-negative Matrix Factorization Model for Big Data Representation Learning

Frontiers in Neurorobotics ◽

10.3389/fnbot.2021.701194 ◽

2021 ◽

Vol 15 ◽

Author(s):

Zhikui Chen ◽

Shan Jin ◽

Runze Liu ◽

Jianing Zhang

Keyword(s):

Big Data ◽

Matrix Factorization ◽

Pattern Mining ◽

Representation Learning ◽

Factorization Method ◽

Data Representation ◽

Deep Architecture ◽

Factorization Model ◽

Symmetric Loss ◽

Benchmark Datasets

Nowadays, deep representations have been attracting much attention owing to the great performance in various tasks. However, the interpretability of deep representations poses a vast challenge on real-world applications. To alleviate the challenge, a deep matrix factorization method with non-negative constraints is proposed to learn deep part-based representations of interpretability for big data in this paper. Specifically, a deep architecture with a supervisor network suppressing noise in data and a student network learning deep representations of interpretability is designed, which is an end-to-end framework for pattern mining. Furthermore, to train the deep matrix factorization architecture, an interpretability loss is defined, including a symmetric loss, an apposition loss, and a non-negative constraint loss, which can ensure the knowledge transfer from the supervisor network to the student network, enhancing the robustness of deep representations. Finally, extensive experimental results on two benchmark datasets demonstrate the superiority of the deep matrix factorization method.

Download Full-text

Self-Ensembling Attention Networks: Addressing Domain Shift for Semantic Segmentation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015581 ◽

2019 ◽

Vol 33 ◽

pp. 5581-5588 ◽

Cited By ~ 3

Author(s):

Yonghao Xu ◽

Bo Du ◽

Lefei Zhang ◽

Qian Zhang ◽

Guoli Wang ◽

...

Keyword(s):

Domain Adaptation ◽

State Of The Art ◽

Semantic Segmentation ◽

Great Success ◽

Learning Models ◽

Target Domain ◽

Attention Networks ◽

Source Domain ◽

Benchmark Datasets ◽

Different Levels

Recent years have witnessed the great success of deep learning models in semantic segmentation. Nevertheless, these models may not generalize well to unseen image domains due to the phenomenon of domain shift. Since pixel-level annotations are laborious to collect, developing algorithms which can adapt labeled data from source domain to target domain is of great significance. To this end, we propose self-ensembling attention networks to reduce the domain gap between different datasets. To the best of our knowledge, the proposed method is the first attempt to introduce selfensembling model to domain adaptation for semantic segmentation, which provides a different view on how to learn domain-invariant features. Besides, since different regions in the image usually correspond to different levels of domain gap, we introduce the attention mechanism into the proposed framework to generate attention-aware features, which are further utilized to guide the calculation of consistency loss in the target domain. Experiments on two benchmark datasets demonstrate that the proposed framework can yield competitive performance compared with the state of the art methods.

Download Full-text