Auto-weighted concept factorization for joint feature map and data representation learning

2021 ◽  
pp. 1-13
Author(s):  
Yikai Zhang ◽  
Yong Peng ◽  
Hongyu Bian ◽  
Yuan Ge ◽  
Feiwei Qin ◽  
...  

Concept factorization (CF) is an effective matrix factorization model which has been widely used in many applications. In CF, the linear combination of data points serves as the dictionary based on which CF can be performed in both the original feature space as well as the reproducible kernel Hilbert space (RKHS). The conventional CF treats each dimension of the feature vector equally during the data reconstruction process, which might violate the common sense that different features have different discriminative abilities and therefore contribute differently in pattern recognition. In this paper, we introduce an auto-weighting variable into the conventional CF objective function to adaptively learn the corresponding contributions of different features and propose a new model termed Auto-Weighted Concept Factorization (AWCF). In AWCF, on one hand, the feature importance can be quantitatively measured by the auto-weighting variable in which the features with better discriminative abilities are assigned larger weights; on the other hand, we can obtain more efficient data representation to depict its semantic information. The detailed optimization procedure to AWCF objective function is derived whose complexity and convergence are also analyzed. Experiments are conducted on both synthetic and representative benchmark data sets and the clustering results demonstrate the effectiveness of AWCF in comparison with the related models.

2021 ◽  
Vol 15 ◽  
Author(s):  
Zhikui Chen ◽  
Shan Jin ◽  
Runze Liu ◽  
Jianing Zhang

Nowadays, deep representations have been attracting much attention owing to the great performance in various tasks. However, the interpretability of deep representations poses a vast challenge on real-world applications. To alleviate the challenge, a deep matrix factorization method with non-negative constraints is proposed to learn deep part-based representations of interpretability for big data in this paper. Specifically, a deep architecture with a supervisor network suppressing noise in data and a student network learning deep representations of interpretability is designed, which is an end-to-end framework for pattern mining. Furthermore, to train the deep matrix factorization architecture, an interpretability loss is defined, including a symmetric loss, an apposition loss, and a non-negative constraint loss, which can ensure the knowledge transfer from the supervisor network to the student network, enhancing the robustness of deep representations. Finally, extensive experimental results on two benchmark datasets demonstrate the superiority of the deep matrix factorization method.


Author(s):  
Tingting Ren ◽  
Xiuyi Jia ◽  
Weiwei Li ◽  
Lei Chen ◽  
Zechao Li

Label distribution learning (LDL) is a novel machine learning paradigm to deal with label ambiguity issues by placing more emphasis on how relevant each label is to a particular instance. Many LDL algorithms have been proposed and most of them concentrate on the learning models, while few of them focus on the feature selection problem. All existing LDL models are built on a simple feature space in which all features are shared by all the class labels. However, this kind of traditional data representation strategy tends to select features that are distinguishable for all labels, but ignores label-specific features that are pertinent and discriminative for each class label. In this paper, we propose a novel LDL algorithm by leveraging label-specific features. The common features for all labels and specific features for each label are simultaneously learned to enhance the LDL model. Moreover, we also exploit the label correlations in the proposed LDL model. The experimental results on several real-world data sets validate the effectiveness of our method.


Author(s):  
K Sobha Rani

Collaborative filtering suffers from the problems of data sparsity and cold start, which dramatically degrade recommendation performance. To help resolve these issues, we propose TrustSVD, a trust-based matrix factorization technique. By analyzing the social trust data from four real-world data sets, we conclude that not only the explicit but also the implicit influence of both ratings and trust should be taken into consideration in a recommendation model. Hence, we build on top of a state-of-the-art recommendation algorithm SVD++ which inherently involves the explicit and implicit influence of rated items, by further incorporating both the explicit and implicit influence of trusted users on the prediction of items for an active user. To our knowledge, the work reported is the first to extend SVD++ with social trust information. Experimental results on the four data sets demonstrate that our approach TrustSVD achieves better accuracy than other ten counterparts, and can better handle the concerned issues.


2021 ◽  
Vol 16 (1) ◽  
pp. 1-24
Author(s):  
Yaojin Lin ◽  
Qinghua Hu ◽  
Jinghua Liu ◽  
Xingquan Zhu ◽  
Xindong Wu

In multi-label learning, label correlations commonly exist in the data. Such correlation not only provides useful information, but also imposes significant challenges for multi-label learning. Recently, label-specific feature embedding has been proposed to explore label-specific features from the training data, and uses feature highly customized to the multi-label set for learning. While such feature embedding methods have demonstrated good performance, the creation of the feature embedding space is only based on a single label, without considering label correlations in the data. In this article, we propose to combine multiple label-specific feature spaces, using label correlation, for multi-label learning. The proposed algorithm, mu lti- l abel-specific f eature space e nsemble (MULFE), takes consideration label-specific features, label correlation, and weighted ensemble principle to form a learning framework. By conducting clustering analysis on each label’s negative and positive instances, MULFE first creates features customized to each label. After that, MULFE utilizes the label correlation to optimize the margin distribution of the base classifiers which are induced by the related label-specific feature spaces. By combining multiple label-specific features, label correlation based weighting, and ensemble learning, MULFE achieves maximum margin multi-label classification goal through the underlying optimization framework. Empirical studies on 10 public data sets manifest the effectiveness of MULFE.


Sensors ◽  
2021 ◽  
Vol 21 (10) ◽  
pp. 3536
Author(s):  
Jakub Górski ◽  
Adam Jabłoński ◽  
Mateusz Heesch ◽  
Michał Dziendzikowski ◽  
Ziemowit Dworakowski

Condition monitoring is an indispensable element related to the operation of rotating machinery. In this article, the monitoring system for the parallel gearbox was proposed. The novelty detection approach is used to develop the condition assessment support system, which requires data collection for a healthy structure. The measured signals were processed to extract quantitative indicators sensitive to the type of damage occurring in this type of structure. The indicator’s values were used for the development of four different novelty detection algorithms. Presented novelty detection models operate on three principles: feature space distance, probability distribution, and input reconstruction. One of the distance-based models is adaptive, adjusting to new data flowing in the form of a stream. The authors test the developed algorithms on experimental and simulation data with a similar distribution, using the training set consisting mainly of samples generated by the simulator. Presented in the article results demonstrate the effectiveness of the trained models on both data sets.


2020 ◽  
pp. 1-11
Author(s):  
Mayamin Hamid Raha ◽  
Tonmoay Deb ◽  
Mahieyin Rahmun ◽  
Tim Chen

Face recognition is the most efficient image analysis application, and the reduction of dimensionality is an essential requirement. The curse of dimensionality occurs with the increase in dimensionality, the sample density decreases exponentially. Dimensionality Reduction is the process of taking into account the dimensionality of the feature space by obtaining a set of principal features. The purpose of this manuscript is to demonstrate a comparative study of Principal Component Analysis and Linear Discriminant Analysis methods which are two of the highly popular appearance-based face recognition projection methods. PCA creates a flat dimensional data representation that describes as much data variance as possible, while LDA finds the vectors that best discriminate between classes in the underlying space. The main idea of PCA is to transform high dimensional input space into the function space that displays the maximum variance. Traditional LDA feature selection is obtained by maximizing class differences and minimizing class distance.


Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1573
Author(s):  
Loris Nanni ◽  
Giovanni Minchio ◽  
Sheryl Brahnam ◽  
Gianluca Maguolo ◽  
Alessandra Lumini

Traditionally, classifiers are trained to predict patterns within a feature space. The image classification system presented here trains classifiers to predict patterns within a vector space by combining the dissimilarity spaces generated by a large set of Siamese Neural Networks (SNNs). A set of centroids from the patterns in the training data sets is calculated with supervised k-means clustering. The centroids are used to generate the dissimilarity space via the Siamese networks. The vector space descriptors are extracted by projecting patterns onto the similarity spaces, and SVMs classify an image by its dissimilarity vector. The versatility of the proposed approach in image classification is demonstrated by evaluating the system on different types of images across two domains: two medical data sets and two animal audio data sets with vocalizations represented as images (spectrograms). Results show that the proposed system’s performance competes competitively against the best-performing methods in the literature, obtaining state-of-the-art performance on one of the medical data sets, and does so without ad-hoc optimization of the clustering methods on the tested data sets.


2021 ◽  
Vol 58 (3) ◽  
pp. 102534
Author(s):  
Zhao Zhang ◽  
Yan Zhang ◽  
Mingliang Xu ◽  
Li Zhang ◽  
Yi Yang ◽  
...  

Author(s):  
Deepika Saini ◽  
Sanoj Kumar ◽  
Manoj K. Singh ◽  
Musrrat Ali

AbstractThe key job here in the presented work is to investigate the performance of Generalized Ant Colony Optimizer (GACO) model in order to evolve the shape of three dimensional free-form Non Uniform Rational B-Spline (NURBS) curve using stereo (two) views. GACO model is a blend of two well known meta-heuristic optimization algorithms known as Simple Ant Colony and Global Ant Colony Optimization algorithms. Basically, the work talks about the solution of NURBS-fitting based reconstruction process. Therefore, GACO model is used to optimize the NURBS parameters (control points and weights) by minimizing the weighted least-square errors between the data points and the fitted NURBS curve. The algorithm is applied by first assuming some pre-fixed values of NURBS parameters. The experiments clearly show that the optimization procedure is a better option in a case where good initial locations of parameters are selected. A detailed experimental analysis is given in support of our algorithm. The implemented error analysis shows that the proposed methodology perform better as compared to the conventional methods.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Yu Wang

Feature space heterogeneity often exists in many real world data sets so that some features are of different importance for classification over different subsets. Moreover, the pattern of feature space heterogeneity might dynamically change over time as more and more data are accumulated. In this paper, we develop an incremental classification algorithm, Supervised Clustering for Classification with Feature Space Heterogeneity (SCCFSH), to address this problem. In our approach, supervised clustering is implemented to obtain a number of clusters such that samples in each cluster are from the same class. After the removal of outliers, relevance of features in each cluster is calculated based on their variations in this cluster. The feature relevance is incorporated into distance calculation for classification. The main advantage of SCCFSH lies in the fact that it is capable of solving a classification problem with feature space heterogeneity in an incremental way, which is favorable for online classification tasks with continuously changing data. Experimental results on a series of data sets and application to a database marketing problem show the efficiency and effectiveness of the proposed approach.


Sign in / Sign up

Export Citation Format

Share Document