Out-of-sample data visualization using bi-kernel t-SNE

2020 ◽  
pp. 147387162097820
Author(s):  
Haili Zhang ◽  
Pu Wang ◽  
Xuejin Gao ◽  
Yongsheng Qi ◽  
Huihui Gao

T-distributed stochastic neighbor embedding (t-SNE) is an effective visualization method. However, it is non-parametric and cannot be applied to steaming data or online scenarios. Although kernel t-SNE provides an explicit projection from a high-dimensional data space to a low-dimensional feature space, some outliers are not well projected. In this paper, bi-kernel t-SNE is proposed for out-of-sample data visualization. Gaussian kernel matrices of the input and feature spaces are used to approximate the explicit projection. Then principal component analysis is applied to reduce the dimensionality of the feature kernel matrix. Thus, the difference between inliers and outliers is revealed. And any new sample can be well mapped. The performance of the proposed method for out-of-sample projection is tested on several benchmark datasets by comparing it with other state-of-the-art algorithms.

2019 ◽  
Vol 2019 ◽  
pp. 1-19
Author(s):  
Mingai Li ◽  
Hongwei Xi ◽  
Xiaoqing Zhu

Due to the nonlinear and high-dimensional characteristics of motor imagery electroencephalography (MI-EEG), it can be challenging to get high online accuracy. As a nonlinear dimension reduction method, landmark maximum variance unfolding (L-MVU) can completely retain the nonlinear features of MI-EEG. However, L-MVU still requires considerable computation costs for out-of-sample data. An incremental version of L-MVU (denoted as IL-MVU) is proposed in this paper. The low-dimensional representation of the training data is generated by L-MVU. For each out-of-sample data, its nearest neighbors will be found in the high-dimensional training samples and the corresponding reconstruction weight matrix be calculated to generate its low-dimensional representation as well. IL-MVU is further combined with the dual-tree complex wavelet transform (DTCWT), which develops a hybrid feature extraction method (named as IL-MD). IL-MVU is applied to extract the nonlinear features of the specific subband signals, which are reconstructed by DTCWT and have the obvious event-related synchronization/event-related desynchronization phenomenon. The average energy features of α and β waves are calculated simultaneously. The two types of features are fused and are evaluated by a linear discriminant analysis classifier. Based on the two public datasets with 12 subjects, extensive experiments were conducted. The average recognition accuracies of 10-fold cross-validation are 92.50% on Dataset 3b and 88.13% on Dataset 2b, and they gain at least 1.43% and 3.45% improvement, respectively, compared to existing methods. The experimental results show that IL-MD can extract more accurate features with relatively lower consumption cost, and it also has better feature visualization and self-adaptive characteristics to subjects. The t-test results and Kappa values suggest the proposed feature extraction method reaches statistical significance and has high consistency in classification.


2014 ◽  
Vol 2014 ◽  
pp. 1-11 ◽  
Author(s):  
Jimmy C. Azar ◽  
Martin Simonsson ◽  
Ewert Bengtsson ◽  
Anders Hast

Comparing staining patterns of paired antibodies designed towards a specific protein but toward different epitopes of the protein provides quality control over the binding and the antibodies’ ability to identify the target protein correctly and exclusively. We present a method for automated quantification of immunostaining patterns for antibodies in breast tissue using the Human Protein Atlas database. In such tissue, dark brown dye 3,3′-diaminobenzidine is used as an antibody-specific stain whereas the blue dye hematoxylin is used as a counterstain. The proposed method is based on clustering and relative scaling of features following principal component analysis. Our method is able (1) to accurately segment and identify staining patterns and quantify the amount of staining and (2) to detect paired antibodies by correlating the segmentation results among different cases. Moreover, the method is simple, operating in a low-dimensional feature space, and computationally efficient which makes it suitable for high-throughput processing of tissue microarrays.


2021 ◽  
Author(s):  
Faizan Ur Rahman ◽  
Soosan Beheshti

Transforming data to feature space using a kernel function can result in better expression of its features, resulting in better separability for some datasets. The parameters of the kernel function govern the structure of data in feature space and need to be optimized simultaneously while also estimating the number of clusters in a dataset. The proposed method denoted by kernel k-Minimum Average Central Error (kernel k-MACE), esti- mates the number of clusters in a dataset while simultaneously clustering the dataset in feature space by finding the optimum value of the Gaussian kernel parameter σk. A cluster initialization technique has also been proposed based on an existing method for k-means clustering. Simulations show that for self-generated datasets with Gaus- sian clusters having 10% - 50% overlap and for real benchmark datasets, the proposed method outperforms multiple state-of-the-art unsupervised clustering methods including k-MACE, the clustering scheme that inspired kernel k-MACE.


2021 ◽  
Author(s):  
Faizan Ur Rahman ◽  
Soosan Beheshti

Transforming data to feature space using a kernel function can result in better expression of its features, resulting in better separability for some datasets. The parameters of the kernel function govern the structure of data in feature space and need to be optimized simultaneously while also estimating the number of clusters in a dataset. The proposed method denoted by kernel k-Minimum Average Central Error (kernel k-MACE), esti- mates the number of clusters in a dataset while simultaneously clustering the dataset in feature space by finding the optimum value of the Gaussian kernel parameter σk. A cluster initialization technique has also been proposed based on an existing method for k-means clustering. Simulations show that for self-generated datasets with Gaus- sian clusters having 10% - 50% overlap and for real benchmark datasets, the proposed method outperforms multiple state-of-the-art unsupervised clustering methods including k-MACE, the clustering scheme that inspired kernel k-MACE.


Author(s):  
S. Schmitz ◽  
U. Weidner ◽  
H. Hammer ◽  
A. Thiele

Abstract. In this paper, the nonlinear dimension reduction algorithm Uniform Manifold Approximation and Projection (UMAP) is investigated to visualize information contained in high dimensional feature representations of Polarimetric Interferometric Synthetic Aperture Radar (PolInSAR) data. Based on polarimetric parameters, target decomposition methods and interferometric coherences a wide range of features is extracted that spans the high dimensional feature space. UMAP is applied to determine a representation of the data in 2D and 3D euclidean space, preserving local and global structures of the data and still suited for classification. The performance of UMAP in terms of generating expressive visualizations is evaluated on PolInSAR data acquired by the F-SAR sensor and compared to that of Principal Component Analysis (PCA), Laplacian Eigenmaps (LE) and t-distributed Stochastic Neighbor embedding (t-SNE). For this purpose, a visual analysis of 2D embeddings is performed. In addition, a quantitative analysis is provided for evaluating the preservation of information in low dimensional representations with respect to separability of different land cover classes. The results show that UMAP exceeds the capability of PCA and LE in these regards and is competitive with t-SNE.


2020 ◽  
Vol 2020 ◽  
pp. 1-21 ◽  
Author(s):  
Hong Yang ◽  
Yasheng Zhang ◽  
Wenzhe Ding

Feature extraction is the key step of Inverse Synthetic Aperture Radar (ISAR) image recognition. However, limited by the cost and conditions of ISAR image acquisition, it is relatively difficult to obtain large-scale sample data, which makes it difficult to obtain target deep features with good discriminability by using the currently popular deep learning method. In this paper, a new method for low-dimensional, strongly robust, and fast space target ISAR image recognition based on local and global structural feature fusion is proposed. This method performs the trace transformation along the longest axis of the ISAR image to generate the global trace feature of the space target ISAR image. By introducing the local structural feature, Local Binary Pattern (LBP), the complementary fusion of the global and local features is achieved, which makes up for the missing structural information of the trace feature and ensures the integrity of the ISAR image feature information. The representation of trace and LBP features in a low-dimensional mapping feature space is found by using the manifold learning method. Under the condition of maintaining the local neighborhood relationship in the original feature space, the effective fusion of trace and LBP features is achieved. So, in the practical application process, the target recognition accuracy is no longer affected by trace function, LBP feature block number selection, and other factors, realizing the high robustness of the algorithm. To verify the effectiveness of the proposed algorithm, an ISAR image database containing 1325 samples of 5 types of space targets is used for experiments. The results show that the classification accuracy of the 5 types of space targets can reach more than 99%, and the recognition accuracy is no longer affected by the trace feature and LBP feature selection, which has strong robustness. The proposed method provides a fast and effective high-precision model for space target feature extraction, which can give some references for solving the problem of space object efficient identification under the condition of small sample data.


Author(s):  
Dongjing Shan ◽  
Chao Zhang

In this paper, we propose a prior fusion and feature transformation-based principal component analysis (PCA) method for saliency detection. It relies on the inner statistics of the patches in the image for identifying unique patterns, and all the processes are done only once. First, three low-level priors are incorporated and act as guidance cues in the model; second, to ensure the validity of PCA distinctness model, a linear transform for the feature space is designed and needs to be trained; furthermore, an extended optimization framework is utilized to generate a smoothed saliency map based on the consistency of the adjacent patches. We compare three versions of our model with seven previous methods and test them on several benchmark datasets. Different kinds of strategies are adopted to evaluate the performance and the results demonstrate that our model achieves the state-of-the-art performance.


2018 ◽  
Author(s):  
Toni Bakhtiar

Kernel Principal Component Analysis (Kernel PCA) is a generalization of the ordinary PCA which allows mapping the original data into a high-dimensional feature space. The mapping is expected to address the issues of nonlinearity among variables and separation among classes in the original data space. The key problem in the use of kernel PCA is the parameter estimation used in kernel functions that so far has not had quite obvious guidance, where the parameter selection mainly depends on the objectivity of the research. This study exploited the use of Gaussian kernel function and focused on the ability of kernel PCA in visualizing the separation of the classified data. Assessments were undertaken based on misclassification obtained by Fisher Discriminant Linear Analysis of the first two principal components. This study results suggest for the visualization of kernel PCA by selecting the parameter in the interval between the closest and the furthest distances among the objects of original data is better than that of ordinary PCA.


2021 ◽  
Author(s):  
Usman Muhammad ◽  
Md Ziaul Hoque ◽  
Mourad Oussalah ◽  
Anja Keskinarkaus ◽  
Tapio Seppänen ◽  
...  

<p>COVID-19 is a rapidly spreading viral disease and has affected over 100 countries worldwide. The numbers of casualties and infected cases have been escalated particularly in vulnerable states with weakened healthcare systems. Recently, reverse transcription-polymerase chain reaction (RT-PCR) is the test of choice for diagnosing COVID-19. However, current evidence suggests that COVID-19 infected patients are mostly stimulated from a lung infection after coming in contact with this virus. Therefore, chest X-ray (i.e., radiography) and chest CT can be a surrogate in some countries where PCR is not readily available. This has forced the scientific community to detect COVID-19 infection from X-ray images and recently proposed machine learning methods offer great promise for fast and accurate detection. Deep learning with convolutional neural networks (CNNs) has been successfully applied to radiological imaging for improving the accuracy of diagnosis. However, the performance remains limited due to the lack of representative X-ray images available in public benchmark datasets. To alleviate this issue, we propose an attention mechanism for data augmentation in the feature space rather than in the data space using reconstruction independent component analysis (RICA). Specifically, a unified architecture is proposed which contains a deep convolutional neural network (CNN), an attention mechanism, and a bidirectional LSTM (BiLSTM). The CNN provides the high-level features extracted at the pooling layer where the attention mechanism chooses the most relevant features and generates low-dimensional augmented features. Finally, BiLSTM is used to classify the processed sequential information. We conducted experiments on two publicly available databases to show that the proposed approach achieves the state-of-the-art results with an accuracy of 97% and 84% while being able to generate explanations. Explainability analysis has been carried out using feature visualization through PCA projection and t-SNE plots.<br></p>


2013 ◽  
Vol 2013 ◽  
pp. 1-9 ◽  
Author(s):  
Junlin Hu ◽  
Liang Wang ◽  
Fuqing Duan ◽  
Ping Guo

Scene classification is a challenging problem in computer vision applications and can be used to model and analyze a special complex system, the internet community. The spatial PACT (Principal component Analysis of Census Transform histograms) is a promising representation for recognizing instances and categories of scenes. However, since the original spatial PACT only simply concatenates compact census transform histograms at all levels together, all levels have the same contribution, which ignores the difference among various levels. In order to ameliorate this point, we propose an adaptive multilevel kernel machine method for scene classification. Firstly, it computes a set of basic kernels at each level. Secondly, an effective adaptive weight learning scheme is employed to find the optimal weights for best fusing all these base kernels. Finally, support vector machine with the optimal kernel is used for scene classification. Experiments on two popular benchmark datasets demonstrate that the proposed adaptive multilevel kernel machine method outperforms the original spatial PACT. Moreover, the proposed method is simple and easy to implement.


Sign in / Sign up

Export Citation Format

Share Document