A Novel Approach to Visualization of High-Dimensional Data by Pairwise Fusion Matrices Using t-SNE

Author(s):  
Mujtaba Husnain ◽  
Malik Muhammad Saad Missen ◽  
Shahzad Mumtaz ◽  
Muhammad Muzzamil Luqman ◽  
Mickael Coustaty ◽  
...  
2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Olivier B. Simon ◽  
Isabelle Buard ◽  
Donald C. Rojas ◽  
Samantha K. Holden ◽  
Benzi M. Kluger ◽  
...  

AbstractGraph theory-based approaches are efficient tools for detecting clustering and group-wise differences in high-dimensional data across a wide range of fields, such as gene expression analysis and neural connectivity. Here, we examine data from a cross-sectional, resting-state magnetoencephalography study of 89 Parkinson’s disease patients, and use minimum-spanning tree (MST) methods to relate severity of Parkinsonian cognitive impairment to neural connectivity changes. In particular, we implement the two-sample multivariate-runs test of Friedman and Rafsky (Ann Stat 7(4):697–717, 1979) and find it to be a powerful paradigm for distinguishing highly significant deviations from the null distribution in high-dimensional data. We also generalize this test for use with greater than two classes, and show its ability to localize significance to particular sub-classes. We observe multiple indications of altered connectivity in Parkinsonian dementia that may be of future use in diagnosis and prediction.


Symmetry ◽  
2019 ◽  
Vol 11 (1) ◽  
pp. 107 ◽  
Author(s):  
Mujtaba Husnain ◽  
Malik Missen ◽  
Shahzad Mumtaz ◽  
Muhammad Luqman ◽  
Mickaël Coustaty ◽  
...  

We applied t-distributed stochastic neighbor embedding (t-SNE) to visualize Urdu handwritten numerals (or digits). The data set used consists of 28 × 28 images of handwritten Urdu numerals. The data set was created by inviting authors from different categories of native Urdu speakers. One of the challenging and critical issues for the correct visualization of Urdu numerals is shape similarity between some of the digits. This issue was resolved using t-SNE, by exploiting local and global structures of the large data set at different scales. The global structure consists of geometrical features and local structure is the pixel-based information for each class of Urdu digits. We introduce a novel approach that allows the fusion of these two independent spaces using Euclidean pairwise distances in a highly organized and principled way. The fusion matrix embedded with t-SNE helps to locate each data point in a two (or three-) dimensional map in a very different way. Furthermore, our proposed approach focuses on preserving the local structure of the high-dimensional data while mapping to a low-dimensional plane. The visualizations produced by t-SNE outperformed other classical techniques like principal component analysis (PCA) and auto-encoders (AE) on our handwritten Urdu numeral dataset.


2015 ◽  
Vol 2015 ◽  
pp. 1-10 ◽  
Author(s):  
Jan Kalina ◽  
Anna Schlenker

The Minimum Redundancy Maximum Relevance (MRMR) approach to supervised variable selection represents a successful methodology for dimensionality reduction, which is suitable for high-dimensional data observed in two or more different groups. Various available versions of the MRMR approach have been designed to search for variables with the largest relevance for a classification task while controlling for redundancy of the selected set of variables. However, usual relevance and redundancy criteria have the disadvantages of being too sensitive to the presence of outlying measurements and/or being inefficient. We propose a novel approach called Minimum Regularized Redundancy Maximum Robust Relevance (MRRMRR), suitable for noisy high-dimensional data observed in two groups. It combines principles of regularization and robust statistics. Particularly, redundancy is measured by a new regularized version of the coefficient of multiple correlation and relevance is measured by a highly robust correlation coefficient based on the least weighted squares regression with data-adaptive weights. We compare various dimensionality reduction methods on three real data sets. To investigate the influence of noise or outliers on the data, we perform the computations also for data artificially contaminated by severe noise of various forms. The experimental results confirm the robustness of the method with respect to outliers.


2011 ◽  
Vol 366 ◽  
pp. 456-459 ◽  
Author(s):  
Jun Yang ◽  
Ying Long Wang

Detecting outliers in a large set of data objects is a major data mining task aiming at finding different mechanisms responsible for different groups of objects in a data set. In high-dimensional data, these approaches are bound to deteriorate due to the notorious “curse of dimensionality”. In this paper, we propose a novel approach named ODMC (Outlier Detection Based On Markov Chain),the effects of the “curse of dimensionality” are alleviated compared to purely distance-based approaches. A main advantage of our new approach is that our method is to use a major feature of an undirected weighted graph to calculate the outlier degree of each node, In a thorough experimental evaluation, we compare ODMC to the ABOD and FindFPOF for various artificial and real data set and show ODMC to perform especially well on high-dimensional data.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Tuba Koç

High-dimensional data sets frequently occur in several scientific areas, and special techniques are required to analyze these types of data sets. Especially, it becomes important to apply a suitable model in classification problems. In this study, a novel approach is proposed to estimate a statistical model for high-dimensional data sets. The proposed method uses analytical hierarchical process (AHP) and information criteria for determining the optimal PCs for the classification model. The high-dimensional “colon” and “gravier” datasets were used in evaluation part. Application results demonstrate that the proposed approach can be successfully used for modeling purposes.


Sign in / Sign up

Export Citation Format

Share Document