Dimensionality reduction techniques for multivariate data classification, interactive visualization, and analysis-systematic feature selection vs. extraction

Author(s):  
A. Konig
2011 ◽  
Vol 08 (02) ◽  
pp. 161-169
Author(s):  
E. SIVASANKAR ◽  
H. SRIDHAR ◽  
V. BALAKRISHNAN ◽  
K. ASHWIN ◽  
R. S. RAJESH

Data mining methods are used to mine voluminous data to find useful information from data. The data that is to be mined may have a large number of dimensions, so the mining process will take a lot of time. In general, the computation time is an exponential function of the number of dimensions. It is in this context that we use dimensionality reduction techniques to speed up the decision-making process. Dimensionality reduction techniques can be categorized as Feature Selection and Feature Extraction Techniques. In this paper we compare the two categories of dimensionality reduction techniques. Feature selection has been implemented using the Information Gain and Goodman–Kruskal measure. Principal Component Analysis has been used for Feature Extraction. In order to compare the accuracy of the methods, we have also implemented a classifier using back-propagation neural network. In general, it is found that feature extraction methods are more accurate than feature selection methods in the framework of credit risk analysis.


2020 ◽  
Vol 1 (2) ◽  
pp. 56-70 ◽  
Author(s):  
Rizgar Zebari ◽  
Adnan Abdulazeez ◽  
Diyar Zeebaree ◽  
Dilovan Zebari ◽  
Jwan Saeed

Due to sharp increases in data dimensions, working on every data mining or machine learning (ML) task requires more efficient techniques to get the desired results. Therefore, in recent years, researchers have proposed and developed many methods and techniques to reduce the high dimensions of data and to attain the required accuracy. To ameliorate the accuracy of learning features as well as to decrease the training time dimensionality reduction is used as a pre-processing step, which can eliminate irrelevant data, noise, and redundant features. Dimensionality reduction (DR) has been performed based on two main methods, which are feature selection (FS) and feature extraction (FE). FS is considered an important method because data is generated continuously at an ever-increasing rate; some serious dimensionality problems can be reduced with this method, such as decreasing redundancy effectively, eliminating irrelevant data, and ameliorating result comprehensibility. Moreover, FE transacts with the problem of finding the most distinctive, informative, and decreased set of features to ameliorate the efficiency of both the processing and storage of data. This paper offers a comprehensive approach to FS and FE in the scope of DR. Moreover, the details of each paper, such as used algorithms/approaches, datasets, classifiers, and achieved results are comprehensively analyzed and summarized. Besides, a systematic discussion of all of the reviewed methods to highlight authors' trends, determining the method(s) has been done, which significantly reduced computational time, and selecting the most accurate classifiers. As a result, the different types of both methods have been discussed and analyzed the findings.  


2015 ◽  
Vol 294 ◽  
pp. 553-564 ◽  
Author(s):  
Manuel Domínguez ◽  
Serafín Alonso ◽  
Antonio Morán ◽  
Miguel A. Prada ◽  
Juan J. Fuertes

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Van Hoan Do ◽  
Stefan Canzar

AbstractEmerging single-cell technologies profile multiple types of molecules within individual cells. A fundamental step in the analysis of the produced high-dimensional data is their visualization using dimensionality reduction techniques such as t-SNE and UMAP. We introduce j-SNE and j-UMAP as their natural generalizations to the joint visualization of multimodal omics data. Our approach automatically learns the relative contribution of each modality to a concise representation of cellular identity that promotes discriminative features but suppresses noise. On eight datasets, j-SNE and j-UMAP produce unified embeddings that better agree with known cell types and that harmonize RNA and protein velocity landscapes.


2019 ◽  
Vol 165 ◽  
pp. 104-111 ◽  
Author(s):  
S. Velliangiri ◽  
S. Alagumuthukrishnan ◽  
S Iwin Thankumar joseph

Sign in / Sign up

Export Citation Format

Share Document