Dimensionality Reduction Techniques for High-dimensional Data in Precision Agriculture

High Dimensional Data ◽

Principal Component ◽

Component Analysis ◽

High Dimensional ◽

Reduction Techniques ◽

Probabilistic Principal Component Analysis

Low Dimensional ◽

Data mining is one of the major areas of research. Clustering is one of the main functionalities of datamining. High dimensionality is one of the main issues of clustering and Dimensionality reduction can be used as a solution to this problem. The present work makes a comparative study of dimensionality reduction techniques such as t-distributed stochastic neighbour embedding and probabilistic principal component analysis in the context of clustering. High dimensional data have been reduced to low dimensional data using dimensionality reduction techniques such as t-distributed stochastic neighbour embedding and probabilistic principal component analysis. Cluster analysis has been performed on the high dimensional data as well as the low dimensional data sets obtained through t-distributed stochastic neighbour embedding and Probabilistic principal component analysis with varying number of clusters. Mean squared error; time and space have been considered as parameters for comparison. The results obtained show that time taken to convert the high dimensional data into low dimensional data using probabilistic principal component analysis is higher than the time taken to convert the high dimensional data into low dimensional data using t-distributed stochastic neighbour embedding.The space required by the data set reduced through Probabilistic principal component analysis is less than the storage space required by the data set reduced through t-distributed stochastic neighbour embedding.

2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI) ◽

Performance Evaluation of Dimensionality Reduction Techniques on High Dimensional Data

10.1109/icoei.2019.8862526 ◽

2019 ◽

Author(s):

Mandikal Vikram ◽

Rakesh Pavan ◽

Navadiya Dhruvikkumar Dineshbhai ◽

Biju Mohan

Keyword(s):

Performance Evaluation ◽

High Dimensional Data ◽

High Dimensional ◽

Reduction Techniques ◽

Overview and comparative study of dimensionality reduction techniques for high dimensional data

Information Fusion ◽

10.1016/j.inffus.2020.01.005 ◽

2020 ◽

Vol 59 ◽

pp. 44-58 ◽

Cited By ~ 9

Author(s):

Shaeela Ayesha ◽

Muhammad Kashif Hanif ◽

Ramzan Talib

Keyword(s):

Comparative Study ◽

High Dimensional Data ◽

High Dimensional ◽

Reduction Techniques ◽

A generalization of t-SNE and UMAP to single-cell multimodal omics

Genome Biology ◽

10.1186/s13059-021-02356-5 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Van Hoan Do ◽

Stefan Canzar

Keyword(s):

Single Cell ◽

Cell Types ◽

High Dimensional ◽

Omics Data ◽

Relative Contribution ◽

Reduction Techniques ◽

Concise Representation ◽

Cellular Identity

AbstractEmerging single-cell technologies profile multiple types of molecules within individual cells. A fundamental step in the analysis of the produced high-dimensional data is their visualization using dimensionality reduction techniques such as t-SNE and UMAP. We introduce j-SNE and j-UMAP as their natural generalizations to the joint visualization of multimodal omics data. Our approach automatically learns the relative contribution of each modality to a concise representation of cellular identity that promotes discriminative features but suppresses noise. On eight datasets, j-SNE and j-UMAP produce unified embeddings that better agree with known cell types and that harmonize RNA and protein velocity landscapes.

Parallel Framework for Dimensionality Reduction of Large-Scale Datasets

Scientific Programming ◽

10.1155/2015/180214 ◽

2015 ◽

Vol 2015 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Sai Kiranmayee Samudrala ◽

Jaroslaw Zola ◽

Srinivas Aluru ◽

Baskar Ganapathysubramanian

Keyword(s):

Organic Solar Cells ◽

Large Scale ◽

Parallel Implementation ◽

High Dimensional Data ◽

Real Life ◽

Processing Parameters ◽

High Dimensional ◽

Morphology Evolution ◽

Reduction Techniques

Dimensionality reduction refers to a set of mathematical techniques used to reduce complexity of the original high-dimensional data, while preserving its selected properties. Improvements in simulation strategies and experimental data collection methods are resulting in a deluge of heterogeneous and high-dimensional data, which often makes dimensionality reduction the only viable way to gain qualitative and quantitative understanding of the data. However, existing dimensionality reduction software often does not scale to datasets arising in real-life applications, which may consist of thousands of points with millions of dimensions. In this paper, we propose a parallel framework for dimensionality reduction of large-scale data. We identify key components underlying the spectral dimensionality reduction techniques, and propose their efficient parallel implementation. We show that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000-core cluster, which is beyond the reach of currently available methods. To further demonstrate applicability of our framework we perform dimensionality reduction of 75,000 images representing morphology evolution during manufacturing of organic solar cells in order to identify how processing parameters affect morphology evolution.

Recent Dimensionality Reduction Techniques for Visualizing High-Dimensional Parkinson’s Disease Omics Data

10.1109/bigdata52589.2021.9671736 ◽

2021 ◽

Author(s):

Marios G. Krokidis ◽

Georgios Dimitrakopoulos ◽

Aristidis G. Vrahatis ◽

Themis P. Exarchos ◽

Panagiotis Vlamos

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

High Dimensional ◽

Omics Data ◽

Reduction Techniques ◽

A SURVEY ON THE CURES FOR THE CURSE OF DIMENSIONALITY IN BIG DATA

Asian Journal of Pharmaceutical and Clinical Research ◽

10.22159/ajpcr.2017.v10s1.19755 ◽

2017 ◽

Vol 10 (13) ◽

pp. 355 ◽

Cited By ~ 1

Author(s):

Reshma Remesh ◽

Pattabiraman. V

Keyword(s):

Input Data ◽

Principal Component ◽

Kernel Principal Component Analysis ◽

High Dimensional ◽

Data Sets ◽

Learning Approaches ◽

Data Set ◽

Reduction Techniques ◽

Dimensionality reduction techniques are used to reduce the complexity for analysis of high dimensional data sets. The raw input data set may have large dimensions and it might consume time and lead to wrong predictions if unnecessary data attributes are been considered for analysis. So using dimensionality reduction techniques one can reduce the dimensions of input data towards accurate prediction with less cost. In this paper the different machine learning approaches used for dimensionality reductions such as PCA, SVD, LDA, Kernel Principal Component Analysis and Artificial Neural Network have been studied.

Analysis of unsupervised dimensionality reduction techniques

Computer Science and Information Systems ◽

10.2298/csis0902217k ◽

2009 ◽

Vol 6 (2) ◽

pp. 217-227 ◽

Cited By ~ 29

Author(s):

Aswani Kumar

Keyword(s):

Approximation Error ◽

High Dimensional ◽

Retrieval Task ◽

Document Collections ◽

Noise Effects ◽

Reduction Techniques ◽

Text Images ◽

High Dimensional Datasets

Domains such as text, images etc contain large amounts of redundancies and ambiguities among the attributes which result in considerable noise effects (i.e. the data is high dimension). Retrieving the data from high dimensional datasets is a big challenge. Dimensionality reduction techniques have been a successful avenue for automatically extracting the latent concepts by removing the noise and reducing the complexity in processing the high dimensional data. In this paper we conduct a systematic study on comparing the unsupervised dimensionality reduction techniques for text retrieval task. We analyze these techniques from the view of complexity, approximation error and retrieval quality with experiments on four testing document collections.

Feature Selection

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch135 ◽

2011 ◽

pp. 878-882

Author(s):

Damien François

Keyword(s):

Feature Selection ◽

Time Series Prediction ◽

High Dimensional Data ◽

Principal Component ◽

Point Of View ◽

High Dimensional ◽

Feature Subset ◽

Selection Methods ◽

Reduction Techniques ◽

In many applications, like function approximation, pattern recognition, time series prediction, and data mining, one has to build a model relating some features describing the data to some response value. Often, the features that are relevant for building the model are not known in advance. Feature selection methods allow removing irrelevant and/or redundant features to only keep the feature subset that are most useful to build a prediction model. The model is simpler and easier to interpret, reducing the risks of overfitting, non-convergence, etc. By contrast with other dimensionality reduction techniques such as principal component analysis or more recent nonlinear projection techniques (Lee & Verleysen 2007), which build a new, smaller set of features, the features that are selected by feature selection methods preserve their initial meaning, potentially bringing extra information about the process being modeled (Guyon 2006). Recently, the advent of high-dimensional data has raised new challenges for feature selection methods, both from the algorithmic point of view and the conceptual point of view (Liu & Motoda 2007). The problem of feature selection is exponential in nature, and many approximate algorithms are cubic with respect to the initial number of features, which may be intractable when the dimensionality of the data is large. Furthermore, high-dimensional data are often highly redundant, and two distinct subsets of features may have very similar predictive power, which can make it difficult to identify the best subset.

A Review on Dimensionality Reduction Techniques

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001419500174 ◽

2019 ◽

Vol 33 (10) ◽

pp. 1950017 ◽

Cited By ~ 5

Author(s):

Xuan Huang ◽

Lei Wu ◽

Yinsong Ye

Keyword(s):

Pattern Recognition ◽

Implementation Process ◽

Recognition System ◽

Research Trend ◽

High Dimensional ◽

Open Problems ◽

Reduction Techniques ◽

Scope Of Application

High-dimensional data is ubiquitous in scientific research and industrial production fields. It brings a lot of information to people, at the same time, because of its sparse and redundancy, it also brings great challenges to data mining and pattern recognition. Dimensionality reduction can reduce redundancy and noise, reduce the complexity of learning algorithms, and improve the accuracy of classification, it is an important and key step in pattern recognition system. In this paper, we overview the classical techniques for dimensionality reduction and review their properties, and categorize these techniques according to their implementation process. We deduce each algorithm in detail and intuitively show their underlying mathematical principles. Thereby, the focus is to uncover the optimization process for each technique. We compare the characteristics and limitations of each technique and summarize the scope of application, discussing a number of open problems and a perspective of research trend in future.