Nearest Neighbors Search Algorithm for High Dimensional Data

Locally Linear Embedding (LLE) is honored as the first algorithm of manifold learning. Generally speaking, the relation between a data and its nearest neighbors is nonlinear and LLE only extracts its linear part. Therefore, local nonlinear embedding is an important direction of improvement to LLE. However, any attempt in this direction may lead to a significant increase in computational complexity. In this paper, a novel algorithm called local quasi-linear embedding (LQLE) is proposed. In our LQLE, each high-dimensional data vector is first expanded by using Kronecker product. The expanded vector contains not only the components of the original vector, but also the polynomials of its components. Then, each expanded vector of high dimensional data is linearly approximated with the expanded vectors of its nearest neighbors. In this way, the proposed LQLE achieves a certain degree of local nonlinearity and learns the data dimensionality reduction results under the principle of keeping local nonlinearity unchanged. More importantly, LQLE does not increase computation complexity by only replacing the data vectors with their Kronecker product expansions in the original LLE program. Experimental results between our proposed methods and four comparison algorithms on various datasets demonstrate the well performance of the proposed methods.

Download Full-text

Array-index: a plug&search K nearest neighbors method for high-dimensional data

Data & Knowledge Engineering ◽

10.1016/s0169-023x(04)00126-0 ◽

2005 ◽

Vol 52 (3) ◽

pp. 333-352 ◽

Cited By ~ 14

Author(s):

Z AGHBARI

Keyword(s):

High Dimensional Data ◽

Nearest Neighbors ◽

High Dimensional ◽

K Nearest Neighbors

Download Full-text

Nearest neighbors in high-dimensional data

Proceedings of the 26th Annual International Conference on Machine Learning - ICML '09 ◽

10.1145/1553374.1553485 ◽

2009 ◽

Cited By ~ 34

Author(s):

Miloš Radovanović ◽

Alexandros Nanopoulos ◽

Mirjana Ivanović

Keyword(s):

High Dimensional Data ◽

Nearest Neighbors ◽

High Dimensional

Download Full-text

Missing value imputation for gene expression data by tailored nearest neighbors

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2015-0098 ◽

2017 ◽

Vol 16 (2) ◽

Cited By ~ 5

Author(s):

Shahla Faisal ◽

Gerhard Tutz

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Missing Values ◽

Human Cancer ◽

High Dimensional Data ◽

Nearest Neighbors ◽

High Dimensional ◽

Expression Data ◽

Missing Value ◽

Missing Value Imputation

AbstractHigh dimensional data like gene expression and RNA-sequences often contain missing values. The subsequent analysis and results based on these incomplete data can suffer strongly from the presence of these missing values. Several approaches to imputation of missing values in gene expression data have been developed but the task is difficult due to the high dimensionality (number of genes) of the data. Here an imputation procedure is proposed that uses weighted nearest neighbors. Instead of using nearest neighbors defined by a distance that includes all genes the distance is computed for genes that are apt to contribute to the accuracy of imputed values. The method aims at avoiding the curse of dimensionality, which typically occurs if local methods as nearest neighbors are applied in high dimensional settings. The proposed weighted nearest neighbors algorithm is compared to existing missing value imputation techniques like mean imputation, KNNimpute and the recently proposed imputation by random forests. We use RNA-sequence and microarray data from studies on human cancer to compare the performance of the methods. The results from simulations as well as real studies show that the weighted distance procedure can successfully handle missing values for high dimensional data structures where the number of predictors is larger than the number of samples. The method typically outperforms the considered competitors.

Download Full-text