On GPU-Based Nearest Neighbor Queries for Large-Scale Photometric Catalogs in Astronomy

Semi-Convex Hull Tree: Fast Nearest Neighbor Queries for Large Scale Data on GPUs

2018 IEEE International Conference on Data Mining (ICDM) ◽

10.1109/icdm.2018.00110 ◽

2018 ◽

Cited By ~ 5

Author(s):

Yewang Chen ◽

Lida Zhou ◽

Nizar Bouguila ◽

Bineng Zhong ◽

Fei Wu ◽

...

Keyword(s):

Convex Hull ◽

Large Scale ◽

Nearest Neighbor ◽

Large Scale Data ◽

Nearest Neighbor Queries ◽

Scale Data

Download Full-text

Continuous k Nearest Neighbor Queries over Large-Scale Spatial–Textual Data Streams

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9110694 ◽

2020 ◽

Vol 9 (11) ◽

pp. 694

Author(s):

Rong Yang ◽

Baoning Niu

Keyword(s):

Data Streams ◽

Large Scale ◽

Nearest Neighbor ◽

Cost Model ◽

Inverted Index ◽

K Nearest Neighbor ◽

Search Range ◽

Textual Data ◽

Block Based ◽

Nearest Neighbor Queries

Continuous k nearest neighbor queries over spatial–textual data streams (abbreviated as CkQST) are the core operations of numerous location-based publish/subscribe systems. Such a system is usually subscribed with millions of CkQST and evaluated simultaneously whenever new objects arrive and old objects expire. To efficiently evaluate CkQST, we extend a quadtree with an ordered, inverted index as the spatial–textual index for subscribed queries to match the incoming objects, and exploit it with three key techniques. (1) A memory-based cost model is proposed to find the optimal quadtree nodes covering the spatial search range of CkQST, which minimize the cost for searching and updating the index. (2) An adaptive block-based ordered, inverted index is proposed to organize the keywords of CkQST, which adaptively arranges queries in spatial nodes and allows the objects containing common keywords to be processed in a batch with a shared scan, and hence a significant performance gain. (3) A cost-based k-skyband technique is proposed to judiciously determine an optimal search range for CkQST according to the workload of objects, to reduce the re-evaluation cost due to the expiration of objects. The experiments on real-world and synthetic datasets demonstrate that our proposed techniques can efficiently evaluate CkQST.

Download Full-text

Continuous K Nearest Neighbor Queries of Moving Objects in Road Networks

Chinese Journal of Computers ◽

10.3724/sp.j.1016.2010.01396 ◽

2010 ◽

Vol 33 (8) ◽

pp. 1396-1404 ◽

Cited By ~ 1

Author(s):

Liang ZHAO ◽

Luo CHEN ◽

Ning JING ◽

Wei LIAO

Keyword(s):

Moving Objects ◽

Nearest Neighbor ◽

Road Networks ◽

Nearest Neighbor Queries

Download Full-text

A dynamic data structure for 3-D convex hulls and 2-D nearest neighbor queries

Journal of the ACM ◽

10.1145/1706591.1706596 ◽

2010 ◽

Vol 57 (3) ◽

pp. 1-15 ◽

Cited By ~ 15

Author(s):

Timothy M. Chan

Keyword(s):

Data Structure ◽

Nearest Neighbor ◽

Convex Hulls ◽

Dynamic Data ◽

Dynamic Data Structure ◽

Nearest Neighbor Queries

Download Full-text

Maximum Variance Hashing via Column Generation

Mathematical Problems in Engineering ◽

10.1155/2013/379718 ◽

2013 ◽

Vol 2013 ◽

pp. 1-10

Author(s):

Lei Luo ◽

Chao Zhang ◽

Yongrui Qin ◽

Chunyuan Zhang

Keyword(s):

Column Generation ◽

Large Scale ◽

Web Search ◽

Nearest Neighbor ◽

Computational Cost ◽

Multimedia Retrieval ◽

Training Data ◽

Nonlinear Dimensionality Reduction ◽

Maximum Variance ◽

Data Volume

With the explosive growth of the data volume in modern applications such as web search and multimedia retrieval, hashing is becoming increasingly important for efficient nearest neighbor (similar item) search. Recently, a number of data-dependent methods have been developed, reflecting the great potential of learning for hashing. Inspired by the classic nonlinear dimensionality reduction algorithm—maximum variance unfolding, we propose a novel unsupervised hashing method, named maximum variance hashing, in this work. The idea is to maximize the total variance of the hash codes while preserving the local structure of the training data. To solve the derived optimization problem, we propose a column generation algorithm, which directly learns the binary-valued hash functions. We then extend it using anchor graphs to reduce the computational cost. Experiments on large-scale image datasets demonstrate that the proposed method outperforms state-of-the-art hashing methods in many cases.

Download Full-text

Aggregate keyword nearest neighbor queries on road networks

GeoInformatica ◽

10.1007/s10707-017-0315-0 ◽

2017 ◽

Vol 22 (2) ◽

pp. 237-268 ◽

Cited By ~ 6

Author(s):

Pengfei Zhang ◽

Huaizhong Lin ◽

Yunjun Gao ◽

Dongming Lu

Keyword(s):

Nearest Neighbor ◽

Road Networks ◽

Nearest Neighbor Queries

Download Full-text

An Efficient Incremental Nearest Neighbor Algorithm for Processing k-Nearest Neighbor Queries with Visal and Semantic Predicates in Multimedia Information Retrieval System

Information Retrieval Technology - Lecture Notes in Computer Science ◽

10.1007/11562382_63 ◽

2005 ◽

pp. 653-658 ◽

Cited By ~ 1

Author(s):

Dong-Ho Lee ◽

Dong-Joo Park

Keyword(s):

Information Retrieval ◽

Multimedia Information ◽

Retrieval System ◽

Nearest Neighbor ◽

Information Retrieval System ◽

Multimedia Information Retrieval ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

Nearest Neighbor Queries

Download Full-text

Fast approximate search algorithm for nearest neighbor queries in high dimensions

Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337) ◽

10.1109/icde.1999.754931 ◽

1999 ◽

Cited By ~ 7

Author(s):

S. Pramanik ◽

J. Li

Keyword(s):

Nearest Neighbor ◽

Search Algorithm ◽

High Dimensions ◽

Approximate Search ◽

Nearest Neighbor Queries

Download Full-text

Continuous obstructed nearest neighbor queries in spatial databases

Proceedings of the 35th SIGMOD international conference on Management of data - SIGMOD '09 ◽

10.1145/1559845.1559906 ◽

2009 ◽

Cited By ~ 32

Author(s):

Yunjun Gao ◽

Baihua Zheng

Keyword(s):

Spatial Databases ◽

Nearest Neighbor ◽

Nearest Neighbor Queries

Download Full-text

Predicting ionizing radiation exposure using biochemically-inspired genomic machine learning

F1000Research ◽

10.12688/f1000research.14048.1 ◽

2018 ◽

Vol 7 ◽

pp. 233

Author(s):

Jonathan Z.L. Zhao ◽

Eliseos J. Mucaki ◽

Peter K. Rogan

Keyword(s):

Machine Learning ◽

Ionizing Radiation ◽

Radiation Exposure ◽

Large Scale ◽

Nearest Neighbor ◽

Error Rates ◽

Support Vector ◽

Dose Estimation ◽

Gene Signatures ◽

Ionizing Radiation Exposure

Background: Gene signatures derived from transcriptomic data using machine learning methods have shown promise for biodosimetry testing. These signatures may not be sufficiently robust for large scale testing, as their performance has not been adequately validated on external, independent datasets. The present study develops human and murine signatures with biochemically-inspired machine learning that are strictly validated using k-fold and traditional approaches. Methods: Gene Expression Omnibus (GEO) datasets of exposed human and murine lymphocytes were preprocessed via nearest neighbor imputation and expression of genes implicated in the literature to be responsive to radiation exposure (n=998) were then ranked by Minimum Redundancy Maximum Relevance (mRMR). Optimal signatures were derived by backward, complete, and forward sequential feature selection using Support Vector Machines (SVM), and validated using k-fold or traditional validation on independent datasets. Results: The best human signatures we derived exhibit k-fold validation accuracies of up to 98% (DDB2, PRKDC, TPP2, PTPRE, and GADD45A) when validated over 209 samples and traditional validation accuracies of up to 92% (DDB2, CD8A, TALDO1, PCNA, EIF4G2, LCN2, CDKN1A, PRKCH, ENO1, and PPM1D) when validated over 85 samples. Some human signatures are specific enough to differentiate between chemotherapy and radiotherapy. Certain multi-class murine signatures have sufficient granularity in dose estimation to inform eligibility for cytokine therapy (assuming these signatures could be translated to humans). We compiled a list of the most frequently appearing genes in the top 20 human and mouse signatures. More frequently appearing genes among an ensemble of signatures may indicate greater impact of these genes on the performance of individual signatures. Several genes in the signatures we derived are present in previously proposed signatures. Conclusions: Gene signatures for ionizing radiation exposure derived by machine learning have low error rates in externally validated, independent datasets, and exhibit high specificity and granularity for dose estimation.

Download Full-text