Grid-R-tree: a data structure for efficient neighborhood and nearest neighbor queries in data mining

Reverse nearest neighbor queries are defined as follows: Given an input point set P, and a query point q, find all the points p in P whose nearest point in P ∪ {q} \ {p} is q. We give a data structure to answer reverse nearest neighbor queries in fixed-dimensional Euclidean space. Our data structure uses O(n) space, its preprocessing time is O(n log n), and its query time is O( log n).

Download Full-text

SKIP QUADTREES: DYNAMIC DATA STRUCTURES FOR MULTIDIMENSIONAL POINT SETS

International Journal of Computational Geometry & Applications ◽

10.1142/s0218195908002568 ◽

2008 ◽

Vol 18 (01n02) ◽

pp. 131-160 ◽

Cited By ~ 14

Author(s):

DAVID EPPSTEIN ◽

MICHAEL T. GOODRICH ◽

JONATHAN Z. SUN

Keyword(s):

Data Structure ◽

Hierarchical Structure ◽

Data Structures ◽

Nearest Neighbor ◽

Point Location ◽

Dynamic Data ◽

Higher Dimensional ◽

Point Data ◽

Logarithmic Height ◽

Nearest Neighbor Queries

We present a new multi-dimensional data structure, which we call the skip quadtree (for point data in R2) or the skip octree (for point data in Rd, with constant d > 2). Our data structure combines the best features of two well-known data structures, in that it has the well-defined “box”-shaped regions of region quadtrees and the logarithmic-height search and update hierarchical structure of skip lists. Indeed, the bottom level of our structure is exactly a region quadtree (or octree for higher dimensional data). We describe efficient algorithms for inserting and deleting points in a skip quadtree, as well as fast methods for performing point location, approximate range, and approximate nearest neighbor queries.

Download Full-text

On Top-k Weighted Sum Aggregate Nearest and Farthest Neighbors in the L1 Plane

International Journal of Computational Geometry & Applications ◽

10.1142/s0218195919500055 ◽

2019 ◽

Vol 29 (03) ◽

pp. 189-218

Author(s):

Haitao Wang ◽

Wuzhou Zhang

Keyword(s):

Data Structure ◽

Nearest Neighbor ◽

Weighted Sum ◽

Aggregate Nearest Neighbor ◽

Dimensional Version ◽

Text Query ◽

The One ◽

Nearest Neighbor Queries ◽

Size Data ◽

Better Than

In this paper, we study top-[Formula: see text] aggregate (or group) nearest neighbor queries using the weighted Sum operator under the [Formula: see text] metric in the plane. Given a set [Formula: see text] of [Formula: see text] points, for any query consisting of a set [Formula: see text] of [Formula: see text] weighted points and an integer [Formula: see text], [Formula: see text], the top-[Formula: see text] aggregate nearest neighbor query asks for the [Formula: see text] points of [Formula: see text] whose aggregate distances to [Formula: see text] are the smallest, where the aggregate distance of each point [Formula: see text] of [Formula: see text] to [Formula: see text] is the sum of the weighted distances from [Formula: see text] to all points of [Formula: see text]. We build an [Formula: see text]-size data structure in [Formula: see text] time, such that each top-[Formula: see text] query can be answered in [Formula: see text] time. We also obtain other results with trade-off between preprocessing and query. Even for the special case where [Formula: see text], our results are better than the previously best work, which requires [Formula: see text] preprocessing time, [Formula: see text] space, and [Formula: see text] query time. In addition, for the one-dimensional version of this problem, our approach can build an [Formula: see text]-size data structure in [Formula: see text] time that can support [Formula: see text] time queries. Further, we extend our techniques to answer the top-[Formula: see text] aggregate farthest neighbor queries, with the same bounds.

Download Full-text

A dynamic data structure for 3-d convex hulls and 2-d nearest neighbor queries

Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm - SODA '06 ◽

10.1145/1109557.1109689 ◽

2006 ◽

Cited By ~ 10

Author(s):

Timothy M. Chan

Keyword(s):

Data Structure ◽

Nearest Neighbor ◽

Convex Hulls ◽

Dynamic Data ◽

Dynamic Data Structure ◽

Nearest Neighbor Queries

Download Full-text

Perancangan Aplikasi Prediksi Kelulusan Tepat Waktu Bagi Mahasiswa Baru Dengan Teknik Data Mining (Studi Kasus: Data Akademik Mahasiswa STMIK Dipanegara Makassar)

Creative Information Technology Journal ◽

10.24076/citec.2014v1i4.27 ◽

2015 ◽

Vol 1 (4) ◽

pp. 270

Author(s):

Muhammad Syukri Mustafa ◽

I. Wayan Simpen

Keyword(s):

Data Mining ◽

Nearest Neighbor ◽

Test Results ◽

K Nearest Neighbor ◽

Accuracy Rate ◽

Sample Data ◽

New Students ◽

K Nearest Neighbor Algorithm ◽

Using Data ◽

Existing Data

Penelitian ini dimaksudkan untuk melakukan prediksi terhadap kemungkian mahasiswa baru dapat menyelesaikan studi tepat waktu dengan menggunakan analisis data mining untuk menggali tumpukan histori data dengan menggunakan algoritma K-Nearest Neighbor (KNN). Aplikasi yang dihasilkan pada penelitian ini akan menggunakan berbagai atribut yang klasifikasikan dalam suatu data mining antara lain nilai ujian nasional (UN), asal sekolah/ daerah, jenis kelamin, pekerjaan dan penghasilan orang tua, jumlah bersaudara, dan lain-lain sehingga dengan menerapkan analysis KNN dapat dilakukan suatu prediksi berdasarkan kedekatan histori data yang ada dengan data yang baru, apakah mahasiswa tersebut berpeluang untuk menyelesaikan studi tepat waktu atau tidak. Dari hasil pengujian dengan menerapkan algoritma KNN dan menggunakan data sampel alumni tahun wisuda 2004 s.d. 2010 untuk kasus lama dan data alumni tahun wisuda 2011 untuk kasus baru diperoleh tingkat akurasi sebesar 83,36%.This research is intended to predict the possibility of new students time to complete studies using data mining analysis to explore the history stack data using K-Nearest Neighbor algorithm (KNN). Applications generated in this study will use a variety of attributes in a data mining classified among other Ujian Nasional scores (UN), the origin of the school / area, gender, occupation and income of parents, number of siblings, and others that by applying the analysis KNN can do a prediction based on historical proximity of existing data with new data, whether the student is likely to complete the study on time or not. From the test results by applying the KNN algorithm and uses sample data alumnus graduation year 2004 s.d 2010 for the case of a long and alumni data graduation year 2011 for new cases obtained accuracy rate of 83.36%.

Download Full-text

Continuous K Nearest Neighbor Queries of Moving Objects in Road Networks

Chinese Journal of Computers ◽

10.3724/sp.j.1016.2010.01396 ◽

2010 ◽

Vol 33 (8) ◽

pp. 1396-1404 ◽

Cited By ~ 1

Author(s):

Liang ZHAO ◽

Luo CHEN ◽

Ning JING ◽

Wei LIAO

Keyword(s):

Moving Objects ◽

Nearest Neighbor ◽

Road Networks ◽

Nearest Neighbor Queries

Download Full-text

Aggregate keyword nearest neighbor queries on road networks

GeoInformatica ◽

10.1007/s10707-017-0315-0 ◽

2017 ◽

Vol 22 (2) ◽

pp. 237-268 ◽

Cited By ~ 6

Author(s):

Pengfei Zhang ◽

Huaizhong Lin ◽

Yunjun Gao ◽

Dongming Lu

Keyword(s):

Nearest Neighbor ◽

Road Networks ◽

Nearest Neighbor Queries

Download Full-text

Data Mining Approach to Analyze COVID-19 Clinical Dataset

10.53350/pjmhs211561812 ◽

2021 ◽

Vol 15 (6) ◽

pp. 1812-1819

Author(s):

Azita Yazdani ◽

Ramin Ravangard ◽

Roxana Sharifian

Keyword(s):

Artificial Intelligence ◽

Data Mining ◽

Support Vector Machine ◽

Nearest Neighbor ◽

Clinical Signs ◽

Study Data ◽

Mining Machine ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Mining Approach

The new coronavirus has been spreading since the beginning of 2020 and many efforts have been made to develop vaccines to help patients recover. It is now clear that the world needs a rapid solution to curb the spread of COVID-19 worldwide with non-clinical approaches such as data mining, enhanced intelligence, and other artificial intelligence techniques. These approaches can be effective in reducing the burden on the health care system to provide the best possible way to diagnose and predict the COVID-19 epidemic. In this study, data mining models for early detection of Covid-19 in patients were developed using the epidemiological dataset of patients and individuals suspected of having Covid-19 in Iran. C4.5, support vector machine, Naive Bayes, logistic regression, Random Forest, and k-nearest neighbor algorithm were used directly on the dataset using Rapid miner to develop the models. By receiving clinical signs, this model diagnosis the risk of contracting the COVID-19 virus. Examination of the models in this study has shown that the support vector machine with 93.41% accuracy is more efficient in the diagnosis of patients with COVID-19 pandemic, which is the best model among other developed models. Keywords: COVID-19, Data mining, Machine Learning, Artificial Intelligence, Classification

Download Full-text