Grid-R-tree: a data structure for efficient neighborhood and nearest neighbor queries in data mining

2020 ◽  
Vol 10 (1) ◽  
pp. 25-47
Author(s):  
Poonam Goyal ◽  
Jagat Sesh Challa ◽  
Dhruv Kumar ◽  
Anuvind Bhat ◽  
Sundar Balasubramaniam ◽  
...  
2011 ◽  
Vol 21 (02) ◽  
pp. 179-188 ◽  
Author(s):  
OTFRIED CHEONG ◽  
ANTOINE VIGNERON ◽  
JUYOUNG YON

Reverse nearest neighbor queries are defined as follows: Given an input point set P, and a query point q, find all the points p in P whose nearest point in P ∪ {q} \ {p} is q. We give a data structure to answer reverse nearest neighbor queries in fixed-dimensional Euclidean space. Our data structure uses O(n) space, its preprocessing time is O(n log n), and its query time is O( log n).


2008 ◽  
Vol 18 (01n02) ◽  
pp. 131-160 ◽  
Author(s):  
DAVID EPPSTEIN ◽  
MICHAEL T. GOODRICH ◽  
JONATHAN Z. SUN

We present a new multi-dimensional data structure, which we call the skip quadtree (for point data in R2) or the skip octree (for point data in Rd, with constant d > 2). Our data structure combines the best features of two well-known data structures, in that it has the well-defined “box”-shaped regions of region quadtrees and the logarithmic-height search and update hierarchical structure of skip lists. Indeed, the bottom level of our structure is exactly a region quadtree (or octree for higher dimensional data). We describe efficient algorithms for inserting and deleting points in a skip quadtree, as well as fast methods for performing point location, approximate range, and approximate nearest neighbor queries.


2019 ◽  
Vol 29 (03) ◽  
pp. 189-218
Author(s):  
Haitao Wang ◽  
Wuzhou Zhang

In this paper, we study top-[Formula: see text] aggregate (or group) nearest neighbor queries using the weighted Sum operator under the [Formula: see text] metric in the plane. Given a set [Formula: see text] of [Formula: see text] points, for any query consisting of a set [Formula: see text] of [Formula: see text] weighted points and an integer [Formula: see text], [Formula: see text], the top-[Formula: see text] aggregate nearest neighbor query asks for the [Formula: see text] points of [Formula: see text] whose aggregate distances to [Formula: see text] are the smallest, where the aggregate distance of each point [Formula: see text] of [Formula: see text] to [Formula: see text] is the sum of the weighted distances from [Formula: see text] to all points of [Formula: see text]. We build an [Formula: see text]-size data structure in [Formula: see text] time, such that each top-[Formula: see text] query can be answered in [Formula: see text] time. We also obtain other results with trade-off between preprocessing and query. Even for the special case where [Formula: see text], our results are better than the previously best work, which requires [Formula: see text] preprocessing time, [Formula: see text] space, and [Formula: see text] query time. In addition, for the one-dimensional version of this problem, our approach can build an [Formula: see text]-size data structure in [Formula: see text] time that can support [Formula: see text] time queries. Further, we extend our techniques to answer the top-[Formula: see text] aggregate farthest neighbor queries, with the same bounds.


2015 ◽  
Vol 1 (4) ◽  
pp. 270
Author(s):  
Muhammad Syukri Mustafa ◽  
I. Wayan Simpen

Penelitian ini dimaksudkan untuk melakukan prediksi terhadap kemungkian mahasiswa baru dapat menyelesaikan studi tepat waktu dengan menggunakan analisis data mining untuk menggali tumpukan histori data dengan menggunakan algoritma K-Nearest Neighbor (KNN). Aplikasi yang dihasilkan pada penelitian ini akan menggunakan berbagai atribut yang klasifikasikan dalam suatu data mining antara lain nilai ujian nasional (UN), asal sekolah/ daerah, jenis kelamin, pekerjaan dan penghasilan orang tua, jumlah bersaudara, dan lain-lain sehingga dengan menerapkan analysis KNN dapat dilakukan suatu prediksi berdasarkan kedekatan histori data yang ada dengan data yang baru, apakah mahasiswa tersebut berpeluang untuk menyelesaikan studi tepat waktu atau tidak. Dari hasil pengujian dengan menerapkan algoritma KNN dan menggunakan data sampel alumni tahun wisuda 2004 s.d. 2010 untuk kasus lama dan data alumni tahun wisuda 2011 untuk kasus baru diperoleh tingkat akurasi sebesar 83,36%.This research is intended to predict the possibility of new students time to complete studies using data mining analysis to explore the history stack data using K-Nearest Neighbor algorithm (KNN). Applications generated in this study will use a variety of attributes in a data mining classified among other Ujian Nasional scores (UN), the origin of the school / area, gender, occupation and income of parents, number of siblings, and others that by applying the analysis KNN can do a prediction based on historical proximity of existing data with new data, whether the student is likely to complete the study on time or not. From the test results by applying the KNN algorithm and uses sample data alumnus graduation year 2004 s.d 2010 for the case of a long and alumni data graduation year 2011 for new cases obtained accuracy rate of 83.36%.


2010 ◽  
Vol 33 (8) ◽  
pp. 1396-1404 ◽  
Author(s):  
Liang ZHAO ◽  
Luo CHEN ◽  
Ning JING ◽  
Wei LIAO

2017 ◽  
Vol 22 (2) ◽  
pp. 237-268 ◽  
Author(s):  
Pengfei Zhang ◽  
Huaizhong Lin ◽  
Yunjun Gao ◽  
Dongming Lu

2021 ◽  
Vol 15 (6) ◽  
pp. 1812-1819
Author(s):  
Azita Yazdani ◽  
Ramin Ravangard ◽  
Roxana Sharifian

The new coronavirus has been spreading since the beginning of 2020 and many efforts have been made to develop vaccines to help patients recover. It is now clear that the world needs a rapid solution to curb the spread of COVID-19 worldwide with non-clinical approaches such as data mining, enhanced intelligence, and other artificial intelligence techniques. These approaches can be effective in reducing the burden on the health care system to provide the best possible way to diagnose and predict the COVID-19 epidemic. In this study, data mining models for early detection of Covid-19 in patients were developed using the epidemiological dataset of patients and individuals suspected of having Covid-19 in Iran. C4.5, support vector machine, Naive Bayes, logistic regression, Random Forest, and k-nearest neighbor algorithm were used directly on the dataset using Rapid miner to develop the models. By receiving clinical signs, this model diagnosis the risk of contracting the COVID-19 virus. Examination of the models in this study has shown that the support vector machine with 93.41% accuracy is more efficient in the diagnosis of patients with COVID-19 pandemic, which is the best model among other developed models. Keywords: COVID-19, Data mining, Machine Learning, Artificial Intelligence, Classification


Sign in / Sign up

Export Citation Format

Share Document