Efficient Group K Nearest-Neighbor Spatial Query Processing in Apache Spark

Aiming at the problem of spatial query processing in distributed computing systems, the design and implementation of new distributed spatial query algorithms is a current challenge. Apache Spark is a memory-based framework suitable for real-time and batch processing. Spark-based systems allow users to work on distributed in-memory data, without worrying about the data distribution mechanism and fault-tolerance. Given two datasets of points (called Query and Training), the group K nearest-neighbor (GKNN) query retrieves (K) points of the Training with the smallest sum of distances to every point of the Query. This spatial query has been actively studied in centralized environments and several performance improving techniques and pruning heuristics have been also proposed, while, a distributed algorithm in Apache Hadoop was recently proposed by our team. Since, in general, Apache Hadoop exhibits lower performance than Spark, in this paper, we present the first distributed GKNN query algorithm in Apache Spark and compare it against the one in Apache Hadoop. This algorithm incorporates programming features and facilities that are specific to Apache Spark. Moreover, techniques that improve performance and are applicable in Apache Spark are also incorporated. The results of an extensive set of experiments with real-world spatial datasets are presented, demonstrating that our Apache Spark GKNN solution, with its improvements, is efficient and a clear winner in comparison to processing this query in Apache Hadoop.

Download Full-text

k-Nearest Neighbor Search based on Node Density in MANETs

Mobile Information Systems ◽

10.1155/2014/158737 ◽

2014 ◽

Vol 10 (4) ◽

pp. 385-405 ◽

Cited By ~ 3

Author(s):

Yuka Komai ◽

Yuya Sasaki ◽

Takahiro Hara ◽

Shojiro Nishio

Keyword(s):

Query Processing ◽

Nearest Neighbor ◽

Nearest Neighbor Search ◽

K Nearest Neighbor ◽

Node Density ◽

Neighbor Search ◽

Query Result ◽

Knn Query ◽

The One ◽

K Nearest Neighbor Search

In a kNN query processing method, it is important to appropriately estimate the range that includes kNNs. While the range could be estimated based on the node density in the entire network, it is not always appropriate because the density of nodes in the network is not uniform. In this paper, we propose two kNN query processing methods in MANETs where the density of nodes is ununiform; the One-Hop (OH) method and the Query Log (QL) method. In the OH method, the nearest node from the point specified by the query acquires its neighbors' location and then determines the size of a circle region (the estimated kNN circle) which includes kNNs with high probability. In the QL method, a node which relays a reply of a kNN query stores the information on the query result for future queries.

Download Full-text

Algorithms for processing closest-pairs and nearest-neighbors queries on big spatial data in parallel and distributed frameworks

10.12681/eadd/49345 ◽

2021 ◽

Author(s):

Παναγιώτης Μουτάφης

Keyword(s):

Spatial Data ◽

Nearest Neighbor ◽

Nearest Neighbors ◽

Apache Spark ◽

K Nearest Neighbor ◽

Apache Hadoop ◽

Nearest Neighbor Query ◽

Join Queries ◽

Nearest Neighbor Queries

Τα Χωρικά Δεδομένα αναφέρονται σε δεδομένα που σχετίζονται με τη θέση ή τη γεωγραφική τοποθεσία αντικειμένων και στοιχείων υπεράνω, υπό ή επί της επιφάνειας της γης. Τέτοια δεδομένα, συχνά ονομάζονται γεωχωρικά δεδομένα, εμφανίζονται σε εφαρμογές σχετικές με τη γεωγραφία. Καθημερινά, πολυπληθείς εφαρμογές και πηγές δημιουργούν εκρηκτικούς όγκους δεδομένων με χωρικά χαρακτηριστικά ή με σχετική γεωχωρική πληροφορία. Αισθητήρες, εφαρμογές σε κινητά τηλέφωνα, αυτοκίνητα, συσκευές GPS, μη επανδρωμένα εναέρια οχήματα (UAV), πλοία, αεροπλάνα, τηλεσκόπια, ιατρικές συσκευές, διαδικτυακές εφαρμογές, κοινωνικά δίκτυα και συσκευές διαδικτύου των αντικειμένων (IoT) αποτελούν παραδείγματα τέτοιων εφαρμογών και πηγών.Η επεξεργασία των χωρικών δεδομένων είναι δυσκολότερη σε σχέση με τα δεδομένα των παραδοσιακών εφαρμογών (π.χ. ονόματα, αριθμοί, ημερομηνίες, κλπ.) και έχουν υπολογιστικές υψηλότερες απαιτήσεις. Επιπλέον, ο μεγάλος όγκος των χωρικών δεδομένων στις σύγχρονες εφαρμογές απαιτεί τη χρήση συστημάτων πολλαπλών κόμβων για την επεξεργασία τους. Μεταξύ αυτών, τα παράλληλα και κατανεμημένα συστήματα χωρίς διαμοίραση (shared-nothing) που βασίζονται στο μοντέλο MapReduce και/ή στα Ανθεκτικά Κατανεμημένα Σύνολα Δεδομένων (Resilient Distributed Datasets RDDs) απαντώνται συχνά στις ερευνητικές προσπάθειες.Η αποτελεσματική διαχείριση των μεγάλων χωρικών δεδομένων απαιτεί αποτελεσματική επεξεργασία των υπολογιστικά απαιτητικών χωρικών ερωτημάτων. Τα ακόλουθα χωρικά ερωτήματα εφαρμόζονται σε δυο σύνολα δεδομένων και συνδυάζουν ερωτήματα ζεύξης (join queries), καθώς όλοι οι δυνατοί συνδυασμοί που σχηματίζονται από αυτά τα σύνολα δεδομένων είναι υποψήφιοι για το τελικό αποτέλεσμα, και ερωτήματα εγγυτέρων γειτόνων (nearest neighbor queries), καθώς το τελικό αποτέλεσμα διαμορφώνεται σύμφωνα με ένα κριτήριο γειτονικότητας.1. Το Ερώτημα των K Εγγυτέρων Ζευγών (K Closest-Pairs Query, KCPQ): για κάθε πιθανό ζεύγος στοιχείων από τα δυο σύνολα δεδομένων, ανακαλύπτει τα K ζεύγη μετις μικρότερες αποστάσεις μεταξύ των στοιχείων τους.2. Το Ερώτημα Ζεύξης Απόστασης (Distance Join Query, DJQ): είναι ένα είδος ερωτήματος εγγυτέρων ζευγών το οποίο, για κάθε πιθανό ζεύγος στοιχείων από τα δυοσύνολα δεδομένων, επιστρέφει τα ζεύγη με αποστάσεις μικρότερες από μια δοσμένη απόσταση.3. Το Ερώτημα Όλων των K Εγγυτέρων Γειτόνων (All K Nearest Neighbor Query, AKNNQ), που ονομάζεται κσι Ζεύξη K Εγγυτέρων Γειτόνων (K NearestNeighbor Join): επιστρέφει τους K εγγύτερους γείτονες στο ένα σύνολο για κάθε στοιχείο του άλλου συνόλου.4. Το Ερώτημα Ομάδας K Εγγυτέρων Γειτόνων (Group (K) Nearest-Neighbor(s) Query, GKNNQ): επιστρέφει K στοιχεία από το ένα σύνολο με το μικρότερο άθροισμα αποστάσεων προς κάθε στοιχείο του άλλου συνόλου.Παρόλο που οι αφελείς αλγόριθμοι για τα παραπάνω ερωτήματα είναι απλοί, πάσχουν από υπερβολικό κόστος υπολογισμού, αποθήκευσης ενδιάμεσου αποτελέσματος και δικτυακής επικοινωνίας και χαμηλής εξισορρόπισης φορτίου μεταξύ των υπολογιστικών κόμβων, ιδιαίτερα σε ένα κατανεμημένο περιβάλλον. Σε αυτή τη διατριβή, επικεντρωνόμαστε σε σημειακά δεδομένα και χρησιμοποιούμε τεχνικές για γρηγορότερους και λιγότερους υπολογισμούς, περικοπή των μη αναγκαίων υπολογισμών, εκμετάλλευση της τοπικότητας και της κατανομής των δεδομένων, καλύτερης εξισορρόπησης του φορτίου μεταξύ των υπολογιστικών κόμβων και βελτιστοποίησης της ποσότητας των δεδομένων που διακινούνται μεταξύ των κόμβων. Με αυτά τα εφόδια,1. αναπτύσσουμε τους πρώτους KCPQ και DJQ αλγορίθμους για το Apache Spark, ένα δημοφιλές σύστημα παράλληλης και κατανεμημένης επεξεργασίας το οποίο έχει προσελκύσει την προσοχή εξαιτίας των δυνατοτήτων υπολογισμού εντός μνήμης,2. αναπτύσσουμε AKNNQ αλγορίθμους για το Apache Hadoop, το πρώτο ευρέως αποδεκτό σύστημα που υλοποιεί το μοντέλο MapReduce,3. αναπτύσσουμε τους πρώτους GKNNQ αλγορίθμους για το Apache Hadoop και το SpatialHadoop, μια επέκταση ειδικά σχεδιασμένη να διαχειρίζεται μεγάλα σύνολα χωρικώνδεδομένων,4. για κάθε ένα από τα παραπάνω ερωτήματα, διενεργούμε εκτεταμένα πειράματα για να εξάγουμε τις καλύτερες ρυθμίσεις των παραμέτρων για κάθε αλγόριθμο και νασυγκρίνουμε την αποτελεσματικότητα των διαφόρων εναλλακτικών αλγορίθμων που αναπτύξαμε και εκείνων της βιβλιογραφίας (για τις περιπτώσεις εκείνες όπου τέτοιοιαλγόριθμοι προϋπήρχαν).

Download Full-text

Spatial query processing for location based application on Hbase

2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA)( ◽

10.1109/icbda.2017.8078787 ◽

2017 ◽

Cited By ~ 2

Author(s):

Shouwu He ◽

Longxian Chu ◽

Xiaoying Li

Keyword(s):

Query Processing ◽

Spatial Query ◽

Spatial Query Processing

Download Full-text

Spatial query processing in geographic database systems

Proceedings of Twentieth Euromicro Conference. System Architecture and Integration ◽

10.1109/eurmic.1994.390406 ◽

2002 ◽

Author(s):

Byungyeon Hwang ◽

Taekyung Byun ◽

Songchun Moon

Keyword(s):

Query Processing ◽

Database Systems ◽

Spatial Query ◽

Spatial Query Processing

Download Full-text

Spatial Query Processing on Distributed Databases

Advances in Intelligent Systems and Applications - Volume 1 - Smart Innovation, Systems and Technologies ◽

10.1007/978-3-642-35452-6_27 ◽

2013 ◽

pp. 251-260 ◽

Cited By ~ 3

Author(s):

Jiun-Wen Bai ◽

Jun-Zhe Wang ◽

Jiun-Long Huang

Keyword(s):

Query Processing ◽

Distributed Databases ◽

Spatial Query ◽

Spatial Query Processing

Download Full-text

A Novel Algorithm for Sentiment Analysis of Online Movie Reviews

Advances in Business Information Systems and Analytics - Social Network Analytics for Contemporary Business Organizations ◽

10.4018/978-1-5225-5097-6.ch007 ◽

2018 ◽

pp. 106-140

Author(s):

Bisma Shah ◽

Farheen Siddiqui

Keyword(s):

Sentiment Analysis ◽

Nearest Neighbor ◽

Supervised Machine Learning ◽

Learning Approaches ◽

World Knowledge ◽

K Nearest Neighbor ◽

Customer Feedback ◽

Novel Approach ◽

The One ◽

Novel Algorithm

Others' opinions can be decisive while choosing among various options, especially when those choices involve worthy resources like spending time and money buying products or services. Customers relying on their peers' past reviews on e-commerce websites or social media have drawn a considerable interest to sentiment analysis due to realization of its commercial and business benefits. Sentiment analysis can be exercised on movie reviews, blogs, customer feedback, etc. This chapter presents a novel approach to perform sentiment analysis of movie reviews given by users on different websites. Also, challenges like presence of thwarted words, world knowledge, and subjectivity detection in sentiments are addressed in this chapter. The results are validated by using two supervised machine learning approaches, k-nearest neighbor and naive Bayes, both on method of sentiment analysis without addressing aforementioned challenges and on proposed method of sentiment analysis with all challenges addressed. Empirical results show that proposed method outperformed the one that left challenges unaddressed.

Download Full-text

Spatial Data on the Move

Handbook of Research on Mobile Multimedia ◽

10.4018/978-1-59140-866-6.ch008 ◽

2011 ◽

pp. 103-118

Author(s):

Wee Hyong Tok ◽

Stéphane Bresan ◽

Panagiotis Kalnis ◽

Baihua Zhengl

Keyword(s):

Query Processing ◽

Spatial Data ◽

Location Based Services ◽

Spatial Query ◽

Spatial Query Processing ◽

Wide Availability ◽

Research Problems ◽

Processing Techniques ◽

Exciting Area ◽

State Of Art

The pervasiveness of mobile computing devices and wide-availability of wireless networking infrastructure have empowered users with applications that provides location-based services as well as the ability to pose queries to remote servers. This necessitates the need for adaptive, robust, and efficient techniques for processing the queries. In this chapter, we identify the issues and challenges of processing spatial data on the move. Next, we present insights on state-of-art spatial query processing techniques used in these dynamic, mobile environments. We conclude with several potential open research problems in this exciting area.

Download Full-text