Distributed Approach to Continuous Queries with kNN Join Processing in Spatial Telemetric Data Warehouse

Author(s):  
Marcin Gorawski ◽  
Wojciech Gebczyk

This chapter describes realization of distributed approach to continuous queries with kNN join processing in the spatial telemetric data warehouse. Due to dispersion of the developed system, new structural members were distinguished: the mobile object simulator, the kNN join processing service, and the query manager. Distributed tasks communicate using JAVA RMI methods. The kNN queries (k Nearest Neighbour) joins every point from one dataset with its k nearest neighbours in the other dataset. In our approach we use the Gorder method, which is a block nested loop join algorithm that exploits sorting, join scheduling, and distance computation filtering to reduce CPU and I/O usage.

2020 ◽  
Vol 16 (4) ◽  
pp. 26-43
Author(s):  
Noura Azaiez ◽  
Jalel Akaichi

Business Intelligence includes the concept of data warehousing to support decision making. As the ETL process presents the core of the warehousing technology, it is responsible for pulling data out of the source systems and placing it into a data warehouse. Given the technology development in the field of geographical information systems, pervasive systems, and the positioning systems, the traditional warehouse features become unable to handle the mobility aspect integrated in the warehousing chain. Therefore, the trajectory or the mobility data gathered from the mobile object movements have to be managed through what is called the trajectory ELT. For this purpose, the authors emphasize the power of the model-driven architecture approach to achieve the whole transformation task, in this case transforming trajectory data source model that describes the resulting trajectories into trajectory data mart models. The authors illustrate the proposed approach with an epilepsy patient state case study.


2011 ◽  
Vol 7 (4) ◽  
pp. 21-42 ◽  
Author(s):  
M. Asif Naeem ◽  
Gillian Dobbie ◽  
Gerald Weber

An important component of near-real-time data warehouses is the near-real-time integration layer. One important element in near-real-time data integration is the join of a continuous input data stream with a disk-based relation. For high-throughput streams, stream-based algorithms, such as Mesh Join (MESHJOIN), can be used. However, in MESHJOIN the performance of the algorithm is inversely proportional to the size of disk-based relation. The Index Nested Loop Join (INLJ) can be set up so that it processes stream input, and can deal with intermittences in the update stream but it has low throughput. This paper introduces a robust stream-based join algorithm called Hybrid Join (HYBRIDJOIN), which combines the two approaches. A theoretical result shows that HYBRIDJOIN is asymptotically as fast as the fastest of both algorithms. The authors present performance measurements of the implementation. In experiments using synthetic data based on a Zipfian distribution, HYBRIDJOIN performs significantly better for typical parameters of the Zipfian distribution, and in general performs in accordance with the theoretical model while the other two algorithms are unacceptably slow under different settings.


Author(s):  
M. Asif Naeem ◽  
Gillian Dobbie ◽  
Gerald Weber

An important component of near-real-time data warehouses is the near-real-time integration layer. One important element in near-real-time data integration is the join of a continuous input data stream with a disk-based relation. For high-throughput streams, stream-based algorithms, such as Mesh Join (MESHJOIN), can be used. However, in MESHJOIN the performance of the algorithm is inversely proportional to the size of disk-based relation. The Index Nested Loop Join (INLJ) can be set up so that it processes stream input, and can deal with intermittences in the update stream but it has low throughput. This paper introduces a robust stream-based join algorithm called Hybrid Join (HYBRIDJOIN), which combines the two approaches. A theoretical result shows that HYBRIDJOIN is asymptotically as fast as the fastest of both algorithms. The authors present performance measurements of the implementation. In experiments using synthetic data based on a Zipfian distribution, HYBRIDJOIN performs significantly better for typical parameters of the Zipfian distribution, and in general performs in accordance with the theoretical model while the other two algorithms are unacceptably slow under different settings.


KURVATEK ◽  
2018 ◽  
Vol 3 (1) ◽  
pp. 63-69
Author(s):  
Siti Jamilah Tarigan ◽  
Wing Wahyu Winarno ◽  
Henderi Safei
Keyword(s):  

Pengambilan keputusan dan perencanaan bidang akademik sering kali tidak berdasarkan pada informasi yang lengkap. Jajaran pengambil keputusan (rektorat atau tingkat eksekutif) hanya bisa melihat sebuah data dalam satu dimensi. Pengambil keputusan akan lebih baik jika informasi dapat disajikan dari berbagai dimensi. Perguruan tinggi telah memiliki data operasional yang lengkap dari kegiatan akademik, kepegawaian, dan penerimaan mahasiswa yang telah dikumpulkan lebih dari 4 tahun. Data warehouse adalah suatu koleksi optimasi database untuk mendukung keputusan. Konsep ini mengintegrasikan antara sistem lama dan sistem baru sehingga tidak terjadi duplikasi data. Data yang telah diintegrasikan dapat diolah dalam berbagai bentuk laporan sesuai dengan kebutuhan.Tujuan dari penelitian ini adalah bagaimana data yang ada bisa menghasilkan informasi yang akurat dan multidimensi sehingga pengambilan keputusan lebih cepat dan akurat. Penelitian ini menggunakan analisis data OLAP, dan skema bintang. Kesimpulan dari penelitian ini adalah rancangan yang dihasilkan bisa membantu pihak akademik dalam membuat keputusan berdasarkan data dan informasi yang mulitidimensi.


Sign in / Sign up

Export Citation Format

Share Document