scholarly journals Spatial Data Sequence Selection Based on a User-Defined Condition Using GPGPU

2021 ◽  
Vol 10 (12) ◽  
pp. 816
Author(s):  
Driss En-Nejjary ◽  
François Pinet ◽  
Myoung-Ah Kang

The size of spatial data is growing intensively due to the emergence of and the tremendous advances in technology such as sensors and the internet of things. Supporting high-performance queries on this large volume of data becomes essential in several data- and compute-intensive applications. Unfortunately, most of the existing methods and approaches are based on a traditional computing framework (uniprocessors) which makes them not scalable and not adequate to deal with large-scale data. In this work, we present a high-performance query for massive spatio–temporal data. The query consists of selecting fixed size raster subsequences, based on the average of their region of interest, from a spatio–temporal raster sequence satisfying a user threshold condition. In our paper, for the purpose of simplification, we consider that the region of interest is the entire raster and not only a subregion. Our aim is to speed up the execution using parallel primitives and pure CUDA. Furthermore, we propose a new method based on a sorting step to save computations and boost the speed of the query execution. The test results show that the proposed methods are faster and good performance is achieved even with large-scale rasters and data.

2012 ◽  
Vol 209-211 ◽  
pp. 252-255
Author(s):  
Li Guo ◽  
Hai Ying Zheng ◽  
Yong Hong Wang ◽  
Bin Zhang

Data matching technology is a key technology for spatial data integration and fusion. This paper represents a solution to the complex polygon area, defines the area overlapped rate in the aspect of geometric measure, presents the data matching idea based on area overlapped rate .Then, this paper discusses and realizes the data matching relation of area elements including one to one , many to one and many to many. At last, region targets are set as the study object, large scale data are taken for example. We draw the conclusion: this algorithm is efficient.


2018 ◽  
Vol 7 (12) ◽  
pp. 467 ◽  
Author(s):  
Mengyu Ma ◽  
Ye Wu ◽  
Wenze Luo ◽  
Luo Chen ◽  
Jun Li ◽  
...  

Buffer analysis, a fundamental function in a geographic information system (GIS), identifies areas by the surrounding geographic features within a given distance. Real-time buffer analysis for large-scale spatial data remains a challenging problem since the computational scales of conventional data-oriented methods expand rapidly with increasing data volume. In this paper, we introduce HiBuffer, a visualization-oriented model for real-time buffer analysis. An efficient buffer generation method is proposed which introduces spatial indexes and a corresponding query strategy. Buffer results are organized into a tile-pyramid structure to enable stepless zooming. Moreover, a fully optimized hybrid parallel processing architecture is proposed for the real-time buffer analysis of large-scale spatial data. Experiments using real-world datasets show that our approach can reduce computation time by up to several orders of magnitude while preserving superior visualization effects. Additional experiments were conducted to analyze the influence of spatial data density, buffer radius, and request rate on HiBuffer performance, and the results demonstrate the adaptability and stability of HiBuffer. The parallel scalability of HiBuffer was also tested, showing that HiBuffer achieves high performance of parallel acceleration. Experimental results verify that HiBuffer is capable of handling 10-million-scale data.


2022 ◽  
Vol 16 (4) ◽  
pp. 1-33
Author(s):  
Danlu Liu ◽  
Yu Li ◽  
William Baskett ◽  
Dan Lin ◽  
Chi-Ren Shyu

Risk patterns are crucial in biomedical research and have served as an important factor in precision health and disease prevention. Despite recent development in parallel and high-performance computing, existing risk pattern mining methods still struggle with problems caused by large-scale datasets, such as redundant candidate generation, inability to discover long significant patterns, and prolonged post pattern filtering. In this article, we propose a novel dynamic tree structure, Risk Hierarchical Pattern Tree (RHPTree), and a top-down search method, RHPSearch, which are capable of efficiently analyzing a large volume of data and overcoming the limitations of previous works. The dynamic nature of the RHPTree avoids costly tree reconstruction for the iterative search process and dataset updates. We also introduce two specialized search methods, the extended target search (RHPSearch-TS) and the parallel search approach (RHPSearch-SD), to further speed up the retrieval of certain items of interest. Experiments on both UCI machine learning datasets and sampled datasets of the Simons Foundation Autism Research Initiative (SFARI)—Simon’s Simplex Collection (SSC) datasets demonstrate that our method is not only faster but also more effective in identifying comprehensive long risk patterns than existing works. Moreover, the proposed new tree structure is generic and applicable to other pattern mining problems.


2018 ◽  
Vol 35 (3) ◽  
pp. 380-388 ◽  
Author(s):  
Wei Zheng ◽  
Qi Mao ◽  
Robert J Genco ◽  
Jean Wactawski-Wende ◽  
Michael Buck ◽  
...  

Abstract Motivation The rapid development of sequencing technology has led to an explosive accumulation of genomic data. Clustering is often the first step to be performed in sequence analysis. However, existing methods scale poorly with respect to the unprecedented growth of input data size. As high-performance computing systems are becoming widely accessible, it is highly desired that a clustering method can easily scale to handle large-scale sequence datasets by leveraging the power of parallel computing. Results In this paper, we introduce SLAD (Separation via Landmark-based Active Divisive clustering), a generic computational framework that can be used to parallelize various de novo operational taxonomic unit (OTU) picking methods and comes with theoretical guarantees on both accuracy and efficiency. The proposed framework was implemented on Apache Spark, which allows for easy and efficient utilization of parallel computing resources. Experiments performed on various datasets demonstrated that SLAD can significantly speed up a number of popular de novo OTU picking methods and meanwhile maintains the same level of accuracy. In particular, the experiment on the Earth Microbiome Project dataset (∼2.2B reads, 437 GB) demonstrated the excellent scalability of the proposed method. Availability and implementation Open-source software for the proposed method is freely available at https://www.acsu.buffalo.edu/~yijunsun/lab/SLAD.html. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Krzysztof Jurczuk ◽  
Marcin Czajkowski ◽  
Marek Kretowski

Decision trees (DTs) are one of the most popular white-box machine-learning techniques. Traditionally, DTs are induced using a top-down greedy search that may lead to sub-optimal solutions. One of the emerging alternatives is an evolutionary induction inspired by the biological evolution. It searches for the tree structure and tests simultaneously, which results in less complex DTs with at least comparable prediction performance. However, the evolutionary search is computationally expensive, and its effective application to big data mining needs algorithmic and technological progress. In this paper, noting that many trees or their parts reappear during the evolution, we propose a reuse strategy. A fixed number of recently processed individuals (DTs) is stored in a so-called repository. A part of the repository entry (related to fitness calculations) is maintained on a CPU side to limit CPU/GPU memory transactions. The rest of the repository entry (tree structures) is located on a GPU side to speed up searching for similar DTs. As the most time-demanding task of the induction is the DTs’ evaluation, the GPU first searches similar DTs in the repository for reuse. If it fails, the GPU has to evaluate DT from the ground up. Large artificial and real-life datasets and various repository strategies are tested. Results show that the concept of reusing information from previous generations can accelerate the original GPU-based solution further. It is especially visible for large-scale data. To give an idea of the overall acceleration scale, the proposed solution can process even billions of objects in a few hours on a single GPU workstation.


2018 ◽  
Vol 13 (2) ◽  
pp. 338-346
Author(s):  
Yusuke Kawai ◽  
Jing Zhao ◽  
Kento Sugiura ◽  
Yoshiharu Ishikawa ◽  
Yukiko Wakita ◽  
...  

Today, large-scale simulations are thriving because of the increase of computating performance and storage capacity. Understanding the results of these simulations is not easy, and hence, support for interactive and exploratory analysis is becoming more important. This study focuses on spatio-temporal simulations and attempts to develop an analysis technology to support them. It uses a database system for supporting interactive analysis of large-scale data. Since the data gained via spatio-temporal simulations is not suitable for management in a relational DBMS (RDBMS), this study uses an array DBMS, a type of DBMS that has been garnering increased attention in recent years. An array DBMS is designed for the management of large-scale array data; it provides a logical model for array data, yet it also supports efficient query processing. SciDB is used as our specific array DBMS in this paper. This study targets disaster evacuation simulation data and demonstrates via experimentation that the query-processing functions offered by an array DBMS provide effective analysis support.


Author(s):  
Gordon Bell ◽  
David H Bailey ◽  
Jack Dongarra ◽  
Alan H Karp ◽  
Kevin Walsh

The Gordon Bell Prize is awarded each year by the Association for Computing Machinery to recognize outstanding achievement in high-performance computing (HPC). The purpose of the award is to track the progress of parallel computing with particular emphasis on rewarding innovation in applying HPC to applications in science, engineering, and large-scale data analytics. Prizes may be awarded for peak performance or special achievements in scalability and time-to-solution on important science and engineering problems. Financial support for the US$10,000 award is provided through an endowment by Gordon Bell, a pioneer in high-performance and parallel computing. This article examines the evolution of the Gordon Bell Prize and the impact it has had on the field.


Sign in / Sign up

Export Citation Format

Share Document