cuda programming
Recently Published Documents


TOTAL DOCUMENTS

44
(FIVE YEARS 10)

H-INDEX

10
(FIVE YEARS 1)

Author(s):  
М.Л. Цымблер ◽  
А.И. Гоглачев

Поиск типичных подпоследовательностей временного ряда является одной из актуальных задач интеллектуального анализа временных рядов. Данная задача предполагает нахождение набора подпоследовательностей временного ряда, которые адекватно отражают течение процесса или явления, задаваемого этим рядом. Поиск типичных подпоследовательностей дает возможность резюмировать и визуализировать большие временные ряды в широком спектре приложений: мониторинг технического состояния сложных машин и механизмов, интеллектуальное управление системами жизнеобеспечения, мониторинг показателей функциональной диагностики организма человека и др. Предложенная недавно концепция сниппета формализует типичную подпоследовательность временного ряда следующим образом. Сниппет представляет собой подпоследовательность, на которую похожи многие другие подпоследовательности данного ряда в смысле специализированной меры схожести, основанной на евклидовом расстоянии. Поиск типичных подпоследовательностей с помощью сниппетов показывает адекватные результаты для временных рядов из широкого спектра предметных областей, однако соответствующий алгоритм имеет высокую вычислительную сложность. В настоящей работе предложен новый параллельный алгоритм поиска сниппетов во временном ряде на графическом ускорителе. Распараллеливание выполнено с помощью технологии программирования CUDA. Разработаны структуры данных, позволяющие эффективно распараллелить вычисления на графическом процессоре. Представлены результаты вычислительных экспериментов, подтверждающих высокую производительность разработанного алгоритма. Discovery of typical subsequences in a time series is one of the topical problems of time series mining. In this problem, we are to find a set of subsequences that adequately represents the specified time series. The solution of such a problem makes it possible to summarize and visualize a large time series in a wide range of applications: monitoring of the technical condition of complex machines and mechanisms, intelligent management of life support systems, monitoring of indicators of functional diagnostics of the human body, etc. The recently proposed snippet concept formalizes a typical time series subsequence as follows. A snippet of a time series is a subsequence that many other subsequences of the given series are similar to, with respect to a specialized similarity measure based on the Euclidean distance. Despite the snippets discovery algorithm shows adequate results for time series from a wide range of subject domains, it has a high computational complexity. In this article, we propose a novel parallel algorithm for snippets discovery on GPU. Parallelization is performed through the CUDA programming technology. We developed data structures that allow for efficient parallelization of GPU calculations. The experimental results show the high performance of the proposed algorithm.


2021 ◽  
Author(s):  
Randa Khemiri ◽  
Soulef Bouaafia ◽  
Asma Bahba ◽  
Maha Nasr ◽  
Fatma Ezahra Sayadi

In Motion estimation (ME), the block matching algorithms have a great potential of parallelism. This process of the best match is performed by computing the similarity for each block position inside the search area, using a similarity metric, such as Sum of Absolute Differences (SAD). It is used in the various steps of motion estimation algorithms. Moreover, it can be parallelized using Graphics Processing Unit (GPU) since the computation algorithm of each block pixels is similar, thus offering better results. In this work a fixed OpenCL code was performed firstly on several architectures as CPU and GPU, secondly a parallel GPU-implementation was proposed with CUDA and OpenCL for the SAD process using block of sizes from 4x4 to 64x64. A comparative study established between execution time on GPU on the same video sequence. The experimental results indicated that GPU OpenCL execution time was better than that of CUDA times with performance ratio that reached the double.


2021 ◽  
Vol 54 (4) ◽  
Author(s):  
Pranay Reddy Kommera ◽  
Vinay Ramakrishnaiah ◽  
Christine Sweeney ◽  
Jeffrey Donatelli ◽  
Petrus H. Zwart

The multitiered iterative phasing (MTIP) algorithm is used to determine the biological structures of macromolecules from fluctuation scattering data. It is an iterative algorithm that reconstructs the electron density of the sample by matching the computed fluctuation X-ray scattering data to the external observations, and by simultaneously enforcing constraints in real and Fourier space. This paper presents the first ever MTIP algorithm acceleration efforts on contemporary graphics processing units (GPUs). The Compute Unified Device Architecture (CUDA) programming model is used to accelerate the MTIP algorithm on NVIDIA GPUs. The computational performance of the CUDA-based MTIP algorithm implementation outperforms the CPU-based version by an order of magnitude. Furthermore, the Heterogeneous-Compute Interface for Portability (HIP) runtime APIs are used to demonstrate portability by accelerating the MTIP algorithm across NVIDIA and AMD GPUs.


2021 ◽  
Author(s):  
Zixiong Zhao ◽  
Peng Hu ◽  
Wei Li ◽  
Zhixian Cao ◽  
Zhiguo He

<p>In recent decades, computational hydraulics and sediment modelling have a great development due to compute technology. Applying a finite-volume Godunov-type hydrodynamic shallow water model with hydro-sediment-morphodynamic processes, this work demonstrates and analysis the ability of single-host parallel computing technology with algorithmic acceleration technology. This model is implemented for high-performance computing using the NVIDIA’s Compute Unified Device Architecture (CUDA) programming framework, using a domain decomposition technique and across multiple cores through an efficient implementation of the Open Multi-Processing (Open MP) architecture, and using an algorithmic acceleration technology named local time stepping scheme (LTS), which is capable of obtain much efficiency improvement via different time step sizes for different grid sizes. The model is applied for three cases, through which we compare the effectiveness of CPU, Open MP, Open MP+LTS, CUDA, and CUDA+LTS, demonstrating high computational performance across CUDA+LTS which can lead to speedups of 40 times with respect to CPU and high-precision results across CUDA +LTS.</p><p>KEY WORDS: Hydro-sediment-morphological modeling; local time step; Open MP; CUDA.</p>


Author(s):  
Teamsar Muliadi Panggabean ◽  
Mario Elyezer Simaremare ◽  
Rusmina Siahaan ◽  
Chandro Pardede ◽  
Wiwin Putri Gurning
Keyword(s):  

Computation ◽  
2020 ◽  
Vol 8 (2) ◽  
pp. 48 ◽  
Author(s):  
Stefano Quer ◽  
Andrea Marcelli ◽  
Giovanni Squillero

The maximum common subgraph of two graphs is the largest possible common subgraph, i.e., the common subgraph with as many vertices as possible. Even if this problem is very challenging, as it has been long proven NP-hard, its countless practical applications still motivates searching for exact solutions. This work discusses the possibility to extend an existing, very effective branch-and-bound procedure on parallel multi-core and many-core architectures. We analyze a parallel multi-core implementation that exploits a divide-and-conquer approach based on a thread pool, which does not deteriorate the original algorithmic efficiency and it minimizes data structure repetitions. We also extend the original algorithm to parallel many-core GPU architectures adopting the CUDA programming framework, and we show how to handle the heavily workload-unbalance and the massive data dependency. Then, we suggest new heuristics to reorder the adjacency matrix, to deal with “dead-ends”, and to randomize the search with automatic restarts. These heuristics can achieve significant speed-ups on specific instances, even if they may not be competitive with the original strategy on average. Finally, we propose a portfolio approach, which integrates all the different local search algorithms as component tools; such portfolio, rather than choosing the best tool for a given instance up-front, takes the decision on-line. The proposed approach drastically limits memory bandwidth constraints and avoids other typical portfolio fragility as CPU and GPU versions often show a complementary efficiency and run on separated platforms. Experimental results support the claims and motivate further research to better exploit GPUs in embedded task-intensive and multi-engine parallel applications.


Author(s):  
Can Yang ◽  
Yin Li ◽  
Fenhua Cheng
Keyword(s):  

2019 ◽  
Vol 184 ◽  
pp. 99-106 ◽  
Author(s):  
Rex Kuan-Shuo Liu ◽  
Cheng-Tao Wu ◽  
Neo Shih-Chao Kao ◽  
Tony Wen-Hann Sheu

Author(s):  
Naajil Aamir Khan ◽  
Muhammad Bilal Latif ◽  
Nida Pervaiz ◽  
Mubashir Baig ◽  
Hasina Khatoon ◽  
...  
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document