graphics processing units
Recently Published Documents


TOTAL DOCUMENTS

1288
(FIVE YEARS 314)

H-INDEX

47
(FIVE YEARS 8)

2022 ◽  
Vol 54 (9) ◽  
pp. 1-35
Author(s):  
Lázaro Bustio-Martínez ◽  
René Cumplido ◽  
Martín Letras ◽  
Raudel Hernández-León ◽  
Claudia Feregrino-Uribe ◽  
...  

In data mining, Frequent Itemsets Mining is a technique used in several domains with notable results. However, the large volume of data in modern datasets increases the processing time of Frequent Itemset Mining algorithms, making them unsuitable for many real-world applications. Accordingly, proposing new methods for Frequent Itemset Mining to obtain frequent itemsets in a realistic amount of time is still an open problem. A successful alternative is to employ hardware acceleration using Graphics Processing Units (GPU) and Field Programmable Gates Arrays (FPGA). In this article, a comprehensive review of the state of the art of Frequent Itemsets Mining hardware acceleration is presented. Several approaches (FPGA and GPU based) were contrasted to show their weaknesses and strengths. This survey gathers the most relevant and the latest research efforts for improving the performance of Frequent Itemsets Mining regarding algorithms advances and modern development platforms. Furthermore, this survey organizes the current research on Frequent Itemsets Mining from the hardware perspective considering the source of the data, the development platform, and the baseline algorithm.


2022 ◽  
Vol 11 (1) ◽  
pp. 55
Author(s):  
Guiming Zhang

Volunteer-contributed geographic data (VGI) is an important source of geospatial big data that support research and applications. A major concern on VGI data quality is that the underlying observation processes are inherently biased. Detecting observation hot-spots thus helps better understand the bias. Enabled by the parallel kernel density estimation (KDE) computational tool that can run on multiple GPUs (graphics processing units), this study conducted point pattern analyses on tens of millions of iNaturalist observations to detect and visualize volunteers’ observation hot-spots across spatial scales. It was achieved by setting varying KDE bandwidths in accordance with the spatial scales at which hot-spots are to be detected. The succession of estimated density surfaces were then rendered at a sequence of map scales for visual detection of hot-spots. This study offers an effective geovisualization scheme for hierarchically detecting hot-spots in massive VGI datasets, which is useful for understanding the pattern-shaping drivers that operate at multiple spatial scales. This research exemplifies a computational tool that is supported by high-performance computing and capable of efficiently detecting and visualizing multi-scale hot-spots in geospatial big data and contributes to expanding the toolbox for geospatial big data analytics.


2022 ◽  
Author(s):  
Christoph Schär

<p>Currently major efforts are underway toward refining the horizontal grid spacing of climate models to about 1 km, using both global and regional climate models. There is the well-founded hope that this increase in resolution will improve climate models, as it enables replacing the parameterizations of moist convection and gravity-wave drag by explicit treatments. Results suggest that this approach has a high potential in improving the representation of the water cycle and extreme events, and in reducing uncertainties in climate change projections. The presentation will provide examples of these developments in the areas of heavy precipitation and severe weather events over Europe. In addition, it will be argued that km-resolution is a promising approach toward constraining uncertainties in global climate change projections, due to improvements in the representation of tropical and subtropical clouds. Work in the latter area has only recently started and results are highly encouraging.</p> <p>For a few years there have also been attempts to make km-resolution available in global climate models for decade-long simulations. Developing this approach requires a concerted effort. Key challenges include the exploitation of the next generation hardware architectures using accelerators (e.g. graphics processing units, GPUs), the development of suitable approaches to overcome the output avalanche, and the maintenance of the rapidly-developing model source codes on a number of different compute architectures. Despite these challenges, it will be argued that km-resolution GCMs with a capacity to run at 1 SYPD (simulated year per day), might be much closer than commonly believed.</p> <p>The presentation is largely based on a recent collaborative paper (Schär et al., 2020, BAMS, https://doi.org/10.1175/BAMS-D-18-0167.1) and ongoing studies. It will also present aspects of a recent Swiss project in this area (EXCLAIM, https://exclaim.ethz.ch/).</p>


Urban Climate ◽  
2022 ◽  
Vol 41 ◽  
pp. 101063
Author(s):  
Mohammad Mortezazadeh ◽  
Liangzhu Leon Wang ◽  
Maher Albettar ◽  
Senwen Yang

Author(s):  
Liam Dunn ◽  
Patrick Clearwater ◽  
Andrew Melatos ◽  
Karl Wette

Abstract The F-statistic is a detection statistic used widely in searches for continuous gravitational waves with terrestrial, long-baseline interferometers. A new implementation of the F-statistic is presented which accelerates the existing "resampling" algorithm using graphics processing units (GPUs). The new implementation runs between 10 and 100 times faster than the existing implementation on central processing units without sacrificing numerical accuracy. The utility of the GPU implementation is demonstrated on a pilot narrowband search for four newly discovered millisecond pulsars in the globular cluster Omega Centauri using data from the second Laser Interferometer Gravitational-Wave Observatory observing run. The computational cost is 17:2 GPU-hours using the new implementation, compared to 1092 core-hours with the existing implementation.


2021 ◽  
Vol 14 (12) ◽  
pp. 7749-7774
Author(s):  
Emmanuel Wyser ◽  
Yury Alkhimenkov ◽  
Michel Jaboyedoff ◽  
Yury Y. Podladchikov

Abstract. We propose an explicit GPU-based solver within the material point method (MPM) framework using graphics processing units (GPUs) to resolve elastoplastic problems under two- and three-dimensional configurations (i.e. granular collapses and slumping mechanics). Modern GPU architectures, including Ampere, Turing and Volta, provide a computational framework that is well suited to the locality of the material point method in view of high-performance computing. For intense and non-local computational aspects (i.e. the back-and-forth mapping between the nodes of the background mesh and the material points), we use straightforward atomic operations (the scattering paradigm). We select the generalized interpolation material point method (GIMPM) to resolve the cell-crossing error, which typically arises in the original MPM, because of the C0 continuity of the linear basis function. We validate our GPU-based in-house solver by comparing numerical results for granular collapses with the available experimental data sets. Good agreement is found between the numerical results and experimental results for the free surface and failure surface. We further evaluate the performance of our GPU-based implementation for the three-dimensional elastoplastic slumping mechanics problem. We report (i) a maximum 200-fold performance gain between a CPU- and a single-GPU-based implementation, provided that (ii) the hardware limit (i.e. the peak memory bandwidth) of the device is reached. Furthermore, our multi-GPU implementation can resolve models with nearly a billion material points. We finally showcase an application to slumping mechanics and demonstrate the importance of a three-dimensional configuration coupled with heterogeneous properties to resolve complex material behaviour.


2021 ◽  
Vol 28 (4) ◽  
pp. 338-355
Author(s):  
Natalia Olegovna Garanina ◽  
Sergei Petrovich Gorlatch

The paper presents a new approach to autotuning data-parallel programs. Autotuning is a search for optimal program settings which maximize its performance. The novelty of the approach lies in the use of the model checking method to find the optimal tuning parameters by the method of counterexamples. In our work, we abstract from specific programs and specific processors by defining their representative abstract patterns. Our method of counterexamples implements the following four steps. At the first step, an execution model of an abstract program on an abstract processor is described in the language of a model checking tool. At the second step, in the language of the model checking tool, we formulate the optimality property that depends on the constructed model. At the third step, we find the optimal values of the tuning parameters by using a counterexample constructed during the verification of the optimality property. In the fourth step, we extract the information about the tuning parameters from the counter-example for the optimal parameters. We apply this approach to autotuning parallel programs written in OpenCL, a popular modern language that extends the C language for programming both standard multi-core processors (CPUs) and massively parallel graphics processing units (GPUs). As a verification tool, we use the SPIN verifier and its model representation language Promela, whose formal semantics is good for modelling the execution of parallel programs on processors with different architectures.


2021 ◽  
Vol 7 (12) ◽  
pp. 274
Author(s):  
Dominique Franson ◽  
Andrew Dupuis ◽  
Vikas Gulani ◽  
Mark Griswold ◽  
Nicole Seiberlich

Image-guided cardiovascular interventions are rapidly evolving procedures that necessitate imaging systems capable of rapid data acquisition and low-latency image reconstruction and visualization. Compared to alternative modalities, Magnetic Resonance Imaging (MRI) is attractive for guidance in complex interventional settings thanks to excellent soft tissue contrast and large fields-of-view without exposure to ionizing radiation. However, most clinically deployed MRI sequences and visualization pipelines exhibit poor latency characteristics, and spatial integration of complex anatomy and device orientation can be challenging on conventional 2D displays. This work demonstrates a proof-of-concept system linking real-time cardiac MR image acquisition, online low-latency reconstruction, and a stereoscopic display to support further development in real-time MR-guided intervention. Data are acquired using an undersampled, radial trajectory and reconstructed via parallelized through-time radial generalized autocalibrating partially parallel acquisition (GRAPPA) implemented on graphics processing units. Images are rendered for display in a stereoscopic mixed-reality head-mounted display. The system is successfully tested by imaging standard cardiac views in healthy volunteers. Datasets comprised of one slice (46 ms), two slices (92 ms), and three slices (138 ms) are collected, with the acquisition time of each listed in parentheses. Images are displayed with latencies of 42 ms/frame or less for all three conditions. Volumetric data are acquired at one volume per heartbeat with acquisition times of 467 ms and 588 ms when 8 and 12 partitions are acquired, respectively. Volumes are displayed with a latency of 286 ms or less. The faster-than-acquisition latencies for both planar and volumetric display enable real-time 3D visualization of the heart.


Author(s):  
Pascal R Bähr ◽  
Bruno Lang ◽  
Peer Ueberholz ◽  
Marton Ady ◽  
Roberto Kersevan

Molflow+ is a Monte Carlo (MC) simulation software for ultra-high vacuum, mainly used to simulate pressure in particle accelerators. In this article, we present and discuss the design choices arising in a new implementation of its ray-tracing–based simulation unit for Nvidia RTX Graphics Processing Units (GPUs). The GPU simulation kernel was designed with Nvidia’s OptiX 7 API to make use of modern hardware-accelerated ray-tracing units, found in recent RTX series GPUs based on the Turing and Ampere architectures. Even with the challenges posed by switching to 32 bit computations, our kernel runs much faster than on comparable CPUs at the expense of a marginal drop in calculation precision.


2021 ◽  
Vol 4 ◽  
pp. 16-22
Author(s):  
Mykola Semylitko ◽  
Gennadii Malaschonok

SVD (Singular Value Decomposition) algorithm is used in recommendation systems, machine learning, image processing, and in various algorithms for working with matrices which can be very large and Big Data, so, given the peculiarities of this algorithm, it can be performed on a large number of computing threads that have only video cards.CUDA is a parallel computing platform and application programming interface model created by Nvidia. It allows software developers and software engineers to use a CUDA-enabled graphics processing unit for general purpose processing – an approach termed GPGPU (general-purpose computing on graphics processing units). The GPU provides much higher instruction throughput and memory bandwidth than the CPU within a similar price and power envelope. Many applications leverage these higher capabilities to run faster on the GPU than on the CPU. Other computing devices, like FPGAs, are also very energy efficient, but they offer much less programming flexibility than GPUs.The developed modification uses the CUDA architecture, which is intended for a large number of simultaneous calculations, which allows to quickly process matrices of very large sizes. The algorithm of parallel SVD for a three-diagonal matrix based on the Givents rotation provides a high accuracy of calculations. Also the algorithm has a number of optimizations to work with memory and multiplication algorithms that can significantly reduce the computation time discarding empty iterations.This article proposes an approach that will reduce the computation time and, consequently, resources and costs. The developed algorithm can be used with the help of a simple and convenient API in C ++ and Java, as well as will be improved by using dynamic parallelism or parallelization of multiplication operations. Also the obtained results can be used by other developers for comparison, as all conditions of the research are described in detail, and the code is in free access.


Sign in / Sign up

Export Citation Format

Share Document