parallel graphics
Recently Published Documents


TOTAL DOCUMENTS

49
(FIVE YEARS 4)

H-INDEX

7
(FIVE YEARS 1)

2021 ◽  
Vol 28 (4) ◽  
pp. 338-355
Author(s):  
Natalia Olegovna Garanina ◽  
Sergei Petrovich Gorlatch

The paper presents a new approach to autotuning data-parallel programs. Autotuning is a search for optimal program settings which maximize its performance. The novelty of the approach lies in the use of the model checking method to find the optimal tuning parameters by the method of counterexamples. In our work, we abstract from specific programs and specific processors by defining their representative abstract patterns. Our method of counterexamples implements the following four steps. At the first step, an execution model of an abstract program on an abstract processor is described in the language of a model checking tool. At the second step, in the language of the model checking tool, we formulate the optimality property that depends on the constructed model. At the third step, we find the optimal values of the tuning parameters by using a counterexample constructed during the verification of the optimality property. In the fourth step, we extract the information about the tuning parameters from the counter-example for the optimal parameters. We apply this approach to autotuning parallel programs written in OpenCL, a popular modern language that extends the C language for programming both standard multi-core processors (CPUs) and massively parallel graphics processing units (GPUs). As a verification tool, we use the SPIN verifier and its model representation language Promela, whose formal semantics is good for modelling the execution of parallel programs on processors with different architectures.


2021 ◽  
Vol 11 (2) ◽  
pp. 818
Author(s):  
Hector Rico-Garcia ◽  
Jose-Luis Sanchez-Romero ◽  
Antonio Jimeno-Morenilla ◽  
Hector Migallon-Gomis

The development of the smart city concept and inhabitants’ need to reduce travel time, in addition to society’s awareness of the importance of reducing fuel consumption and respecting the environment, have led to a new approach to the classic travelling salesman problem (TSP) applied to urban environments. This problem can be formulated as “Given a list of geographic points and the distances between each pair of points, what is the shortest possible route that visits each point and returns to the departure point?”. At present, with the development of Internet of Things (IoT) devices and increased capabilities of sensors, a large amount of data and measurements are available, allowing researchers to model accurately the routes to choose. In this work, the aim is to provide a solution to the TSP in smart city environments using a modified version of the metaheuristic optimization algorithm Teacher Learner Based Optimization (TLBO). In addition, to improve performance, the solution is implemented by means of a parallel graphics processing unit (GPU) architecture, specifically a Compute Unified Device Architecture (CUDA) implementation.


Electronics ◽  
2019 ◽  
Vol 8 (12) ◽  
pp. 1479 ◽  
Author(s):  
Michael Losh ◽  
Daniel Llamocca

Modern massively-parallel Graphics Processing Units (GPUs) and Machine Learning (ML) frameworks enable neural network implementations of unprecedented performance and sophistication. However, state-of-the-art GPU hardware platforms are extremely power-hungry, while microprocessors cannot achieve the performance requirements. Biologically-inspired Spiking Neural Networks (SNN) have inherent characteristics that lead to lower power consumption. We thus present a bit-serial SNN-like hardware architecture. By using counters, comparators, and an indexing scheme, the design effectively implements the sum-of-products inherent in neurons. In addition, we experimented with various strength-reduction methods to lower neural network resource usage. The proposed Spiking Hybrid Network (SHiNe), validated on an FPGA, has been found to achieve reasonable performance with a low resource utilization, with some trade-off with respect to hardware throughput and signal representation.


2019 ◽  
Vol 9 (9) ◽  
pp. 1871 ◽  
Author(s):  
Chanrith Poleak ◽  
Jangwoo Kwon

Automatically generating a novel description of an image is a challenging and important problem that brings together advanced research in both computer vision and natural language processing. In recent years, image captioning has significantly improved its performance by using long short-term memory (LSTM) as a decoder for the language model. However, despite this improvement, LSTM itself has its own shortcomings as a model because the structure is complicated and its nature is inherently sequential. This paper proposes a model using a simple convolutional network for both encoder and decoder functions of image captioning, instead of the current state-of-the-art approach. Our experiment with this model on a Microsoft Common Objects in Context (MSCOCO) captioning dataset yielded results that are competitive with the state-of-the-art image captioning model across different evaluation metrics, while having a much simpler model and enabling parallel graphics processing unit (GPU) computation during training, resulting in a faster training time.


2013 ◽  
Vol 753-755 ◽  
pp. 2912-2915
Author(s):  
Wei Cao ◽  
Zheng Hua Wang ◽  
Chuan Fu Xu

In recent years, the highly parallel graphics processing unit (GPU) is rapidly gaining maturity as a powerful engine for high performance computer. However, in most computational fluid dynamics (CFD) simulations, the computational capacity of CPU was ignored. In this paper, we propose a hybrid parallel programming model to utilize the computational capacity of both CPU and GPU. Considering the memory amount of CPU and GPU, we also propose an out-of-core method to increase the simulation scale on single node. The experiment results show that the programming model can utilize the computational capacity of both CPU and GPU efficiently and the out-of-core method can increase the simulation scale on single node.


2013 ◽  
Vol 712-715 ◽  
pp. 2538-2541
Author(s):  
Cao Wei ◽  
Zheng Hua Wang ◽  
Chuan Fu Xu

In recent years, the highly parallel graphics processing unit (GPU) is rapidly gaining maturity as a powerful engine for high performance computer. More and more researchers try to port the computational fluid dynamics (CFD) simulations into heterogeneous computers. However, most researchers focus on exploring the computational capability of GPU, while ignore the computational capability of CPU. In order to utilize the computational capability of CPU and GPU, we propose a hybrid CUDA/OpenMP parallel programming model. And we proposed an adaptive load balancing scheme to distribute the workload among CPUs and GPUs. With this programming model, we implement a high-order CFD program on “Tianhe-1A” supercomputer system. The performance results validate the workload distribution scheme.


Sign in / Sign up

Export Citation Format

Share Document