parallel graphics Latest Research Papers

The paper presents a new approach to autotuning data-parallel programs. Autotuning is a search for optimal program settings which maximize its performance. The novelty of the approach lies in the use of the model checking method to find the optimal tuning parameters by the method of counterexamples. In our work, we abstract from specific programs and specific processors by defining their representative abstract patterns. Our method of counterexamples implements the following four steps. At the first step, an execution model of an abstract program on an abstract processor is described in the language of a model checking tool. At the second step, in the language of the model checking tool, we formulate the optimality property that depends on the constructed model. At the third step, we find the optimal values of the tuning parameters by using a counterexample constructed during the verification of the optimality property. In the fourth step, we extract the information about the tuning parameters from the counter-example for the optimal parameters. We apply this approach to autotuning parallel programs written in OpenCL, a popular modern language that extends the C language for programming both standard multi-core processors (CPUs) and massively parallel graphics processing units (GPUs). As a verification tool, we use the SPIN verifier and its model representation language Promela, whose formal semantics is good for modelling the execution of parallel programs on processors with different architectures.

Download Full-text

A Parallel Meta-Heuristic Approach to Reduce Vehicle Travel Time in Smart Cities

Applied Sciences ◽

10.3390/app11020818 ◽

2021 ◽

Vol 11 (2) ◽

pp. 818

Author(s):

Hector Rico-Garcia ◽

Jose-Luis Sanchez-Romero ◽

Antonio Jimeno-Morenilla ◽

Hector Migallon-Gomis

Keyword(s):

Travel Time ◽

Smart City ◽

Graphics Processing Unit ◽

Smart Cities ◽

Travelling Salesman Problem ◽

Urban Environments ◽

Processing Unit ◽

Departure Point ◽

Iot Devices ◽

Parallel Graphics

The development of the smart city concept and inhabitants’ need to reduce travel time, in addition to society’s awareness of the importance of reducing fuel consumption and respecting the environment, have led to a new approach to the classic travelling salesman problem (TSP) applied to urban environments. This problem can be formulated as “Given a list of geographic points and the distances between each pair of points, what is the shortest possible route that visits each point and returns to the departure point?”. At present, with the development of Internet of Things (IoT) devices and increased capabilities of sensors, a large amount of data and measurements are available, allowing researchers to model accurately the routes to choose. In this work, the aim is to provide a solution to the TSP in smart city environments using a modified version of the metaheuristic optimization algorithm Teacher Learner Based Optimization (TLBO). In addition, to improve performance, the solution is implemented by means of a parallel graphics processing unit (GPU) architecture, specifically a Compute Unified Device Architecture (CUDA) implementation.

Download Full-text

A Low-Power Spike-Like Neural Network Design

Electronics ◽

10.3390/electronics8121479 ◽

2019 ◽

Vol 8 (12) ◽

pp. 1479 ◽

Cited By ~ 1

Author(s):

Michael Losh ◽

Daniel Llamocca

Keyword(s):

Neural Network ◽

Graphics Processing Units ◽

Hybrid Network ◽

Network Resource ◽

Performance Requirements ◽

Reduction Methods ◽

Sum Of Products ◽

Parallel Graphics ◽

Hardware Platforms ◽

Graphics Processing

Modern massively-parallel Graphics Processing Units (GPUs) and Machine Learning (ML) frameworks enable neural network implementations of unprecedented performance and sophistication. However, state-of-the-art GPU hardware platforms are extremely power-hungry, while microprocessors cannot achieve the performance requirements. Biologically-inspired Spiking Neural Networks (SNN) have inherent characteristics that lead to lower power consumption. We thus present a bit-serial SNN-like hardware architecture. By using counters, comparators, and an indexing scheme, the design effectively implements the sum-of-products inherent in neurons. In addition, we experimented with various strength-reduction methods to lower neural network resource usage. The proposed Spiking Hybrid Network (SHiNe), validated on an FPGA, has been found to achieve reasonable performance with a low resource utilization, with some trade-off with respect to hardware throughput and signal representation.

Download Full-text

Parallel Image Captioning Using 2D Masked Convolution

Applied Sciences ◽

10.3390/app9091871 ◽

2019 ◽

Vol 9 (9) ◽

pp. 1871 ◽

Cited By ~ 1

Author(s):

Chanrith Poleak ◽

Jangwoo Kwon

Keyword(s):

Language Processing ◽

Short Term Memory ◽

State Of The Art ◽

Graphics Processing Unit ◽

Language Model ◽

Processing Unit ◽

Image Captioning ◽

Convolutional Network ◽

Training Time ◽

Parallel Graphics

Automatically generating a novel description of an image is a challenging and important problem that brings together advanced research in both computer vision and natural language processing. In recent years, image captioning has significantly improved its performance by using long short-term memory (LSTM) as a decoder for the language model. However, despite this improvement, LSTM itself has its own shortcomings as a model because the structure is complicated and its nature is inherently sequential. This paper proposes a model using a simple convolutional network for both encoder and decoder functions of image captioning, instead of the current state-of-the-art approach. Our experiment with this model on a Microsoft Common Objects in Context (MSCOCO) captioning dataset yielded results that are competitive with the state-of-the-art image captioning model across different evaluation metrics, while having a much simpler model and enabling parallel graphics processing unit (GPU) computation during training, resulting in a faster training time.

Download Full-text

Large-Scale Nonlinear Device-Level Power Electronic Circuit Simulation on Massively Parallel Graphics Processing Architectures

IEEE Transactions on Power Electronics ◽

10.1109/tpel.2017.2725239 ◽

2018 ◽

Vol 33 (6) ◽

pp. 4660-4678 ◽

Cited By ~ 7

Author(s):

Shenhao Yan ◽

Zhiyin Zhou ◽

Venkata Dinavahi

Keyword(s):

Large Scale ◽

Electronic Circuit ◽

Circuit Simulation ◽

Massively Parallel ◽

Power Electronic ◽

Parallel Graphics ◽

Processing Architectures ◽

Electronic Circuit Simulation ◽

Graphics Processing

Download Full-text

An Out-of-Core Method for CFD Simulation in Heterogeneous Environment

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.753-755.2912 ◽

2013 ◽

Vol 753-755 ◽

pp. 2912-2915

Author(s):

Wei Cao ◽

Zheng Hua Wang ◽

Chuan Fu Xu

Keyword(s):

High Performance ◽

Cfd Simulation ◽

Programming Model ◽

Graphics Processing Unit ◽

Processing Unit ◽

Single Node ◽

Core Method ◽

Computational Capacity ◽

Parallel Programming Model ◽

Parallel Graphics

In recent years, the highly parallel graphics processing unit (GPU) is rapidly gaining maturity as a powerful engine for high performance computer. However, in most computational fluid dynamics (CFD) simulations, the computational capacity of CPU was ignored. In this paper, we propose a hybrid parallel programming model to utilize the computational capacity of both CPU and GPU. Considering the memory amount of CPU and GPU, we also propose an out-of-core method to increase the simulation scale on single node. The experiment results show that the programming model can utilize the computational capacity of both CPU and GPU efficiently and the out-of-core method can increase the simulation scale on single node.

Download Full-text

A GPU/CPU Programming Model for CFD Simulation

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.712-715.2538 ◽

2013 ◽

Vol 712-715 ◽

pp. 2538-2541

Author(s):

Cao Wei ◽

Zheng Hua Wang ◽

Chuan Fu Xu

Keyword(s):

High Performance ◽

Cfd Simulation ◽

Programming Model ◽

Graphics Processing Unit ◽

Processing Unit ◽

Computational Capability ◽

Parallel Programming Model ◽

Parallel Graphics ◽

Performance Results ◽

Graphics Processing

In recent years, the highly parallel graphics processing unit (GPU) is rapidly gaining maturity as a powerful engine for high performance computer. More and more researchers try to port the computational fluid dynamics (CFD) simulations into heterogeneous computers. However, most researchers focus on exploring the computational capability of GPU, while ignore the computational capability of CPU. In order to utilize the computational capability of CPU and GPU, we propose a hybrid CUDA/OpenMP parallel programming model. And we proposed an adaptive load balancing scheme to distribute the workload among CPUs and GPUs. With this programming model, we implement a high-order CFD program on “Tianhe-1A” supercomputer system. The performance results validate the workload distribution scheme.

Download Full-text