High Performance Parallel Programming of a GA Using Multi-core Technology

Author(s):  
Rogelio Serrano ◽  
Juan Tapia ◽  
Oscar Montiel ◽  
Roberto Sepúlveda ◽  
Patricia Melin
2012 ◽  
pp. 278-296
Author(s):  
Dake Liu ◽  
Joar Sohl ◽  
Jian Wang

A novel master-multi-SIMD architecture and its kernel (template) based parallel programming flow is introduced as a parallel signal processing platform. The name of the platform is ePUMA (embedded Parallel DSP processor architecture with Unique Memory Access). The essential technology is to separate data accessing kernels from arithmetic computing kernels so that the run-time cost of data access can be minimized by running it in parallel with algorithm computing. The SIMD memory subsystem architecture based on the proposed flow dramatically improves the total computing performance. The hardware system and programming flow introduced in this article will primarily aim at low-power high-performance embedded parallel computing with low silicon cost for communications and similar real-time signal processing.


1997 ◽  
Vol 6 (2) ◽  
pp. 215-227 ◽  
Author(s):  
Guy Edjlali ◽  
Gagan Guyagrawal ◽  
Alan Sussman ◽  
Jim Humphries ◽  
Joel Saltz

For better utilization of computing resources, it is important to consider parallel programming environments in which the number of available processors varies at run-time. In this article, we discuss run-time support for data-parallel programming in such an adaptive environment. Executing programs in an adaptive environment requires redistributing data when the number of processors changes, and also requires determining new loop bounds and communication patterns for the new set of processors. We have developed a run-time library to provide this support. We discuss how the run-time library can be used by compilers of high-performance Fortran (HPF)-like languages to generate code for an adaptive environment. We present performance results for a Navier-Stokes solver and a multigrid template run on a network of workstations and an IBM SP-2. Our experiments show that if the number of processors is not varied frequently, the cost of data redistribution is not significant compared to the time required for the actual computation. Overall, our work establishes the feasibility of compiling HPF for a network of nondedicated workstations, which are likely to be an important resource for parallel programming in the future.


Author(s):  
Miaoqing Huang ◽  
Chenggang Lai ◽  
Xuan Shi ◽  
Zhijun Hao ◽  
Haihang You

Coprocessors based on the Intel Many Integrated Core (MIC) Architecture have been adopted in many high-performance computer clusters. Typical parallel programming models, such as MPI and OpenMP, are supported on MIC processors to achieve the parallelism. In this work, we conduct a detailed study on the performance and scalability of the MIC processors under different programming models using the Beacon computer cluster. Our findings are as follows. (1) The native MPI programming model on the MIC processors is typically better than the offload programming model, which offloads the workload to MIC cores using OpenMP. (2) On top of the native MPI programming model, multithreading inside each MPI process can further improve the performance for parallel applications on computer clusters with MIC coprocessors. (3) Given a fixed number of MPI processes, it is a good strategy to schedule these MPI processes to as few MIC processors as possible to reduce the cross-processor communication overhead. (4) The hybrid MPI programming model, in which data processing is distributed to both MIC cores and CPU cores, can outperform the native MPI programming model.


2018 ◽  
Vol 29 (2) ◽  
pp. 377-390 ◽  
Author(s):  
Fady Ghanim ◽  
Uzi Vishkin ◽  
Rajeev Barua

2012 ◽  
Vol 2012 ◽  
pp. 1-11 ◽  
Author(s):  
Oscar Montiel-Ross ◽  
Jorge Quiñones ◽  
Roberto Sepúlveda

This paper presents a methodology to integrate a fuzzy coprocessor described in VHDL (VHSIC Hardware Description Language) to a soft processor embedded into an FPGA, which increases the throughput of the whole system, since the controller uses parallelism at the circuitry level for high-speed-demanding applications, the rest of the application can be written in C/C++. We used the ARM 32-bit soft processor, which allows sequential and parallel programming. The FLC coprocessor incorporates a tuning method that allows to manipulate the system response. We show experimental results using a fuzzy PD+I controller as the embedded coprocessor.


Sign in / Sign up

Export Citation Format

Share Document