A DESIGN METHODOLOGY FOR VERY LARGE ARRAY PROCESSORS—PART 1: GIPOP PROCESSOR ARRAY

Author(s):  
N. VENKATESWARAN ◽  
S. PATTABIRAMAN ◽  
R. DEVANATHAN ◽  
B. KUMARAN ◽  
ASHRAF AHMED ◽  
...  

Very Large Array Processors (VLAP) will be the need of the future for solving computationally intense Very Large Problems (VLP) common in pattern recognition, image processing and other related areas of digital signal processing. Design methodology of such VLAPs for massively parallel dedicated/general purpose applications is highly complex. Two companion papers (Part 1 and Part 2) on VLAP are presented in this issue. In Part 1, we propose a VLAP called Reconfigurable GIPOP Processor Array (RGPA). The RGPA is made up of high performance processing elements called the Generalized Inner Product Outer Product (GIPOP) processor. Unlike the traditional special/general purpose processors, ours has a totally different and new architecture and organization involving higher level functional units to match with the complex computational structures of numeric algorithms and suitable for massively parallel processing. We also present a strategy for mapping VLPs on VLAPs. In Part 2, we propose a novel VLSI design methodology for implementing cost effective and very high performance processors meant for special purpose applications and in particular, for VLAPs.

2014 ◽  
Vol 2014 ◽  
pp. 1-13 ◽  
Author(s):  
Mouna Baklouti ◽  
Mohamed Abid

To meet the high performance demands of embedded multimedia applications, embedded systems are integrating multiple processing units. However, they are mostly based on custom-logic design methodology. Designing parallel multicore systems using available standards intellectual properties yet maintaining high performance is also a challenging issue. Softcore processors and field programmable gate arrays (FPGAs) are a cheap and fast option to develop and test such systems. This paper describes a FPGA-based design methodology to implement a rapid prototype of parametric multicore systems. A study of the viability of making the SoC using the NIOS II soft-processor core from Altera is also presented. The NIOS II features a general-purpose RISC CPU architecture designed to address a wide range of applications. The performance of the implemented architecture is discussed, and also some parallel applications are used for testing speedup and efficiency of the system. Experimental results demonstrate the performance of the proposed multicore system, which achieves better speedup than the GPU (29.5% faster for the FIR filter and 23.6% faster for the matrix-matrix multiplication).


VLSI Design ◽  
1995 ◽  
Vol 2 (4) ◽  
pp. 305-314 ◽  
Author(s):  
Peter W. Thompson ◽  
Julian D. Lewis

High-performance parallel systems demand a high-performance interconnect so that their component parts can exchange data and synchronise efficiently. The interconnect must be cheap, and must also scale well in both performance and cost relative to the system size. In this paper we describe the rationale, architecture and operation of the STC104, the first commercially available, general-purpose interconnect chip. The serial protocols used by the device are described, followed by an overview of the microarchitecture, The operation of the fundamental block is outlined, including the response to error conditions. Chip-wide design issues and design methodology are discussed, and finally various aspects of performance are calculated.


2021 ◽  
Vol 247 ◽  
pp. 03016
Author(s):  
Raffi Yessayan ◽  
Yousry Y. Azmy ◽  
R. Joseph Zerr

The PIDOTS neutral particle transport code utilizes a red/black implementation of the Parallel Gauss-Seidel algorithm to solve the SN approximation of the neutron transport equation on 3D Cartesian meshes. PIDOTS is designed for execution on massively parallel platforms and is capable of using the full resources of modern, leadership class high performance computers. Initial testing revealed that some configurations of PIDOTS’s Integral Transport Matrix Method solver demonstrated unexpectedly poor parallel scaling. Work at Idaho and Los Alamos National Laboratories then revealed that this inefficiency was a result of the accumulation of high-cost latency events in the complex blocking communication networks employed during each PIDOTS iteration. That work explored the possibility of minimizing those inefficiencies while maintaining a blocking communications model. While significant speedups were obtained, it was shown that fully mitigating the problem on general-purpose platforms was highly unlikely for a blocking code. This work continues that analysis by implementing a deeply interleaved non-blocking communication model into PIDOTS. This new model benefits from the optimization work performed on the blocking model while also providing significant opportunities to overlap the remaining un-mitigated communication costs with computation. Additionally, our new approach is easily transferable to other similarly spatially decomposed codes. The resulting algorithm was tested on LANL’s Trinity system at up to 32,768 processors and was found at that processor count to effectively hide 100% of MPI communication cost – equivalently 20% of the red/black phase time. It is expected that the implemented interleaving algorithm can fully support far higher processor counts and completely hide communication costs up ~50% of total iteration time.


2014 ◽  
Vol 36 (4) ◽  
pp. 790-798
Author(s):  
Kai ZHANG ◽  
Shu-Ming CHEN ◽  
Yao-Hua WANG ◽  
Xi NING

2011 ◽  
Vol 28 (1) ◽  
pp. 1-14 ◽  
Author(s):  
W. van Straten ◽  
M. Bailes

Abstractdspsr is a high-performance, open-source, object-oriented, digital signal processing software library and application suite for use in radio pulsar astronomy. Written primarily in C++, the library implements an extensive range of modular algorithms that can optionally exploit both multiple-core processors and general-purpose graphics processing units. After over a decade of research and development, dspsr is now stable and in widespread use in the community. This paper presents a detailed description of its functionality, justification of major design decisions, analysis of phase-coherent dispersion removal algorithms, and demonstration of performance on some contemporary microprocessor architectures.


2020 ◽  
Vol 15 (S359) ◽  
pp. 347-349
Author(s):  
Carpes P. Hekatelyne ◽  
Thaisa Storchi-Bergmann

AbstractWe present Multi-Object Spectrograph (GMOS) Integral Field Unit (IFU), Hubble Space Telescope (HST) and Very Large Array (VLA) observations of the inner kpc of the OH Megamaser galaxy IRAS 11506-3851. In this work we discuss the kinematics and excitation of the gas as well as its radio emission. The HST images reveal an isolated spiral galaxy and the combination with the GMOS-IFU flux distributions allowed us to identify a partial ring of star-forming regions surrounding the nucleus with a radius of ≍500 pc. The emission-line ratios and excitation map reveal that the region inside the ring present mixed/transition excitation between those of Starbursts and Active Galactic Nuclei (AGN), while regions along the ring are excited by Starbursts. We suggest that we are probing a buried or fading AGN that could be both exciting the gas and originating an outflow.


Sign in / Sign up

Export Citation Format

Share Document