A DESIGN METHODOLOGY FOR VERY LARGE ARRAY PROCESSORS—PART 1: GIPOP PROCESSOR ARRAY

Very Large Array Processors (VLAP) will be the need of the future for solving computationally intense Very Large Problems (VLP) common in pattern recognition, image processing and other related areas of digital signal processing. Design methodology of such VLAPs for massively parallel dedicated/general purpose applications is highly complex. Two companion papers (Part 1 and Part 2) on VLAP are presented in this issue. In Part 1, we propose a VLAP called Reconfigurable GIPOP Processor Array (RGPA). The RGPA is made up of high performance processing elements called the Generalized Inner Product Outer Product (GIPOP) processor. Unlike the traditional special/general purpose processors, ours has a totally different and new architecture and organization involving higher level functional units to match with the complex computational structures of numeric algorithms and suitable for massively parallel processing. We also present a strategy for mapping VLPs on VLAPs. In Part 2, we propose a novel VLSI design methodology for implementing cost effective and very high performance processors meant for special purpose applications and in particular, for VLAPs.

Download Full-text

A DESIGN METHODOLOGY FOR VERY LARGE ARRAY PROCESSORS—PART 1: GIPOP PROCESSOR ARRAY

VLSI and Parallel Computing for Pattern Recognition and Artificial Intelligence - Series in Machine Perception and Artificial Intelligence ◽

10.1142/9789812797766_0004 ◽

1995 ◽

pp. 59-90

Author(s):

N. VENKATESWARAN ◽

S. PATTABIRAMAN ◽

R. DEVANATHAN ◽

B. KUMARAN ◽

ASHRAF AHMED ◽

...

Keyword(s):

Design Methodology ◽

Processor Array ◽

Large Array ◽

Very Large Array ◽

Array Processors

Download Full-text

Multi-Softcore Architecture on FPGA

International Journal of Reconfigurable Computing ◽

10.1155/2014/979327 ◽

2014 ◽

Vol 2014 ◽

pp. 1-13 ◽

Cited By ~ 4

Author(s):

Mouna Baklouti ◽

Mohamed Abid

Keyword(s):

High Performance ◽

Design Methodology ◽

Matrix Multiplication ◽

Rapid Prototype ◽

General Purpose ◽

Parallel Applications ◽

Multicore Systems ◽

Processor Core ◽

Nios Ii ◽

Wide Range

To meet the high performance demands of embedded multimedia applications, embedded systems are integrating multiple processing units. However, they are mostly based on custom-logic design methodology. Designing parallel multicore systems using available standards intellectual properties yet maintaining high performance is also a challenging issue. Softcore processors and field programmable gate arrays (FPGAs) are a cheap and fast option to develop and test such systems. This paper describes a FPGA-based design methodology to implement a rapid prototype of parametric multicore systems. A study of the viability of making the SoC using the NIOS II soft-processor core from Altera is also presented. The NIOS II features a general-purpose RISC CPU architecture designed to address a wide range of applications. The performance of the implemented architecture is discussed, and also some parallel applications are used for testing speedup and efficiency of the system. Experimental results demonstrate the performance of the proposed multicore system, which achieves better speedup than the GPU (29.5% faster for the FIR filter and 23.6% faster for the matrix-matrix multiplication).

Download Full-text

The STC104 Packet Routing Chip

VLSI Design ◽

10.1155/1995/92096 ◽

1995 ◽

Vol 2 (4) ◽

pp. 305-314 ◽

Cited By ~ 8

Author(s):

Peter W. Thompson ◽

Julian D. Lewis

Keyword(s):

High Performance ◽

Design Methodology ◽

Parallel Systems ◽

System Size ◽

General Purpose ◽

Packet Routing ◽

Design Issues ◽

Exchange Data

High-performance parallel systems demand a high-performance interconnect so that their component parts can exchange data and synchronise efficiently. The interconnect must be cheap, and must also scale well in both performance and cost relative to the system size. In this paper we describe the rationale, architecture and operation of the STC104, the first commercially available, general-purpose interconnect chip. The serial protocols used by the device are described, followed by an overview of the microarchitecture, The operation of the fundamental block is outlined, including the response to error conditions. Chip-wide design issues and design methodology are discussed, and finally various aspects of performance are calculated.

Download Full-text

ITERATIVE AND PARALLEL PERFORMANCE ANALYSIS OF NON-BLOCKING COMMUNICATION ALGORITHMS IN THE MASSIVELY PARALLEL NEUTRON TRANSPORT CODE PIDOTS

EPJ Web of Conferences ◽

10.1051/epjconf/202124703016 ◽

2021 ◽

Vol 247 ◽

pp. 03016

Author(s):

Raffi Yessayan ◽

Yousry Y. Azmy ◽

R. Joseph Zerr

Keyword(s):

Communication Networks ◽

High Performance ◽

Neutron Transport ◽

General Purpose ◽

Massively Parallel ◽

Communication Model ◽

Neutron Transport Equation ◽

Communication Costs ◽

Communication Algorithms ◽

Transport Code

The PIDOTS neutral particle transport code utilizes a red/black implementation of the Parallel Gauss-Seidel algorithm to solve the SN approximation of the neutron transport equation on 3D Cartesian meshes. PIDOTS is designed for execution on massively parallel platforms and is capable of using the full resources of modern, leadership class high performance computers. Initial testing revealed that some configurations of PIDOTS’s Integral Transport Matrix Method solver demonstrated unexpectedly poor parallel scaling. Work at Idaho and Los Alamos National Laboratories then revealed that this inefficiency was a result of the accumulation of high-cost latency events in the complex blocking communication networks employed during each PIDOTS iteration. That work explored the possibility of minimizing those inefficiencies while maintaining a blocking communications model. While significant speedups were obtained, it was shown that fully mitigating the problem on general-purpose platforms was highly unlikely for a blocking code. This work continues that analysis by implementing a deeply interleaved non-blocking communication model into PIDOTS. This new model benefits from the optimization work performed on the blocking model while also providing significant opportunities to overlap the remaining un-mitigated communication costs with computation. Additionally, our new approach is easily transferable to other similarly spatially decomposed codes. The resulting algorithm was tested on LANL’s Trinity system at up to 32,768 processors and was found at that processor count to effectively hide 100% of MPI communication cost – equivalently 20% of the red/black phase time. It is expected that the implemented interleaving algorithm can fully support far higher processor counts and completely hide communication costs up ~50% of total iteration time.

Download Full-text

A DESIGN METHODOLGY FOR VERY LARGE ARRAY PROCESSORS—PART 2: PACUBE VLSI ARRAYS

VLSI and Parallel Computing for Pattern Recognition and Artificial Intelligence - Series in Machine Perception and Artificial Intelligence ◽

10.1142/9789812797766_0005 ◽

1995 ◽

pp. 91-129

Author(s):

N. VENKATESWARAN ◽

S. PATTABIRAMAN ◽

J. DESOUZA ◽

G. SRIRAM ◽

R. SRINIVASAN ◽

...

Keyword(s):

Large Array ◽

Very Large Array ◽

Array Processors

Download Full-text

High Performance Image Processing on a Massively Parallel Processor Array

2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools ◽

10.1109/dsd.2009.166 ◽

2009 ◽

Author(s):

Roberto R. Osorio ◽

César Diaz-Resco ◽

Javier D. Bruguera

Keyword(s):

Image Processing ◽

High Performance ◽

Massively Parallel ◽

Processor Array ◽

Parallel Processor ◽

Massively Parallel Processor

Download Full-text

Design Tradeoffs of High Performance DSPs for General-Purpose HPC

Chinese Journal of Computers ◽

10.3724/sp.j.1016.2013.00790 ◽

2014 ◽

Vol 36 (4) ◽

pp. 790-798

Author(s):

Kai ZHANG ◽

Shu-Ming CHEN ◽

Yao-Hua WANG ◽

Xi NING

Keyword(s):

High Performance ◽

General Purpose ◽

Design Tradeoffs

Download Full-text

The design of a high-performance image processor using the general systems design methodology

10.22215/etd/1985-12341 ◽

1985 ◽

Author(s):

John Beaton

Keyword(s):

High Performance ◽

Design Methodology ◽

Systems Design ◽

Image Processor ◽

General Systems

Download Full-text

DSPSR: Digital Signal Processing Software for Pulsar Astronomy

Publications of the Astronomical Society of Australia ◽

10.1071/as10021 ◽

2011 ◽

Vol 28 (1) ◽

pp. 1-14 ◽

Cited By ~ 172

Author(s):

W. van Straten ◽

M. Bailes

Keyword(s):

Signal Processing ◽

Digital Signal Processing ◽

Graphics Processing Units ◽

High Performance ◽

Digital Signal ◽

General Purpose ◽

Design Decisions ◽

Extensive Range ◽

Processing Software ◽

Graphics Processing

Abstractdspsr is a high-performance, open-source, object-oriented, digital signal processing software library and application suite for use in radio pulsar astronomy. Written primarily in C++, the library implements an extensive range of modular algorithms that can optionally exploit both multiple-core processors and general-purpose graphics processing units. After over a decade of research and development, dspsr is now stable and in widespread use in the community. This paper presents a detailed description of its functionality, justification of major design decisions, analysis of phase-coherent dispersion removal algorithms, and demonstration of performance on some contemporary microprocessor architectures.

Download Full-text

Multiwavelength analysis of OH Megamaser galaxies: The case of IRAS11506-3851

Proceedings of the International Astronomical Union ◽

10.1017/s1743921320004317 ◽

2020 ◽

Vol 15 (S359) ◽

pp. 347-349

Author(s):

Carpes P. Hekatelyne ◽

Thaisa Storchi-Bergmann

Keyword(s):

Active Galactic Nuclei ◽

Hubble Space Telescope ◽

Galactic Nuclei ◽

Large Array ◽

Very Large Array ◽

Star Forming ◽

Star Forming Regions ◽

Field Unit ◽

Mixed Transition ◽

Integral Field

AbstractWe present Multi-Object Spectrograph (GMOS) Integral Field Unit (IFU), Hubble Space Telescope (HST) and Very Large Array (VLA) observations of the inner kpc of the OH Megamaser galaxy IRAS 11506-3851. In this work we discuss the kinematics and excitation of the gas as well as its radio emission. The HST images reveal an isolated spiral galaxy and the combination with the GMOS-IFU flux distributions allowed us to identify a partial ring of star-forming regions surrounding the nucleus with a radius of ≍500 pc. The emission-line ratios and excitation map reveal that the region inside the ring present mixed/transition excitation between those of Starbursts and Active Galactic Nuclei (AGN), while regions along the ring are excited by Starbursts. We suggest that we are probing a buried or fading AGN that could be both exciting the gas and originating an outflow.

Download Full-text