Design of partially-asynchronous parallel processing elements for setting up Benes networks in O(log<inf>2</inf>N) time

Abstract. Dieser Beitrag behandelt die Abbildung eines videobasierten Verfahrens zur echtzeitfähigen Auswertung von Winkelhistogrammen auf eine modulare Coprozessor-Architektur. Die Architektur besteht aus mehreren dedizierten Recheneinheiten zur parallelen Verarbeitung rechenintensiver Bildverarbeitungsverfahren und ist mit einem RISC-Prozessor verbunden. Eine konfigurierbare Architekturerweiterung um eine Recheneinheit zur Auswertung von Winkelhistogrammen von Objekten ermöglicht in Verbindung mit dem RISC eine echtzeitfähige Klassifikation. Je nach Konfiguration sind für die Architekturerweiterung auf einem Xilinx Virtex-5-FPGA zwischen 3300 und 12 000 Lookup-Tables erforderlich. Bei einer Taktfrequenz von 100 MHz können unabhängig von der Bildauflösung pro Einzelbild in einem 25-Hz-Videodatenstrom bis zu 100 Objekte der Größe 256×256 Pixel analysiert werden. This paper presents the mapping of a video-based approach for real-time evaluation of angular histograms on a modular coprocessor architecture. The architecture comprises several dedicated processing elements for parallel processing of computation-intensive image processing tasks and is coupled with a RISC processor. A configurable architecture extension, especially a processing element for evaluating angular histograms of objects in conjunction with a RISC processor, provides a real-time classification. Depending on the configuration of the architecture extension, 3 300 to 12 000 look-up tables are required for a Xilinx Virtex-5 FPGA implementation. Running at a clock frequency of 100 MHz and independently of the image resolution per frame, 100 objects of size 256×256 pixels are analyzed in a 25 Hz video stream by the architecture.

Download Full-text

Architecture and circuit design of parallel processing elements for de novo sequence assembly

2013 IEEE International SOC Conference ◽

10.1109/socc.2013.6749659 ◽

2013 ◽

Author(s):

Yu-Long Huang ◽

Chun-Shen Liu ◽

Yu-Cheng Li ◽

Yi-Chang Lu

Keyword(s):

Parallel Processing ◽

Circuit Design ◽

De Novo ◽

Sequence Assembly ◽

Processing Elements ◽

De Novo Sequence Assembly

Download Full-text

A Viterbi decoder architecture based on parallel processing elements

[Proceedings] GLOBECOM '90: IEEE Global Telecommunications Conference and Exhibition ◽

10.1109/glocom.1990.116709 ◽

2002 ◽

Cited By ~ 2

Author(s):

S.R. Meier

Keyword(s):

Parallel Processing ◽

Viterbi Decoder ◽

Processing Elements ◽

Decoder Architecture

Download Full-text

Event-driven relaxation method based on asynchronous parallel processing

Systems and Computers in Japan ◽

10.1002/scj.4690170908 ◽

1986 ◽

Vol 17 (9) ◽

pp. 67-77 ◽

Cited By ~ 1

Author(s):

Toshikazu Kato ◽

Toshiyuki Sakai

Keyword(s):

Parallel Processing ◽

Relaxation Method ◽

Event Driven ◽

Asynchronous Parallel

Download Full-text

Determinacy problem in relaxation method-analysis of asynchronous parallel processing

Systems and Computers in Japan ◽

10.1002/scj.4690170610 ◽

1986 ◽

Vol 17 (6) ◽

pp. 85-94

Author(s):

Toshikazu Kato ◽

Koji Wakimoto ◽

Kosaku Inagaki ◽

Toshiyuki Sakai

Keyword(s):

Parallel Processing ◽

Relaxation Method ◽

Method Analysis ◽

Asynchronous Parallel

Download Full-text

Algorithms for asynchronous parallel processing of object-oriented databases

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/69.390252 ◽

1995 ◽

Vol 7 (3) ◽

pp. 487-504 ◽

Cited By ~ 10

Author(s):

A.K. Thakore ◽

S.Y.W. Su ◽

H.X. Lam

Keyword(s):

Parallel Processing ◽

Object Oriented ◽

Object Oriented Databases ◽

Asynchronous Parallel

Download Full-text

Distributed Computing for Signal Processing: Modeling of Asynchronous Parallel Computation. Appendix G. On the Design and Modeling of Special Purpose Parallel Processing Systems.

10.21236/ada167622 ◽

1985 ◽

Author(s):

Bradley W. Smith

Keyword(s):

Signal Processing ◽

Parallel Processing ◽

Distributed Computing ◽

Parallel Computation ◽

Asynchronous Parallel

Download Full-text

AREA-DELAY EFFICIENT FFT ARCHITECTURE USING PARALLEL PROCESSING AND NEW MEMORY SHARING TECHNIQUE

Journal of Circuits System and Computers ◽

10.1142/s021812661240018x ◽

2012 ◽

Vol 21 (06) ◽

pp. 1240018 ◽

Cited By ~ 5

Author(s):

YOUSRI OUERHANI ◽

MAHER JRIDI ◽

AYMAN ALFALOU

Keyword(s):

Parallel Processing ◽

Execution Time ◽

Area Ratio ◽

Experimental Results ◽

Ip Core ◽

Hardware Complexity ◽

Processing Elements ◽

Low Area ◽

Memory Sharing ◽

Delay Elements

In this paper we present a novel architecture for FFT implementation on FPGA. The proposed architecture based on radix-4 algorithm presents the advantage of a higher throughput and low area-delay product. In fact, the novelty consists on using a memory sharing and dividing technique along with parallel-in parallel-out Processing Elements (PE). The proposed architecture can perform N-point FFT using only 4/3N delay elements and involves a latency of N/4 cycles. Comparison in terms of hardware complexity and area-delay product with recent works presented in the literature and commercial IPs has been made to show the efficiency of the proposed design. Moreover, from the experimental results obtained from a FPGA prototype we find that the proposed design involves an execution time of 56% lower than that obtained with Xilinx IP core and an increase of 19% in the throughput by area ratio for 256-point FFT.

Download Full-text

Design of partially-asynchronous parallel processing elements for setting up Benes networks in O(log2N) time

Design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in Beneš networks

Low latency modular multiplication for public-key cryptosystems using a scalable array of parallel processing elements

Hardware-Abbildung eines videobasierten Verfahrens zur echtzeitfähigen Auswertung von Winkelhistogrammen auf eine modulare Coprozessor-Architektur

Architecture and circuit design of parallel processing elements for de novo sequence assembly

A Viterbi decoder architecture based on parallel processing elements

Event-driven relaxation method based on asynchronous parallel processing

Determinacy problem in relaxation method-analysis of asynchronous parallel processing

Algorithms for asynchronous parallel processing of object-oriented databases

Distributed Computing for Signal Processing: Modeling of Asynchronous Parallel Computation. Appendix G. On the Design and Modeling of Special Purpose Parallel Processing Systems.

AREA-DELAY EFFICIENT FFT ARCHITECTURE USING PARALLEL PROCESSING AND NEW MEMORY SHARING TECHNIQUE

Export Citation Format