Design of partially-asynchronous parallel processing elements for setting up Benes networks in O(log2N) time

Author(s):  
Y. Kai ◽  
K. Hamada ◽  
Y. Miao ◽  
H. Obara
2010 ◽  
Vol 8 ◽  
pp. 135-142 ◽  
Author(s):  
H. Flatt ◽  
A. Tarnowsky ◽  
H. Blume ◽  
P. Pirsch

Abstract. Dieser Beitrag behandelt die Abbildung eines videobasierten Verfahrens zur echtzeitfähigen Auswertung von Winkelhistogrammen auf eine modulare Coprozessor-Architektur. Die Architektur besteht aus mehreren dedizierten Recheneinheiten zur parallelen Verarbeitung rechenintensiver Bildverarbeitungsverfahren und ist mit einem RISC-Prozessor verbunden. Eine konfigurierbare Architekturerweiterung um eine Recheneinheit zur Auswertung von Winkelhistogrammen von Objekten ermöglicht in Verbindung mit dem RISC eine echtzeitfähige Klassifikation. Je nach Konfiguration sind für die Architekturerweiterung auf einem Xilinx Virtex-5-FPGA zwischen 3300 und 12 000 Lookup-Tables erforderlich. Bei einer Taktfrequenz von 100 MHz können unabhängig von der Bildauflösung pro Einzelbild in einem 25-Hz-Videodatenstrom bis zu 100 Objekte der Größe 256×256 Pixel analysiert werden. This paper presents the mapping of a video-based approach for real-time evaluation of angular histograms on a modular coprocessor architecture. The architecture comprises several dedicated processing elements for parallel processing of computation-intensive image processing tasks and is coupled with a RISC processor. A configurable architecture extension, especially a processing element for evaluating angular histograms of objects in conjunction with a RISC processor, provides a real-time classification. Depending on the configuration of the architecture extension, 3 300 to 12 000 look-up tables are required for a Xilinx Virtex-5 FPGA implementation. Running at a clock frequency of 100 MHz and independently of the image resolution per frame, 100 objects of size 256×256 pixels are analyzed in a 25 Hz video stream by the architecture.


1986 ◽  
Vol 17 (6) ◽  
pp. 85-94
Author(s):  
Toshikazu Kato ◽  
Koji Wakimoto ◽  
Kosaku Inagaki ◽  
Toshiyuki Sakai

2012 ◽  
Vol 21 (06) ◽  
pp. 1240018 ◽  
Author(s):  
YOUSRI OUERHANI ◽  
MAHER JRIDI ◽  
AYMAN ALFALOU

In this paper we present a novel architecture for FFT implementation on FPGA. The proposed architecture based on radix-4 algorithm presents the advantage of a higher throughput and low area-delay product. In fact, the novelty consists on using a memory sharing and dividing technique along with parallel-in parallel-out Processing Elements (PE). The proposed architecture can perform N-point FFT using only 4/3N delay elements and involves a latency of N/4 cycles. Comparison in terms of hardware complexity and area-delay product with recent works presented in the literature and commercial IPs has been made to show the efficiency of the proposed design. Moreover, from the experimental results obtained from a FPGA prototype we find that the proposed design involves an execution time of 56% lower than that obtained with Xilinx IP core and an increase of 19% in the throughput by area ratio for 256-point FFT.


Sign in / Sign up

Export Citation Format

Share Document