parallel pipelined
Recently Published Documents


TOTAL DOCUMENTS

146
(FIVE YEARS 15)

H-INDEX

15
(FIVE YEARS 1)

2021 ◽  
Vol 26 (2) ◽  
pp. 172-183
Author(s):  
E.S. Yanakova ◽  
◽  
G.T. Macharadze ◽  
L.G. Gagarina ◽  
A.A. Shvachko ◽  
...  

A turn from homogeneous to heterogeneous architectures permits to achieve the advantages of the efficiency, size, weight and power consumption, which is especially important for the built-in solutions. However, the development of the parallel software for heterogeneous computer systems is rather complex task due to the requirements of high efficiency, easy programming and the process of scaling. In the paper the efficiency of parallel-pipelined processing of video information in multiprocessor heterogeneous systems on a chip (SoC) such as DSP, GPU, ISP, VDP, VPU and others, has been investigated. A typical scheme of parallel-pipelined processing of video data using various accelerators has been presented. The scheme of the parallel-pipelined video data on heterogeneous SoC 1892VM248 has been developed. The methods of efficient parallel-pipelined processing of video data in heterogeneous computers (SoC), consisting of the operating system level, programming technologies level and the application level, have been proposed. A comparative analysis of the most common programming technologies, such as OpenCL, OpenMP, MPI, OpenAMP, has been performed. The analysis has shown that depend-ing on the device finite purpose two programming paradigms should be applied: based on OpenCL technology (for built-in system) and MPI technology (for inter-cell and inter processor interaction). The results obtained of the parallel-pipelined processing within the framework of the face recognition have confirmed the effectiveness of the chosen solutions.


Electronics ◽  
2021 ◽  
Vol 10 (5) ◽  
pp. 532
Author(s):  
Lan Huang ◽  
Teng Gao ◽  
Dalin Li ◽  
Zihao Wang ◽  
Kangping Wang

FPGA has recently played an increasingly important role in heterogeneous computing, but Register Transfer Level design flows are not only inefficient in design, but also require designers to be familiar with the circuit architecture. High-level synthesis (HLS) allows developers to design FPGA circuits more efficiently with a more familiar programming language, a higher level of abstraction, and automatic adaptation of timing constraints. When using HLS tools, such as Xilinx Vivado HLS, specific design patterns and techniques are required in order to create high-performance circuits. Moreover, designing efficient concurrency and data flow structures requires a deep understanding of the hardware, imposing more learning costs on programmers. In this paper, we propose a set of functional patterns libraries based on the MapReduce model, implemented by C++ templates, which can quickly implement high-performance parallel pipelined computing models on FPGA with specified simple parameters. The usage of this pattern library allows flexible adaptation of parallel and flow structures in algorithms, which greatly improves the coding efficiency. The contributions of this paper are as follows. (1) Four standard functional operators suitable for hardware parallel computing are defined. (2) Functional concurrent programming patterns are described based on C++ templates and Xilinx HLS. (3) The efficiency of this programming paradigm is verified with two algorithms with different complexity.


The high-throughput programmable Fast Fourier transform processor supports the usage of 2-stream 1024/2048/4096-point Fast Fourier Transforms and 1-to 4- stream 64/128-point Fast Fourier Transform for 4G,wireless local networks and for 5G.The proposed architecture which was designed is a well-intentionedfour-bank single-port SRAM which is being working in four-word data width, the design which is proposed gives us sixteen memory pathways . where the data is accessed up to this extent where it can be used in upcoming 5G. The radix-16 butterfly process element comprises of 2 cascaded parallel, pipelined radix-4 butterfly units which is specified. The projected memory-addressing methodology will effectively wear down single-port, merged-bank memory with high-radix process components. Comparing with typical memory based Fast Fourier Transform styles, the derived design has higher performance in expressions of area and power consumption. The architecture which is projected occupies the tiniest area of around1.21mm2 .The processor supports 1966MS/s 4096-point FFT and frequency of 1GHz.The Electronic design automation synthesis results show the power consumption is 32.16mW.The SQNR performance analysis is 42.14 dB.


Sign in / Sign up

Export Citation Format

Share Document