scholarly journals Configurable Multi-directional Systolic Array Architecture for Convolutional Neural Networks

2021 ◽  
Vol 18 (4) ◽  
pp. 1-24
Author(s):  
Rui Xu ◽  
Sheng Ma ◽  
Yaohua Wang ◽  
Xinhai Chen ◽  
Yang Guo

The systolic array architecture is one of the most popular choices for convolutional neural network hardware accelerators. The biggest advantage of the systolic array architecture is its simple and efficient design principle. Without complicated control and dataflow, hardware accelerators with the systolic array can calculate traditional convolution very efficiently. However, this advantage also brings new challenges to the systolic array. When computing special types of convolution, such as the small-scale convolution or depthwise convolution, the processing element (PE) utilization rate of the array decreases sharply. The main reason is that the simple architecture design limits the flexibility of the systolic array. In this article, we design a configurable multi-directional systolic array (CMSA) to address these issues. First, we added a data path to the systolic array. It allows users to split the systolic array through configuration to speed up the calculation of small-scale convolution. Second, we redesigned the PE unit so that the array has multiple data transmission modes and dataflow strategies. This allows users to switch the dataflow of the PE array to speed up the calculation of depthwise convolution. In addition, unlike other works, we only make a few changes and modifications to the existing systolic array architecture. It avoids additional hardware overheads and can be easily deployed in application scenarios that require small systolic arrays such as mobile terminals. Based on our evaluation, CMSA can increase the PE utilization rate by up to 1.6 times compared to the typical systolic array when running the last layers of ResNet-18. When running depthwise convolution in MobileNet, CMSA can increase the utilization rate by up to 14.8 times. At the same time, CMSA and the traditional systolic arrays are similar in area and energy consumption.

Author(s):  
Rui Xu ◽  
Sheng Ma ◽  
Yaohua Wang ◽  
Yang Guo ◽  
Dongsheng Li ◽  
...  

2011 ◽  
Vol 1 (2) ◽  
Author(s):  
Doru Chiper

AbstractA new VLSI algorithm and its associated systolic array architecture for a prime length type IV discrete cosine transform is presented. They represent the basis of an efficient design approach for deriving a linear systolic array architecture for type IV DCT. The proposed algorithm uses a regular computational structure called pseudoband correlation structure that is appropriate for a VLSI implementation. The proposed algorithm is then mapped onto a linear systolic array with a small number of I/O channels and low I/O bandwidth. The proposed architecture can be unified with that obtained for type IV DST due to a similar kernel. A highly efficient VLSI chip can be thus obtained with good performance in the architectural topology, computing parallelism, processing speed, hardware complexity and I/O costs similar to those obtained for circular correlation and cyclic convolution computational structures.


2021 ◽  
Vol 20 (5s) ◽  
pp. 1-20
Author(s):  
Hyungmin Cho

Depthwise convolutions are widely used in convolutional neural networks (CNNs) targeting mobile and embedded systems. Depthwise convolution layers reduce the computation loads and the number of parameters compared to the conventional convolution layers. Many deep neural network (DNN) accelerators adopt an architecture that exploits the high data-reuse factor of DNN computations, such as a systolic array. However, depthwise convolutions have low data-reuse factor and under-utilize the processing elements (PEs) in systolic arrays. In this paper, we present a DNN accelerator design called RiSA, which provides a novel mechanism that boosts the PE utilization for depthwise convolutions on a systolic array with minimal overheads. In addition, the PEs in systolic arrays can be efficiently used only if the data items ( tensors ) are arranged in the desired layout. Typical DNN accelerators provide various types of PE interconnects or additional modules to flexibly rearrange the data items and manage data movements during DNN computations. RiSA provides a lightweight set of tensor management tasks within the PE array itself that eliminates the need for an additional module for tensor reshaping tasks. Using this embedded tensor reshaping, RiSA supports various DNN models, including convolutional neural networks and natural language processing models while maintaining a high area efficiency. Compared to Eyeriss v2, RiSA improves the area and energy efficiency for MobileNet-V1 inference by 1.91× and 1.31×, respectively.


Energies ◽  
2020 ◽  
Vol 13 (18) ◽  
pp. 4903
Author(s):  
Yasutsugu Baba ◽  
Andante Hadi Pandyaswargo ◽  
Hiroshi Onoda

Forests cover two-thirds of Japan’s land area, and woody biomass is attracting attention as one of the most promising renewable energy sources in the country. The Feed-in Tariff (FIT) Act came into effect in 2012, and since then, woody biomass power generation has spread rapidly. Gasification power generation, which can generate electricity on a relatively small scale, has attracted a lot of attention. However, the technical issues of this technology remain poorly defined. This paper aims to clarify the problems of woody biomass gasification power generation in Japan, specifically on the challenges of improving energy utilization rate, the problem of controlling the moisture content, and the different performance of power generation facilities that uses different tree species. We also describe the technological development of a 2 MW updraft reactor for gasification and bio-oil coproduction to improve the energy utilization rate. The lower heating value of bio-oil, which was obtained in the experiment, was found to be about 70% of A-fuel oil. Among the results, the importance of controlling the moisture content of wood chips is identified from the measurement evaluation of a 0.36 MW-scale downdraft gasifier’s actual operation. We discuss the effects of tree species variation and ash on gasification power generation based on the results of pyrolysis analysis, industry analysis for each tree species. These results indicate the necessity of building a system specifically suited to Japan’s climate and forestry industry to allow woody biomass gasification power generation to become widespread in Japan.


2011 ◽  
Vol 4 (3) ◽  
pp. 835-844 ◽  
Author(s):  
P. Hanappe ◽  
A. Beurivé ◽  
F. Laguzet ◽  
L. Steels ◽  
N. Bellouin ◽  
...  

Abstract. We have optimised the atmospheric radiation algorithm of the FAMOUS climate model on several hardware platforms. The optimisation involved translating the Fortran code to C and restructuring the algorithm around the computation of a single air column. Instead of the existing MPI-based domain decomposition, we used a task queue and a thread pool to schedule the computation of individual columns on the available processors. Finally, four air columns are packed together in a single data structure and computed simultaneously using Single Instruction Multiple Data operations. The modified algorithm runs more than 50 times faster on the CELL's Synergistic Processing Element than on its main PowerPC processing element. On Intel-compatible processors, the new radiation code runs 4 times faster. On the tested graphics processor, using OpenCL, we find a speed-up of more than 2.5 times as compared to the original code on the main CPU. Because the radiation code takes more than 60 % of the total CPU time, FAMOUS executes more than twice as fast. Our version of the algorithm returns bit-wise identical results, which demonstrates the robustness of our approach. We estimate that this project required around two and a half man-years of work.


1990 ◽  
Vol 38 (8) ◽  
pp. 1310-1313 ◽  
Author(s):  
M. Ueno ◽  
K. Kawabata ◽  
T. Morooka

2015 ◽  
Vol 785 ◽  
pp. 310-314 ◽  
Author(s):  
Norzanah Rosmin ◽  
N.A. Rahman ◽  
A.H. Mustaamal

Vertical-Axis Wind Turbines (VAWTs) are known as the most suitable wind turbine for small-scale electrical generation. There are many types of VAWTs and each of it has different performances and efficiency. In this work, three types of VAWT systems (Savo-B2, Savo-B4 and Giro-B3) were designed, constructed and tested to investigate the amount of electrical power that could be generated under several constant wind speeds. The blade rotors were designed and built using 2 mm thickness of aluminum plate. The tip speed ratios, power coefficients, blade rotations for each blade rotor and the simplicity of the proposed designs were studied via an experimental setup. The experimental work demonstrates that Savo-B2 provides the highest power coefficient which is up to 0.32. Meanwhile, Giro-B3 offers the fastest rotational blade speed, up to 20.53 rad/s, among the three designs.


2001 ◽  
Vol 105 (1050) ◽  
pp. 419-426 ◽  
Author(s):  
G. Barakos ◽  
M. Vahdati ◽  
A.I. Sayma ◽  
C. Bréard ◽  
M. Imregun

Abstract This paper presents the development and validation of a parallel unsteady flow and aeroelasticity code for large-scale numerical models used in turbo machinery applications. The work is based on an existing unstructured Navier-Stokes solver developed over the past ten years by the Aeroelasticity Research Group at Imperial College Vibration University Technology Centre. The single-process multiple-data paradigm was adopted for the parallelisation of the solver and several validation cases were considered. The computational mesh was divided into several sub-sections using a domain decomposition technique. The performance and numerical accuracy of the parallel solver was validated across several computer platforms for various problem sizes. In cases where the solution could be obtained on a single CPU, the serial and parallel versions of the code were found to produce identical results. Studies on up to 32 CPUs showed varying levels of parallelisation efficiency, an almost linear speed-up being obtained in some cases. Finally, an industrial configuration, a 17 blade row turbine with a 47 million point mesh, was discussed to illustrate the potential of the proposed large-scale modelling methodology.


Sign in / Sign up

Export Citation Format

Share Document