A high-throughput scalable BNN accelerator with fully pipelined architecture

Author(s):  
Zhe Han ◽  
Jingfei Jiang ◽  
Jinwei Xu ◽  
Peng Zhang ◽  
Xiaoqiang Zhao ◽  
...  
2016 ◽  
Vol 25 (09) ◽  
pp. 1650113 ◽  
Author(s):  
Hadi Mardani Kamali ◽  
Shaahin Hessabi

Advanced Encryption Standard (AES) is the most popular symmetric encryption method, which encrypts streams of data by using symmetric keys. The current preferable AES architectures employ effective methods to achieve two important goals: protection against power analysis attacks and high-throughput. Based on a different architectural point of view, we implement a particular parallel architecture for the latter goal, which is capable of implementing a more efficient pipelining in field-programmable gate array (FPGA). In this regard, all intermediate registers which have a role for unrolling the main loop will be removed. Also, instead of unrolling the main loop of AES algorithm, we implement pipelining structure by replicating nonpipelined AES architectures and using an auto-assigner mechanism for each AES block. By implementing the new pipelined architecture, we achieve two valuable advantages: (a) solving single point of failure problem when one of the replicated parts is faulty and (b) deploying the proposed design as a fault tolerant AES architecture. In addition, we put emphasis on area optimization for all four AES main functions to reduce the overhead associated with AES block replication. The simulation results show that the maximum frequency of our proposed AES architecture is 675.62[Formula: see text]MHz, and for AES128 the throughput is 86.5[Formula: see text]Gbps which is 30.9% better than its closest existing competitor.


2017 ◽  
Vol 26 (11) ◽  
pp. 1750178 ◽  
Author(s):  
Atif Raza Jafri ◽  
Muhammad Najam ul Islam ◽  
Malik Imran ◽  
Muhammad Rashid

Applying unified formula while computing point addition and doubling provides immunity to Elliptic Curve Cryptography (ECC) against power analysis attacks (a type of side channel attack). One of the popular techniques providing this unifiedness is the Binary Huff Curves (BHC) which got attention in 2011. In this paper we are presenting highly optimized architectures to implement point multiplication (PM) on the standard NIST curves over [Formula: see text] and [Formula: see text] using BHC. To achieve a high throughput over area ratio, first of all, we have used a simplified arithmetic and logic unit. Secondly, we have reduced the time to compute PM through Double and Add algorithm. This is achieved by increasing the frequency of operation through a 2-stage pipelined architecture. The increase in clock cycles caused by consequent pipeline hazards is controlled through optimal scheduling of computations involved in PM. The synthesis results show that our designs can work up to a frequency of 377[Formula: see text]MHz on Xilinx Virtex 7 FPGA. Moreover, the overall throughput/area ratio achieved through the adopted approach is up to 20% higher while comparing with available state-of-the-art solutions.


2013 ◽  
Vol 3 (4) ◽  
Author(s):  
K. Rahimunnisa ◽  
P. Karthigaikumar ◽  
N. Christy ◽  
S. Kumar ◽  
J. Jayakumar

AbstractAs the technology is growing day by day, information security plays a very important role in our lives. In order to protect the information, several cryptographic algorithms have been proposed. The aim of this paper is to present an effective Advanced Encryption Standard (AES) architecture to achieve high throughput for security applications. The Parallel Sub-Pipelined architecture (PSP) is proposed in order to obtain high throughput. The proposed architecture is also compared with loop unrolled, pipelined, sub-pipelined, parallel and parallel pipelined architecture in terms of throughput. The AES algorithm using Parallel Sub-Pipelined architecture was prototyped in FPGA (Field Programmable Gate Array) and ASIC (Application Specific Integrated Circuit).The proposed architecture yielded a throughput of 59.59 Gbps at a frequency of 450.045 MHz on FPGA Virtex XC6VLX75T which is higher than the throughput yielded in other architectures. In ASIC 0.13 µm technology, the proposed architecture yielded a throughput of 25.60 Gbps and in 0.18 µm, it yielded a throughput of 20.56 Gbps.


Author(s):  
TONY TSANG

This paper presents a high-throughput memory efficient decoder for low density parity check (LDPC) codes in the high-rate wireless personal area network application. The novel techniques which can apply to our selected LDPC code is proposed, including parallel blocked layered decoding architecture and simplification of the WiGig networks. State-of-the-art flexible LDPC decoders cannot simultaneously achieve the high throughput mandated by these standards and the low power needed for mobile applications. This work develops a flexible, fully pipelined architecture for the IEEE 802.11ad standard capable of achieving both goals. We use Real Time–Performance Evaluation Process Algebra (RT-PEPA) to evaluate a typical LDPC Decoder system's performance. The approach is more convenient, flexible, and lower cost than the former simulation method which needs to develop special hardware and software tools. Moreover, we can easily analyze how changes in performance depend on changes in a particular mode by supplying ranges for parameter values.


2013 ◽  
Vol 100 (8) ◽  
pp. 1033-1045 ◽  
Author(s):  
Meeturani Sharma ◽  
Honey Durga Tiwari ◽  
Yong Beom Cho

Sign in / Sign up

Export Citation Format

Share Document