scholarly journals An Application-Specific VLIW Processor with Vector Instruction Set for CNN Acceleration

Author(s):  
Andreas Bytyn ◽  
Rainer Leupers ◽  
Gerd Ascheid
2019 ◽  
Vol 13 (2) ◽  
pp. 174-180
Author(s):  
Poonam Sharma ◽  
Ashwani Kumar Dubey ◽  
Ayush Goyal

Background: With the growing demand of image processing and the use of Digital Signal Processors (DSP), the efficiency of the Multipliers and Accumulators has become a bottleneck to get through. We revised a few patents on an Application Specific Instruction Set Processor (ASIP), where the design considerations are proposed for application-specific computing in an efficient way to enhance the throughput. Objective: The study aims to develop and analyze a computationally efficient method to optimize the speed performance of MAC. Methods: The work presented here proposes the design of an Application Specific Instruction Set Processor, exploiting a Multiplier Accumulator integrated as the dedicated hardware. This MAC is optimized for high-speed performance and is the application-specific part of the processor; here it can be the DSP block of an image processor while a 16-bit Reduced Instruction Set Computer (RISC) processor core gives the flexibility to the design for any computing. The design was emulated on a Xilinx Field Programmable Gate Array (FPGA) and tested for various real-time computing. Results: The synthesis of the hardware logic on FPGA tools gave the operating frequencies of the legacy methods and the proposed method, the simulation of the logic verified the functionality. Conclusion: With the proposed method, a significant improvement of 16% increase in throughput has been observed for 256 steps iterations of multiplier and accumulators on an 8-bit sample data. Such an improvement can help in reducing the computation time in many digital signal processing applications where multiplication and addition are done iteratively.


2017 ◽  
Vol 66 (4) ◽  
pp. 647-660 ◽  
Author(s):  
Tuo Li ◽  
Muhammad Shafique ◽  
Jude Angelo Ambrose ◽  
Jorg Henkel ◽  
Sri Parameswaran

2021 ◽  
Vol 11 (18) ◽  
pp. 8379
Author(s):  
Seongmin Kim

A recent innovation in the trusted execution environment (TEE) technologies enables the delegation of privacy-preserving computation to the cloud system. In particular, Intel SGX, an extension of x86 instruction set architecture (ISA), accelerates this trend by offering hardware-protected isolation with near-native performance. However, SGX inherently suffers from performance degradation depending on the workload characteristics due to the hardware restriction and design decisions that primarily concern the security guarantee. The system-level optimizations on SGX runtime and kernel module have been proposed to resolve this, but they cannot effectively reflect application-specific characteristics that largely impact the performance of legacy SGX applications. This work presents an optimization strategy to achieve application-level optimization by utilizing asynchronous switchless calls to reduce enclave transition, one of the dominant overheads of using SGX. Based on the systematic analysis, our methodology examines the performance benefit for each enclave transition wrapper and selectively applies switchless calls without modifying the legacy codebases. The evaluation shows that our optimization strategy successfully improves the end-to-end performance of our showcasing application, an SGX-enabled network middlebox.


Sign in / Sign up

Export Citation Format

Share Document