Using continuous statistical machine learning to enable high-speed performance prediction in hybrid instruction-/cycle-accurate instruction set simulators

Author(s):  
Daniel Christopher Powell ◽  
Björn Franke
2019 ◽  
Vol 13 (2) ◽  
pp. 174-180
Author(s):  
Poonam Sharma ◽  
Ashwani Kumar Dubey ◽  
Ayush Goyal

Background: With the growing demand of image processing and the use of Digital Signal Processors (DSP), the efficiency of the Multipliers and Accumulators has become a bottleneck to get through. We revised a few patents on an Application Specific Instruction Set Processor (ASIP), where the design considerations are proposed for application-specific computing in an efficient way to enhance the throughput. Objective: The study aims to develop and analyze a computationally efficient method to optimize the speed performance of MAC. Methods: The work presented here proposes the design of an Application Specific Instruction Set Processor, exploiting a Multiplier Accumulator integrated as the dedicated hardware. This MAC is optimized for high-speed performance and is the application-specific part of the processor; here it can be the DSP block of an image processor while a 16-bit Reduced Instruction Set Computer (RISC) processor core gives the flexibility to the design for any computing. The design was emulated on a Xilinx Field Programmable Gate Array (FPGA) and tested for various real-time computing. Results: The synthesis of the hardware logic on FPGA tools gave the operating frequencies of the legacy methods and the proposed method, the simulation of the logic verified the functionality. Conclusion: With the proposed method, a significant improvement of 16% increase in throughput has been observed for 256 steps iterations of multiplier and accumulators on an 8-bit sample data. Such an improvement can help in reducing the computation time in many digital signal processing applications where multiplication and addition are done iteratively.


Author(s):  
Gaurav Mattey ◽  
Lava Ranganathan

Abstract Critical speed path analysis using Dynamic Laser Stimulation (DLS) technique has been an indispensable technology used in the Semiconductor IC industry for identifying process defects, design and layout issues that limit product speed performance. Primarily by injecting heat or injecting photocurrent in the active diffusion of the transistors, the laser either slows down or speeds up the switching speed of transistors, thereby affecting the overall speed performance of the chip and revealing the speed limiting/enhancing circuits. However, recently on Qualcomm Technologies’ 14nm FinFET technology SOC product, the 1340nm laser’s heating characteristic revealed a Vt (threshold voltage) improvement behavior at low operating voltages which helped identify process issues on multiple memory array blocks across multiple cores failing for MBIST (Memory Built-in Self-test). In this paper, we explore the innovative approach of using the laser to study Vt shifts in transistors due to process issues. We also study the laser silicon interactions through scanning the 1340nm thermal laser on silicon and observing frequency shifts in a high-speed Ring Oscillator (RO) on 16nm FinFET technology. This revealed the normal and reverse Temperature Dependency Gate voltages for 16nm FinFET, thereby illustrating the dual nature of stimulation (reducing mobility and improving Vt) from a thermal laser. Frequency mapping through Laser Voltage Imaging (LVI) was performed on the Ring Oscillator (RO) using the 1340nm thermal laser, while concurrently stimulating the transistors of the RO. Spatial distribution of stimulation was studied by observing the frequency changes on LVI.


2021 ◽  
Vol 31 (2) ◽  
pp. 1-28
Author(s):  
Gopinath Chennupati ◽  
Nandakishore Santhi ◽  
Phill Romero ◽  
Stephan Eidenbenz

Hardware architectures become increasingly complex as the compute capabilities grow to exascale. We present the Analytical Memory Model with Pipelines (AMMP) of the Performance Prediction Toolkit (PPT). PPT-AMMP takes high-level source code and hardware architecture parameters as input and predicts runtime of that code on the target hardware platform, which is defined in the input parameters. PPT-AMMP transforms the code to an (architecture-independent) intermediate representation, then (i) analyzes the basic block structure of the code, (ii) processes architecture-independent virtual memory access patterns that it uses to build memory reuse distance distribution models for each basic block, and (iii) runs detailed basic-block level simulations to determine hardware pipeline usage. PPT-AMMP uses machine learning and regression techniques to build the prediction models based on small instances of the input code, then integrates into a higher-order discrete-event simulation model of PPT running on Simian PDES engine. We validate PPT-AMMP on four standard computational physics benchmarks and present a use case of hardware parameter sensitivity analysis to identify bottleneck hardware resources on different code inputs. We further extend PPT-AMMP to predict the performance of a scientific application code, namely, the radiation transport mini-app SNAP. To this end, we analyze multi-variate regression models that accurately predict the reuse profiles and the basic block counts. We validate predicted SNAP runtimes against actual measured times.


2021 ◽  
Vol 1045 (1) ◽  
pp. 012040
Author(s):  
Fahim Faisal ◽  
Mirza Muntasir Nishat ◽  
Sayka Afreen Mim ◽  
Hafsa Akter ◽  
Md. Rafid Kaysar Shagor
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document