Using continuous statistical machine learning to enable high-speed performance prediction in hybrid instruction-/cycle-accurate instruction set simulators

Background: With the growing demand of image processing and the use of Digital Signal Processors (DSP), the efficiency of the Multipliers and Accumulators has become a bottleneck to get through. We revised a few patents on an Application Specific Instruction Set Processor (ASIP), where the design considerations are proposed for application-specific computing in an efficient way to enhance the throughput. Objective: The study aims to develop and analyze a computationally efficient method to optimize the speed performance of MAC. Methods: The work presented here proposes the design of an Application Specific Instruction Set Processor, exploiting a Multiplier Accumulator integrated as the dedicated hardware. This MAC is optimized for high-speed performance and is the application-specific part of the processor; here it can be the DSP block of an image processor while a 16-bit Reduced Instruction Set Computer (RISC) processor core gives the flexibility to the design for any computing. The design was emulated on a Xilinx Field Programmable Gate Array (FPGA) and tested for various real-time computing. Results: The synthesis of the hardware logic on FPGA tools gave the operating frequencies of the legacy methods and the proposed method, the simulation of the logic verified the functionality. Conclusion: With the proposed method, a significant improvement of 16% increase in throughput has been observed for 256 steps iterations of multiplier and accumulators on an 8-bit sample data. Such an improvement can help in reducing the computation time in many digital signal processing applications where multiplication and addition are done iteratively.

Download Full-text

Accurate high-speed performance prediction for full differential current-mode logic: the effect of dielectric anisotropy

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ◽

10.1109/43.743736 ◽

1999 ◽

Vol 18 (2) ◽

pp. 212-219 ◽

Cited By ~ 3

Author(s):

A. Garg ◽

Y.L. Le Coz ◽

H.J. Greub ◽

R.B. Iverson ◽

R.F. Philhower ◽

...

Keyword(s):

Performance Prediction ◽

High Speed ◽

Current Mode ◽

Dielectric Anisotropy ◽

Current Mode Logic ◽

Speed Performance ◽

Differential Current

Download Full-text

Laser Silicon Interaction Study on Ring Oscillators to Establish Temperature Voltage Dependency Trends on 14 nm and Future FinFET Technologies

ISTFA 2017: Conference Proceedings from the 43rd International Symposium for Testing and Failure Analysis ◽

10.31399/asm.cp.istfa2017p0214 ◽

2017 ◽

Author(s):

Gaurav Mattey ◽

Lava Ranganathan

Keyword(s):

High Speed ◽

Laser Frequency ◽

Ring Oscillator ◽

Switching Speed ◽

Dual Nature ◽

Voltage Imaging ◽

Multiple Cores ◽

Speed Performance ◽

Self Test ◽

Frequency Changes

Abstract Critical speed path analysis using Dynamic Laser Stimulation (DLS) technique has been an indispensable technology used in the Semiconductor IC industry for identifying process defects, design and layout issues that limit product speed performance. Primarily by injecting heat or injecting photocurrent in the active diffusion of the transistors, the laser either slows down or speeds up the switching speed of transistors, thereby affecting the overall speed performance of the chip and revealing the speed limiting/enhancing circuits. However, recently on Qualcomm Technologies’ 14nm FinFET technology SOC product, the 1340nm laser’s heating characteristic revealed a Vt (threshold voltage) improvement behavior at low operating voltages which helped identify process issues on multiple memory array blocks across multiple cores failing for MBIST (Memory Built-in Self-test). In this paper, we explore the innovative approach of using the laser to study Vt shifts in transistors due to process issues. We also study the laser silicon interactions through scanning the 1340nm thermal laser on silicon and observing frequency shifts in a high-speed Ring Oscillator (RO) on 16nm FinFET technology. This revealed the normal and reverse Temperature Dependency Gate voltages for 16nm FinFET, thereby illustrating the dual nature of stimulation (reducing mobility and improving Vt) from a thermal laser. Frequency mapping through Laser Voltage Imaging (LVI) was performed on the Ring Oscillator (RO) using the 1340nm thermal laser, while concurrently stimulating the transistors of the RO. Spatial distribution of stimulation was studied by observing the frequency changes on LVI.

Download Full-text

Derivative Free Optimization of Complex Systems with the Use of Statistical Machine Learning Models

10.21236/ada622645 ◽

2015 ◽

Author(s):

Katya Scheinberg

Keyword(s):

Machine Learning ◽

Complex Systems ◽

Learning Models ◽

Statistical Machine Learning ◽

Derivative Free Optimization ◽

Derivative Free ◽

Machine Learning Models

Download Full-text

Machine Learning–enabled Scalable Performance Prediction of Scientific Codes

ACM Transactions on Modeling and Computer Simulation ◽

10.1145/3450264 ◽

2021 ◽

Vol 31 (2) ◽

pp. 1-28

Author(s):

Gopinath Chennupati ◽

Nandakishore Santhi ◽

Phill Romero ◽

Stephan Eidenbenz

Keyword(s):

Machine Learning ◽

Performance Prediction ◽

Prediction Models ◽

Radiation Transport ◽

Discrete Event ◽

Basic Block ◽

Distribution Models ◽

Scientific Application ◽

High Level ◽

Access Patterns

Hardware architectures become increasingly complex as the compute capabilities grow to exascale. We present the Analytical Memory Model with Pipelines (AMMP) of the Performance Prediction Toolkit (PPT). PPT-AMMP takes high-level source code and hardware architecture parameters as input and predicts runtime of that code on the target hardware platform, which is defined in the input parameters. PPT-AMMP transforms the code to an (architecture-independent) intermediate representation, then (i) analyzes the basic block structure of the code, (ii) processes architecture-independent virtual memory access patterns that it uses to build memory reuse distance distribution models for each basic block, and (iii) runs detailed basic-block level simulations to determine hardware pipeline usage. PPT-AMMP uses machine learning and regression techniques to build the prediction models based on small instances of the input code, then integrates into a higher-order discrete-event simulation model of PPT running on Simian PDES engine. We validate PPT-AMMP on four standard computational physics benchmarks and present a use case of hardware parameter sensitivity analysis to identify bottleneck hardware resources on different code inputs. We further extend PPT-AMMP to predict the performance of a scientific application code, namely, the radiation transport mini-app SNAP. To this end, we analyze multi-variate regression models that accurately predict the reuse profiles and the basic block counts. We validate predicted SNAP runtimes against actual measured times.

Download Full-text