IOb-Cache: A High-Performance Configurable Open-Source Cache

Open-source processors are increasingly being adopted by the industry, which requires all sorts of open-source implementations of peripherals and other system-on-chip modules. Despite the recent advent of open-source hardware, the available open-source caches have low configurability, limited lack of support for single-cycle pipelined memory accesses, and use non-standard hardware interfaces. In this paper, the IObundle cache (IOb-Cache), a high-performance configurable open-source cache is proposed, developed and deployed. The cache has front-end and back-end modules for fast integration with processors and memory controllers. The front-end module supports the native interface, and the back-end module supports the native interface and the standard Advanced eXtensible Interface (AXI). The cache is highly configurable in structure and access policies. The back-end can be configured to read bursts of multiple words per transfer to take advantage of the available memory bandwidth. To the best of our knowledge, IOb-Cache is currently the only configurable cache that supports pipelined Central Processing Unit (CPU) interfaces and AXI memory bus interface. Additionally, it has a write-through buffer and an independent controller for fast, most of the time 1-cycle writing together with 1-cycle reading, while previous works only support 1-cycle reading. This allows the best clocks-per-Instruction (CPI) to be close to one (1.055). IOb-Cache is integrated into IOb System-on-Chip (IOb-SoC) Github repository, which has 29 stars and is already being used in 50 projects (forks).

Download Full-text

An Efficient Cache Organization for On-Chip Multiprocessor Networks

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v5i3.pp503-517 ◽

2015 ◽

Vol 5 (3) ◽

pp. 503

Author(s):

Medhat Awadalla ◽

Ahmed M. Sadek

Keyword(s):

High Performance ◽

Cache Coherence ◽

Low Cost ◽

System On Chip ◽

Cache Memory ◽

Processing Unit ◽

Single Chip ◽

Chip Area ◽

Shared Cache ◽

On Chip

To meet the growing computation-intensive applications and the needs of low-power, high-performance systems, the number of computing resources in single-chip has enormously increased. By adding many computing resources to build a system in System-on-Chip, its interconnection between each other becomes another challenging issue. In most System-on-Chip applications, a shared bus interconnection which needs an arbitration logic to serialize several bus access requests, is adopted to communicate with each integrated processing unit because of its low-cost and simple control characteristics. This paper focuses on the interconnection design issues of area, power and performance of chip multi-processors with shared cache memory. It shows that having shared cache memory contributes to the performance improvement, however, typical interconnection between cores and the shared cache using crossbar occupies most of the chip area, consumes a lot of power and does not scale efficiently with increased number of cores. New interconnection mechanisms are needed to address these issues. This paper proposes an architectural paradigm in an attempt to gain the advantages of having shared cache with the avoidance of penalty imposed by the crossbar interconnect. The proposed architecture achieves smaller area occupation allowing more space to add additional cache memory. It also reduces power consumption compared to the existing crossbar architecture. Furthermore, the paper presents a modified cache coherence algorithm called Tuned-MESI. It is based on the typical MESI cache coherence algorithm however it is tuned and tailored for the suggested architecture. The achieved results of the conducted simulated experiments show that the developed architecture produces less broadcast operations compared to the typical algorithm.

Download Full-text

Low-Process–Voltage–Temperature-Sensitivity Multi-Stage Timing Monitor for System-on-Chip Applications

Electronics ◽

10.3390/electronics10131587 ◽

2021 ◽

Vol 10 (13) ◽

pp. 1587

Author(s):

Duo Sheng ◽

Hsueh-Ru Lin ◽

Li Tai

Keyword(s):

High Performance ◽

Power Reduction ◽

System On Chip ◽

Timing Information ◽

Multi Stage ◽

Dynamic Voltage ◽

And Performance ◽

On Chip ◽

Maximum Measurement ◽

Maximum Measurement Error

High performance and complex system-on-chip (SoC) design require a throughput and stable timing monitor to reduce the impacts of uncertain timing and implement the dynamic voltage and frequency scaling (DVFS) scheme for overall power reduction. This paper presents a multi-stage timing monitor, combining three timing-monitoring stages to achieve a high timing-monitoring resolution and a wide timing-monitoring range simultaneously. Additionally, because the proposed timing monitor has high immunity to the process–voltage–temperature (PVT) variation, it provides a more stable time-monitoring results. The time-monitoring resolution and range of the proposed timing monitor are 47 ps and 2.2 µs, respectively, and the maximum measurement error is 0.06%. Therefore, the proposed multi-stage timing monitor provides not only the timing information of the specified signals to maintain the functionality and performance of the SoC, but also makes the operation of the DVFS scheme more efficient and accurate in SoC design.

Download Full-text

A high performance scalable fuzzy based modified Asymmetric Heterogene Multiprocessor System on Chip (AHt-MPSOC) reconfigurable architecture

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189737 ◽

2021 ◽

pp. 1-12

Author(s):

Arun Prasath Raveendran ◽

Jafar A. Alzubi ◽

Ramesh Sekaran ◽

Manikandan Ramachandran

Keyword(s):

High Performance ◽

Standard Technique ◽

System On Chip ◽

Mixed Integer ◽

Multiprocessor System ◽

Compound System ◽

Available Bandwidth ◽

Mip Model ◽

Fpga Chip ◽

On Chip

This Ensuing generation of FPGA circuit tolerates the combination of lot of hard and soft cores as well as devoted accelerators on a chip. The Heterogene Multi-Processor System-on-Chip (Ht-MPSoC) architecture accomplishes the requirement of modern applications. A compound System on Chip (SoC) system designed for single FPGA chip, and that considered for the performance/power consumption ratio. In the existing method, a FPGA based Mixed Integer Programming (MIP) model used to define the Ht-MPSoC configuration by taking into consideration the sharing hardware accelerator between the cores. However, here, the sharing method differs from one processor to another based on FPGA architecture. Hence, high number of hardware resources on a single FPGA chip with low latency and power targeted. For this reason, a fuzzy based MIP and Graph theory based Traffic Estimator (GTE) are proposed system used to define New asymmetric multiprocessor heterogene framework on microprocessor (AHt-MPSoC) architecture. The bandwidths, energy consumption, wait and transmission range are better accomplished in this suggested technique than the standard technique and it is also implemented with a multi-task framework. The new Fuzzy control-based AHt-MPSoC analysis proves significant improvement of 14.7 percent in available bandwidth and 89.8 percent of energy minimized to various traffic scenarios as compared to conventional method.

Download Full-text

Low Power System-on-Chip Platform Architecture for High Performance Applications

The Kluwer International Series in Engineering and Computer Science - System-on-Chip for Real-Time Applications ◽

10.1007/978-1-4615-0351-4_32 ◽

2003 ◽

pp. 349-356

Author(s):

W.-C. Lo ◽

A. T. Erdogan ◽

T. Arslan

Keyword(s):

Power System ◽

Low Power ◽

High Performance ◽

System On Chip ◽

Platform Architecture ◽

Low Power System ◽

On Chip

Download Full-text

Design of Router Supporting Multiply Routing Algorithm for NoC

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.981.431 ◽

2014 ◽

Vol 981 ◽

pp. 431-434

Author(s):

Zhan Peng Jiang ◽

Rui Xu ◽

Chang Chun Dong ◽

Lin Hai Cui

Keyword(s):

Complex System ◽

High Performance ◽

Routing Algorithm ◽

Network On Chip ◽

System On Chip ◽

Low Latency ◽

Deterministic Routing ◽

Key Features ◽

Design Challenge ◽

On Chip

Network on Chip(NoC)，a new proposed solution to solve global communication problem in complex System on Chip (SoC) design，has absorbed more and more researchers to do research in this area. Due to some distinct characteristics, NoC is different from both traditional off-chip network and traditional on-chip bus，and is facing with the huge design challenge. NoC router design is one of the most important issues in NoC system. The paper present a high-performance, low-latency two-stage pipelined router architecture suitable for NoC designs and providing a solution to irregular 2Dmesh topology for NoC. The key features of the proposed Mix Router are its suitability for 2Dmesh NoC topology and its capability of suorting both full-adaptive routing and deterministic routing algorithm.

Download Full-text

Efficient parallelization of SPH algorithm on modern multi-core CPUs and massively parallel GPUs

International Journal of Modeling Simulation and Scientific Computing ◽

10.1142/s1793962321500549 ◽

2021 ◽

pp. 2150054

Author(s):

Pravin Jagtap ◽

Rupesh Nasre ◽

V. S. Sanapala ◽

B. S. V. Patnaik

Keyword(s):

High Performance ◽

Performance Metrics ◽

Computational Simulation ◽

Massively Parallel ◽

Benchmark Problems ◽

Processing Unit ◽

Central Processing ◽

Neighbor Search ◽

Computational Performance ◽

Sph Algorithm

Smoothed Particle Hydrodynamics (SPH) is fast emerging as a practically useful computational simulation tool for a wide variety of engineering problems. SPH is also gaining popularity as the back bone for fast and realistic animations in graphics and video games. The Lagrangian and mesh-free nature of the method facilitates fast and accurate simulation of material deformation, interface capture, etc. Typically, particle-based methods would necessitate particle search and locate algorithms to be implemented efficiently, as continuous creation of neighbor particle lists is a computationally expensive step. Hence, it is advantageous to implement SPH, on modern multi-core platforms with the help of High-Performance Computing (HPC) tools. In this work, the computational performance of an SPH algorithm is assessed on multi-core Central Processing Unit (CPU) as well as massively parallel General Purpose Graphical Processing Units (GP-GPU). Parallelizing SPH faces several challenges such as, scalability of the neighbor search process, force calculations, minimizing thread divergence, achieving coalesced memory access patterns, balancing workload, ensuring optimum use of computational resources, etc. While addressing some of these challenges, detailed analysis of performance metrics such as speedup, global load efficiency, global store efficiency, warp execution efficiency, occupancy, etc. is evaluated. The OpenMP and Compute Unified Device Architecture[Formula: see text] parallel programming models have been used for parallel computing on Intel Xeon[Formula: see text] E5-[Formula: see text] multi-core CPU and NVIDIA Quadro M[Formula: see text] and NVIDIA Tesla p[Formula: see text] massively parallel GPU architectures. Standard benchmark problems from the Computational Fluid Dynamics (CFD) literature are chosen for the validation. The key concern of how to identify a suitable architecture for mesh-less methods which essentially require heavy workload of neighbor search and evaluation of local force fields from neighbor interactions is addressed.

Download Full-text

SeisNoise.jl: Ambient Seismic Noise Cross Correlation on the CPU and GPU in Julia

Seismological Research Letters ◽

10.1785/0220200192 ◽

2020 ◽

Vol 92 (1) ◽

pp. 517-527

Author(s):

Timothy Clements ◽

Marine A. Denolle

Keyword(s):

Seismic Noise ◽

High Performance ◽

Cross Correlation ◽

Graphic Processing Unit ◽

Ambient Seismic Noise ◽

Processing Unit ◽

Central Processing ◽

And Performance ◽

Noise Cross Correlation ◽

Performance Computing

Abstract We introduce SeisNoise.jl, a library for high-performance ambient seismic noise cross correlation, written entirely in the computing language Julia. Julia is a new language, with syntax and a learning curve similar to MATLAB (see Data and Resources), R, or Python and performance close to Fortran or C. SeisNoise.jl is compatible with high-performance computing resources, using both the central processing unit and the graphic processing unit. SeisNoise.jl is a modular toolbox, giving researchers common tools and data structures to design custom ambient seismic cross-correlation workflows in Julia.

Download Full-text