DESIGN OF ON-CHIP BUS OCP PROTOCOL WITH BUS FUNCTIONALITIES

The need for on-chip bus protocols are increased drastically for efficient and lossless communication among large number of IP cores of SOC design. This paper proposes a high-performance, highly scalable, busindependent interface between IP cores named as Open Core Protocol-International partnership. The Open Core Protocol (OCP) is a core centric point to point protocol which provides lossless communication and reduces design time, design risk, and manufacturing costs for SOC designs . Main property of OCP is that it can be configured with respect to the application required. The OCP is chosen because of its advanced supporting features such as configurable sideband control signaling and test harness signals, when compared to other core protocols. The OCP defines a pointto-point interface between two communicating entities such as IP cores and bus interface modules. One entity acts as the master of the OCP instance, and the other as the slave .In this paper, the most efficient bus architecture was adopted to support most advanced bus functionalities including simple transactions, burst transactions, pipelined transactions, and out-of-order transactions with respect to its suitable application in the real time product. The Open Core Protocol (OCP) was designed and the hardware modeling for that architecture was done using VHDL. This design is Simulated and Synthesized. An experimental result shows the efficiency of the proposed bus architecture and interface.

Download Full-text

Compiler-directed scratchpad memory data transfer optimization for multithreaded applications on a heterogeneous many-core architecture

The Journal of Supercomputing ◽

10.1007/s11227-021-03853-x ◽

2021 ◽

Author(s):

Xiaohan Tao ◽

Jianmin Pang ◽

Jinlong Xu ◽

Yu Zhu

Keyword(s):

Energy Consumption ◽

High Performance ◽

Scientific Computing ◽

Data Transfer ◽

Performance Model ◽

Experimental Result ◽

Transfer Model ◽

Scratchpad Memory ◽

On Chip ◽

Many Core

AbstractThe heterogeneous many-core architecture plays an important role in the fields of high-performance computing and scientific computing. It uses accelerator cores with on-chip memories to improve performance and reduce energy consumption. Scratchpad memory (SPM) is a kind of fast on-chip memory with lower energy consumption compared with a hardware cache. However, data transfer between SPM and off-chip memory can be managed only by a programmer or compiler. In this paper, we propose a compiler-directed multithreaded SPM data transfer model (MSDTM) to optimize the process of data transfer in a heterogeneous many-core architecture. We use compile-time analysis to classify data accesses, check dependences and determine the allocation of data transfer operations. We further present the data transfer performance model to derive the optimal granularity of data transfer and select the most profitable data transfer strategy. We implement the proposed MSDTM on the GCC complier and evaluate it on Sunway TaihuLight with selected test cases from benchmarks and scientific computing applications. The experimental result shows that the proposed MSDTM improves the application execution time by 5.49$$\times$$ × and achieves an energy saving of 5.16$$\times$$ × on average.

Download Full-text

A High Performance Advanced Encryption Standard (AES) Encrypted On-Chip Bus Architecture for Internet-of-Things (IoT) System-on-Chips (SoC)

10.25148/etd.fidc000248 ◽

2016 ◽

Author(s):

Xiaokun Yang

Keyword(s):

Internet Of Things ◽

High Performance ◽

Advanced Encryption Standard ◽

On Chip ◽

Bus Architecture

Download Full-text

Design and Calibration of MIMU Based on Chip Size Micro Inertial Sensors

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.849.302 ◽

2013 ◽

Vol 849 ◽

pp. 302-309

Author(s):

Yun Xu ◽

Xin Hua Zhu ◽

Yu Wang

Keyword(s):

High Performance ◽

Inertial Sensors ◽

Low Cost ◽

Rapid Development ◽

Experimental Result ◽

Integrated Navigation ◽

Bias Stability ◽

Chip Size ◽

Order Of Magnitude ◽

On Chip

With rapid development of micro fabrication technology, the performance of MIMU has gradually improved. The MIMU introduced in this paper is based on the silicon micro machined gyroscope of type MSG7000D and accelerometer of type MSA6000. The volume of it is 3×3×3cm3, the mass is 68.5g and the power consumption is less than 1w. The experimental result shows that the bias stability of the gyroscope and accelerometer for each axis of the designed MIMU is less than 10°/h and 0.5mg respectively. For the non orthogonality in three axes of the structure, MIMU needs to be calibrated. After calibration, the measurement accuracy has improved by an order of magnitude. The designed MIMU can satisfy the requirement of high performance, low cost, light weight and small size for strap-down navigation system, thus it can be widely applied not only to the field of vehicles integrated navigation, attitude measurement but also to the fields of personal goods such as mobile, game consoles and so on.

Download Full-text

3D Stacked Cache Data Management for Energy Minimization of 3D Chip Multiprocessor

International Journal of Students Research in Technology & Management ◽

10.18510/ijsrtm.2015.325 ◽

2015 ◽

Vol 3 (2) ◽

pp. 264-268

Author(s):

K. Suresh Kumar ◽

S. Anitha ◽

M. Gayathri

Keyword(s):

Temperature Distribution ◽

High Performance ◽

Chip Multiprocessors ◽

Electrical Power ◽

Chip Multiprocessor ◽

Energy Reduction ◽

Experimental Result ◽

Data Mapping ◽

Promising Solution ◽

On Chip

In this model a runtime cache data mapping is discussed for 3-D stacked L2 caches to minimize the overall energy of 3-D chip multiprocessors (CMPs). The suggested method considers both temperature distribution and memory traffic of 3-D CMPs. Experimental result shows energy reduction achieving up to 22.88% compared to an existing solution which considers only the temperature distribution. New tendencies envisage 3D Multi-Processor System-On-Chip (MPSoC) design as a promising solution to keep increasing the performance of the next-generation high performance computing (HPC) systems. However, as the power density of HPC systems increases with the arrival of 3D MPSoCs with energy reduction achieving up to 19.55% by supplying electrical power to the computing equipment and constantly removing the generated heat is rapidly becoming the dominant cost in any HPC facility.

Download Full-text

Novel NoC Topology Construction for High-Performance Communications

Journal of Computer Networks and Communications ◽

10.1155/2011/405697 ◽

2011 ◽

Vol 2011 ◽

pp. 1-6 ◽

Cited By ~ 4

Author(s):

P. Ezhumalai ◽

A. Chilambuchelvan ◽

C. Arun

Keyword(s):

High Performance ◽

Data Communication ◽

Network On Chip ◽

Performance Study ◽

Hop Count ◽

Nanoscale Systems ◽

Area Reduction ◽

Ip Cores ◽

Global Interconnects ◽

On Chip

Different intellectual property (IP) cores, including processor and memory, are interconnected to build a typical system-on-chip (SoC) architecture. Larger SoC designs dictate the data communication to happen over the global interconnects. Network-on-Chip(NoC) architectures have been proposed as a scalable solution to the global communication challenges in nanoscale systems-on-chip (SoC) design. We proposed an idea on building customizing synthesis network—on-chip with the better flow partitioning and also considered power and area reduction as compared to the already presented regular topologies. Hence to improve the performance of SoC, first, we did a performance study of regular interconnect topologies MESH, TORUS, BFT and EBFT, we observed that the overall latency and throughput of the EBFT is better compared to other topologies, The next best in case of latency and throughput is BFT. Experimental results on a variety of NoC benchmarks showed that our synthesis results were achieved reduction in power consumption and average hop count over custom topology implementation.

Download Full-text

Desarrollo e implementación de la interface SBA para un núcleo pWM de 16 canales independientes programables

Revista ECIPeru ◽

10.33017/reveciperu2010.0017/ ◽

2019 ◽

pp. 28-32

Keyword(s):

Intellectual Property ◽

State Machine ◽

System On Chip ◽

Circuit Board ◽

Ip Core ◽

Complex State ◽

Central Processing ◽

Ip Cores ◽

On Chip ◽

Bus Architecture

Desarrollo e implementación de la interface SBA para un núcleo pWM de 16 canales independientes programables Development and implementation of the SBA interface for a 16 independent programmable channels pWM Ip Core Renzo Bermúdez y Miguel Risco Centro de Investigación y Desarrollo en Ingeniería (CIDI) de la Facultad de Ingeniería Electrónica y Mecatrónica Universidad Tecnológica del perú DOI: https://doi.org/10.33017/RevECIPeru2010.0017/ RESUMEN Los Ip-Cores (Núcleos de propiedad Intelectual) son para el diseño de hardware lo que las librerías son para la programación de computadoras. Se suelen utilizar en la forma de un circuito discreto integrado, donde la “placa de circuito” es un diseño más grande en ASIC o en FpGA. Un núcleo de propiedad intelectual a menudo adopta la forma de un programa de computadora escrito en el HDL, tales como Verilog, VHDL o SystemC. Idealmente, un Ip-Core debe ser totalmente “portable”, es decir, que fácilmente se pueda adaptar a cualquier tecnología de otros proveedores o diferentes métodos de diseño. Los Receptores/Transmisores Asíncronos Universales (UART), las Unidades Centrales de procesamiento (CpU), los Controladores Ethernet, las Interfaces pCI, son algunos ejemplos de Ip-Cores. En este trabajo, se presenta la adaptación de un IpCore pWM de 16 canales a una estructura de bloques independientes similar a los SoC (System on Chip). No se ha implementado un microprocesador como maestro del sistema; en su lugar una máquina de estado compleja administra un bus con la finalidad de ahorrar recursos en la FpGA. Esta máquina de estado compleja, que hace las veces de controlador del sistema, se encuentra dentro de una disposición a la que se le denomina SBA (Simple Bus Architecture) o Arquitectura Simple de Bus, la cual no es más de una simplificación de las señales y reglas que establece la especificación Wishbone. El sistema así integrado permite la configuración de 16 salidas digitales pWM independientes en modo de bajo rizado. Si bien en el ejemplo que se presenta en este trabajo muestra un solo IpCore pWM instanciado, esto no supone un límite. El núcleo pWM implementado no hace uso de recursos específicos o especiales de la FpGA, lo que permite que la cantidad de bloques instanciados pueda crecer tanto como bloques genéricos configurables en la FpGA se encuentren disponibles. Es decir, por cada núcleo instanciado se dispondrá de 16 canales pWM independientes que poseerán una posición de programación específica dentro del mapa de direcciones del SBA. Descriptores: FPGa, PWm, system on chip. ABSTRACT iP cores (intellectual Property cores) are for hardware design what libraries are for computer programming. They are typically used in the style and form of a discrete integrated circuit, where the “circuit board” is a larger design in asic or FPGa. a core intellectual property often takes the form of a software program written in hDl such as verilog, vhDl or systemc. ideally, an iP-core must be fully portable, meaning that it can be easily adapted to any technology from other suppliers or different design methods. receivers/transmitters universal asynchronous (uart), central Processing units (cPu), ethernet controllers, interfaces Pci are examples of iP-cores. This paper presents the adaptation of a 16-channel PWm iPcore to a separate brick structure similar to soc (system on chip). We did not implement a microprocessor as master of the system, instead a complex state machine runs a bus in order to save resources in the FPGa. This complex state machine that acts as the controller of the system is within a provision which is called sba (single bus architecture), which is just a simplification of the signals and rules establishing the Wishbone specification. The system thus allows the configuration of 16 independent PWm digital outputs in low ripple mode. While the example presented in this work shows a single PWm iPcore instantiated this is not a limit. The implemented PWm core does not use specific or special resources of the FPGa, which allows that the number of instantiated blocks can grow as much as configurable generic blocks in the FPGa become available. That is, for each instantiated core there will be 16 independent PWm channels that will have specific preset positions within the address map of the sba. Keywords: FPGa, PWm, system on chip.

Download Full-text

The Efficient On-Chip Bus Architecture for High-Performance SoC Design

International Journal of Control and Automation ◽

10.14257/ijca.2017.10.1.02 ◽

2017 ◽

Vol 10 (1) ◽

pp. 13-22

Author(s):

Fred Adu Kumi ◽

Seungyong Park ◽

Kwangki Ryoo

Keyword(s):

High Performance ◽

On Chip ◽

Bus Architecture

Download Full-text

Networks on Chips: Structure and Design Methodologies

Journal of Electrical and Computer Engineering ◽

10.1155/2012/509465 ◽

2012 ◽

Vol 2012 ◽

pp. 1-15 ◽

Cited By ~ 20

Author(s):

Wen-Chung Tsai ◽

Ying-Cherng Lan ◽

Yu-Hen Hu ◽

Sao-Jie Chen

Keyword(s):

High Performance ◽

Chip Multiprocessors ◽

Multiprocessor System ◽

Communication Performance ◽

Core System ◽

Traditional System ◽

On Chip ◽

Many Core ◽

Bus Architecture

The next generation of multiprocessor system on chip (MPSoC) and chip multiprocessors (CMPs) will contain hundreds or thousands of cores. Such a many-core system requires high-performance interconnections to transfer data among the cores on the chip. Traditional system components interface with the interconnection backbone via a bus interface. This interconnection backbone can be an on-chip bus or multilayer bus architecture. With the advent of many-core architectures, the bus architecture becomes the performance bottleneck of the on-chip interconnection framework. In contrast, network on chip (NoC) becomes a promising on-chip communication infrastructure, which is commonly considered as an aggressive long-term approach for on-chip communications. Accordingly, this paper first discusses several common architectures and prevalent techniques that can deal well with the design issues of communication performance, power consumption, signal integrity, and system scalability in an NoC. Finally, a novel bidirectional NoC (BiNoC) architecture with a dynamically self-reconfigurable bidirectional channel is proposed to break the conventional performance bottleneck caused by bandwidth restriction in conventional NoCs.

Download Full-text

A high performance on-chip segmented bus architecture using dynamic bridge-by-pass technique

2010 5th International Conference on Industrial and Information Systems ◽

10.1109/iciinfs.2010.5578700 ◽

2010 ◽

Cited By ~ 1

Author(s):

S. Hema Chitra ◽

A. Kandaswamy

Keyword(s):

High Performance ◽

On Chip ◽

Bus Architecture

Download Full-text

Implementation of AMBA Based AHB2APB Bridge

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e6908.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 1033-1037

Keyword(s):

Embedded System ◽

High Speed ◽

High Performance ◽

Data Loss ◽

Verilog Hdl ◽

Main Target ◽

Functional Blocks ◽

Timing Simulation ◽

On Chip ◽

Bus Architecture

The Advance Micro controller Bus Architecture bus protocol is used to build high performance SoC designs (system on chip). This achieves communication through the connection of different functional blocks ( or IP ). By using multiple controllers and peripherals, it makes possible to develop multiprocessor unit. It provides reusability of IP of different buses of AMBA, which can reduce the communication gap between high performance buses and low speed buses. To perform high-speed pipelined data transfers, AMBA based embedded system becomes a demanding hypothesis analytical wise, by using different bus signals supported by AMBA. To synthesize as well as simulate the composite annexation which connects advance high performance bus and advance peripheral bus which known as AHB2APB Bridge in addition to no data loss during transfer is the main target of this work. Implementation of bridge module is designed in Verilog HDL and functional and timing simulation of bridge module are done on a platform of Xilinx.

Download Full-text