PENINGKATAN PERFORMANSI PROSESOR DLX DENGAN METODE PIPELINE

Prosesor DLX adalah sebuah prosesor berbasis RISC (Reduced Instruction Set Computer) yang dirancang sebagai prosesor tujuan umum (general purpose processor). Prosesor ini mempunyai arsitektur load-store dengan panjang semua instruksinya 32 bit. Setiap instruksi dieksekusi dalam beberapa siklus waktu (cycletime). Secara umum time cycle yang digunakan sebanyak lima tahap yang terdiri dari tahap-tahap : Instruction Fetch (IF), Instruction Decode (ID), Execute (EX), Memory Access (MEM), dan Write Back (WB). Kelima tahap ini dikerjakan secara berurutan [2]. Sebagai prosesor multicycle, DLX mempunyai peluang untuk meningkatkan kinerjanya yang diukur dengan kecepatan proses yang dinyatkan sebagai waktu CPU (CPU time). Peningkatan kinerja prosesor DLX dapat diterapkan dengan menggunakan teknik pipeline. Pada jurnal ini telah dianalisis peningkatan performansi prosesor DLX dengan menggunakan teknik pipeline. Uji coba dilakukan terhadap beberapa program aplikasi yang dieksekusi dengan menggunakan teknik pipeline dan tanpa menggunakan teknik pipeline. Secara umum terjadi peningkatan kecepatan pada setiap kumpulan instruksi yang dianalisis. Proses pengujian dilakukan dengan menggunakan simulator windlx yang merupakan simulator prosesor DLX.Kata kunci : Prosesor DLX, RISC, general purpose processor, CPU, Pipeline, windlx

Download Full-text

FuMicro: A Fused Microarchitecture Design Integrating In-Order Superscalar and VLIW

VLSI Design ◽

10.1155/2016/8787919 ◽

2016 ◽

Vol 2016 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Yumin Hou ◽

Hu He ◽

Xu Yang ◽

Deyuan Guo ◽

Xu Wang ◽

...

Keyword(s):

Digital Signal ◽

General Purpose ◽

Instruction Level Parallelism ◽

Instruction Set ◽

Mode Switch ◽

Development Environment ◽

General Purpose Processor ◽

Improve Instruction ◽

Library Function ◽

Level Parallelism

This paper proposes FuMicro, a fused microarchitecture integrating both in-order superscalar and Very Long Instruction Word (VLIW) in a single core. A processor with FuMicro microarchitecture can work under alternative in-order superscalar and VLIW mode, using the same pipeline and the same Instruction Set Architecture (ISA). Small modification to the compiler is made to expand the register file in VLIW mode. The decision of mode switch is made by software, and this does not need extra hardware. VLIW code can be exploited in the form of library function and the users will be exposed under only superscalar mode; by this means, we can provide the users with a convenient development environment. FuMicro could serve as a universal microarchitecture for it can be applied to different ISAs. In this paper, we focus on the implementation of FuMicro with ARM ISA. This architecture is evaluated on gem5, which is a cycle accurate microarchitecture simulation platform. By adopting FuMicro microarchitecture, the performance can be improved on an average of 10%, with the best performance improvement being 47.3%, compared with that under pure in-order superscalar mode. The result shows that FuMicro microarchitecture can improve Instruction Level Parallelism (ILP) significantly, making it promising to expand digital signal processing capability on a General Purpose Processor.

Download Full-text

INSTRUCTION-SET EXTENSION FOR CRYPTOGRAPHIC APPLICATIONS ON RECONFIGURABLE PLATFORM

Journal of Circuits System and Computers ◽

10.1142/s0218126607004076 ◽

2007 ◽

Vol 16 (06) ◽

pp. 911-927

Author(s):

S. MAJZOUB ◽

H. DIAB

Keyword(s):

Reconfigurable Computing ◽

General Purpose ◽

Coarse Grain ◽

Instruction Set ◽

General Purpose Processor ◽

Instruction Set Extension ◽

Custom Hardware ◽

Reconfigurable Platform ◽

And Performance ◽

Bitwise Operations

Reconfigurable Systems represent a middle trade-off between speed and flexibility in the processor design world. It provides performance close to the custom-hardware and yet preserves some of the general-purpose processor flexibility. Recently, the area of reconfigurable computing has received considerable interest in both its forms: the FPGA and coarse-grain hardware. Since the field is still in its developing stage, it is important to perform hardware analysis and evaluation of certain key applications on target reconfigurable architectures to identify potential limitations and improvements. This paper presents the mapping and performance analysis of two encryption algorithms, namely Rijndael and Twofish, on a coarse grain reconfigurable platform, namely MorphoSys. MorphoSys is a reconfigurable architecture targeted for multimedia applications. Since many cryptographic algorithms involve bitwise operations, bitwise instruction set extension was proposed to enhance the performance. We present the details of the mapping of the bitwise operations involved in the algorithms with thorough analysis. The methodology we used can be utilized in other systems.

Download Full-text

Improvement of Standard and Non-Standard Floating-Point Operators

ECTI Transactions on Computer and Information Technology (ECTI-CIT) ◽

10.37936/ecti-cit.201261.54317 ◽

1970 ◽

Vol 6 (1) ◽

pp. 19-32

Author(s):

Pongyupinpanich Surapong ◽

Francois Philipp ◽

Faizal Arya Samman ◽

Manfred Glesner

Keyword(s):

Clock Cycle ◽

General Purpose ◽

Floating Point ◽

Instruction Set ◽

General Purpose Processor ◽

Ieee Standard ◽

Xilinx Fpga ◽

Instruction Pipeline ◽

Point Arithmetic ◽

Standard Operations

This paper presents the design and analysis of a floating-point arithmetic accelerator in compliance with the IEEE standard single precision floatingpoint format. The accelerator can be used to extend a general-purpose processor such as Motorola MC6820, where floating-point execution units are unembedded by default. It implements standard and non-standard mathematic functions, addition/subtraction, multiplication, Product-of-Sum and Sumof- Product through a micro-instruction set supported by both single and multi-processors systems. The architecture of the unit is based on an instruction pipeline which can simultaneously fetch and execute an instruction within one clock cycle. The non-standard operations such as Product-of-Sum and Sum-of-Product are introduced to compute threeinput operands. The algorithm complexity and hardware critical delay are determined for each operator. The synthesis results of the accelerator on a Xilinx FPGA Virtex 5 xc5vlx110t-3ff-1136 and on Faraday 130-nm Silicon technology report that the design respectively achieves 200 MHz and 1 GHz.

Download Full-text

Design and Implementation of Low Energy Wireless Network Nodes based on Hardware Compression Acceleration

Recent Patents on Computer Science ◽

10.2174/2213275912666190715164024 ◽

2019 ◽

Vol 12 ◽

Author(s):

Hui Yang ◽

Anand Nayyar

Keyword(s):

Energy Consumption ◽

Data Compression ◽

Energy Saving ◽

Optimization Design ◽

Hardware Acceleration ◽

Transmission Efficiency ◽

General Purpose ◽

Storage Space ◽

General Purpose Processor ◽

Compression Time

: In the fast development of information, the information data is increasing in geometric multiples, and the speed of information transmission and storage space are required to be higher. In order to reduce the use of storage space and further improve the transmission efficiency of data, data need to be compressed. processing. In the process of data compression, it is very important to ensure the lossless nature of data, and lossless data compression algorithms appear. The gradual optimization design of the algorithm can often achieve the energy-saving optimization of data compression. Similarly, The effect of energy saving can also be obtained by improving the hardware structure of node. In this paper, a new structure is designed for sensor node, which adopts hardware acceleration, and the data compression module is separated from the node microprocessor.On the basis of the ASIC design of the algorithm, by introducing hardware acceleration, the energy consumption of the compressed data was successfully reduced, and the proportion of energy consumption and compression time saved by the general-purpose processor was as high as 98.4 % and 95.8 %, respectively. It greatly reduces the compression time and energy consumption.

Download Full-text

A 45-nm 37.3 GOPS/W Heterogeneous Multi-Core SOC with 16/32 Bit Instruction-Set General-Purpose Core

IEICE Transactions on Electronics ◽

10.1587/transele.e94.c.663 ◽

2011 ◽

Vol E94-C (4) ◽

pp. 663-669

Author(s):

Osamu NISHII ◽

Yoichi YUYAMA ◽

Masayuki ITO ◽

Yoshikazu KIYOSHIGE ◽

Yusuke NITTA ◽

...

Keyword(s):

General Purpose ◽

Instruction Set

Download Full-text

SoC-FPGA systems for the acquisition and processing of electroencephalographic signals

International Journal of Reconfigurable and Embedded Systems (IJRES) ◽

10.11591/ijres.v10.i3.pp237-248 ◽

2021 ◽

Vol 10 (3) ◽

pp. 237

Author(s):

Matias Javier Oliva ◽

Pablo Andrés García ◽

Enrique Mario Spinelli ◽

Alejandro Luis Veiga

Keyword(s):

Embedded System ◽

Real Time ◽

General Purpose ◽

System Response ◽

Single Chip ◽

Real Time Processing ◽

General Purpose Processor ◽

Time Operation ◽

Electroencephalographic Signals ◽

High Level

<span lang="EN-US">Real-time acquisition and processing of electroencephalographic signals have promising applications in the implementation of brain-computer interfaces. These devices allow the user to control a device without performing motor actions, and are usually made up of a biopotential acquisition stage and a personal computer (PC). This structure is very flexible and appropriate for research, but for final users it is necessary to migrate to an embedded system, eliminating the PC from the scheme. The strict real-time processing requirements of such systems justify the choice of a system on a chip field-programmable gate arrays (SoC-FPGA) for its implementation. This article proposes a platform for the acquisition and processing of electroencephalographic signals using this type of device, which combines the parallelism and speed capabilities of an FPGA with the simplicity of a general-purpose processor on a single chip. In this scheme, the FPGA is in charge of the real-time operation, acquiring and processing the signals, while the processor solves the high-level tasks, with the interconnection between processing elements solved by buses integrated into the chip. The proposed scheme was used to implement a brain-computer interface based on steady-state visual evoked potentials, which was used to command a speller. The first tests of the system show that a selection time of 5 seconds per command can be achieved. The time delay between the user’s selection and the system response has been estimated at 343 µs.</span>

Download Full-text

An Approach to the Construction of a Network Processing Unit

Modeling and Analysis of Information Systems ◽

10.18255/1818-1015-2019-1-39-62 ◽

2019 ◽

Vol 26 (1) ◽

pp. 39-62

Author(s):

Stanislav O. Bezzubtsev ◽

Vyacheslav V. Vasin ◽

Dmitry Yu. Volkanov ◽

Shynar R. Zhailauova ◽

Vladislav A. Miroshnik ◽

...

Keyword(s):

Simulation Model ◽

General Purpose ◽

Network Processor ◽

Processing Unit ◽

Use Case ◽

General Purpose Processor ◽

Software Products ◽

Processor Architectures ◽

Advantages And Disadvantages ◽

Processor Unit

The paper proposes the architecture and basic requirements for a network processor for OpenFlow switches of software-defined networks. An analysis of the architectures of well-known network processors is presented − NP-5 from EZchip (now Mellanox) and Tofino from Barefoot Networks. The advantages and disadvantages of two different versions of network processor architectures are considered: pipeline-based architecture, the stages of which are represented by a set of general-purpose processor cores, and pipeline-based architecture whose stages correspond to cores specialized for specific packet processing operations. Based on a dedicated set of the most common use case scenarios, a new architecture of the network processor unit (NPU) with functionally specialized pipeline stages was proposed. The article presents a description of the simulation model of the NPU of the proposed architecture. The simulation model of the network processor is implemented in C ++ languages using SystemC, the open-source C++ library. For the functional testing of the obtained NPU model, the described use case scenarios were implemented in C. In order to evaluate the performance of the proposed NPU architecture a set of software products developed by KM211 company and the KMX32 family of microcontrollers were used. Evaluation of NPU performance was made on the basis of a simulation model. Estimates of the processing time of one packet and the average throughput of the NPU model for each scenario are obtained.

Download Full-text