INSTRUCTION-SET EXTENSION FOR CRYPTOGRAPHIC APPLICATIONS ON RECONFIGURABLE PLATFORM

Reconfigurable Systems represent a middle trade-off between speed and flexibility in the processor design world. It provides performance close to the custom-hardware and yet preserves some of the general-purpose processor flexibility. Recently, the area of reconfigurable computing has received considerable interest in both its forms: the FPGA and coarse-grain hardware. Since the field is still in its developing stage, it is important to perform hardware analysis and evaluation of certain key applications on target reconfigurable architectures to identify potential limitations and improvements. This paper presents the mapping and performance analysis of two encryption algorithms, namely Rijndael and Twofish, on a coarse grain reconfigurable platform, namely MorphoSys. MorphoSys is a reconfigurable architecture targeted for multimedia applications. Since many cryptographic algorithms involve bitwise operations, bitwise instruction set extension was proposed to enhance the performance. We present the details of the mapping of the bitwise operations involved in the algorithms with thorough analysis. The methodology we used can be utilized in other systems.

Download Full-text

Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures

Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000 ◽

10.1109/micro.2000.898075 ◽

2002 ◽

Cited By ~ 25

Author(s):

R. Balasubramonian ◽

D. Albones ◽

A. Buyuktosunoglu ◽

S. Dwarkadas

Keyword(s):

Memory Hierarchy ◽

General Purpose ◽

General Purpose Processor ◽

Processor Architectures ◽

And Performance

Download Full-text

FuMicro: A Fused Microarchitecture Design Integrating In-Order Superscalar and VLIW

VLSI Design ◽

10.1155/2016/8787919 ◽

2016 ◽

Vol 2016 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Yumin Hou ◽

Hu He ◽

Xu Yang ◽

Deyuan Guo ◽

Xu Wang ◽

...

Keyword(s):

Digital Signal ◽

General Purpose ◽

Instruction Level Parallelism ◽

Instruction Set ◽

Mode Switch ◽

Development Environment ◽

General Purpose Processor ◽

Improve Instruction ◽

Library Function ◽

Level Parallelism

This paper proposes FuMicro, a fused microarchitecture integrating both in-order superscalar and Very Long Instruction Word (VLIW) in a single core. A processor with FuMicro microarchitecture can work under alternative in-order superscalar and VLIW mode, using the same pipeline and the same Instruction Set Architecture (ISA). Small modification to the compiler is made to expand the register file in VLIW mode. The decision of mode switch is made by software, and this does not need extra hardware. VLIW code can be exploited in the form of library function and the users will be exposed under only superscalar mode; by this means, we can provide the users with a convenient development environment. FuMicro could serve as a universal microarchitecture for it can be applied to different ISAs. In this paper, we focus on the implementation of FuMicro with ARM ISA. This architecture is evaluated on gem5, which is a cycle accurate microarchitecture simulation platform. By adopting FuMicro microarchitecture, the performance can be improved on an average of 10%, with the best performance improvement being 47.3%, compared with that under pure in-order superscalar mode. The result shows that FuMicro microarchitecture can improve Instruction Level Parallelism (ILP) significantly, making it promising to expand digital signal processing capability on a General Purpose Processor.

Download Full-text

An Impulse-C Hardware Accelerator for Packet Classification Based on Fine/Coarse Grain Optimization

International Journal of Reconfigurable Computing ◽

10.1155/2013/130765 ◽

2013 ◽

Vol 2013 ◽

pp. 1-23 ◽

Cited By ~ 1

Author(s):

O. Ahmed ◽

S. Areibi ◽

R. Collier ◽

G. Grewal

Keyword(s):

Poor Performance ◽

Electronic System ◽

General Purpose ◽

Packet Classification ◽

Optimization Techniques ◽

System Level ◽

Coarse Grain ◽

Hardware Accelerator ◽

General Purpose Processor ◽

Incremental Update

Current software-based packet classification algorithms exhibit relatively poor performance, prompting many researchers to concentrate on novel frameworks and architectures that employ both hardware and software components. The Packet Classification with Incremental Update (PCIU) algorithm, Ahmed et al. (2010), is a novel and efficient packet classification algorithm with a unique incremental update capability that demonstrated excellent results and was shown to be scalable for many different tasks and clients. While a pure software implementation can generate powerful results on a server machine, an embedded solution may be more desirable for some applications and clients. Embedded, specialized hardware accelerator based solutions are typically much more efficient in speed, cost, and size than solutions that are implemented on general-purpose processor systems. This paper seeks to explore the design space of translating the PCIU algorithm into hardware by utilizing several optimization techniques, ranging from fine grain to coarse grain and parallel coarse grain approaches. The paper presents a detailed implementation of a hardware accelerator of the PCIU based on an Electronic System Level (ESL) approach. Results obtained indicate that the hardware accelerator achieves on average 27x speedup over a state-of-the-art Xeon processor.

Download Full-text

PENINGKATAN PERFORMANSI PROSESOR DLX DENGAN METODE PIPELINE

TEKTRIKA - Jurnal Penelitian dan Pengembangan Telekomunikasi Kendali Komputer Elektrik dan Elektronika ◽

10.25124/tektrika.v8i2.230 ◽

2016 ◽

Vol 8 (2) ◽

Author(s):

Maman Abdurohman

Keyword(s):

General Purpose ◽

Memory Access ◽

Instruction Set ◽

Instruction Fetch ◽

General Purpose Processor ◽

Cpu Time ◽

Time Cycle ◽

Reduced Instruction Set Computer

Prosesor DLX adalah sebuah prosesor berbasis RISC (Reduced Instruction Set Computer) yang dirancang sebagai prosesor tujuan umum (general purpose processor). Prosesor ini mempunyai arsitektur load-store dengan panjang semua instruksinya 32 bit. Setiap instruksi dieksekusi dalam beberapa siklus waktu (cycletime). Secara umum time cycle yang digunakan sebanyak lima tahap yang terdiri dari tahap-tahap : Instruction Fetch (IF), Instruction Decode (ID), Execute (EX), Memory Access (MEM), dan Write Back (WB). Kelima tahap ini dikerjakan secara berurutan [2]. Sebagai prosesor multicycle, DLX mempunyai peluang untuk meningkatkan kinerjanya yang diukur dengan kecepatan proses yang dinyatkan sebagai waktu CPU (CPU time). Peningkatan kinerja prosesor DLX dapat diterapkan dengan menggunakan teknik pipeline. Pada jurnal ini telah dianalisis peningkatan performansi prosesor DLX dengan menggunakan teknik pipeline. Uji coba dilakukan terhadap beberapa program aplikasi yang dieksekusi dengan menggunakan teknik pipeline dan tanpa menggunakan teknik pipeline. Secara umum terjadi peningkatan kecepatan pada setiap kumpulan instruksi yang dianalisis. Proses pengujian dilakukan dengan menggunakan simulator windlx yang merupakan simulator prosesor DLX.Kata kunci : Prosesor DLX, RISC, general purpose processor, CPU, Pipeline, windlx

Download Full-text

Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture - MICRO 33 ◽

10.1145/360128.360153 ◽

2000 ◽

Cited By ~ 110

Author(s):

Rajeev Balasubramonian ◽

David Albonesi ◽

Alper Buyuktosunoglu ◽

Sandhya Dwarkadas

Keyword(s):

Memory Hierarchy ◽

General Purpose ◽

General Purpose Processor ◽

Processor Architectures ◽

And Performance

Download Full-text

Improvement of Standard and Non-Standard Floating-Point Operators

ECTI Transactions on Computer and Information Technology (ECTI-CIT) ◽

10.37936/ecti-cit.201261.54317 ◽

1970 ◽

Vol 6 (1) ◽

pp. 19-32

Author(s):

Pongyupinpanich Surapong ◽

Francois Philipp ◽

Faizal Arya Samman ◽

Manfred Glesner

Keyword(s):

Clock Cycle ◽

General Purpose ◽

Floating Point ◽

Instruction Set ◽

General Purpose Processor ◽

Ieee Standard ◽

Xilinx Fpga ◽

Instruction Pipeline ◽

Point Arithmetic ◽

Standard Operations

This paper presents the design and analysis of a floating-point arithmetic accelerator in compliance with the IEEE standard single precision floatingpoint format. The accelerator can be used to extend a general-purpose processor such as Motorola MC6820, where floating-point execution units are unembedded by default. It implements standard and non-standard mathematic functions, addition/subtraction, multiplication, Product-of-Sum and Sumof- Product through a micro-instruction set supported by both single and multi-processors systems. The architecture of the unit is based on an instruction pipeline which can simultaneously fetch and execute an instruction within one clock cycle. The non-standard operations such as Product-of-Sum and Sum-of-Product are introduced to compute threeinput operands. The algorithm complexity and hardware critical delay are determined for each operator. The synthesis results of the accelerator on a Xilinx FPGA Virtex 5 xc5vlx110t-3ff-1136 and on Faraday 130-nm Silicon technology report that the design respectively achieves 200 MHz and 1 GHz.

Download Full-text

Instruction-Set Extension for Cryptographic Applications on Reconfigurable Platform

2006 6th International Workshop on System on Chip for Real Time Applications ◽

10.1109/iwsoc.2006.348231 ◽

2006 ◽

Cited By ~ 1

Author(s):

Sohaib Majzoub ◽

Hassan Diab

Keyword(s):

Instruction Set ◽

Instruction Set Extension ◽

Reconfigurable Platform

Download Full-text

Methods of test and performance requirements for general-purpose flat pallets for through transit of goods

10.3403/00225803 ◽

1990 ◽

Keyword(s):

General Purpose ◽

Performance Requirements ◽

And Performance

Download Full-text

Design and Implementation of Low Energy Wireless Network Nodes based on Hardware Compression Acceleration

Recent Patents on Computer Science ◽

10.2174/2213275912666190715164024 ◽

2019 ◽

Vol 12 ◽

Author(s):

Hui Yang ◽

Anand Nayyar

Keyword(s):

Energy Consumption ◽

Data Compression ◽

Energy Saving ◽

Optimization Design ◽

Hardware Acceleration ◽

Transmission Efficiency ◽

General Purpose ◽

Storage Space ◽

General Purpose Processor ◽

Compression Time

: In the fast development of information, the information data is increasing in geometric multiples, and the speed of information transmission and storage space are required to be higher. In order to reduce the use of storage space and further improve the transmission efficiency of data, data need to be compressed. processing. In the process of data compression, it is very important to ensure the lossless nature of data, and lossless data compression algorithms appear. The gradual optimization design of the algorithm can often achieve the energy-saving optimization of data compression. Similarly, The effect of energy saving can also be obtained by improving the hardware structure of node. In this paper, a new structure is designed for sensor node, which adopts hardware acceleration, and the data compression module is separated from the node microprocessor.On the basis of the ASIC design of the algorithm, by introducing hardware acceleration, the energy consumption of the compressed data was successfully reduced, and the proportion of energy consumption and compression time saved by the general-purpose processor was as high as 98.4 % and 95.8 %, respectively. It greatly reduces the compression time and energy consumption.

Download Full-text

The Use of Microcomputer Simulations in Undergraduate Neurophysiology Experiments

Alternatives to Laboratory Animals ◽

10.1177/026119298701400303 ◽

1987 ◽

Vol 14 (3) ◽

pp. 134-140 ◽

Cited By ~ 2

Author(s):

K.A. Clarke

Keyword(s):

Low Cost ◽

Animal Experiments ◽

Theoretical Background ◽

General Purpose ◽

Microcomputer System ◽

Laboratory Equipment ◽

Teaching Objectives ◽

Compound Action Potentials ◽

Experimental Management ◽

And Performance

Practical classes in neurophysiology reinforce and complement the theoretical background in a number of ways, including demonstration of concepts, practice in planning and performance of experiments, and the production and maintenance of viable neural preparations. The balance of teaching objectives will depend upon the particular group of students involved. A technique is described which allows the embedding of real compound action potentials from one of the most basic introductory neurophysiology experiments—frog sciatic nerve, into interactive programs for student use. These retain all the elements of the “real experiment” in terms of appearance, presentation, experimental management and measurement by the student. Laboratory reports by the students show that the experiments are carefully and enthusiastically performed and the material is well absorbed. Three groups of student derive most benefit from their use. First, students whose future careers will not involve animal experiments do not spend time developing dissecting skills they will not use, but more time fulfilling the other teaching objectives. Second, relatively inexperienced students, struggling to produce viable neural material and master complicated laboratory equipment, who are often left with little time or motivation to take accurate readings or ponder upon neurophysiological concepts. Third, students in institutions where neurophysiology is taught with difficulty because of the high cost of equipment and lack of specific expertise, may well have access to a low cost general purpose microcomputer system.

Download Full-text