INSTRUCTION-SET EXTENSION FOR CRYPTOGRAPHIC APPLICATIONS ON RECONFIGURABLE PLATFORM

2007 ◽  
Vol 16 (06) ◽  
pp. 911-927
Author(s):  
S. MAJZOUB ◽  
H. DIAB

Reconfigurable Systems represent a middle trade-off between speed and flexibility in the processor design world. It provides performance close to the custom-hardware and yet preserves some of the general-purpose processor flexibility. Recently, the area of reconfigurable computing has received considerable interest in both its forms: the FPGA and coarse-grain hardware. Since the field is still in its developing stage, it is important to perform hardware analysis and evaluation of certain key applications on target reconfigurable architectures to identify potential limitations and improvements. This paper presents the mapping and performance analysis of two encryption algorithms, namely Rijndael and Twofish, on a coarse grain reconfigurable platform, namely MorphoSys. MorphoSys is a reconfigurable architecture targeted for multimedia applications. Since many cryptographic algorithms involve bitwise operations, bitwise instruction set extension was proposed to enhance the performance. We present the details of the mapping of the bitwise operations involved in the algorithms with thorough analysis. The methodology we used can be utilized in other systems.

VLSI Design ◽  
2016 ◽  
Vol 2016 ◽  
pp. 1-12 ◽  
Author(s):  
Yumin Hou ◽  
Hu He ◽  
Xu Yang ◽  
Deyuan Guo ◽  
Xu Wang ◽  
...  

This paper proposes FuMicro, a fused microarchitecture integrating both in-order superscalar and Very Long Instruction Word (VLIW) in a single core. A processor with FuMicro microarchitecture can work under alternative in-order superscalar and VLIW mode, using the same pipeline and the same Instruction Set Architecture (ISA). Small modification to the compiler is made to expand the register file in VLIW mode. The decision of mode switch is made by software, and this does not need extra hardware. VLIW code can be exploited in the form of library function and the users will be exposed under only superscalar mode; by this means, we can provide the users with a convenient development environment. FuMicro could serve as a universal microarchitecture for it can be applied to different ISAs. In this paper, we focus on the implementation of FuMicro with ARM ISA. This architecture is evaluated on gem5, which is a cycle accurate microarchitecture simulation platform. By adopting FuMicro microarchitecture, the performance can be improved on an average of 10%, with the best performance improvement being 47.3%, compared with that under pure in-order superscalar mode. The result shows that FuMicro microarchitecture can improve Instruction Level Parallelism (ILP) significantly, making it promising to expand digital signal processing capability on a General Purpose Processor.


2013 ◽  
Vol 2013 ◽  
pp. 1-23 ◽  
Author(s):  
O. Ahmed ◽  
S. Areibi ◽  
R. Collier ◽  
G. Grewal

Current software-based packet classification algorithms exhibit relatively poor performance, prompting many researchers to concentrate on novel frameworks and architectures that employ both hardware and software components. The Packet Classification with Incremental Update (PCIU) algorithm, Ahmed et al. (2010), is a novel and efficient packet classification algorithm with a unique incremental update capability that demonstrated excellent results and was shown to be scalable for many different tasks and clients. While a pure software implementation can generate powerful results on a server machine, an embedded solution may be more desirable for some applications and clients. Embedded, specialized hardware accelerator based solutions are typically much more efficient in speed, cost, and size than solutions that are implemented on general-purpose processor systems. This paper seeks to explore the design space of translating the PCIU algorithm into hardware by utilizing several optimization techniques, ranging from fine grain to coarse grain and parallel coarse grain approaches. The paper presents a detailed implementation of a hardware accelerator of the PCIU based on an Electronic System Level (ESL) approach. Results obtained indicate that the hardware accelerator achieves on average 27x speedup over a state-of-the-art Xeon processor.


Author(s):  
Maman Abdurohman

Prosesor DLX adalah sebuah prosesor berbasis RISC (Reduced Instruction Set Computer) yang dirancang sebagai prosesor tujuan umum (general purpose processor). Prosesor ini mempunyai arsitektur load-store dengan panjang semua instruksinya 32 bit. Setiap instruksi dieksekusi dalam beberapa siklus waktu (cycletime). Secara umum time cycle yang digunakan sebanyak lima tahap yang terdiri dari tahap-tahap : Instruction Fetch (IF), Instruction Decode (ID), Execute (EX), Memory Access (MEM), dan Write Back (WB). Kelima tahap ini dikerjakan secara berurutan [2]. Sebagai prosesor multicycle, DLX mempunyai peluang untuk meningkatkan kinerjanya yang diukur dengan kecepatan proses yang dinyatkan sebagai waktu CPU (CPU time). Peningkatan kinerja prosesor DLX dapat diterapkan dengan menggunakan teknik pipeline. Pada jurnal ini telah dianalisis peningkatan performansi prosesor DLX dengan menggunakan teknik pipeline. Uji coba dilakukan terhadap beberapa program aplikasi yang dieksekusi dengan menggunakan teknik pipeline dan tanpa menggunakan teknik pipeline. Secara umum terjadi peningkatan kecepatan pada setiap kumpulan instruksi yang dianalisis. Proses pengujian dilakukan dengan menggunakan simulator windlx yang merupakan simulator prosesor DLX.Kata kunci : Prosesor DLX, RISC, general purpose processor, CPU, Pipeline, windlx


Author(s):  
Pongyupinpanich Surapong ◽  
Francois Philipp ◽  
Faizal Arya Samman ◽  
Manfred Glesner

This paper presents the design and analysis of a floating-point arithmetic accelerator in compliance with the IEEE standard single precision floatingpoint format. The accelerator can be used to extend a general-purpose processor such as Motorola MC6820, where floating-point execution units are unembedded by default. It implements standard and non-standard mathematic functions, addition/subtraction, multiplication, Product-of-Sum and Sumof- Product through a micro-instruction set supported by both single and multi-processors systems. The architecture of the unit is based on an instruction pipeline which can simultaneously fetch and execute an instruction within one clock cycle. The non-standard operations such as Product-of-Sum and Sum-of-Product are introduced to compute threeinput operands. The algorithm complexity and hardware critical delay are determined for each operator. The synthesis results of the accelerator on a Xilinx FPGA Virtex 5 xc5vlx110t-3ff-1136 and on Faraday 130-nm Silicon technology report that the design respectively achieves 200 MHz and 1 GHz.


Author(s):  
Hui Yang ◽  
Anand Nayyar

: In the fast development of information, the information data is increasing in geometric multiples, and the speed of information transmission and storage space are required to be higher. In order to reduce the use of storage space and further improve the transmission efficiency of data, data need to be compressed. processing. In the process of data compression, it is very important to ensure the lossless nature of data, and lossless data compression algorithms appear. The gradual optimization design of the algorithm can often achieve the energy-saving optimization of data compression. Similarly, The effect of energy saving can also be obtained by improving the hardware structure of node. In this paper, a new structure is designed for sensor node, which adopts hardware acceleration, and the data compression module is separated from the node microprocessor.On the basis of the ASIC design of the algorithm, by introducing hardware acceleration, the energy consumption of the compressed data was successfully reduced, and the proportion of energy consumption and compression time saved by the general-purpose processor was as high as 98.4 % and 95.8 %, respectively. It greatly reduces the compression time and energy consumption.


1987 ◽  
Vol 14 (3) ◽  
pp. 134-140 ◽  
Author(s):  
K.A. Clarke

Practical classes in neurophysiology reinforce and complement the theoretical background in a number of ways, including demonstration of concepts, practice in planning and performance of experiments, and the production and maintenance of viable neural preparations. The balance of teaching objectives will depend upon the particular group of students involved. A technique is described which allows the embedding of real compound action potentials from one of the most basic introductory neurophysiology experiments—frog sciatic nerve, into interactive programs for student use. These retain all the elements of the “real experiment” in terms of appearance, presentation, experimental management and measurement by the student. Laboratory reports by the students show that the experiments are carefully and enthusiastically performed and the material is well absorbed. Three groups of student derive most benefit from their use. First, students whose future careers will not involve animal experiments do not spend time developing dissecting skills they will not use, but more time fulfilling the other teaching objectives. Second, relatively inexperienced students, struggling to produce viable neural material and master complicated laboratory equipment, who are often left with little time or motivation to take accurate readings or ponder upon neurophysiological concepts. Third, students in institutions where neurophysiology is taught with difficulty because of the high cost of equipment and lack of specific expertise, may well have access to a low cost general purpose microcomputer system.


Sign in / Sign up

Export Citation Format

Share Document