Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture - MICRO 33 ◽

10.1145/360128.360153 ◽

2000 ◽

Cited By ~ 110

Author(s):

Rajeev Balasubramonian ◽

David Albonesi ◽

Alper Buyuktosunoglu ◽

Sandhya Dwarkadas

Keyword(s):

Memory Hierarchy ◽

General Purpose ◽

General Purpose Processor ◽

Processor Architectures ◽

And Performance

An Approach to the Construction of a Network Processing Unit

Modeling and Analysis of Information Systems ◽

10.18255/1818-1015-2019-1-39-62 ◽

2019 ◽

Vol 26 (1) ◽

pp. 39-62

Author(s):

Stanislav O. Bezzubtsev ◽

Vyacheslav V. Vasin ◽

Dmitry Yu. Volkanov ◽

Shynar R. Zhailauova ◽

Vladislav A. Miroshnik ◽

...

Keyword(s):

Simulation Model ◽

General Purpose ◽

Network Processor ◽

Processing Unit ◽

Use Case ◽

General Purpose Processor ◽

Software Products ◽

Processor Architectures ◽

Advantages And Disadvantages ◽

Processor Unit

The paper proposes the architecture and basic requirements for a network processor for OpenFlow switches of software-defined networks. An analysis of the architectures of well-known network processors is presented − NP-5 from EZchip (now Mellanox) and Tofino from Barefoot Networks. The advantages and disadvantages of two different versions of network processor architectures are considered: pipeline-based architecture, the stages of which are represented by a set of general-purpose processor cores, and pipeline-based architecture whose stages correspond to cores specialized for specific packet processing operations. Based on a dedicated set of the most common use case scenarios, a new architecture of the network processor unit (NPU) with functionally specialized pipeline stages was proposed. The article presents a description of the simulation model of the NPU of the proposed architecture. The simulation model of the network processor is implemented in C ++ languages using SystemC, the open-source C++ library. For the functional testing of the obtained NPU model, the described use case scenarios were implemented in C. In order to evaluate the performance of the proposed NPU architecture a set of software products developed by KM211 company and the KMX32 family of microcontrollers were used. Evaluation of NPU performance was made on the basis of a simulation model. Estimates of the processing time of one packet and the average throughput of the NPU model for each scenario are obtained.

INSTRUCTION-SET EXTENSION FOR CRYPTOGRAPHIC APPLICATIONS ON RECONFIGURABLE PLATFORM

Journal of Circuits System and Computers ◽

10.1142/s0218126607004076 ◽

2007 ◽

Vol 16 (06) ◽

pp. 911-927

Author(s):

S. MAJZOUB ◽

H. DIAB

Keyword(s):

Reconfigurable Computing ◽

General Purpose ◽

Coarse Grain ◽

Instruction Set ◽

General Purpose Processor ◽

Instruction Set Extension ◽

Custom Hardware ◽

Reconfigurable Platform ◽

And Performance ◽

Bitwise Operations

Reconfigurable Systems represent a middle trade-off between speed and flexibility in the processor design world. It provides performance close to the custom-hardware and yet preserves some of the general-purpose processor flexibility. Recently, the area of reconfigurable computing has received considerable interest in both its forms: the FPGA and coarse-grain hardware. Since the field is still in its developing stage, it is important to perform hardware analysis and evaluation of certain key applications on target reconfigurable architectures to identify potential limitations and improvements. This paper presents the mapping and performance analysis of two encryption algorithms, namely Rijndael and Twofish, on a coarse grain reconfigurable platform, namely MorphoSys. MorphoSys is a reconfigurable architecture targeted for multimedia applications. Since many cryptographic algorithms involve bitwise operations, bitwise instruction set extension was proposed to enhance the performance. We present the details of the mapping of the bitwise operations involved in the algorithms with thorough analysis. The methodology we used can be utilized in other systems.

Recent changes and future trends in general purpose processor architectures to support image and video applications

Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429) ◽

10.1109/icip.2003.1247187 ◽

2004 ◽

Cited By ~ 2

Author(s):

E. Debes

Keyword(s):

General Purpose ◽

Future Trends ◽

General Purpose Processor ◽

Processor Architectures

Methods of test and performance requirements for general-purpose flat pallets for through transit of goods

10.3403/00225803 ◽

1990 ◽

Keyword(s):

General Purpose ◽

Performance Requirements ◽

And Performance

Design and Implementation of Low Energy Wireless Network Nodes based on Hardware Compression Acceleration

Recent Patents on Computer Science ◽

10.2174/2213275912666190715164024 ◽

2019 ◽

Vol 12 ◽

Author(s):

Hui Yang ◽

Anand Nayyar

Keyword(s):

Energy Consumption ◽

Data Compression ◽

Energy Saving ◽

Optimization Design ◽

Hardware Acceleration ◽

Transmission Efficiency ◽

General Purpose ◽

Storage Space ◽

General Purpose Processor ◽

Compression Time

: In the fast development of information, the information data is increasing in geometric multiples, and the speed of information transmission and storage space are required to be higher. In order to reduce the use of storage space and further improve the transmission efficiency of data, data need to be compressed. processing. In the process of data compression, it is very important to ensure the lossless nature of data, and lossless data compression algorithms appear. The gradual optimization design of the algorithm can often achieve the energy-saving optimization of data compression. Similarly, The effect of energy saving can also be obtained by improving the hardware structure of node. In this paper, a new structure is designed for sensor node, which adopts hardware acceleration, and the data compression module is separated from the node microprocessor.On the basis of the ASIC design of the algorithm, by introducing hardware acceleration, the energy consumption of the compressed data was successfully reduced, and the proportion of energy consumption and compression time saved by the general-purpose processor was as high as 98.4 % and 95.8 %, respectively. It greatly reduces the compression time and energy consumption.

The Use of Microcomputer Simulations in Undergraduate Neurophysiology Experiments

Alternatives to Laboratory Animals ◽

10.1177/026119298701400303 ◽

1987 ◽

Vol 14 (3) ◽

pp. 134-140 ◽

Cited By ~ 2

Author(s):

K.A. Clarke

Keyword(s):

Low Cost ◽

Animal Experiments ◽

Theoretical Background ◽

General Purpose ◽

Microcomputer System ◽

Laboratory Equipment ◽

Teaching Objectives ◽

Compound Action Potentials ◽

Experimental Management ◽

And Performance

Practical classes in neurophysiology reinforce and complement the theoretical background in a number of ways, including demonstration of concepts, practice in planning and performance of experiments, and the production and maintenance of viable neural preparations. The balance of teaching objectives will depend upon the particular group of students involved. A technique is described which allows the embedding of real compound action potentials from one of the most basic introductory neurophysiology experiments—frog sciatic nerve, into interactive programs for student use. These retain all the elements of the “real experiment” in terms of appearance, presentation, experimental management and measurement by the student. Laboratory reports by the students show that the experiments are carefully and enthusiastically performed and the material is well absorbed. Three groups of student derive most benefit from their use. First, students whose future careers will not involve animal experiments do not spend time developing dissecting skills they will not use, but more time fulfilling the other teaching objectives. Second, relatively inexperienced students, struggling to produce viable neural material and master complicated laboratory equipment, who are often left with little time or motivation to take accurate readings or ponder upon neurophysiological concepts. Third, students in institutions where neurophysiology is taught with difficulty because of the high cost of equipment and lack of specific expertise, may well have access to a low cost general purpose microcomputer system.

Effect of Changing a Traffic Control Device Color on Driver Behavior and Perception across Different Age Groups

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211011168 ◽

2021 ◽

pp. 036119812110111

Author(s):

Hatem Abou-Senna ◽

Mohamed El-Agroudy ◽

Mustapha Mouloua ◽

Essam Radwan

Keyword(s):

Traffic Control ◽

Traffic Management ◽

Driving Simulator ◽

Age Groups ◽

The United States ◽

Time Of Day ◽

General Purpose ◽

Traffic Density ◽

Driving Conditions ◽

And Performance

The use of express lanes (ELs) in freeway traffic management has seen increasing popularity throughout the United States, particularly in Florida. These lanes aim at making the most efficient transportation system management and operations tool to provide a more reliable trip. An important component of ELs is the channelizing devices used to delineate the separation between the ELs and the general-purpose lane. With the upcoming changes to the FHWA Manual on Uniform Traffic Control Devices, this study provided an opportunity to recommend changes affecting safety and efficiency on a nationwide level. It was important to understand the impacts on driver perception and performance in response to the color of the EL delineators. It was also valuable to understand the differences between demographics in responding to delineator colors under different driving conditions. The driving simulator was used to test the responses of several demographic groups to changes in marker color and driving conditions. Furthermore, participants were tested for several factors relevant to driving performance including visual and subjective responses to the changes in colors and driving conditions. Impacts on driver perception were observed via eye-tracking technology with changes to time of day, visibility, traffic density, roadway surface type, and, crucially, color of the delineating devices. The analyses concluded that white was the optimal and most significant color for notice of delineators across the majority of subjective and performance measures, followed by yellow, with black being the least desirable.

SYSTEMC IMPLEMENTATION AND PERFORMANCE EVALUATION OF A DECOUPLED GENERAL-PURPOSE MATRIX PROCESSOR

Parallel Processing Letters ◽

10.1142/s0129626410000090 ◽

2010 ◽

Vol 20 (02) ◽

pp. 103-121 ◽

Cited By ~ 1

Author(s):

MOSTAFA I. SOLIMAN ◽

ABDULMAJID F. Al-JUNAID

Keyword(s):

Performance Evaluation ◽

Matrix Multiplication ◽

General Purpose ◽

System Level ◽

Memory Latency ◽

Single Chip ◽

Wide Range ◽

Matrix Unit ◽

And Performance ◽

Vector Matrix

Technological advances in IC manufacturing provide us with the capability to integrate more and more functionality into a single chip. Today's modern processors have nearly one billion transistors on a single chip. With the increasing complexity of today's system, the designs have to be modeled at a high-level of abstraction before partitioning into hardware and software components for final implementation. This paper explains in detail the implementation and performance evaluation of a matrix processor called Mat-Core with SystemC (system level modeling language). Mat-Core is a research processor aiming at exploiting the increasingly number of transistors per IC to improve the performance of a wide range of applications. It extends a general-purpose scalar processor with a matrix unit. To hide memory latency, the extended matrix unit is decoupled into two components: address generation and data computation, which communicate through data queues. Like vector architectures, the data computation unit is organized in parallel lanes. However, on parallel lanes, Mat-Core can execute matrix-scalar, matrix-vector, and matrix-matrix instructions in addition to vector-scalar and vector-vector instructions. For controlling the execution of vector/matrix instructions on the matrix core, this paper extends the well known scoreboard technique. Furthermore, the performance of Mat-Core is evaluated on vector and matrix kernels. Our results show that the performance of four lanes Mat-Core with matrix registers of size 4 × 4 or 16 elements each, queues size of 10, start up time of 6 clock cycles, and memory latency of 10 clock cycles is about 0.94, 1.3, 2.3, 1.6, 2.3, and 5.5 FLOPs per clock cycle; achieved on scalar-vector multiplication, SAXPY, Givens, rank-1 update, vector-matrix multiplication, and matrix-matrix multiplication, respectively.

SoC-FPGA systems for the acquisition and processing of electroencephalographic signals

International Journal of Reconfigurable and Embedded Systems (IJRES) ◽

10.11591/ijres.v10.i3.pp237-248 ◽

2021 ◽

Vol 10 (3) ◽

pp. 237

Author(s):

Matias Javier Oliva ◽

Pablo Andrés García ◽

Enrique Mario Spinelli ◽

Alejandro Luis Veiga

Keyword(s):

Embedded System ◽

Real Time ◽

General Purpose ◽

System Response ◽

Single Chip ◽

Real Time Processing ◽

General Purpose Processor ◽

Time Operation ◽

Electroencephalographic Signals ◽

High Level

<span lang="EN-US">Real-time acquisition and processing of electroencephalographic signals have promising applications in the implementation of brain-computer interfaces. These devices allow the user to control a device without performing motor actions, and are usually made up of a biopotential acquisition stage and a personal computer (PC). This structure is very flexible and appropriate for research, but for final users it is necessary to migrate to an embedded system, eliminating the PC from the scheme. The strict real-time processing requirements of such systems justify the choice of a system on a chip field-programmable gate arrays (SoC-FPGA) for its implementation. This article proposes a platform for the acquisition and processing of electroencephalographic signals using this type of device, which combines the parallelism and speed capabilities of an FPGA with the simplicity of a general-purpose processor on a single chip. In this scheme, the FPGA is in charge of the real-time operation, acquiring and processing the signals, while the processor solves the high-level tasks, with the interconnection between processing elements solved by buses integrated into the chip. The proposed scheme was used to implement a brain-computer interface based on steady-state visual evoked potentials, which was used to command a speller. The first tests of the system show that a selection time of 5 seconds per command can be achieved. The time delay between the user’s selection and the system response has been estimated at 343 µs.</span>