scholarly journals Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures

Author(s):  
R. Balasubramonian ◽  
D. Albones ◽  
A. Buyuktosunoglu ◽  
S. Dwarkadas

2019 ◽  
Vol 26 (1) ◽  
pp. 39-62
Author(s):  
Stanislav O. Bezzubtsev ◽  
Vyacheslav V. Vasin ◽  
Dmitry Yu. Volkanov ◽  
Shynar R. Zhailauova ◽  
Vladislav A. Miroshnik ◽  
...  

The paper proposes the architecture and basic requirements for a network processor for OpenFlow switches of software-defined networks. An analysis of the architectures of well-known network processors is presented − NP-5 from EZchip (now Mellanox) and Tofino from Barefoot Networks. The advantages and disadvantages of two different versions of network processor architectures are considered: pipeline-based architecture, the stages of which are represented by a set of general-purpose processor cores, and pipeline-based architecture whose stages correspond to cores specialized for specific packet processing operations. Based on a dedicated set of the most common use case scenarios, a new architecture of the network processor unit (NPU) with functionally specialized pipeline stages was proposed. The article presents a description of the simulation model of the NPU of the proposed architecture. The simulation model of the network processor is implemented in C ++ languages using SystemC, the open-source C++ library. For the functional testing of the obtained NPU model, the described use case scenarios were implemented in C. In order to evaluate the performance of the proposed NPU architecture a set of software products developed by KM211 company and the KMX32 family of microcontrollers were used. Evaluation of NPU performance was made on the basis of a simulation model. Estimates of the processing time of one packet and the average throughput of the NPU model for each scenario are obtained.



2007 ◽  
Vol 16 (06) ◽  
pp. 911-927
Author(s):  
S. MAJZOUB ◽  
H. DIAB

Reconfigurable Systems represent a middle trade-off between speed and flexibility in the processor design world. It provides performance close to the custom-hardware and yet preserves some of the general-purpose processor flexibility. Recently, the area of reconfigurable computing has received considerable interest in both its forms: the FPGA and coarse-grain hardware. Since the field is still in its developing stage, it is important to perform hardware analysis and evaluation of certain key applications on target reconfigurable architectures to identify potential limitations and improvements. This paper presents the mapping and performance analysis of two encryption algorithms, namely Rijndael and Twofish, on a coarse grain reconfigurable platform, namely MorphoSys. MorphoSys is a reconfigurable architecture targeted for multimedia applications. Since many cryptographic algorithms involve bitwise operations, bitwise instruction set extension was proposed to enhance the performance. We present the details of the mapping of the bitwise operations involved in the algorithms with thorough analysis. The methodology we used can be utilized in other systems.



Author(s):  
Hui Yang ◽  
Anand Nayyar

: In the fast development of information, the information data is increasing in geometric multiples, and the speed of information transmission and storage space are required to be higher. In order to reduce the use of storage space and further improve the transmission efficiency of data, data need to be compressed. processing. In the process of data compression, it is very important to ensure the lossless nature of data, and lossless data compression algorithms appear. The gradual optimization design of the algorithm can often achieve the energy-saving optimization of data compression. Similarly, The effect of energy saving can also be obtained by improving the hardware structure of node. In this paper, a new structure is designed for sensor node, which adopts hardware acceleration, and the data compression module is separated from the node microprocessor.On the basis of the ASIC design of the algorithm, by introducing hardware acceleration, the energy consumption of the compressed data was successfully reduced, and the proportion of energy consumption and compression time saved by the general-purpose processor was as high as 98.4 % and 95.8 %, respectively. It greatly reduces the compression time and energy consumption.



1987 ◽  
Vol 14 (3) ◽  
pp. 134-140 ◽  
Author(s):  
K.A. Clarke

Practical classes in neurophysiology reinforce and complement the theoretical background in a number of ways, including demonstration of concepts, practice in planning and performance of experiments, and the production and maintenance of viable neural preparations. The balance of teaching objectives will depend upon the particular group of students involved. A technique is described which allows the embedding of real compound action potentials from one of the most basic introductory neurophysiology experiments—frog sciatic nerve, into interactive programs for student use. These retain all the elements of the “real experiment” in terms of appearance, presentation, experimental management and measurement by the student. Laboratory reports by the students show that the experiments are carefully and enthusiastically performed and the material is well absorbed. Three groups of student derive most benefit from their use. First, students whose future careers will not involve animal experiments do not spend time developing dissecting skills they will not use, but more time fulfilling the other teaching objectives. Second, relatively inexperienced students, struggling to produce viable neural material and master complicated laboratory equipment, who are often left with little time or motivation to take accurate readings or ponder upon neurophysiological concepts. Third, students in institutions where neurophysiology is taught with difficulty because of the high cost of equipment and lack of specific expertise, may well have access to a low cost general purpose microcomputer system.



Author(s):  
Hatem Abou-Senna ◽  
Mohamed El-Agroudy ◽  
Mustapha Mouloua ◽  
Essam Radwan

The use of express lanes (ELs) in freeway traffic management has seen increasing popularity throughout the United States, particularly in Florida. These lanes aim at making the most efficient transportation system management and operations tool to provide a more reliable trip. An important component of ELs is the channelizing devices used to delineate the separation between the ELs and the general-purpose lane. With the upcoming changes to the FHWA Manual on Uniform Traffic Control Devices, this study provided an opportunity to recommend changes affecting safety and efficiency on a nationwide level. It was important to understand the impacts on driver perception and performance in response to the color of the EL delineators. It was also valuable to understand the differences between demographics in responding to delineator colors under different driving conditions. The driving simulator was used to test the responses of several demographic groups to changes in marker color and driving conditions. Furthermore, participants were tested for several factors relevant to driving performance including visual and subjective responses to the changes in colors and driving conditions. Impacts on driver perception were observed via eye-tracking technology with changes to time of day, visibility, traffic density, roadway surface type, and, crucially, color of the delineating devices. The analyses concluded that white was the optimal and most significant color for notice of delineators across the majority of subjective and performance measures, followed by yellow, with black being the least desirable.



2010 ◽  
Vol 20 (02) ◽  
pp. 103-121 ◽  
Author(s):  
MOSTAFA I. SOLIMAN ◽  
ABDULMAJID F. Al-JUNAID

Technological advances in IC manufacturing provide us with the capability to integrate more and more functionality into a single chip. Today's modern processors have nearly one billion transistors on a single chip. With the increasing complexity of today's system, the designs have to be modeled at a high-level of abstraction before partitioning into hardware and software components for final implementation. This paper explains in detail the implementation and performance evaluation of a matrix processor called Mat-Core with SystemC (system level modeling language). Mat-Core is a research processor aiming at exploiting the increasingly number of transistors per IC to improve the performance of a wide range of applications. It extends a general-purpose scalar processor with a matrix unit. To hide memory latency, the extended matrix unit is decoupled into two components: address generation and data computation, which communicate through data queues. Like vector architectures, the data computation unit is organized in parallel lanes. However, on parallel lanes, Mat-Core can execute matrix-scalar, matrix-vector, and matrix-matrix instructions in addition to vector-scalar and vector-vector instructions. For controlling the execution of vector/matrix instructions on the matrix core, this paper extends the well known scoreboard technique. Furthermore, the performance of Mat-Core is evaluated on vector and matrix kernels. Our results show that the performance of four lanes Mat-Core with matrix registers of size 4 × 4 or 16 elements each, queues size of 10, start up time of 6 clock cycles, and memory latency of 10 clock cycles is about 0.94, 1.3, 2.3, 1.6, 2.3, and 5.5 FLOPs per clock cycle; achieved on scalar-vector multiplication, SAXPY, Givens, rank-1 update, vector-matrix multiplication, and matrix-matrix multiplication, respectively.



Author(s):  
Matias Javier Oliva ◽  
Pablo Andrés García ◽  
Enrique Mario Spinelli ◽  
Alejandro Luis Veiga

<span lang="EN-US">Real-time acquisition and processing of electroencephalographic signals have promising applications in the implementation of brain-computer interfaces. These devices allow the user to control a device without performing motor actions, and are usually made up of a biopotential acquisition stage and a personal computer (PC). This structure is very flexible and appropriate for research, but for final users it is necessary to migrate to an embedded system, eliminating the PC from the scheme. The strict real-time processing requirements of such systems justify the choice of a system on a chip field-programmable gate arrays (SoC-FPGA) for its implementation. This article proposes a platform for the acquisition and processing of electroencephalographic signals using this type of device, which combines the parallelism and speed capabilities of an FPGA with the simplicity of a general-purpose processor on a single chip. In this scheme, the FPGA is in charge of the real-time operation, acquiring and processing the signals, while the processor solves the high-level tasks, with the interconnection between processing elements solved by buses integrated into the chip. The proposed scheme was used to implement a brain-computer interface based on steady-state visual evoked potentials, which was used to command a speller. The first tests of the system show that a selection time of 5 seconds per command can be achieved. The time delay between the user’s selection and the system response has been estimated at 343 µs.</span>



Sign in / Sign up

Export Citation Format

Share Document