scholarly journals Dynamic Task Distribution Model for On-Chip Reconfigurable High Speed Computing System

2015 ◽  
Vol 2015 ◽  
pp. 1-12
Author(s):  
Mahendra Vucha ◽  
Arvind Rajawat

Modern embedded systems are being modeled as Reconfigurable High Speed Computing System (RHSCS) where Reconfigurable Hardware, that is, Field Programmable Gate Array (FPGA), and softcore processors configured on FPGA act as computing elements. As system complexity increases, efficient task distribution methodologies are essential to obtain high performance. A dynamic task distribution methodology based on Minimum Laxity First (MLF) policy (DTD-MLF) distributes the tasks of an application dynamically onto RHSCS and utilizes available RHSCS resources effectively. The DTD-MLF methodology takes the advantage of runtime design parameters of an application represented as DAG and considers the attributes of tasks in DAG and computing resources to distribute the tasks of an application onto RHSCS. In this paper, we have described the DTD-MLF model and verified its effectiveness by distributing some of real life benchmark applications onto RHSCS configured on Virtex-5 FPGA device. Some benchmark applications are represented as DAG and are distributed to the resources of RHSCS based on DTD-MLF model. The performance of the MLF based dynamic task distribution methodology is compared with static task distribution methodology. The comparison shows that the dynamic task distribution model with MLF criteria outperforms the static task distribution techniques in terms of schedule length and effective utilization of available RHSCS resources.

Author(s):  
Diandian Zhang ◽  
Han Zhang ◽  
Jeronimo Castrillon ◽  
Torsten Kempf ◽  
Bart Vanthournout ◽  
...  

Efficient runtime resource management in multi-processor systems-on-chip (MPSoCs) for achieving high performance and low energy consumption is one of the key challenges for system designers. OSIP, an operating system application-specific instruction-set processor, together with its well-defined programming model, provides a promising solution. It delivers high computational performance to deal with dynamic task scheduling and mapping. Being programmable, it can easily be adapted to different systems. However, the distributed computation among the different processing elements introduces complexity to the communication architecture, which tends to become the bottleneck of such systems. In this work, the authors highlight the vital importance of the communication architecture for OSIP-based systems and optimize the communication architecture. Furthermore, the effects of OSIP and the communication architecture are investigated jointly from the system point of view, based on a broad case study for a real life application (H.264) and a synthetic benchmark application.


Author(s):  
Diandian Zhang ◽  
Han Zhang ◽  
Jeronimo Castrillon ◽  
Torsten Kempf ◽  
Bart Vanthournout ◽  
...  

Efficient runtime resource management in multi-processor systems-on-chip (MPSoCs) for achieving high performance and low energy consumption is one of the key challenges for system designers. OSIP, an operating system application-specific instruction-set processor, together with its well-defined programming model, provides a promising solution. It delivers high computational performance to deal with dynamic task scheduling and mapping. Being programmable, it can easily be adapted to different systems. However, the distributed computation among the different processing elements introduces complexity to the communication architecture, which tends to become the bottleneck of such systems. In this work, the authors highlight the vital importance of the communication architecture for OSIP-based systems and optimize the communication architecture. Furthermore, the effects of OSIP and the communication architecture are investigated jointly from the system point of view, based on a broad case study for a real life application (H.264) and a synthetic benchmark application.


Nanophotonics ◽  
2020 ◽  
Vol 10 (2) ◽  
pp. 937-945
Author(s):  
Ruihuan Zhang ◽  
Yu He ◽  
Yong Zhang ◽  
Shaohua An ◽  
Qingming Zhu ◽  
...  

AbstractUltracompact and low-power-consumption optical switches are desired for high-performance telecommunication networks and data centers. Here, we demonstrate an on-chip power-efficient 2 × 2 thermo-optic switch unit by using a suspended photonic crystal nanobeam structure. A submilliwatt switching power of 0.15 mW is obtained with a tuning efficiency of 7.71 nm/mW in a compact footprint of 60 μm × 16 μm. The bandwidth of the switch is properly designed for a four-level pulse amplitude modulation signal with a 124 Gb/s raw data rate. To the best of our knowledge, the proposed switch is the most power-efficient resonator-based thermo-optic switch unit with the highest tuning efficiency and data ever reported.


Author(s):  
David R. Selviah ◽  
Janti Shawash

This chapter celebrates 50 years of first and higher order neural network (HONN) implementations in terms of the physical layout and structure of electronic hardware, which offers high speed, low latency, compact, low cost, low power, mass produced systems. Low latency is essential for practical applications in real time control for which software implementations running on CPUs are too slow. The literature review chapter traces the chronological development of electronic neural networks (ENN) discussing selected papers in detail from analog electronic hardware, through probabilistic RAM, generalizing RAM, custom silicon Very Large Scale Integrated (VLSI) circuit, Neuromorphic chips, pulse stream interconnected neurons to Application Specific Integrated circuits (ASICs) and Zero Instruction Set Chips (ZISCs). Reconfigurable Field Programmable Gate Arrays (FPGAs) are given particular attention as the most recent generation incorporate Digital Signal Processing (DSP) units to provide full System on Chip (SoC) capability offering the possibility of real-time, on-line and on-chip learning.


Author(s):  
TAJ ALAM ◽  
PARITOSH DUBEY ◽  
ANKIT KUMAR

Distributed systems are efficient means of realizing high-performance computing (HPC). They are used in meeting the demand of executing large-scale high-performance computational jobs. Scheduling the tasks on such computational resources is one of the prime concerns in the heterogeneous distributed systems. Scheduling jobs on distributed systems are NP-complete in nature. Scheduling requires either heuristic or metaheuristic approach for sub-optimal but acceptable solutions. An adaptive threshold-based scheduler is one such heuristic approach. This work proposes adaptive threshold-based scheduler for batch of independent jobs (ATSBIJ) with the objective of optimizing the makespan of the jobs submitted for execution on cloud computing systems. ATSBIJ exploits the features of interval estimation for calculating the threshold values for generation of efficient schedule of the batch. Simulation studies on CloudSim ensures that the ATSBIJ approach works effectively for real life scenario.


2019 ◽  
Vol 28 (05) ◽  
pp. 1950079 ◽  
Author(s):  
Trailokya Nath Sasamal ◽  
Ashutosh Kumar Singh ◽  
Umesh Ghanekar

Quantum-dot cellular automata (QCA) is one of the promising technologies that enable nanoscale circuit design with high performance and low-power consumption features. As memory cell and flip-flops are rudimentary for most of the digital circuits, having a high speed, and a less complex memory cell is significantly important. This paper presents novel architecture of D flip-flops and memory cell using a recently proposed five-input majority gate in QCA technology and simulated by QCADesigner tool version 2.0.3. The simulation results show that the proposed D flip-flops and the memory cell are more superior to the existing designs by considering the common design parameters. The proposed RAM cell spreads over an area of 0.12[Formula: see text][Formula: see text]m2and delay of 1.5 clock cycles. The proposed level-triggered, positive/negative edge-triggered, and dual edge-triggered D flip-flop uses 14%, 33%, and 21% less area, whereas the latency is 40%, 27%, and 25% less when compared to the previous best design. In addition, all the proposed designs are implemented in a single layer QCA and do not require any single or multilayer wire crossing.


2004 ◽  
Vol 13 (04) ◽  
pp. 789-811
Author(s):  
EDUARD ALARCÓN ◽  
GERARD VILLAR ◽  
ALBERTO POVEDA

Two case examples of high-speed CMOS microelectronic implementations of high-performance controllers for switching power converters are presented. The design and implementation of a current-programmed controller and a general-purpose feedforward one-cycle controller are described. The integrated circuit controllers attain high-performance by means of using current-mode analog signal processing, hence allowing high switching frequencies that extend the operation margin compared to previous designs. Global layout-extracted transistor-level simulation results for 0.8 μm and 0.35 μm standard CMOS technologies confirm both the correct operation of the circuits in terms of bandwidth as well as their functionality for the control of switching power converters. The circuits may be used either as standalone IC controllers or as controller circuits that are technology-compatible with on-chip switching power converters and on-chip loads for future powered systems-on-chip.


2014 ◽  
Vol 550 ◽  
pp. 126-136
Author(s):  
N. Ramya Rani

:Floating point arithmetic plays a major role in scientific and embedded computing applications. But the performance of field programmable gate arrays (FPGAs) used for floating point applications is poor due to the complexity of floating point arithmetic. The implementation of floating point units on FPGAs consumes a large amount of resources and that leads to the development of embedded floating point units in FPGAs. Embedded applications like multimedia, communication and DSP algorithms use floating point arithmetic in processing graphics, Fourier transformation, coding, etc. In this paper, methodologies are presented for the implementation of embedded floating point units on FPGA. The work is focused with the aim of achieving high speed of computations and to reduce the power for evaluating expressions. An application that demands high performance floating point computation can achieve better speed and density by incorporating embedded floating point units. Additionally this paper describes a comparative study of the design of single precision and double precision pipelined floating point arithmetic units for evaluating expressions. The modules are designed using VHDL simulation in Xilinx software and implemented on VIRTEX and SPARTAN FPGAs.


2016 ◽  
Vol 2 (1) ◽  
Author(s):  
Manish Sharma ◽  
Prof. Sonu Lal

Conventional distributed arithmetic (DA) is popular in field programmable gate array (FPGA) design, and it features on-chip ROM to achieve high speed and regularity. In this paper, we describe high speed area efficient 1-D discrete wavelet transform (DWT) using 9/7 filter based new efficient distributed arithmetic (NEDA) Technique. Being area efficient architecture free of ROM, multiplication, and subtraction, NEDA can also expose the redundancy existing in the adder array consisting of entries of 0 and 1. This architecture supports any size of image pixel value and any level of decomposition. The parallel structure has 100% hardware utilization efficiency.


Sign in / Sign up

Export Citation Format

Share Document