Dynamic Task Distribution Model for On-Chip Reconfigurable High Speed Computing System

Modern embedded systems are being modeled as Reconfigurable High Speed Computing System (RHSCS) where Reconfigurable Hardware, that is, Field Programmable Gate Array (FPGA), and softcore processors configured on FPGA act as computing elements. As system complexity increases, efficient task distribution methodologies are essential to obtain high performance. A dynamic task distribution methodology based on Minimum Laxity First (MLF) policy (DTD-MLF) distributes the tasks of an application dynamically onto RHSCS and utilizes available RHSCS resources effectively. The DTD-MLF methodology takes the advantage of runtime design parameters of an application represented as DAG and considers the attributes of tasks in DAG and computing resources to distribute the tasks of an application onto RHSCS. In this paper, we have described the DTD-MLF model and verified its effectiveness by distributing some of real life benchmark applications onto RHSCS configured on Virtex-5 FPGA device. Some benchmark applications are represented as DAG and are distributed to the resources of RHSCS based on DTD-MLF model. The performance of the MLF based dynamic task distribution methodology is compared with static task distribution methodology. The comparison shows that the dynamic task distribution model with MLF criteria outperforms the static task distribution techniques in terms of schedule length and effective utilization of available RHSCS resources.

Download Full-text

Optimized Communication Architecture of MPSoCs with a Hardware Scheduler

International Journal of Embedded and Real-Time Communication Systems ◽

10.4018/jertcs.2011070101 ◽

2011 ◽

Vol 2 (3) ◽

pp. 1-20 ◽

Cited By ~ 4

Author(s):

Diandian Zhang ◽

Han Zhang ◽

Jeronimo Castrillon ◽

Torsten Kempf ◽

Bart Vanthournout ◽

...

Keyword(s):

High Performance ◽

Programming Model ◽

Real Life ◽

Point Of View ◽

Communication Architecture ◽

Dynamic Task ◽

Computational Performance ◽

Systems On Chip ◽

Promising Solution ◽

On Chip

Efficient runtime resource management in multi-processor systems-on-chip (MPSoCs) for achieving high performance and low energy consumption is one of the key challenges for system designers. OSIP, an operating system application-specific instruction-set processor, together with its well-defined programming model, provides a promising solution. It delivers high computational performance to deal with dynamic task scheduling and mapping. Being programmable, it can easily be adapted to different systems. However, the distributed computation among the different processing elements introduces complexity to the communication architecture, which tends to become the bottleneck of such systems. In this work, the authors highlight the vital importance of the communication architecture for OSIP-based systems and optimize the communication architecture. Furthermore, the effects of OSIP and the communication architecture are investigated jointly from the system point of view, based on a broad case study for a real life application (H.264) and a synthetic benchmark application.

Download Full-text

Optimized Communication Architecture of MPSoCs with a Hardware Scheduler

Adoption and Optimization of Embedded and Real-Time Communication Systems ◽

10.4018/978-1-4666-2776-5.ch009 ◽

2013 ◽

pp. 163-180 ◽

Cited By ~ 1

Author(s):

Diandian Zhang ◽

Han Zhang ◽

Jeronimo Castrillon ◽

Torsten Kempf ◽

Bart Vanthournout ◽

...

Keyword(s):

High Performance ◽

Programming Model ◽

Real Life ◽

Point Of View ◽

Communication Architecture ◽

Dynamic Task ◽

Computational Performance ◽

Systems On Chip ◽

Promising Solution ◽

On Chip

Download Full-text

Ultracompact and low-power-consumption silicon thermo-optic switch for high-speed data

Nanophotonics ◽

10.1515/nanoph-2020-0496 ◽

2020 ◽

Vol 10 (2) ◽

pp. 937-945

Author(s):

Ruihuan Zhang ◽

Yu He ◽

Yong Zhang ◽

Shaohua An ◽

Qingming Zhu ◽

...

Keyword(s):

Power Consumption ◽

Low Power ◽

High Speed ◽

High Performance ◽

Pulse Amplitude ◽

Telecommunication Networks ◽

Low Power Consumption ◽

Power Efficient ◽

High Speed Data ◽

On Chip

AbstractUltracompact and low-power-consumption optical switches are desired for high-performance telecommunication networks and data centers. Here, we demonstrate an on-chip power-efficient 2 × 2 thermo-optic switch unit by using a suspended photonic crystal nanobeam structure. A submilliwatt switching power of 0.15 mW is obtained with a tuning efficiency of 7.71 nm/mW in a compact footprint of 60 μm × 16 μm. The bandwidth of the switch is properly designed for a four-level pulse amplitude modulation signal with a 124 Gb/s raw data rate. To the best of our knowledge, the proposed switch is the most power-efficient resonator-based thermo-optic switch unit with the highest tuning efficiency and data ever reported.

Download Full-text

Fifty years of Electronic Hardware Implementations of First and Higher Order Neural Networks

Artificial Higher Order Neural Networks for Computer Science and Engineering ◽

10.4018/978-1-61520-711-4.ch012 ◽

2010 ◽

pp. 269-285 ◽

Cited By ~ 3

Author(s):

David R. Selviah ◽

Janti Shawash

Keyword(s):

Neural Networks ◽

Real Time ◽

High Speed ◽

Higher Order ◽

Low Latency ◽

Real Time Control ◽

Practical Applications ◽

Field Programmable ◽

On Chip ◽

Electronic Hardware

This chapter celebrates 50 years of first and higher order neural network (HONN) implementations in terms of the physical layout and structure of electronic hardware, which offers high speed, low latency, compact, low cost, low power, mass produced systems. Low latency is essential for practical applications in real time control for which software implementations running on CPUs are too slow. The literature review chapter traces the chronological development of electronic neural networks (ENN) discussing selected papers in detail from analog electronic hardware, through probabilistic RAM, generalizing RAM, custom silicon Very Large Scale Integrated (VLSI) circuit, Neuromorphic chips, pulse stream interconnected neurons to Application Specific Integrated circuits (ASICs) and Zero Instruction Set Chips (ZISCs). Reconfigurable Field Programmable Gate Arrays (FPGAs) are given particular attention as the most recent generation incorporate Digital Signal Processing (DSP) units to provide full System on Chip (SoC) capability offering the possibility of real-time, on-line and on-chip learning.

Download Full-text

Adaptive Threshold Based Scheduler for Batch of Independent Jobs for Cloud Computing System

Research Anthology on Architectures, Frameworks, and Integration Strategies for Distributed and Cloud Computing ◽

10.4018/978-1-7998-5339-8.ch110 ◽

2021 ◽

pp. 2246-2266

Author(s):

TAJ ALAM ◽

PARITOSH DUBEY ◽

ANKIT KUMAR

Keyword(s):

Cloud Computing ◽

Distributed Systems ◽

High Performance ◽

Large Scale ◽

Real Life ◽

Interval Estimation ◽

Computing System ◽

Adaptive Threshold ◽

Batch Simulation ◽

Heterogeneous Distributed Systems

Distributed systems are efficient means of realizing high-performance computing (HPC). They are used in meeting the demand of executing large-scale high-performance computational jobs. Scheduling the tasks on such computational resources is one of the prime concerns in the heterogeneous distributed systems. Scheduling jobs on distributed systems are NP-complete in nature. Scheduling requires either heuristic or metaheuristic approach for sub-optimal but acceptable solutions. An adaptive threshold-based scheduler is one such heuristic approach. This work proposes adaptive threshold-based scheduler for batch of independent jobs (ATSBIJ) with the objective of optimizing the makespan of the jobs submitted for execution on cloud computing systems. ATSBIJ exploits the features of interval estimation for calculating the threshold values for generation of efficient schedule of the batch. Simulation studies on CloudSim ensures that the ATSBIJ approach works effectively for real life scenario.

Download Full-text

Design and Implementation of QCA D-Flip-Flops and RAM Cell Using Majority Gates

Journal of Circuits System and Computers ◽

10.1142/s0218126619500798 ◽

2019 ◽

Vol 28 (05) ◽

pp. 1950079 ◽

Cited By ~ 4

Author(s):

Trailokya Nath Sasamal ◽

Ashutosh Kumar Singh ◽

Umesh Ghanekar

Keyword(s):

Circuit Design ◽

High Speed ◽

High Performance ◽

Single Layer ◽

Memory Cell ◽

Design Parameters ◽

Flip Flop ◽

Negative Edge ◽

Wire Crossing ◽

The Common

Quantum-dot cellular automata (QCA) is one of the promising technologies that enable nanoscale circuit design with high performance and low-power consumption features. As memory cell and flip-flops are rudimentary for most of the digital circuits, having a high speed, and a less complex memory cell is significantly important. This paper presents novel architecture of D flip-flops and memory cell using a recently proposed five-input majority gate in QCA technology and simulated by QCADesigner tool version 2.0.3. The simulation results show that the proposed D flip-flops and the memory cell are more superior to the existing designs by considering the common design parameters. The proposed RAM cell spreads over an area of 0.12[Formula: see text][Formula: see text]m2and delay of 1.5 clock cycles. The proposed level-triggered, positive/negative edge-triggered, and dual edge-triggered D flip-flop uses 14%, 33%, and 21% less area, whereas the latency is 40%, 27%, and 25% less when compared to the previous best design. In addition, all the proposed designs are implemented in a single layer QCA and do not require any single or multilayer wire crossing.

Download Full-text

CMOS INTEGRATED CIRCUIT CONTROLLERS FOR SWITCHING POWER CONVERTERS

Journal of Circuits System and Computers ◽

10.1142/s0218126604001714 ◽

2004 ◽

Vol 13 (04) ◽

pp. 789-811

Author(s):

EDUARD ALARCÓN ◽

GERARD VILLAR ◽

ALBERTO POVEDA

Keyword(s):

Integrated Circuit ◽

High Speed ◽

High Performance ◽

Power Converters ◽

Correct Operation ◽

Case Examples ◽

Cmos Integrated Circuit ◽

Switching Power ◽

Switching Power Converters ◽

On Chip

Two case examples of high-speed CMOS microelectronic implementations of high-performance controllers for switching power converters are presented. The design and implementation of a current-programmed controller and a general-purpose feedforward one-cycle controller are described. The integrated circuit controllers attain high-performance by means of using current-mode analog signal processing, hence allowing high switching frequencies that extend the operation margin compared to previous designs. Global layout-extracted transistor-level simulation results for 0.8 μm and 0.35 μm standard CMOS technologies confirm both the correct operation of the circuits in terms of bandwidth as well as their functionality for the control of switching power converters. The circuits may be used either as standalone IC controllers or as controller circuits that are technology-compatible with on-chip switching power converters and on-chip loads for future powered systems-on-chip.

Download Full-text

Design trade-offs for on-chip driving of high-speed high-performance ADCs in static BIST applications

2016 IEEE 21st International Mixed-Signal Testing Workshop (IMSTW) ◽

10.1109/ims3tw.2016.7524229 ◽

2016 ◽

Cited By ~ 4

Author(s):

A. J. Gines ◽

E. Peralias ◽

G. Leger ◽

A. Rueda ◽

G. Renaud ◽

...

Keyword(s):

High Speed ◽

High Performance ◽

Trade Offs ◽

On Chip

Download Full-text

Implementation of Embedded Floating Point Arithmetic Units on FPGA

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.550.126 ◽

2014 ◽

Vol 550 ◽

pp. 126-136

Author(s):

N. Ramya Rani

Keyword(s):

High Speed ◽

High Performance ◽

Floating Point ◽

Double Precision ◽

Embedded Computing ◽

Floating Point Arithmetic ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Arithmetic Units ◽

Point Arithmetic

:Floating point arithmetic plays a major role in scientific and embedded computing applications. But the performance of field programmable gate arrays (FPGAs) used for floating point applications is poor due to the complexity of floating point arithmetic. The implementation of floating point units on FPGAs consumes a large amount of resources and that leads to the development of embedded floating point units in FPGAs. Embedded applications like multimedia, communication and DSP algorithms use floating point arithmetic in processing graphics, Fourier transformation, coding, etc. In this paper, methodologies are presented for the implementation of embedded floating point units on FPGA. The work is focused with the aim of achieving high speed of computations and to reduce the power for evaluating expressions. An application that demands high performance floating point computation can achieve better speed and density by incorporating embedded floating point units. Additionally this paper describes a comparative study of the design of single precision and double precision pipelined floating point arithmetic units for evaluating expressions. The modules are designed using VHDL simulation in Xilinx software and implemented on VIRTEX and SPARTAN FPGAs.

Download Full-text

ANALYSIS OF EFFECTS OF USING 9/7 WAVELET COEFFICIENTS IN MULTI-RESOLUTION ANALYSIS

SMART MOVES JOURNAL IJOSCIENCE ◽

10.24113/ijoscience.v2i1.68 ◽

2016 ◽

Vol 2 (1) ◽

Author(s):

Manish Sharma ◽

Prof. Sonu Lal

Keyword(s):

High Speed ◽

Utilization Efficiency ◽

Parallel Structure ◽

Discrete Wavelet ◽

Distributed Arithmetic ◽

Fpga Design ◽

Multi Resolution Analysis ◽

Field Programmable ◽

On Chip ◽

Area Efficient

Conventional distributed arithmetic (DA) is popular in field programmable gate array (FPGA) design, and it features on-chip ROM to achieve high speed and regularity. In this paper, we describe high speed area efficient 1-D discrete wavelet transform (DWT) using 9/7 filter based new efficient distributed arithmetic (NEDA) Technique. Being area efficient architecture free of ROM, multiplication, and subtraction, NEDA can also expose the redundancy existing in the adder array consisting of entries of 0 and 1. This architecture supports any size of image pixel value and any level of decomposition. The parallel structure has 100% hardware utilization efficiency.

Download Full-text