High-Performance Reconfigurable Computing

Author(s):  
Mário Pereira Vestias

High-performance reconfigurable computing systems integrate reconfigurable technology in the computing architecture to improve performance. Besides performance, reconfigurable hardware devices also achieve lower power consumption compared to general-purpose processors. Better performance and lower power consumption could be achieved using application-specific integrated circuit (ASIC) technology. However, ASICs are not reconfigurable, turning them application specific. Reconfigurable logic becomes a major advantage when hardware flexibility permits to speed up whatever the application with the same hardware module. The first and most common devices utilized for reconfigurable computing are fine-grained FPGAs with a large hardware flexibility. To reduce the performance and area overhead associated with the reconfigurability, coarse-grained reconfigurable solutions has been proposed as a way to achieve better performance and lower power consumption. In this chapter, the authors provide a description of reconfigurable hardware for high-performance computing.

Author(s):  
Mário Pereira Vestias

High-Performance Reconfigurable Computing systems integrate reconfigurable technology in the computing architecture to improve performance. Besides performance, reconfigurable hardware devices also achieve lower power consumption compared to General-Purpose Processors. Better performance and lower power consumption could be achieved using Application Specific Integrated Circuit (ASIC) technology. However, ASICs are not reconfigurable, turning them application specific. Reconfigurable logic becomes a major advantage when hardware flexibility permits to speed up whatever the application with the same hardware module. The first and most common devices utilized for reconfigurable computing are fine-grained FPGAs with a large hardware flexibility. To reduce the performance and area overhead associated with the reconfigurability, coarse-grained reconfigurable solutions has been proposed as a way to achieve better performance and lower power consumption. In this chapter we will provide a description of reconfigurable hardware for high performance computing.


2012 ◽  
Vol 2012 ◽  
pp. 1-17 ◽  
Author(s):  
Alexander Thomas ◽  
Michael Rückauer ◽  
Jürgen Becker

Since the introduction of the first reconfigurable devices in 1985 the field of reconfigurable computing developed a broad variety of architectures from fine-grained to coarse-grained types. However, the main disadvantages of the reconfigurable approaches, the costs in area, and power consumption, are still present. This contribution presents a solution for application-driven adaptation of our reconfigurable architecture at register transfer level (RTL) to reduce the resource requirements and power consumption while keeping the flexibility and performance for a predefined set of applications. Furthermore, implemented runtime adaptive features like online routing and configuration sequencing will be presented and discussed. A presentation of the prototype chip of this architecture designed in 90 nm standard cell technology manufactured by TSMC will conclude this contribution.


2015 ◽  
Vol 57 (3) ◽  
Author(s):  
Lars Bauer ◽  
Jörg Henkel ◽  
Andreas Herkersdorf ◽  
Michael A. Kochte ◽  
Johannes M. Kühn ◽  
...  

AbstractAchieving system-level dependability is a demanding task. The manifold requirements and dependability threats can no longer be statically addressed at individual abstraction layers. Instead, all components of future multi-processor systems-on-chip (MPSoCs) have to contribute to this common goal in an adaptive manner.In this paper we target a generic heterogeneous MPSoC that combines general purpose processors along with dedicated application-specific hard-wired accelerators, fine-grained reconfigurable processors, and coarse-grained reconfigurable architectures. We present different


Author(s):  
Zhuliang Yao ◽  
Shijie Cao ◽  
Wencong Xiao ◽  
Chen Zhang ◽  
Lanshun Nie

In trained deep neural networks, unstructured pruning can reduce redundant weights to lower storage cost. However, it requires the customization of hardwares to speed up practical inference. Another trend accelerates sparse model inference on general-purpose hardwares by adopting coarse-grained sparsity to prune or regularize consecutive weights for efficient computation. But this method often sacrifices model accuracy. In this paper, we propose a novel fine-grained sparsity approach, Balanced Sparsity, to achieve high model accuracy with commercial hardwares efficiently. Our approach adapts to high parallelism property of GPU, showing incredible potential for sparsity in the widely deployment of deep learning services. Experiment results show that Balanced Sparsity achieves up to 3.1x practical speedup for model inference on GPU, while retains the same high model accuracy as finegrained sparsity.


Author(s):  
Sheng Kang ◽  
Guofeng Chen ◽  
Chun Wang ◽  
Ruiquan Ding ◽  
Jiajun Zhang ◽  
...  

With the advent of big data and cloud computing solutions, enterprise demand for servers is increasing. There is especially high growth for Intel based x86 server platforms. Today’s datacenters are in constant pursuit of high performance/high availability computing solutions coupled with low power consumption and low heat generation and the ability to manage all of this through advanced telemetry data gathering. This paper showcases one such solution of an updated rack and server architecture that promises such improvements. The ability to manage server and data center power consumption and cooling more completely is critical in effectively managing datacenter costs and reducing the PUE in the data center. Traditional Intel based 1U and 2U form factor servers have existed in the data center for decades. These general purpose x86 server designs by the major OEM’s are, for all practical purposes, very similar in their power consumption and thermal output. Power supplies and thermal designs for server in the past have not been optimized for high efficiency. In addition, IT managers need to know more information about servers in order to optimize data center cooling and power use, an improved server/rack design needs to be built to take advantage of more efficient power supplies or PDU’s and more efficient means of cooling server compute resources than from traditional internal server fans. This is the constant pursuit of corporations looking at new ways to improving efficiency and gaining a competitive advantage. A new way to optimize power consumption and improve cooling is a complete redesign of the traditional server rack. Extracting internal server power supplies and server fans and centralizing these within the rack aims to achieve this goal. This type of design achieves an entirely new low power target by utilizing centralized, high efficiency PDU’s that power all servers within the rack. Cooling is improved by also utilizing large efficient rack based fans for airflow to all servers. Also, opening up the server design is to allow greater airflow across server components for improved cooling. This centralized power supply breaks through the traditional server power limits. Rack based PDU’s can adjust the power efficiency to a more optimum point. Combine this with the use of online + offline modes within one single power supply. Cold backup makes data center power to achieve optimal power efficiency. In addition, unifying the mechanical structure and thermal definitions within the rack solution for server cooling and PSU information allows IT to collect all server power and thermal information centrally for improved ease in analyzing and processing.


Author(s):  
Saranya R ◽  
Pradeep C ◽  
Neena Baby ◽  
Radhakrishnan R

Reconfigurable computing for DSP remains an active area to explore as the need for incorporation with more conventional DSP technologies turn out to be obvious. Conventionally, the majority of the work in the area of reconfigurable computing is aimed on fine grained FPGA devices. Over the years, the focus is shifted from bit level granularity to a coarse grained composition. FIR filter remains and persist to be an important building block in various DSP systems. It computes the output by multiplying input samples with a set of coefficients followed by addition. Here multipliers and adders are modeled using the concept of divide and conquer. For developing a reconfiguarble FIR filter, different tap filters are designed as separate reconfigurable modules. Furthermore, there is an additional concern for making the system fault tolerant. A fault detection mechanism is introduced to detect the faults based on the nature of operands. The reconfigurable modules are structurally modeled in Verilog HDL and simulated and synthesized using Xilinx ISE 14.2. A comparison of the device utilization of reconfigurable modules is also presented in this paper by implementing the design on various Virtex FPGA devices.


Computers ◽  
2020 ◽  
Vol 9 (3) ◽  
pp. 70
Author(s):  
Carolina Fernández ◽  
Sergio Giménez ◽  
Eduard Grasa ◽  
Steve Bunch

The lack of high-performance RINA (Recursive InterNetwork Architecture) implementations to date makes it hard to experiment with RINA as an underlay networking fabric solution for different types of networks, and to assess RINA’s benefits in practice on scenarios with high traffic loads. High-performance router implementations typically require dedicated hardware support, such as FPGAs (Field Programmable Gate Arrays) or specialized ASICs (Application Specific Integrated Circuit). With the advance of hardware programmability in recent years, new possibilities unfold to prototype novel networking technologies. In particular, the use of the P4 programming language for programmable ASICs holds great promise for developing a RINA router. This paper details the design and part of the implementation of the first P4-based RINA interior router, which reuses the layer management components of the IRATI Linux-based RINA implementation and implements the data-transfer components using a P4 program. We also describe the configuration and testing of our initial deployment scenarios, using ancillary open-source tools such as the P4 reference test software switch (BMv2) or the P4Runtime API.


2013 ◽  
Vol 22 (10) ◽  
pp. 1340033 ◽  
Author(s):  
HONGLIANG ZHAO ◽  
YIQIANG ZHAO ◽  
YIWEI SONG ◽  
JUN LIAO ◽  
JUNFENG GENG

A low power readout integrated circuit (ROIC) for 512 × 512 cooled infrared focal plane array (IRFPA) is presented. A capacitive trans-impedance amplifier (CTIA) with high gain cascode amplifier and inherent correlated double sampling (CDS) configuration is employed to achieve a high performance readout interface for the IRFPA with a pixel size of 30 × 30 μm2. By optimizing column readout timing and using two operating modes in column amplifiers, the power consumption is significantly reduced. The readout chip is implemented in a standard 0.35 μm 2P4M CMOS technology. The measurement results show the proposed ROIC achieves a readout rate of 10 MHz with 70 mW power consumption under 3.3 V supply voltage from 77 K to 150 K operating temperature. And it occupies a chip area of 18.4 × 17.5 mm2.


2011 ◽  
Vol 20 (04) ◽  
pp. 641-655 ◽  
Author(s):  
REZA FAGHIH MIRZAEE ◽  
MOHAMMAD HOSSEIN MOAIYERI ◽  
HAMID KHORSAND ◽  
KEIVAN NAVI

A new 1-bit hybrid Full Adder cell is presented in this paper with the aim of reaching a robust and high-performance adder structure. While most of recent Full Adders are proposed with the purpose of using fewer transistors, they suffer from some disadvantages such as output or internal non-full-swing nodes and poor driving capability. Considering these drawbacks, they might not be a good choice to operate in a practical environment. Lowering the number of transistors can inherently lead to smaller occupied area, higher speed and lower power consumption. However, other parameters, such as robustness to PVT variations and rail-to-rail operation, should also be considered. While the robustness is taken into account, HSPICE simulation demonstrates a great improvement in terms of speed and power-delay product (PDP).


Sign in / Sign up

Export Citation Format

Share Document