Design Space Exploration of High-Performance Parallel Architectures

Enric Musoll; Mario Nemirovsky

doi:10.29292/jics.v3i1.279

Design Space Exploration of High-Performance Parallel Architectures

Journal of Integrated Circuits and Systems ◽

10.29292/jics.v3i1.279 ◽

2008 ◽

Vol 3 (1) ◽

pp. 32-38

Author(s):

Enric Musoll ◽

Mario Nemirovsky

Keyword(s):

Power Efficiency ◽

High Performance ◽

Design Space Exploration ◽

Parallel Architecture ◽

Parallel Architectures ◽

Power Performance ◽

Power Budget ◽

Performance Goal ◽

Power Efficient ◽

On Chip

High-performance single-threaded processors achieve their performance goal partly by relying, among other architectural techniques, on speculation and large on-chip caches. The hardware to support these techniques is usually a large portion of the overall processor real state area, and therefore it consumes a significant amount of power that sometimes is not optimally used toward doing useful work. In this work, we study the intuitive fact that architectures with hardware support for threads are more power efficient than a more traditional single-threaded superscalar architecture. Toward this goal, we have created a model of the power, performance and area of several parallel architectures. This model shows that a parallel architecture can be designed so that (a) it requires less area and power (to reach the same performance), or (b) it achieves better power efficiency and less area (for the same power budget), or (c) it has higher performance and better power efficiency (for the same area constraint), when compared to a single-threaded superscalar architecture.

Download Full-text

Novel CNN-Based AP2D-Net Accelerator: An Area and Power Efficient Solution for Real-Time Applications on Mobile FPGA

Electronics ◽

10.3390/electronics9050832 ◽

2020 ◽

Vol 9 (5) ◽

pp. 832 ◽

Cited By ~ 2

Author(s):

Shuai Li ◽

Kuangyuan Sun ◽

Yukui Luo ◽

Nandakishor Yadav ◽

Ken Choi

Keyword(s):

Real Time ◽

Power Efficiency ◽

High Performance ◽

Memory Storage ◽

Ultra Low Power ◽

Power Efficient ◽

Battery Capacity ◽

Point Representation ◽

On Chip ◽

Better Than

Standard convolutional neural networks (CNNs) have large amounts of data redundancy, and the same accuracy can be obtained even in lower bit weights instead of floating-point representation. Most CNNs have to be developed and executed on high-end GPU-based workstations, for which it is hard to transplant the existing implementations onto portable edge FPGAs because of the limitation of on-chip block memory storage size and battery capacity. In this paper, we present adaptive pointwise convolution and 2D convolution joint network (AP2D-Net), an ultra-low power and relatively high throughput system combined with dynamic precision weights and activation. Our system has high performance, and we make a trade-off between accuracy and power efficiency by adopting unmanned aerial vehicle (UAV) object detection scenarios. We evaluate our system on the Zynq UltraScale+ MPSoC Ultra96 mobile FPGA platform. The target board can get the real-time speed of 30 fps under 5.6 W, and the FPGA on-chip power is only 0.6 W. The power efficiency of our system is 2.8× better than the best system design on a Jetson TX2 GPU and 1.9× better than the design on a PYNQ-Z1 SoC FPGA.

Download Full-text

Ultracompact and low-power-consumption silicon thermo-optic switch for high-speed data

Nanophotonics ◽

10.1515/nanoph-2020-0496 ◽

2020 ◽

Vol 10 (2) ◽

pp. 937-945

Author(s):

Ruihuan Zhang ◽

Yu He ◽

Yong Zhang ◽

Shaohua An ◽

Qingming Zhu ◽

...

Keyword(s):

Power Consumption ◽

Low Power ◽

High Speed ◽

High Performance ◽

Pulse Amplitude ◽

Telecommunication Networks ◽

Low Power Consumption ◽

Power Efficient ◽

High Speed Data ◽

On Chip

AbstractUltracompact and low-power-consumption optical switches are desired for high-performance telecommunication networks and data centers. Here, we demonstrate an on-chip power-efficient 2 × 2 thermo-optic switch unit by using a suspended photonic crystal nanobeam structure. A submilliwatt switching power of 0.15 mW is obtained with a tuning efficiency of 7.71 nm/mW in a compact footprint of 60 μm × 16 μm. The bandwidth of the switch is properly designed for a four-level pulse amplitude modulation signal with a 124 Gb/s raw data rate. To the best of our knowledge, the proposed switch is the most power-efficient resonator-based thermo-optic switch unit with the highest tuning efficiency and data ever reported.

Download Full-text

FPGA-Based On-Board Hyperspectral Imaging Compression: Benchmarking Performance and Energy Efficiency against GPU Implementations

Remote Sensing ◽

10.3390/rs12223741 ◽

2020 ◽

Vol 12 (22) ◽

pp. 3741 ◽

Cited By ~ 1

Author(s):

Julián Caba ◽

María Díaz ◽

Jesús Barba ◽

Raúl Guerra ◽

Jose A. de la Torre and Sebastián López

Keyword(s):

Reconfigurable Computing ◽

Power Efficiency ◽

Real Life ◽

Hyperspectral Data ◽

Actual Behavior ◽

Power Budget ◽

Image Block ◽

Power Efficient ◽

Time Requirements ◽

Computing Platforms

Remote-sensing platforms, such as Unmanned Aerial Vehicles, are characterized by limited power budget and low-bandwidth downlinks. Therefore, handling hyperspectral data in this context can jeopardize the operational time of the system. FPGAs have been traditionally regarded as the most power-efficient computing platforms. However, there is little experimental evidence to support this claim, which is especially critical since the actual behavior of the solutions based on reconfigurable technology is highly dependent on the type of application. In this work, a highly optimized implementation of an FPGA accelerator of the novel HyperLCA algorithm has been developed and thoughtfully analyzed in terms of performance and power efficiency. In this regard, a modification of the aforementioned lossy compression solution has also been proposed to be efficiently executed into FPGA devices using fixed-point arithmetic. Single and multi-core versions of the reconfigurable computing platforms are compared with three GPU-based implementations of the algorithm on as many NVIDIA computing boards: Jetson Nano, Jetson TX2 and Jetson Xavier NX. Results show that the single-core version of our FPGA-based solution fulfils the real-time requirements of a real-life hyperspectral application using a mid-range Xilinx Zynq-7000 SoC chip (XC7Z020-CLG484). Performance levels of the custom hardware accelerator are above the figures obtained by the Jetson Nano and TX2 boards, and power efficiency is higher for smaller sizes of the image block to be processed. To close the performance gap between our proposal and the Jetson Xavier NX, a multi-core version is proposed. The results demonstrate that a solution based on the use of various instances of the FPGA hardware compressor core achieves similar levels of performance than the state-of-the-art GPU, with better efficiency in terms of processed frames by watt.

Download Full-text

Framework for Design Exploration and Performance Analysis of RF-NoC Manycore Architecture

Journal of Low Power Electronics and Applications ◽

10.3390/jlpea10040037 ◽

2020 ◽

Vol 10 (4) ◽

pp. 37

Author(s):

Habiba Lahdhiri ◽

Jordane Lorandel ◽

Salvatore Monteleone ◽

Emmanuelle Bourdel ◽

Maurizio Palesi

Keyword(s):

High Performance ◽

Design Space Exploration ◽

Routing Algorithm ◽

Long Distance ◽

Promising Solution ◽

And Performance ◽

On Chip ◽

Many Core ◽

High Degree ◽

Real Traffic

The Network-on-chip (NoC) paradigm has been proposed as a promising solution to enable the handling of a high degree of integration in multi-/many-core architectures. Despite their advantages, wired NoC infrastructures are facing several performance issues regarding multi-hop long-distance communications. RF-NoC is an attractive solution offering high performance and multicast/broadcast capabilities. However, managing RF links is a critical aspect that relies on both application-dependent and architectural parameters. This paper proposes a design space exploration framework for OFDMA-based RF-NoC architecture, which takes advantage of both real application benchmarks simulated using Sniper and RF-NoC architecture modeled using Noxim. We adopted the proposed framework to finely configure a routing algorithm, working with real traffic, achieving up to 45% of delay reduction, compared to a wired NoC setup in similar conditions.

Download Full-text

WATT MATTERS MOST? DESIGN SPACE EXPLORATION OF HIGH-PERFORMANCE MICROPROCESSORS FOR POWER-PERFORMANCE EFFICIENCY

Journal of Circuits System and Computers ◽

10.1142/s0218126607003721 ◽

2007 ◽

Vol 16 (03) ◽

pp. 357-378

Author(s):

PEDRO TRANCOSO

Keyword(s):

High Performance ◽

Design Space Exploration ◽

High Sensitivity ◽

Clock Frequency ◽

Power Performance ◽

Large Power ◽

Multiple Parameters ◽

And Performance ◽

The One ◽

Power Awareness

Computer systems have evolved significantly in the last years leading to high-performance systems. This, however, has come with a cost of large power dissipation. As such, power-awareness has become a major factor in processor design. Therefore, it is important to have a complete understanding of the power and performance behavior of all processor components. In order to achieve this, the current work presents a comprehensive analysis of power-performance efficiency for different high-end microarchitecture configurations using three different workloads: multimedia, scientific, and database. The objectives of this work are: (1) to analyze and compare the power-performance efficiency for different workloads; (2) to present a sensitivity analysis for the microarchitecture parameters in order to identify which ones are more sensitive to changes in terms of power-performance efficiency; and (3) to propose power-performance efficient configurations for each workload. The simulation results show that the multimedia workload is the one achieving the highest efficiency but the database workload is the most sensitive to parameter changes. In addition, the results also show that the parameter sensitivity depends significantly on the workload. While the issue width and clock frequency present very high sensitivity across all workloads (approximately 100%), for the database workload, the first-level instruction cache size shows an even higher sensitivity (149%). The correct configuration of these microarchitecture parameters is essential. A careless configuration of a single parameter from a baseline setup may result in a loss of the power-performance efficiency of up to 99%. Finally, carefully tuning multiple parameters simultaneously may result in gains up to 154% over the power-performance efficiency of the baseline configuration.

Download Full-text

ILP Based Power-Aware Test Time Reduction Using On-Chip Clocking in NoC Based SoC

Journal of Low Power Electronics and Applications ◽

10.3390/jlpea9020019 ◽

2019 ◽

Vol 9 (2) ◽

pp. 19 ◽

Cited By ~ 1

Author(s):

Harikrishna Parmar ◽

Usha Mehta

Keyword(s):

Communication Network ◽

Network On Chip ◽

Test Time ◽

Test Scheduling ◽

Power Budget ◽

Power Efficient ◽

Efficient Test ◽

Ip Cores ◽

On Chip ◽

The Individual

Network-on-chip (NoC) based system-on-chips (SoC) has been a promising paradigm of core-based systems. It is difficult and challenging to test the individual Intellectual property IP cores of SoC with the constraints of test time and test power. By reusing the on-chip communication network of NoC for the testing of different cores in SoC, the test time and test cost can be reduced effectively. In this paper, we have proposed a power-aware test scheduling by reusing existing on-chip communication network. On-chip test clock frequencies are used for power efficient test scheduling. In this paper, an integer linear programming (ILP) model is proposed. This model assigns different frequencies to the NoC cores in such a way that it reduces the test time without crossing the power budget. Experimental results on the ITC’02 benchmark SoCs show that the proposed ILP method gives up to 50% reduction in test time compared to the existing method.

Download Full-text

Large-scale, power-efficient Au/VO2 active metasurfaces for ultrafast optical modulation

Nanophotonics ◽

10.1515/nanoph-2020-0354 ◽

2020 ◽

Vol 10 (2) ◽

pp. 909-918

Author(s):

Tongtong Kang ◽

Zongwei Ma ◽

Jun Qin ◽

Zheng Peng ◽

Weihao Yang ◽

...

Keyword(s):

Self Assembly ◽

Power Efficiency ◽

High Performance ◽

Optical Switching ◽

Large Scale ◽

Colloidal Crystal ◽

Resonance Wavelength ◽

Optical Modulation ◽

Power Efficient ◽

Ultrafast Optical

AbstractActive metasurfaces, in which the optical property of a metasurface device can be controlled by external stimuli, have attracted great research interest recently. For optical switching and modulation applications, high-performance active metasurfaces need to show high transparency, high power efficiency, as well as ultrafast switching and large-scale fabrication capability. This paper reports Au/VO2-based active metasurfaces meeting the requirements above. Centimeter-scale Au/VO2 metasurfaces are fabricated by polystyrene sphere colloidal crystal self-assembly. The devices show optical modulation on-off ratio up to 12.7 dB and insertion loss down to 3.3 dB at 2200 nm wavelength in the static heating experiment, and ΔT/T of 10% in ultrafast pump-probe experiments. In particular, by judiciously aligning the surface plasmon resonance wavelength to the pump wavelength of the femtosecond laser, the enhanced electric field at 800 nm is capable to switch off the extraordinary optical transmission effect at 2200 nm in 100 fs time scale. Compared to VO2 thin-film samples, the devices also show 50% power reduction for all-optical modulation. Our work provides a practical way to fabricate large-scale and power-efficient active metasurfaces for ultrafast optical modulation.

Download Full-text

ENERGY AND POWER EFFICIENT SYSTEM ON CHIP WITH NANOSHEET FET

Journal of Electronics and Informatics - September 2019 ◽

10.36548/jei.2019.1.006 ◽

2019 ◽

Vol 01 (01) ◽

pp. 51-59 ◽

Cited By ~ 5

Author(s):

Mohan Kumar N.

Keyword(s):

Power Dissipation ◽

Cost Efficiency ◽

Power Efficiency ◽

System On Chip ◽

Noise Tolerance ◽

Single Chip ◽

Efficient System ◽

Power Efficient ◽

On Chip ◽

The Cost

As the level of integration of IC increases, System on Chip (SoC) design has evolved. This technology comprises of several intellectual property blocks on a single chip. With downsizing of transistors, the traditional elements used impose several challenges such as power dissipation, leakage and so on. These factors risk the cost efficiency of microsystems and risk the semiconductor industry’s capability to prolong Moore’s law in the nanometer range. This is overcome by the introduction of carbon materials such as nanosheet FET. They are advantageous over the traditional elements in terms of area and power efficiency. We design an energy and power efficient SoC with nanosheet FET that provides noise tolerance and memory optimization.

Download Full-text

High Performance and Power Efficient On-Chip Network Designs through Multiple Injection Ports

10.4995/thesis/10251/18235 ◽

2012 ◽

Author(s):

Jesús Camacho Villanueva

Keyword(s):

High Performance ◽

Multiple Injection ◽

Power Efficient ◽

On Chip

Download Full-text

Parallel Architectures for MEDLINE Search

Encyclopedia of Healthcare Information Systems ◽

10.4018/978-1-59904-889-5.ch130 ◽

2008 ◽

pp. 1048-1055

Author(s):

Rajendra V. Boppana ◽

Suresh Chalasani ◽

Bob Badgett ◽

Jacqueline A. Pugh

Keyword(s):

High Performance Computing ◽

High Performance ◽

Response Times ◽

Low Cost ◽

Parallel Architecture ◽

Fast Response ◽

Parallel Architectures ◽

Medline Search ◽

Medline Database ◽

Performance Computing

In this article, we describe a parallel architecture for MEDLINE database integrated with search refinement tools to facilitate accurate and fast response to search requests by users. The proposed architecture, to be developed by the authors, will use low-cost, high-performance computing clusters consisting of Linux based personal computers and workstations (i) to provide subsecond response times for individual searches and (ii) to support several concurrent queries from search refinement programs such as SUMSearch.

Download Full-text