Design Space Exploration on High-Order QAM Demodulation Circuits: Algorithms, Arithmetic and Approximation Techniques

Ioannis Stratakos; Vasileios Leon; Giorgos Armeniakos; George Lentaris; Dimitrios Soudris

doi:10.3390/electronics11010039

Design Space Exploration on High-Order QAM Demodulation Circuits: Algorithms, Arithmetic and Approximation Techniques

Electronics ◽

10.3390/electronics11010039 ◽

2021 ◽

Vol 11 (1) ◽

pp. 39

Author(s):

Ioannis Stratakos ◽

Vasileios Leon ◽

Giorgos Armeniakos ◽

George Lentaris ◽

Dimitrios Soudris

Keyword(s):

Fixed Point ◽

Orthogonal Frequency Division Multiplexing ◽

Design Space Exploration ◽

Performance Metrics ◽

Circuit Complexity ◽

Error Rates ◽

High Order ◽

Approximate Computing ◽

Clock Frequency ◽

Approximation Techniques

Every new generation of wireless communication standard aims to improve the overall performance and quality of service (QoS), compared to the previous generations. Increased data rates, numbers and capabilities of connected devices, new applications, and higher data volume transfers are some of the key parameters that are of interest. To satisfy these increased requirements, the synergy between wireless technologies and optical transport will dominate the 5G network topologies. This work focuses on a fundamental digital function in an orthogonal frequency-division multiplexing (OFDM) baseband transceiver architecture and aims at improving the throughput and circuit complexity of this function. Specifically, we consider the high-order QAM demodulation and apply approximation techniques to achieve our goals. We adopt approximate computing as a design strategy to exploit the error resiliency of the QAM function and deliver significant gains in terms of critical performance metrics. Particularly, we take into consideration and explore four demodulation algorithms and develop accurate floating- and fixed-point circuits in VHDL. In addition, we further explore the effects of introducing approximate arithmetic components. For our test case, we consider 64-QAM demodulators, and the results suggest that the most promising design provides bit error rates (BER) ranging from 10−1 to 10−4 for SNR 0–14 dB in terms of accuracy. Targeting a Xilinx Zynq Ultrascale+ ZCU106 (XCZU7EV) FPGA device, the approximate circuits achieve up to 98% reduction in LUT utilization, compared to the accurate floating-point model of the same algorithm, and up to a 122% increase in operating frequency. In terms of power consumption, our most efficient circuit configurations consume 0.6–1.1 W when operating at their maximum clock frequency. Our results show that if the objective is to achieve high accuracy in terms of BER, the prevailing solution is the approximate LLR algorithm configured with fixed-point arithmetic and 8-bit truncation, providing 81% decrease in LUTs and 13% increase in frequency and sustains a throughput of 323 Msamples/s.

Download Full-text

Plasticine: A Cross-layer Approximation Methodology for Multi-kernel Applications through Minimally Biased, High-throughput, and Energy-efficient SIMD Soft Multiplier-divider

ACM Transactions on Design Automation of Electronic Systems ◽

10.1145/3486616 ◽

2022 ◽

Vol 27 (2) ◽

pp. 1-33

Author(s):

Zahra Ebrahimi ◽

Dennis Klar ◽

Mohammad Aasim Ekhtiyar ◽

Akash Kumar

Keyword(s):

High Throughput ◽

Performance Metrics ◽

Synergistic Effects ◽

Rapid Evolution ◽

Future Research ◽

Cross Layer ◽

Approximate Computing ◽

Approximation Techniques ◽

End To End ◽

A Chain

The rapid evolution of error-resilient programs intertwined with their quest for high throughput has motivated the use of Single Instruction, Multiple Data (SIMD) components in Field-Programmable Gate Arrays (FPGAs). Particularly, to exploit the error-resiliency of such applications, Cross-layer approximation paradigm has recently gained traction, the ultimate goal of which is to efficiently exploit approximation potentials across layers of abstraction. From circuit- to application-level, valuable studies have proposed various approximation techniques, albeit linked to four drawbacks: First, most of approximate multipliers and dividers operate only in SISD mode. Second, imprecise units are often substituted, merely in a single kernel of a multi-kernel application, with an end-to-end analysis in Quality of Results (QoR) and not in the gained performance. Third, state-of-the-art (SoA) strategies neglect the fact that each kernel contributes differently to the end-to-end QoR and performance metrics. Therefore, they lack in adopting a generic methodology for adjusting the approximation knobs to maximize performance gains for a user-defined quality constraint. Finally, multi-level techniques lack in being efficiently supported, from application-, to architecture-, to circuit-level, in a cohesive cross-layer hierarchy. In this article, we propose Plasticine , a cross-layer methodology for multi-kernel applications, which addresses the aforementioned challenges by efficiently utilizing the synergistic effects of a chain of techniques across layers of abstraction. To this end, we propose an application sensitivity analysis and a heuristic that tailor the precision at constituent kernels of the application by finding the most tolerable degree of approximations for each of consecutive kernels, while also satisfying the ultimate user-defined QoR. The chain of approximations is also effectively enabled in a cross-layer hierarchy, from application- to architecture- to circuit-level, through the plasticity of SIMD multiplier-dividers, each supporting dynamic precision variability along with hybrid functionality. The end-to-end evaluations of Plasticine on three multi-kernel applications employed in bio-signal processing, image processing, and moving object tracking for Unmanned Air Vehicles (UAV) demonstrate 41%–64%, 39%–62%, and 70%–86% improvements in area, latency, and Area-Delay-Product (ADP), respectively, over 32-bit fixed precision, with negligible loss in QoR. To springboard future research in reconfigurable and approximate computing communities, our implementations will be available and open-sourced at https://cfaed.tu-dresden.de/pd-downloads.

Download Full-text

Core-Level Modeling and Frequency Prediction for DSP Applications on FPGAs

International Journal of Reconfigurable Computing ◽

10.1155/2015/784672 ◽

2015 ◽

Vol 2015 ◽

pp. 1-20

Author(s):

Gongyu Wang ◽

Greg Stitt ◽

Herman Lam ◽

Alan George

Keyword(s):

High Performance ◽

Design Space Exploration ◽

Design Space ◽

Space Exploration ◽

Core Level ◽

Prediction Methods ◽

Clock Frequency ◽

Worst Case ◽

Model Based ◽

Dsp Applications

Field-programmable gate arrays (FPGAs) provide a promising technology that can improve performance of many high-performance computing and embedded applications. However, unlike software design tools, the relatively immature state of FPGA tools significantly limits productivity and consequently prevents widespread adoption of the technology. For example, the lengthy design-translate-execute (DTE) process often must be iterated to meet the application requirements. Previous works have enabled model-based, design-space exploration to reduce DTE iterations but are limited by a lack of accurate model-based prediction of key design parameters, the most important of which is clock frequency. In this paper, we present a core-level modeling and design (CMD) methodology that enables modeling of FPGA applications at an abstract level and yet produces accurate predictions of parameters such as clock frequency, resource utilization (i.e., area), and latency. We evaluate CMD’s prediction methods using several high-performance DSP applications on various families of FPGAs and show an average clock-frequency prediction error of 3.6%, with a worst-case error of 20.4%, compared to the best of existing high-level prediction methods, 13.9% average error with 48.2% worst-case error. We also demonstrate how such prediction enables accurate design-space exploration without coding in a hardware-description language (HDL), significantly reducing the total design time.

Download Full-text

Parallel Resampling of OFDM Signals for Fluctuating Doppler Shifts in Underwater Acoustic Communication

Journal of Electrical and Computer Engineering ◽

10.1155/2018/3579619 ◽

2018 ◽

Vol 2018 ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

Shingo Yoshizawa ◽

Takashi Saito ◽

Yusaku Mabuchi ◽

Tomoya Tsukui ◽

Shinichi Sawada

Keyword(s):

Conventional Method ◽

Acoustic Communication ◽

Orthogonal Frequency Division Multiplexing ◽

Carrier Frequency Offset ◽

Autonomous Underwater Vehicles ◽

Underwater Vehicles ◽

Error Rates ◽

Underwater Acoustic Communication ◽

Underwater Acoustic ◽

Parallel Resampling

Reliable underwater acoustic communication is demanded for autonomous underwater vehicles (AUVs) and remotely operated underwater vehicles (ROVs). Orthogonal frequency-division multiplexing (OFDM) is robust with multipath interference; however, it is sensitive to Doppler. Doppler compensation is given by two-step processing of resampling and residual carrier frequency offset (CFO) compensation. This paper describes the improvement of a resampling technique. The conventional method assumes a constant Doppler shift during a communication frame. It cannot cope with Doppler fluctuation, where relative speeds between transmitter and receiver units are fluctuating. We propose a parallel resampling technique that a resampling range is extended by measured Doppler standard deviation. The effectiveness of parallel resampling has been confirmed in the communication experiment. The proposed method shows better performance in bit error rates (BERs) and frame error rates (FERs) compared with the conventional method.

Download Full-text

A Performance Comparison of Centralized and Distributed Spectrum Management Techniques in Elastic Optical Networks

Journal of Engineering ◽

10.1155/2019/3860685 ◽

2019 ◽

Vol 2019 ◽

pp. 1-13

Author(s):

Tushar Mathur ◽

Gokhan Sahin ◽

Donald R. Ucci

Keyword(s):

Optical Networks ◽

Wavelength Division Multiplexing ◽

Orthogonal Frequency Division Multiplexing ◽

Performance Metrics ◽

New Technology ◽

Spectrum Management ◽

Performance Comparison ◽

Spectrum Efficiency ◽

Network Simulator ◽

Reservation Protocol

Elastic optical networks (EONs) have emerged to provide higher spectrum efficiency than traditional Dense Wavelength-Division-Multiplexing (DWDM) by utilizing enabling technologies such as flexible spectrum grid, Orthogonal Frequency Division Multiplexing (OFDM), and distance adaptive rate and modulation. The choice of the control-plane is an important consideration when deploying any new technology, especially in optical networks. This paper considers generic distributed and centralized spectrum assignment policies in conjunction with the accompanying connection set-up signaling protocols in EONs. A network simulator for Generalized Multiprotocol Label Switching (GMPLS) was developed with Forward Reservation Protocol and Backward Reservation Protocol signaling methods. These signaling techniques are used with the First Fit (FF) and Random Fit (RF) Routing and Spectrum Allocation (RSA) algorithms. The paper discusses control elements (central and distributed architectures) decisions under busy hour and normal network conditions and presents a comprehensive performance analysis of key performance metrics such as connection success rate, connection establishment time, and capacity requirement.

Download Full-text

High order modal approximation techniques in the frequency domain

The Journal of the Acoustical Society of America ◽

10.1121/1.4920548 ◽

2015 ◽

Vol 137 (4) ◽

pp. 2342-2342

Author(s):

Andrew S. Wixom ◽

James G. McDaniel

Keyword(s):

Frequency Domain ◽

High Order ◽

Approximation Techniques ◽

Modal Approximation

Download Full-text

WATT MATTERS MOST? DESIGN SPACE EXPLORATION OF HIGH-PERFORMANCE MICROPROCESSORS FOR POWER-PERFORMANCE EFFICIENCY

Journal of Circuits System and Computers ◽

10.1142/s0218126607003721 ◽

2007 ◽

Vol 16 (03) ◽

pp. 357-378

Author(s):

PEDRO TRANCOSO

Keyword(s):

High Performance ◽

Design Space Exploration ◽

High Sensitivity ◽

Clock Frequency ◽

Power Performance ◽

Large Power ◽

Multiple Parameters ◽

And Performance ◽

The One ◽

Power Awareness

Computer systems have evolved significantly in the last years leading to high-performance systems. This, however, has come with a cost of large power dissipation. As such, power-awareness has become a major factor in processor design. Therefore, it is important to have a complete understanding of the power and performance behavior of all processor components. In order to achieve this, the current work presents a comprehensive analysis of power-performance efficiency for different high-end microarchitecture configurations using three different workloads: multimedia, scientific, and database. The objectives of this work are: (1) to analyze and compare the power-performance efficiency for different workloads; (2) to present a sensitivity analysis for the microarchitecture parameters in order to identify which ones are more sensitive to changes in terms of power-performance efficiency; and (3) to propose power-performance efficient configurations for each workload. The simulation results show that the multimedia workload is the one achieving the highest efficiency but the database workload is the most sensitive to parameter changes. In addition, the results also show that the parameter sensitivity depends significantly on the workload. While the issue width and clock frequency present very high sensitivity across all workloads (approximately 100%), for the database workload, the first-level instruction cache size shows an even higher sensitivity (149%). The correct configuration of these microarchitecture parameters is essential. A careless configuration of a single parameter from a baseline setup may result in a loss of the power-performance efficiency of up to 99%. Finally, carefully tuning multiple parameters simultaneously may result in gains up to 154% over the power-performance efficiency of the baseline configuration.

Download Full-text

Discovering implicit constraints in design

Artificial intelligence for engineering design analysis and manufacturing ◽

10.1017/s0890060410000478 ◽

2011 ◽

Vol 25 (1) ◽

pp. 57-75 ◽

Cited By ~ 4

Author(s):

Madan Mohan Dabbeeru ◽

Amitabha Mukerjee

Keyword(s):

Design Space ◽

Performance Metrics ◽

General Purpose ◽

Error Rates ◽

Design Variables ◽

Functional Aspects ◽

Machine Learning Approach ◽

Design Function ◽

Function Approximator

AbstractDesigners who are experts in a given design domain are well known to be able to Immediately focus on “good designs,” suggesting that they may have learned additional constraints while exploring the design space based on some functional aspects. These constraints, which are often implicit, result in a redefinition of the design space, and may be crucial for discovering chunks or interrelations among the design variables. Here we propose a machine-learning approach for discovering such constraints in supervised design tasks. We develop models for specifying design function in situations where the design has a given structure or embodiment, in terms of a set of performance metrics that evaluate a given design. The functionally feasible regions, which are those parts of the design space that demonstrate high levels of performance, can now be learned using any general purpose function approximator. We demonstrate this process using examples from the design of simple locking mechanisms, and as in human experience, we show that the quality of the constraints learned improves with greater exposure in the design space. Next, we consider changing the embodiment and suggest that similar embodiments may have similar abstractions. To explore convergence, we also investigate the variability in time and error rates where the experiential patterns are significantly different. In the process, we also consider the situation where certain functionally feasible regions may encode lower dimensional manifolds and how this may relate to cognitive chunking.

Download Full-text

Call-Level Performance Analysis of a Power Line Communication Network Under Disturbance

International Journal of Emerging Electric Power Systems ◽

10.2202/1553-779x.2732 ◽

2011 ◽

Vol 12 (3) ◽

Author(s):

Shensheng Tang ◽

Yi Xie

Keyword(s):

Communication Networks ◽

Orthogonal Frequency Division Multiplexing ◽

Performance Metrics ◽

Steady State Solution ◽

Power Line ◽

Base Station ◽

Channel Noise ◽

Power Line Communication ◽

Proposed Model ◽

Level Performance

Power line communication (PLC) is a promising technique for information transmission using existing power lines. We analytically model a finite-source PLC network subject to channel noise (disturbance) and evaluate its call-level performance through a queueing theoretic framework. The proposed PLC network model consists of a base station (BS), which is located at a transformer station and connected to the backbone communication networks, and a number of subscriber stations that are interconnected with each other and with the BS via the power line transmission medium. An orthogonal frequency division multiplexing based transmission technique is assumed to be used for providing the transmission channels in a frequency spectrum. The channels are subject to failure during service due to disturbance. We determine the steady-state solution of the proposed model and derive a set of performance metrics of interest. Numerical and simulation results are presented to show the derived metrics with respect to different system parameters. The proposed modeling method can be used for evaluation and design of future PLC networks.

Download Full-text

Translating Timing into an Architecture: The Synergy of COTSon and HLS (Domain Expertise—Designing a Computer Architecture via HLS)

International Journal of Reconfigurable Computing ◽

10.1155/2019/2624938 ◽

2019 ◽

Vol 2019 ◽

pp. 1-18 ◽

Cited By ~ 1

Author(s):

Roberto Giorgi ◽

Farnam Khalili ◽

Marco Procaccini

Keyword(s):

Computer Architecture ◽

Design Space Exploration ◽

Performance Metrics ◽

Simulation Framework ◽

Design Cycle ◽

System Requirement ◽

Domain Expertise ◽

Execution Model ◽

High Level

Translating a system requirement into a low-level representation (e.g., register transfer level or RTL) is the typical goal of the design of FPGA-based systems. However, the Design Space Exploration (DSE) needed to identify the final architecture may be time consuming, even when using high-level synthesis (HLS) tools. In this article, we illustrate our hybrid methodology, which uses a frontend for HLS so that the DSE is performed more rapidly by using a higher level abstraction, but without losing accuracy, thanks to the HP-Labs COTSon simulation infrastructure in combination with our DSE tools (MYDSE tools). In particular, this proposed methodology proved useful to achieve an appropriate design of a whole system in a shorter time than trying to design everything directly in HLS. Our motivating problem was to deploy a novel execution model called data-flow threads (DF-Threads) running on yet-to-be-designed hardware. For that goal, directly using the HLS was too premature in the design cycle. Therefore, a key point of our methodology consists in defining the first prototype in our simulation framework and gradually migrating the design into the Xilinx HLS after validating the key performance metrics of our novel system in the simulator. To explain this workflow, we first use a simple driving example consisting in the modelling of a two-way associative cache. Then, we explain how we generalized this methodology and describe the types of results that we were able to analyze in the AXIOM project, which helped us reduce the development time from months/weeks to days/hours.

Download Full-text

Utilization of differential pulse position modulation in designing time-mode serial data links

10.32920/ryerson.14655978 ◽

2021 ◽

Author(s):

Taha Mehrabi Shahsavari

Keyword(s):

Circuit Complexity ◽

Main Idea ◽

Differential Pulse ◽

Cmos Technology ◽

Point Of View ◽

Clock Signal ◽

Clock Frequency ◽

Data Links ◽

The Difference ◽

Serial Data Links

A new differential time-based architecture for use in serial communication data links is presented in this thesis, the main idea of which involves transmitting the difference between the input clock signal and the data signal to the receiver. A time to digital converter (TDC) is then used to demodulate the data from the differential pulse position modulated signal. The proposed design substantiates an improvement in the bandwidth and simplifies the circuit complexity of the currently used serializer de-serializers (SerDes). Additionally, a feature of testability that covers different stuck-at faults was proposed to be implemented in the transmitter side of the proposed architecture. The complete proposed design was tested in TSMC 65 nm CMOS technology; it achieved a data rate of 10 Gbps running at the input clock frequency of 1.25 GHz. Moreover, a complete study of different components of a time mode transceiver architecture was performed during which different design implementation of TDC and phase locked loop (PLL) were thoroughly investigated. Last but not the least, different factors that are mainly imposed by the communication channel that affect the signal integrity were studied, and various methods both from a signal and a circuit point of view were investigated.

Download Full-text