BLOCK FLOATING POINT FFT IMPLEMENTATION FOR DMT xDSL SYSTEMS

The development of multiple Discrete Multitone (DMT) Digital Subscriber Line (DSL) flavors on a single platform can benefit considerably by a programmable architecture, which feature Digital Signal Processors (DSP) and Field Programmable Gate Arrays (FPGA), especially when fast prototyping is targeted. However, the flexibility assumed to be offered by algorithmic partitioning does not automatically and proportionally simplify the digital signal processing algorithms, unless the effects of overflow/saturation in intermediate processing stages are carefully studied. The effects of overflow/saturation in intermediate stages is very critical throughout the design process, since the operations involved are nonlinear in nature and affect the most significant bits of the computational process. This paper presents an efficient soft-core implementation of a Block Floating Point FFT (BLFP) algorithm, designed for a Very high-speed DSL (VDSL) DMT systems and for the full variety of other xDSL DMT flavors, as the latter demand an extended dynamic range to achieve performance that may otherwise be only warranted by costly floating-point chip implementations.

Download Full-text

Embedded GPU Implementation for High-Performance Ultrasound Imaging

Electronics ◽

10.3390/electronics10080884 ◽

2021 ◽

Vol 10 (8) ◽

pp. 884

Author(s):

Stefano Rossi ◽

Enrico Boni

Keyword(s):

High Performance ◽

Graphics Processing Unit ◽

Digital Signal ◽

Processing Unit ◽

Embedded Computing ◽

Field Programmable ◽

Peripheral Component Interconnect ◽

Programmable Gate Arrays ◽

Graphics Processing ◽

Signal Processors

Methods of increasing complexity are currently being proposed for ultrasound (US) echographic signal processing. Graphics Processing Unit (GPU) resources allowing massive exploitation of parallel computing are ideal candidates for these tasks. Many high-performance US instruments, including open scanners like ULA-OP 256, have an architecture based only on Field-Programmable Gate Arrays (FPGAs) and/or Digital Signal Processors (DSPs). This paper proposes the implementation of the embedded NVIDIA Jetson Xavier AGX module on board ULA-OP 256. The system architecture was revised to allow the introduction of a new Peripheral Component Interconnect Express (PCIe) communication channel, while maintaining backward compatibility with all other embedded computing resources already on board. Moreover, the Input/Output (I/O) peripherals of the module make the ultrasound system independent, freeing the user from the need to use an external controlling PC.

Download Full-text

Implementation of Embedded Floating Point Arithmetic Units on FPGA

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.550.126 ◽

2014 ◽

Vol 550 ◽

pp. 126-136

Author(s):

N. Ramya Rani

Keyword(s):

High Speed ◽

High Performance ◽

Floating Point ◽

Double Precision ◽

Embedded Computing ◽

Floating Point Arithmetic ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Arithmetic Units ◽

Point Arithmetic

:Floating point arithmetic plays a major role in scientific and embedded computing applications. But the performance of field programmable gate arrays (FPGAs) used for floating point applications is poor due to the complexity of floating point arithmetic. The implementation of floating point units on FPGAs consumes a large amount of resources and that leads to the development of embedded floating point units in FPGAs. Embedded applications like multimedia, communication and DSP algorithms use floating point arithmetic in processing graphics, Fourier transformation, coding, etc. In this paper, methodologies are presented for the implementation of embedded floating point units on FPGA. The work is focused with the aim of achieving high speed of computations and to reduce the power for evaluating expressions. An application that demands high performance floating point computation can achieve better speed and density by incorporating embedded floating point units. Additionally this paper describes a comparative study of the design of single precision and double precision pipelined floating point arithmetic units for evaluating expressions. The modules are designed using VHDL simulation in Xilinx software and implemented on VIRTEX and SPARTAN FPGAs.

Download Full-text

BIST Architecture using Area Efficient Low Current LFSR for Embedded Memory Testing Applications Applications

International Journal of Reconfigurable and Embedded Systems (IJRES) ◽

10.11591/ijres.v7.i1.pp1-11 ◽

2018 ◽

Vol 7 (1) ◽

pp. 1

Author(s):

M. Parvathi ◽

N. Vasantha ◽

K. Satya Prasad

Keyword(s):

Systems Design ◽

Digital Signal ◽

Low Area ◽

Gate Arrays ◽

Current Limit ◽

Maximum Current ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Layout Area ◽

Signal Processors

One of the important block of BIST controller is LFSR and the speed with which BIST operates depends on LFSR systems design. There are methods in implementing LFSR using field programmable gate arrays (FPGAs) or digital signal processors (DSPs). BIST controller system speed is then limited to FPGAs and DSPs, which may influence other parameters such as overall area, maximum current, limit and power dissipation. This paper proposes a technique to achieve an efficient BIST controller by redesigning LFSR using GDI based D flip-flops that resulted with low area and low current capabilities. This paper presents three different techniques for implementing flip-flops for an efficient LFSR so that the layout area will be minimized as well as the maximum current drawn will be lower.

Download Full-text

Implementation of a high speed four transmitter space-time encoder using field programmable gate array and parallel digital signal processors

Third IEEE International Workshop on Electronic Design, Test and Applications (DELTA'06) ◽

10.1109/delta.2006.55 ◽

2006 ◽

Cited By ~ 6

Author(s):

P.J. Green ◽

D.P. Taylor

Keyword(s):

Field Programmable Gate Array ◽

High Speed ◽

Digital Signal ◽

Space Time ◽

Digital Signal Processors ◽

Field Programmable ◽

Gate Array ◽

Signal Processors

Download Full-text

Challenges in Clock Synchronization for On-Site Coding Digital Beamformer

International Journal of Reconfigurable Computing ◽

10.1155/2017/7802735 ◽

2017 ◽

Vol 2017 ◽

pp. 1-8

Author(s):

Satheesh Bojja Venkatakrishnan ◽

Elias A. Alwan ◽

John L. Volakis

Keyword(s):

Clock Synchronization ◽

Digital Signal ◽

Data Converter ◽

Analog To Digital ◽

Synchronization Errors ◽

Digital To Analog Converters ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Signal Synchronization ◽

Signal Processors

Typical radio frequency (RF) digital beamformers can be highly complex. In addition to a suitable antenna array, they require numerous receiver chains, demodulators, data converter arrays, and digital signal processors. To recover and reconstruct the received signal, synchronization is required since the analog-to-digital converters (ADCs), digital-to-analog converters (DACs), field programmable gate arrays (FPGAs), and local oscillators are all clocked at different frequencies. In this article, we present a clock synchronization topology for a multichannel on-site coding receiver (OSCR) using the FPGA as a master clock to drive all RF blocks. This approach reduces synchronization errors by a factor of 8, when compared to conventional digital beamformer.

Download Full-text

A Reconfigurable System Approach to the Direct Kinematics of a 5D.o.fRobotic Manipulator

International Journal of Reconfigurable Computing ◽

10.1155/2010/727909 ◽

2010 ◽

Vol 2010 ◽

pp. 1-10 ◽

Cited By ~ 6

Author(s):

Diego F. Sánchez ◽

Daniel M. Muñoz ◽

Carlos H. Llanos ◽

José M. Motta

Keyword(s):

High Performance ◽

Degrees Of Freedom ◽

Dynamic Range ◽

Hardware Acceleration ◽

Hardware Architecture ◽

Floating Point ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Transcendental Functions ◽

Direct Kinematics

Hardware acceleration in high performance computer systems has a particular interest for many engineering and scientific applications in which a large number of arithmetic operations and transcendental functions must be computed. In this paper a hardware architecture for computing direct kinematics of robot manipulators with 5 degrees of freedom (5D.o.f) using floating-point arithmetic is presented for 32, 43, and 64 bit-width representations and it is implemented in Field Programmable Gate Arrays (FPGAs). The proposed architecture has been developed using several floating-point libraries for arithmetic and transcendental functions operators, allowing the designer to select (pre-synthesis) a suitable bit-width representation according to the accuracy and dynamic range, as well as the area, elapsed time and power consumption requirements of the application. Synthesis results demonstrate the effectiveness and high performance of the implemented cores on commercial FPGAs. Simulation results have been addressed in order to compute the Mean Square Error (MSE), using the Matlab as statistical estimator, validating the correct behavior of the implemented cores. Additionally, the processing time of the hardware architecture was compared with the same formulation implemented in software, using the PowerPC (FPGA embedded processor), demonstrating that the hardware architecture speeds-up by factor of 1298 the software implementation.

Download Full-text

A Generic Reconfigurable Mixed Time and Frequency Domain QAM Transmitter with Forward Error Correction

International Journal of Advances in Telecommunications Electrotechnics Signals and Systems ◽

10.11601/ijates.v6i2.219 ◽

2017 ◽

Vol 6 (2) ◽

pp. 80 ◽

Cited By ~ 2

Author(s):

Shalina Percy Delicia Figuli ◽

Peter Figuli ◽

Alberto Sonnino ◽

Juergen Becker

Keyword(s):

Error Correction ◽

Frequency Domain ◽

Communication Systems ◽

High Speed ◽

Forward Error Correction ◽

Digital Signal ◽

Field Programmable ◽

Hard Limit ◽

Programmable Gate Arrays ◽

Forward Error

In the past three decades, Field Programmable Gate Arrays (FPGAs) have emerged to be the backbone of digital signal processing, especially in high-speed communication systems. However, today, these devices are clocked below 1GHz and improvement in performance stays a big challenge on all abstraction layers, right from system architecture down to physical technology. Far and wide, myriad number of researches are done on methodologies and techniques which can deliver higher throughput with lower operating frequencies. Towards this projected objective, in this paper an efficient modulation technique like Quadrature Amplitude Modulation (QAM) along with mixed time and frequency domain approach and Forward Error Correction (FEC) technique have been utilized to employ a generic scalable FPGA based QAM transmitter with filter parallelization being executed in mixed domain. The system developed in this paper achieves an effective throughput of 12.8Gb/s for 256-QAM with 16 parallel inputs having an operating frequency of 201.25MHz, while a 18.7Gb/s effective throughput is realized with 32 parallel inputs at 146MHz. Thereby, it paves down a promising methodology for applications where having higher clock frequencies is a hard limit.

Download Full-text

The Characters of Dual Harmonic Frames of Subspaces and Applications in Signal Processing Theory

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.457-458.731 ◽

2013 ◽

Vol 457-458 ◽

pp. 731-735

Author(s):

Yong Yi Huang ◽

Jian Feng Zhou

Keyword(s):

Signal Processing ◽

Information Science ◽

Digital Signal ◽

General Purpose ◽

Decomposition Scheme ◽

Processing Theory ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Definition Of ◽

Signal Processors

Digital signal processing is the processing of digitized discrete-time samp-led signals. Processing is done by general-purpose computers or by digital circuits such as ASICs, field-programmable gate arrays or specialized digital signal processors. Information science focuses on understanding problems from the perspective of the stakeholders involved and then applying information and other technologies as needed. The definition of multiple pseudofames for subspaces with integer translation is proposed. The notion of a generalized multiresolution structure (GMS) of is also introduced. The construction of a generalized multiresolution structure of Paley-Wiener subspaces of is investigated. The pyramid decomposition scheme is derived based on a generalized multiresolution structure.

Download Full-text

Designing Discrete and Digital Communication Systems

10.1093/oso/9780198860792.003.0010 ◽

2021 ◽

pp. 542-561

Author(s):

Stevan Berber

Keyword(s):

Communication Systems ◽

Digital Signal ◽

Clear Understanding ◽

Gate Arrays ◽

Advantages And Disadvantages ◽

Shift Keying ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Signal Processors ◽

The Relationship

In this chapter, the practical aspects of the design of digital discrete communication systems, primarily digital signal processors and field -programmable gate arrays, are analysed. The systems are presented at the level of block schematics, to address the main issues in their design and discuss the advantages and disadvantages of various designs in digital technology. Designs using quadriphase-shift keying and quadrature amplitude modulation are presented separately. The operation of each system is explained in terms of the theoretical structure of the system, which allows a clear understanding of the relationship between the theoretical model of the system and its practical design. The structures of the first, second, and third generation of discrete transceiver designs are presented and discussed.

Download Full-text

LARGE DYNAMIC RANGE RNS SYSTEMS AND THEIR RESIDUE TO BINARY CONVERTERS

Journal of Circuits System and Computers ◽

10.1142/s0218126607003666 ◽

2007 ◽

Vol 16 (02) ◽

pp. 267-286 ◽

Cited By ~ 4

Author(s):

ALEXANDER SKAVANTZOS ◽

MOHAMMAD ABDALLAH ◽

THANOS STOURAITIS

Keyword(s):

High Speed ◽

Dynamic Range ◽

Chinese Remainder Theorem ◽

Digital Signal ◽

Number System ◽

Residue Number System ◽

New Class ◽

Mixed Radix Conversion ◽

Residue Number ◽

Signal Processors

The Residue Number System (RNS) is an integer system appropriate for implementing fast digital signal processors. It can be used for supporting high-speed arithmetic by operating in parallel channels without need for exchanging information among the channels. In this paper, two novel RNS are proposed. First, a new RNS system based on the modulus set {2n+1, 2n - 1, 2n + 1, 2n + 2(n+1)/2 + 1, 2n - 2(n+1)/2 + 1}, n odd, is developed, along with an efficient implementation of its residue-to-weighted converter. The new RNS is a balanced five-modulus system, appropriate for large dynamic ranges. The proposed residue-to-binary converter is fast and hardware efficient and is based on a one's complement multi-operand adder that adds operands of size only 80% of the size dictated by the system's dynamic range. Second, a new class of multi-modulus RNS systems is proposed. These systems are based on sets consisting of two groups of moduli with the modulus product within one group being of the form 2a(2b - 1), while the modulus product within the other group is of the form 2c - 1. Their RNS-to-weighted converters are based on efficient combinations of the Chinese Remainder Theorem and Mixed Radix Conversion decoding techniques. Systems based on four, five, and seven moduli are constructed and analyzed. The new systems allow efficient implementations for their RNS-to-weighted decoders, imply fast and balanced RNS arithmetic, and may achieve large dynamic ranges. The presented residue-to-weighted converters for these systems rely on simple mod (2x - 1) hardware, which can be easily implemented as one's complement hardware.

Download Full-text