digital signal
Recently Published Documents





2022 ◽  
Vol 15 (1) ◽  
pp. 1-30
Seyedramin Rasoulinezhad ◽  
Esther Roorda ◽  
Steve Wilton ◽  
Philip H. W. Leong ◽  
David Boland

The underlying goal of FPGA architecture research is to devise flexible substrates that implement a wide variety of circuits efficiently. Contemporary FPGA architectures have been optimized to support networking, signal processing, and image processing applications through high-precision digital signal processing (DSP) blocks. The recent emergence of machine learning has created a new set of demands characterized by: (1) higher computational density and (2) low precision arithmetic requirements. With the goal of exploring this new design space in a methodical manner, we first propose a problem formulation involving computing nested loops over multiply-accumulate (MAC) operations, which covers many basic linear algebra primitives and standard deep neural network (DNN) kernels. A quantitative methodology for deriving efficient coarse-grained compute block architectures from benchmarks is then proposed together with a family of new embedded blocks, called MLBlocks. An MLBlock instance includes several multiply-accumulate units connected via a flexible routing, where each configuration performs a few parallel dot-products in a systolic array fashion. This architecture is parameterized with support for different data movements, reuse, and precisions, utilizing a columnar arrangement that is compatible with existing FPGA architectures. On synthetic benchmarks, we demonstrate that for 8-bit arithmetic, MLBlocks offer 6× improved performance over the commercial Xilinx DSP48E2 architecture with smaller area and delay; and for time-multiplexed 16-bit arithmetic, achieves 2× higher performance per area with the same area and frequency. All source codes and data, along with documents to reproduce all the results in this article, are available at .

2022 ◽  
Vol 15 (3) ◽  
pp. 1-25
S. Rasoul Faraji ◽  
Pierre Abillama ◽  
Kia Bazargan

Multipliers are used in virtually all Digital Signal Processing (DSP) applications such as image and video processing. Multiplier efficiency has a direct impact on the overall performance of such applications, especially when real-time processing is needed, as in 4K video processing, or where hardware resources are limited, as in mobile and IoT devices. We propose a novel, low-cost, low energy, and high-speed approximate constant coefficient multiplier (CCM) using a hybrid binary-unary encoding method. The proposed method implements a CCM using simple routing networks with no logic gates in the unary domain, which results in more efficient multipliers compared to Xilinx LogiCORE IP CCMs and table-based KCM CCMs (Flopoco) on average. We evaluate the proposed multipliers on 2-D discrete cosine transform algorithm as a common DSP module. Post-routing FPGA results show that the proposed multipliers can improve the {area, area × delay, power consumption, and energy-delay product} of a 2-D discrete cosine transform on average by {30%, 33%, 30%, 31%}. Moreover, the throughput of the proposed 2-D discrete cosine transform is on average 5% more than that of the binary architecture implemented using table-based KCM CCMs. We will show that our method has fewer routability issues compared to binary implementations when implementing a DCT core.

2022 ◽  
Vol 15 (2) ◽  
pp. 1-29
Paolo D'Alberto ◽  
Victor Wu ◽  
Aaron Ng ◽  
Rahul Nimaiyar ◽  
Elliott Delaye ◽  

We present xDNN, an end-to-end system for deep-learning inference based on a family of specialized hardware processors synthesized on Field-Programmable Gate Array (FPGAs) and Convolution Neural Networks (CNN). We present a design optimized for low latency, high throughput, and high compute efficiency with no batching. The design is scalable and a parametric function of the number of multiply-accumulate units, on-chip memory hierarchy, and numerical precision. The design can produce a scale-down processor for embedded devices, replicated to produce more cores for larger devices, or resized to optimize efficiency. On Xilinx Virtex Ultrascale+ VU13P FPGA, we achieve 800 MHz that is close to the Digital Signal Processing maximum frequency and above 80% efficiency of on-chip compute resources. On top of our processor family, we present a runtime system enabling the execution of different networks for different input sizes (i.e., from 224× 224 to 2048× 1024). We present a compiler that reads CNNs from native frameworks (i.e., MXNet, Caffe, Keras, and Tensorflow), optimizes them, generates codes, and provides performance estimates. The compiler combines quantization information from the native environment and optimizations to feed the runtime with code as efficient as any hardware expert could write. We present tools partitioning a CNN into subgraphs for the division of work to CPU cores and FPGAs. Notice that the software will not change when or if the FPGA design becomes an ASIC, making our work vertical and not just a proof-of-concept FPGA project. We show experimental results for accuracy, latency, and power for several networks: In summary, we can achieve up to 4 times higher throughput, 3 times better power efficiency than the GPUs, and up to 20 times higher throughput than the latest CPUs. To our knowledge, we provide solutions faster than any previous FPGA-based solutions and comparable to any other top-of-the-shelves solutions.

Ibtissem Wali ◽  
Amina Kessentini ◽  
Mohamed Ali Ben Ayed ◽  
Nouri Masmoudi ◽  

The programmable processors newest technologies, as for example the multicore Digital Signal Processors (DSP), offer a promising solution for overcoming the complexity of the real time video encoding application. In this paper, the SHVC video encoder was effectively implemented just on a single core among the eight cores of TMS320C6678 DSP for a Common Intermediate Format (CIF)input video sequence resolution(352x288). Performance optimization of the SHVC encoder had reached up 41% compared to its reference software enabling a real-time implementation of the SHVC encoder for CIF input videos sequence resolution. The proposed SHVC implementation was carried out on different quantization parameters (QP). Several experimental tests had proved our performance achievement for real-time encoding on TMS320C6678.

2022 ◽  
Vol 3 ◽  
Rhoda Au ◽  
Vijaya B. Kolachalama ◽  
Ioannis C. H. Paschalidis

“Digital biomarker” is a term broadly and indiscriminately applied and often limited in its conceptualization to mimic well-established biomarkers as defined and approved by regulatory agencies such as the United States Food and Drug Administration (FDA). There is a practical urgency to revisit the definition of a digital biomarker and expand it beyond current methods of identification and validation. Restricting the promise of digital technologies within the realm of currently defined biomarkers creates a missed opportunity. A whole new field of prognostic and early diagnostic digital biomarkers driven by data science and artificial intelligence can break the current cycle of high healthcare costs and low health quality that is being driven by today's chronic disease detection and treatment approaches. This new class of digital biomarkers will be dynamic and require developing new FDA approval pathways and next-generation gold standards.

Inventions ◽  
2022 ◽  
Vol 7 (1) ◽  
pp. 12
Qi Zhang ◽  
Wenhui Pei

The digital signal processing (DSP) processor-in-the-loop tests based on automatic code generation technology are studied. Firstly, the idea of model-based design is introduced, and the principle and method of embedded code automatic generation technology are analyzed by taking the automatic code generation of the DSP control algorithm for pulse width modulation (PWM) output as an example. Then, the control system model is established on MATLAB/Simulink. After verifying the model through simulation, the target board platform is established with DSP as the core processor, and the automatically generated code is tested by the processor-in-the-loop (PIL). The results show that the technology greatly shortens the development cycle of the project, improves the robustness and consistency of the control code, and can be widely used in the complex algorithm development process of the controller, from intelligent design and modeling to implementation.

Computers ◽  
2022 ◽  
Vol 11 (1) ◽  
pp. 11
Padmanabhan Balasubramanian ◽  
Raunaq Nayar ◽  
Okkar Min ◽  
Douglas L. Maskell

Approximate arithmetic circuits are an attractive alternative to accurate arithmetic circuits because they have significantly reduced delay, area, and power, albeit at the cost of some loss in accuracy. By keeping errors due to approximate computation within acceptable limits, approximate arithmetic circuits can be used for various practical applications such as digital signal processing, digital filtering, low power graphics processing, neuromorphic computing, hardware realization of neural networks for artificial intelligence and machine learning etc. The degree of approximation that can be incorporated into an approximate arithmetic circuit tends to vary depending on the error resiliency of the target application. Given this, the manual coding of approximate arithmetic circuits corresponding to different degrees of approximation in a hardware description language (HDL) may be a cumbersome and a time-consuming process—more so when the circuit is big. Therefore, a software tool that can automatically generate approximate arithmetic circuits of any size corresponding to a desired accuracy would not only aid the design flow but also help to improve a designer’s productivity by speeding up the circuit/system development. In this context, this paper presents ‘Approximator’, which is a software tool developed to automatically generate approximate arithmetic circuits based on a user’s specification. Approximator can automatically generate Verilog HDL codes of approximate adders and multipliers of any size based on the novel approximate arithmetic circuit architectures proposed by us. The Verilog HDL codes output by Approximator can be used for synthesis in an FPGA or ASIC (standard cell based) design environment. Additionally, the tool can perform error and accuracy analyses of approximate arithmetic circuits. The salient features of the tool are illustrated through some example screenshots captured during different stages of the tool use. Approximator has been made open-access on GitHub for the benefit of the research community, and the tool documentation is provided for the user’s reference.

2022 ◽  
Vol 2022 ◽  
pp. 1-11
Hongyan Mao

Traditional electronic countermeasure incident intelligence processing has problems such as low accuracy and stability and long processing time. A method of electronic countermeasure incident intelligence processing based on communication technology is proposed. First, use the integrated digital signal receiver to identify various modulation methods in the complex signal environment to facilitate the processing and transmission of communication signals, then establish an electronic countermeasure intelligence processing framework with Esper as the core, and flow the situation to the processing conclusion through the PROTOBUF interactive format Redis cache. The data can realize the intelligent processing of electronic countermeasure incidents. The experimental results show that the method proposed in this paper increases the recall rate by 5 to 20% compared with other methods. This method has high accuracy and stability for electronic countermeasure incident intelligence processing and can effectively shorten the time for electronic countermeasure incident intelligence processing.

Robert B. Randall ◽  
Jerome Antoni ◽  
Pietro Borghesani

Sign in / Sign up

Export Citation Format

Share Document