A Modified KNN Algorithm for High-Performance Computing on FPGA of Real-Time m-QAM Demodulators

David Marquez-Viloria; Luis Castano-Londono; Neil Guerrero-Gonzalez

doi:10.3390/electronics10050627

A Modified KNN Algorithm for High-Performance Computing on FPGA of Real-Time m-QAM Demodulators

Electronics ◽

10.3390/electronics10050627 ◽

2021 ◽

Vol 10 (5) ◽

pp. 627

Author(s):

David Marquez-Viloria ◽

Luis Castano-Londono ◽

Neil Guerrero-Gonzalez

Keyword(s):

Real Time ◽

High Performance ◽

Interference Mitigation ◽

Parallel Implementation ◽

Computational Time ◽

Successful Implementation ◽

Interchannel Interference ◽

The Difference ◽

High Level ◽

Performance Computing

A methodology for scalable and concurrent real-time implementation of highly recurrent algorithms is presented and experimentally validated using the AWS-FPGA. This paper presents a parallel implementation of a KNN algorithm focused on the m-QAM demodulators using high-level synthesis for fast prototyping, parameterization, and scalability of the design. The proposed design shows the successful implementation of the KNN algorithm for interchannel interference mitigation in a 3 × 16 Gbaud 16-QAM Nyquist WDM system. Additionally, we present a modified version of the KNN algorithm in which comparisons among data symbols are reduced by identifying the closest neighbor using the rule of the 8-connected clusters used for image processing. Real-time implementation of the modified KNN on a Xilinx Virtex UltraScale+ VU9P AWS-FPGA board was compared with the results obtained in previous work using the same data from the same experimental setup but offline DSP using Matlab. The results show that the difference is negligible below FEC limit. Additionally, the modified KNN shows a reduction of operations from 43 percent to 75 percent, depending on the symbol’s position in the constellation, achieving a reduction 47.25% reduction in total computational time for 100 K input symbols processed on 20 parallel cores compared to the KNN algorithm.

Download Full-text

Evaluation of Static Mapping for Dynamic Space-Shared Multi-task Processing on FPGAs

Journal of Signal Processing Systems ◽

10.1007/s11265-020-01633-z ◽

2021 ◽

Author(s):

Umar Ibrahim Minhas ◽

Roger Woods ◽

Georgios Karakonstantis

Keyword(s):

High Performance ◽

Design Space Exploration ◽

Design Space ◽

System Throughput ◽

Design Parameters ◽

Temporal Constraints ◽

Shared Resources ◽

Task Processing ◽

High Level ◽

Performance Computing

AbstractWhilst FPGAs have been used in cloud ecosystems, it is still extremely challenging to achieve high compute density when mapping heterogeneous multi-tasks on shared resources at runtime. This work addresses this by treating the FPGA resource as a service and employing multi-task processing at the high level, design space exploration and static off-line partitioning in order to allow more efficient mapping of heterogeneous tasks onto the FPGA. In addition, a new, comprehensive runtime functional simulator is used to evaluate the effect of various spatial and temporal constraints on both the existing and new approaches when varying system design parameters. A comprehensive suite of real high performance computing tasks was implemented on a Nallatech 385 FPGA card and show that our approach can provide on average 2.9 × and 2.3 × higher system throughput for compute and mixed intensity tasks, while 0.2 × lower for memory intensive tasks due to external memory access latency and bandwidth limitations. The work has been extended by introducing a novel scheduling scheme to enhance temporal utilization of resources when using the proposed approach. Additional results for large queues of mixed intensity tasks (compute and memory) show that the proposed partitioning and scheduling approach can provide higher than 3 × system speedup over previous schemes.

Download Full-text

High-Level Parallel Ant Colony Optimization with Algorithmic Skeletons

International Journal of Parallel Programming ◽

10.1007/s10766-021-00714-1 ◽

2021 ◽

Author(s):

Breno A. de Melo Menezes ◽

Nina Herrmann ◽

Herbert Kuchen ◽

Fernando Buarque de Lima Neto

Keyword(s):

Ant Colony Optimization ◽

High Performance ◽

Optimization Problems ◽

Programming Model ◽

Parallel Implementation ◽

Ant Colony ◽

Algorithmic Skeletons ◽

Low Level ◽

Programming Patterns ◽

High Level

AbstractParallel implementations of swarm intelligence algorithms such as the ant colony optimization (ACO) have been widely used to shorten the execution time when solving complex optimization problems. When aiming for a GPU environment, developing efficient parallel versions of such algorithms using CUDA can be a difficult and error-prone task even for experienced programmers. To overcome this issue, the parallel programming model of Algorithmic Skeletons simplifies parallel programs by abstracting from low-level features. This is realized by defining common programming patterns (e.g. map, fold and zip) that later on will be converted to efficient parallel code. In this paper, we show how algorithmic skeletons formulated in the domain specific language Musket can cope with the development of a parallel implementation of ACO and how that compares to a low-level implementation. Our experimental results show that Musket suits the development of ACO. Besides making it easier for the programmer to deal with the parallelization aspects, Musket generates high performance code with similar execution times when compared to low-level implementations.

Download Full-text

Accessible high performance computing solutions for near real-time image processing for time critical applications

10.1117/12.830356 ◽

2009 ◽

Cited By ~ 1

Author(s):

Conrad Bielski ◽

Guido Lemoine ◽

Jacek Syryczynski

Keyword(s):

Image Processing ◽

High Performance Computing ◽

Real Time ◽

High Performance ◽

Real Time Image Processing ◽

Real Time Image ◽

Time Critical ◽

Time Image ◽

Performance Computing

Download Full-text

2016 Sixth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC)

10.1109/wolfhpc40351.2016 ◽

2016 ◽

Keyword(s):

High Performance Computing ◽

High Performance ◽

International Workshop ◽

Domain Specific Languages ◽

Domain Specific ◽

Sixth International Workshop ◽

High Level ◽

Performance Computing ◽

Sixth International

Download Full-text

A synchronized real-time linux based myrinet cluster for deterministic high performance computing and MPI/RT

Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001 ◽

10.1109/ipdps.2001.925050 ◽

2005 ◽

Cited By ~ 10

Author(s):

M. Apte ◽

S. Chakravarthi ◽

J. Padmanabhan ◽

A. Skjellum

Keyword(s):

High Performance Computing ◽

Real Time ◽

High Performance ◽

Performance Computing

Download Full-text

Viability of Cloud Computing for Real-Time Numerical Weather Prediction

Weather and Forecasting ◽

10.1175/waf-d-16-0075.1 ◽

2016 ◽

Vol 31 (6) ◽

pp. 1985-1996 ◽

Cited By ~ 9

Author(s):

David Siuta ◽

Gregory West ◽

Henryk Modzelewski ◽

Roland Schigas ◽

Roland Stull

Keyword(s):

Cloud Computing ◽

High Performance Computing ◽

Real Time ◽

Numerical Weather Prediction ◽

High Performance ◽

Virtual Machines ◽

Weather Prediction ◽

Cloud Platform ◽

Numerical Weather ◽

Performance Computing

Abstract As cloud-service providers like Google, Amazon, and Microsoft decrease costs and increase performance, numerical weather prediction (NWP) in the cloud will become a reality not only for research use but for real-time use as well. The performance of the Weather Research and Forecasting (WRF) Model on the Google Cloud Platform is tested and configurations and optimizations of virtual machines that meet two main requirements of real-time NWP are found: 1) fast forecast completion (timeliness) and 2) economic cost effectiveness when compared with traditional on-premise high-performance computing hardware. Optimum performance was found by using the Intel compiler collection with no more than eight virtual CPUs per virtual machine. Using these configurations, real-time NWP on the Google Cloud Platform is found to be economically competitive when compared with the purchase of local high-performance computing hardware for NWP needs. Cloud-computing services are becoming viable alternatives to on-premise compute clusters for some applications.

Download Full-text

Timing Predictability in High-Performance Computing With Probabilistic Real-Time

IEEE Access ◽

10.1109/access.2020.3038559 ◽

2020 ◽

Vol 8 ◽

pp. 208566-208582

Author(s):

Federico Reghenzani ◽

Giuseppe Massari ◽

William Fornaciari

Keyword(s):

High Performance Computing ◽

Real Time ◽

High Performance ◽

Performance Computing

Download Full-text

Computing Solution for the Recognition of Basic Actions of Violence in Real Time, from the use of Convolutional Neural Networks, Video Sequences and High Performance Computing

2019 XLV Latin American Computing Conference (CLEI) ◽

10.1109/clei47609.2019.235100 ◽

2019 ◽

Author(s):

Almendra Prisila Laureano Lumba ◽

Roy Roger Rios Nunez ◽

Isaac Ocampo Yahuarcani ◽

Rodolfo Cardenas Vigo ◽

Carlos Alberto Garcia Cortegano ◽

...

Keyword(s):

Neural Networks ◽

High Performance Computing ◽

Real Time ◽

Convolutional Neural Networks ◽

High Performance ◽

Video Sequences ◽

Performance Computing

Download Full-text

Editor’s Note: Special Issue on High-Level Languages and Frameworks for High-Performance Computing

International Journal of Parallel Programming ◽

10.1007/s10766-019-00644-z ◽

2019 ◽

Vol 47 (5-6) ◽

pp. 1045-1045

Keyword(s):

High Performance Computing ◽

High Performance ◽

Special Issue ◽

High Level ◽

Performance Computing

Download Full-text

PAEAN: Portable and scalable runtime support for parallel Haskell dialects

Journal of Functional Programming ◽

10.1017/s0956796816000010 ◽

2016 ◽

Vol 26 ◽

Cited By ~ 1

Author(s):

JOST BERTHOLD ◽

HANS-WOLFGANG LOIDL ◽

KEVIN HAMMOND

Keyword(s):

Shared Memory ◽

High Performance ◽

Parallel Machines ◽

State Of The Art ◽

Computing Systems ◽

Programming Abstraction ◽

Work Distribution ◽

High Level ◽

Parallelism Model ◽

Performance Computing

AbstractOver time, several competing approaches to parallel Haskell programming have emerged. Different approaches support parallelism at various different scales, ranging from small multicores to massively parallel high-performance computing systems. They also provide varying degrees of control, ranging from completely implicit approaches to ones providing full programmer control. Most current designs assume a shared memory model at the programmer, implementation and hardware levels. This is, however, becoming increasingly divorced from the reality at the hardware level. It also imposes significant unwanted runtime overheads in the form of garbage collection synchronisation etc. What is needed is an easy way to abstract over the implementation and hardware levels, while presenting a simple parallelism model to the programmer. The PArallEl shAred Nothing runtime system design aims to provide a portable and high-level shared-nothing implementation platform for parallel Haskell dialects. It abstracts over major issues such as work distribution and data serialisation, consolidating existing, successful designs into a single framework. It also provides an optional virtual shared-memory programming abstraction for (possibly) shared-nothing parallel machines, such as modern multicore/manycore architectures or cluster/cloud computing systems. It builds on, unifies and extends, existing well-developed support for shared-memory parallelism that is provided by the widely used GHC Haskell compiler. This paper summarises the state-of-the-art in shared-nothing parallel Haskell implementations, introduces the PArallEl shAred Nothing abstractions, shows how they can be used to implement three distinct parallel Haskell dialects, and demonstrates that good scalability can be obtained on recent parallel machines.

Download Full-text