The Status and Challenges of Multi-Processor System-on-Chip’s Formal Verification

With the continuous advancement of processor manufacturing process as well as the exposed limitations of single-core Processors, Multi-Processor System-on-Chip (MPSoC) has become the inevitable outcome of the technological development and practical application needs. It is used to meet the requirements of multitasking, multifunctional and high performance computing. With the improvement of chip complexity, verification module also increases exponentially. Verification of MPSoC is becoming Bottleneck in the process of chip’s design. So this paper first introduces the origin of MPSoC, and analyzes developing tendency of its verification. And then, the theory and main challenges to the formal verification of MPSoC are discussed. This paper will provide support for building the verified theory method and technology that can meet the demand of MPSoC design, and Developing MPSoC high-level architecture design verification technology.

Download Full-text

Evaluation of Static Mapping for Dynamic Space-Shared Multi-task Processing on FPGAs

Journal of Signal Processing Systems ◽

10.1007/s11265-020-01633-z ◽

2021 ◽

Author(s):

Umar Ibrahim Minhas ◽

Roger Woods ◽

Georgios Karakonstantis

Keyword(s):

High Performance ◽

Design Space Exploration ◽

Design Space ◽

System Throughput ◽

Design Parameters ◽

Temporal Constraints ◽

Shared Resources ◽

Task Processing ◽

High Level ◽

Performance Computing

AbstractWhilst FPGAs have been used in cloud ecosystems, it is still extremely challenging to achieve high compute density when mapping heterogeneous multi-tasks on shared resources at runtime. This work addresses this by treating the FPGA resource as a service and employing multi-task processing at the high level, design space exploration and static off-line partitioning in order to allow more efficient mapping of heterogeneous tasks onto the FPGA. In addition, a new, comprehensive runtime functional simulator is used to evaluate the effect of various spatial and temporal constraints on both the existing and new approaches when varying system design parameters. A comprehensive suite of real high performance computing tasks was implemented on a Nallatech 385 FPGA card and show that our approach can provide on average 2.9 × and 2.3 × higher system throughput for compute and mixed intensity tasks, while 0.2 × lower for memory intensive tasks due to external memory access latency and bandwidth limitations. The work has been extended by introducing a novel scheduling scheme to enhance temporal utilization of resources when using the proposed approach. Additional results for large queues of mixed intensity tasks (compute and memory) show that the proposed partitioning and scheduling approach can provide higher than 3 × system speedup over previous schemes.

Download Full-text

A Modified KNN Algorithm for High-Performance Computing on FPGA of Real-Time m-QAM Demodulators

Electronics ◽

10.3390/electronics10050627 ◽

2021 ◽

Vol 10 (5) ◽

pp. 627

Author(s):

David Marquez-Viloria ◽

Luis Castano-Londono ◽

Neil Guerrero-Gonzalez

Keyword(s):

Real Time ◽

High Performance ◽

Interference Mitigation ◽

Parallel Implementation ◽

Computational Time ◽

Successful Implementation ◽

Interchannel Interference ◽

The Difference ◽

High Level ◽

Performance Computing

A methodology for scalable and concurrent real-time implementation of highly recurrent algorithms is presented and experimentally validated using the AWS-FPGA. This paper presents a parallel implementation of a KNN algorithm focused on the m-QAM demodulators using high-level synthesis for fast prototyping, parameterization, and scalability of the design. The proposed design shows the successful implementation of the KNN algorithm for interchannel interference mitigation in a 3 × 16 Gbaud 16-QAM Nyquist WDM system. Additionally, we present a modified version of the KNN algorithm in which comparisons among data symbols are reduced by identifying the closest neighbor using the rule of the 8-connected clusters used for image processing. Real-time implementation of the modified KNN on a Xilinx Virtex UltraScale+ VU9P AWS-FPGA board was compared with the results obtained in previous work using the same data from the same experimental setup but offline DSP using Matlab. The results show that the difference is negligible below FEC limit. Additionally, the modified KNN shows a reduction of operations from 43 percent to 75 percent, depending on the symbol’s position in the constellation, achieving a reduction 47.25% reduction in total computational time for 100 K input symbols processed on 20 parallel cores compared to the KNN algorithm.

Download Full-text

Merging Plasmonics and Silicon Photonics Towards Greener and Faster “Network-on-Chip” Solutions for Data Centers and High-Performance Computing Systems

Plasmonics - Principles and Applications ◽

10.5772/51853 ◽

2012 ◽

Cited By ~ 3

Author(s):

Sotirios Papaioannou ◽

Konstantinos Vyrsokinos ◽

Dimitrios Kalavrouziotis ◽

Giannis Giannoulis ◽

Dimitrios Apostolopoulos ◽

...

Keyword(s):

High Performance Computing ◽

Silicon Photonics ◽

High Performance ◽

Data Centers ◽

Network On Chip ◽

Computing Systems ◽

On Chip ◽

Performance Computing

Download Full-text

Location of Processor Allocator and Job Scheduler and Its Impact on CMP Performance

International Journal of Electronics and Telecommunications ◽

10.2478/v10177-012-0001-y ◽

2012 ◽

Vol 58 (1) ◽

pp. 9-14 ◽

Cited By ~ 1

Author(s):

Dawid Zydek ◽

Grzegorz Chmaj ◽

Alaa Shawky ◽

Henry Selvaraj

Keyword(s):

High Performance ◽

Chip Multiprocessors ◽

Energy Savings ◽

Processor Allocation ◽

Processing Elements ◽

Energy Models ◽

Allocation Process ◽

On Chip ◽

Job Scheduler ◽

Performance Computing

Location of Processor Allocator and Job Scheduler and Its Impact on CMP PerformanceHigh Performance Computing (HPC) architectures are being developed continually with an aim of achieving exascale capability by 2020. Processors that are being developed and used as nodes in HPC systems are Chip Multiprocessors (CMPs) with a number of cores. In this paper, we continue our effort towards a better processor allocation process. The Processor Allocator (PA) and Job Scheduler (JS) proposed and implemented in our previous works are explored in the context of its best location on the chip. We propose a system, where all locations on a chip can be analyzed, considering energy used by Network-on-Chip (NoC), PA and JS, and processing elements. We present energy models for the researched CMP components, mathematical model of the system, and experimentation system. Based on experimental results, proper placement of PA and JS on a chip can provide up to 45% NoC energy savings.

Download Full-text

2016 Sixth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC)

10.1109/wolfhpc40351.2016 ◽

2016 ◽

Keyword(s):

High Performance Computing ◽

High Performance ◽

International Workshop ◽

Domain Specific Languages ◽

Domain Specific ◽

Sixth International Workshop ◽

High Level ◽

Performance Computing ◽

Sixth International

Download Full-text

A Debugging Standard for High-Performance Computing

Scientific Programming ◽

10.1155/2000/971291 ◽

2000 ◽

Vol 8 (2) ◽

pp. 95-108 ◽

Cited By ~ 4

Author(s):

Joan M. Francioni ◽

Cherri M. Pancake

Keyword(s):

High Performance Computing ◽

High Performance ◽

Base Level ◽

The Status ◽

Performance Debugging ◽

One Year ◽

Performance Computing

Throughout 1998, the High Performance Debugging Forum worked on defining a base level standard for high performance debuggers. The standard had to meet the sometimes conflicting constraints of being useful to users, realistically implementable by developers, and architecturally independent across multiple platforms. To meet criteria for timeliness, the standard had to be defined in one year and in such a way that it could be implemented within an additional year. The Forum was successful, and in November 1998 released Version 1 of the HPD Standard. Implementations of the standard are currently underway. This paper presents an overview of Version 1 of the standard and an analysis of the process by which the standard was developed. The status of implementation efforts and plans for follow-on efforts are discussed as well.

Download Full-text

Practical Application of High-Resolution Reservoir Simulation and High-Performance Computing for Accurate Modeling of Low Permeability Gas Condensate Reservoirs Production (Russian)

10.2118/196916-ru ◽

2019 ◽

Author(s):

Ruslan Sharafutdinov ◽

Victor Tyurin ◽

Dmitry Fateev ◽

Sergey Skvortsov ◽

Yuriy Dolgikh ◽

...

Keyword(s):

High Resolution ◽

High Performance Computing ◽

Reservoir Simulation ◽

High Performance ◽

Gas Condensate ◽

Low Permeability ◽

Practical Application ◽

Gas Condensate Reservoirs ◽

Performance Computing

Download Full-text

Practical Application of High-Resolution Reservoir Simulation and High-Performance Computing for Accurate Modeling of Low Permeability Gas Condensate Reservoirs Production

10.2118/196916-ms ◽

2019 ◽

Author(s):

Ruslan Sharafutdinov ◽

Victor Tyurin ◽

Dmitry Fateev ◽

Sergey Skvortsov ◽

Yuriy Dolgikh ◽

...

Keyword(s):

High Resolution ◽

High Performance Computing ◽

Reservoir Simulation ◽

High Performance ◽

Gas Condensate ◽

Low Permeability ◽

Practical Application ◽

Gas Condensate Reservoirs ◽

Performance Computing

Download Full-text

Editor’s Note: Special Issue on High-Level Languages and Frameworks for High-Performance Computing

International Journal of Parallel Programming ◽

10.1007/s10766-019-00644-z ◽

2019 ◽

Vol 47 (5-6) ◽

pp. 1045-1045

Keyword(s):

High Performance Computing ◽

High Performance ◽

Special Issue ◽

High Level ◽

Performance Computing

Download Full-text

PAEAN: Portable and scalable runtime support for parallel Haskell dialects

Journal of Functional Programming ◽

10.1017/s0956796816000010 ◽

2016 ◽

Vol 26 ◽

Cited By ~ 1

Author(s):

JOST BERTHOLD ◽

HANS-WOLFGANG LOIDL ◽

KEVIN HAMMOND

Keyword(s):

Shared Memory ◽

High Performance ◽

Parallel Machines ◽

State Of The Art ◽

Computing Systems ◽

Programming Abstraction ◽

Work Distribution ◽

High Level ◽

Parallelism Model ◽

Performance Computing

AbstractOver time, several competing approaches to parallel Haskell programming have emerged. Different approaches support parallelism at various different scales, ranging from small multicores to massively parallel high-performance computing systems. They also provide varying degrees of control, ranging from completely implicit approaches to ones providing full programmer control. Most current designs assume a shared memory model at the programmer, implementation and hardware levels. This is, however, becoming increasingly divorced from the reality at the hardware level. It also imposes significant unwanted runtime overheads in the form of garbage collection synchronisation etc. What is needed is an easy way to abstract over the implementation and hardware levels, while presenting a simple parallelism model to the programmer. The PArallEl shAred Nothing runtime system design aims to provide a portable and high-level shared-nothing implementation platform for parallel Haskell dialects. It abstracts over major issues such as work distribution and data serialisation, consolidating existing, successful designs into a single framework. It also provides an optional virtual shared-memory programming abstraction for (possibly) shared-nothing parallel machines, such as modern multicore/manycore architectures or cluster/cloud computing systems. It builds on, unifies and extends, existing well-developed support for shared-memory parallelism that is provided by the widely used GHC Haskell compiler. This paper summarises the state-of-the-art in shared-nothing parallel Haskell implementations, introduces the PArallEl shAred Nothing abstractions, shows how they can be used to implement three distinct parallel Haskell dialects, and demonstrates that good scalability can be obtained on recent parallel machines.

Download Full-text