High-performance packet routing acceleration for cloud systems using high bandwidth memory

Due to performance and energy requirements, FPGA-based accelerators have become a promising solution for high-performance computations. Meanwhile, with the help of high-level synthesis (HLS) compilers, FPGA can be programmed using common programming languages such as C, C++, or OpenCL, thereby improving design efficiency and portability. Stencil computations are significant kernels in various scientific applications. In this paper, we introduce an architecture design for implementing stencil kernels on state-of-the-art FPGA with high bandwidth memory (HBM). Traditional FPGAs are usually equipped with external memory, e.g., DDR3 or DDR4, which limits the design space exploration in the spatial domain of stencil kernels. Therefore, many previous studies mainly relied on exploiting parallelism in the temporal domain to eliminate the bandwidth limitations. In our approach, we scale-up the design performance by considering both the spatial and temporal parallelism of the stencil kernel equally. We also discuss the design portability among different HLS compilers. We use typical stencil kernels to evaluate our design on a Xilinx U280 FPGA board and compare the results with other existing studies. By adopting our method, developers can take broad parallelization strategies based on specific FPGA resources to improve performance.

Download Full-text

Reconfigurable platform for 3D-panoramic telepresence system for mobile applications

10.32920/ryerson.14652129.v1 ◽

2021 ◽

Author(s):

Artur Saakov

Keyword(s):

High Performance ◽

Outer Space ◽

Nuclear Industry ◽

Human Beings ◽

Reconfigurable Logic ◽

Logic Devices ◽

Hazardous Environments ◽

High Bandwidth ◽

Reconfigurable Platform ◽

Bomb Disposal

The concept of telepresence allows human beings to interact with hazardous environments and situations without facing any actual risks. Examples include the nuclear industry, outer space and underwater operations, mining, bomb disposal and firefighting. Recent progress in digital system technology, especially in technology of reconfigurable logic devices (e.g. FPGA), allows the effective implementation of advanced embedded systems characterized by high-performance data processing and high-bandwidth communication. However, most of the existing telepresence systems do not benefit from these advancements. Therefore, the goal of this work was to develop a concept and architecture of the platform for the 3D-Panoramic Telepresence System for mobile robotic applications based on reconfigurable logic devices. During the development process, two versions of the system were implemented. The first system focused on feasibility testing of major components of the proposed architecture. Based on the experimental results obtained on the first prototype of the system and their analyses, a set of recommendations were derived for an updated version of the system. These recommendations were incorporated into the implementation of the second and final version of the system.

Download Full-text

A HIGH-PERFORMANCE MIXED-TECHNOLOGY LAN FOR EDUCATION AND RESEARCH

International Journal of Modern Physics C ◽

10.1142/s0129183101002231 ◽

2001 ◽

Vol 12 (04) ◽

pp. 459-467

Author(s):

CLAUDIO D. ARLANDINI ◽

MATTEO J. BOSCHINI ◽

ANDREA MATTASOGLIO

Keyword(s):

High Performance ◽

Local Area Networks ◽

Local Area ◽

Full Duplex ◽

Effective Bandwidth ◽

Performance Tests ◽

Mass Storage ◽

Network Systems ◽

High Bandwidth ◽

University Consortium

In this work we describe a series of performance tests on different architectures of high bandwidth local area networks, contemporarily in use at C.I.L.E.A. (Inter-University Consortium for Automatic Elaboration of Lombardy) to connect multi-processor machines devoted to educational and research purposes, such as fluido-dynamic and mechanical simulations. This LAN is essentially made out of a standard FDDI ring, and an HyperFabric backbone. HyperFabric is a Hewlett-Packard high performance network system bus, with a declared maximum bandwidth of 2.5 Gbit/s full duplex per link. We present a comparison, in terms of effective bandwidth, average throughput and CPU consumption of the above mentioned network systems. Furthermore we also describe the effects, in terms of transfer efficiency, of such a mixed environment, in which different systems co-exist and must often be cross-walked by various applications, as backups and mass storage access. Measurements and comparisons are made using Open Software tools like netperf and HetPIPE.

Download Full-text

Hipernetch: High-Performance FPGA Network Switch

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3477054 ◽

2022 ◽

Vol 15 (1) ◽

pp. 1-31

Author(s):

Philippos Papaphilippou ◽

Jiuxi Meng ◽

Nadeen Gebara ◽

Wayne Luk

Keyword(s):

High Frequency ◽

High Performance ◽

Data Centers ◽

Round Robin ◽

Switching Performance ◽

Wide Range ◽

Crossbar Switches ◽

High Bandwidth ◽

Network Switch ◽

Network Switches

We present Hipernetch, a novel FPGA-based design for performing high-bandwidth network switching. FPGAs have recently become more popular in data centers due to their promising capabilities for a wide range of applications. With the recent surge in transceiver bandwidth, they could further benefit the implementation and refinement of network switches used in data centers. Hipernetch replaces the crossbar with a “combined parallel round-robin arbiter”. Unlike a crossbar, the combined parallel round-robin arbiter is easy to pipeline, and does not require centralised iterative scheduling algorithms that try to fit too many steps in a single or a few FPGA cycles. The result is a network switch implementation on FPGAs operating at a high frequency and with a low port-to-port latency. Our proposed Hipernetch architecture additionally provides a competitive switching performance approaching output-queued crossbar switches. Our implemented Hipernetch designs exhibit a throughput that exceeds 100 Gbps per port for switches of up to 16 ports, reaching an aggregate throughput of around 1.7 Tbps.

Download Full-text

Performance of Cryptographic Protocols for High-Performance, High-Bandwidth and High-Latency Grid Systems

Third IEEE International Conference on e-Science and Grid Computing (e-Science 2007) ◽

10.1109/e-science.2007.58 ◽

2007 ◽

Cited By ~ 1

Author(s):

Himanshu Khurana ◽

Radostina Koleva ◽

Jim Basney

Keyword(s):

High Performance ◽

Cryptographic Protocols ◽

Grid Systems ◽

High Bandwidth

Download Full-text

A Framework for HI Spectral Source Finding Using Distributed-Memory Supercomputing

Publications of the Astronomical Society of Australia ◽

10.1017/pasa.2014.18 ◽

2014 ◽

Vol 31 ◽

Cited By ~ 2

Author(s):

Stefan Westerlund ◽

Christopher Harris

Keyword(s):

High Performance ◽

Distributed Memory ◽

Computing Systems ◽

Sky Surveys ◽

Local Statistics ◽

Wide Range ◽

Gaussian Source ◽

High Bandwidth ◽

Traditional Approaches ◽

Performance Computing

AbstractThe latest generation of radio astronomy interferometers will conduct all sky surveys with data products consisting of petabytes of spectral line data. Traditional approaches to identifying and parameterising the astrophysical sources within this data will not scale to datasets of this magnitude, since the performance of workstations will not keep up with the real-time generation of data. For this reason, it is necessary to employ high performance computing systems consisting of a large number of processors connected by a high-bandwidth network. In order to make use of such supercomputers substantial modifications must be made to serial source finding code. To ease the transition, this work presents the Scalable Source Finder Framework, a framework providing storage access, networking communication and data composition functionality, which can support a wide range of source finding algorithms provided they can be applied to subsets of the entire image. Additionally, the Parallel Gaussian Source Finder was implemented using SSoFF, utilising Gaussian filters, thresholding, and local statistics. PGSF was able to search on a 256GB simulated dataset in under 24 minutes, significantly less than the 8 to 12 hour observation that would generate such a dataset.

Download Full-text

Angara interconnect makes GPU-based Desmos supercomputer an efficient tool for molecular dynamics calculations

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019826667 ◽

2019 ◽

Vol 33 (3) ◽

pp. 507-521 ◽

Cited By ~ 10

Author(s):

Vladimir Stegailov ◽

Ekaterina Dlinnova ◽

Timur Ismagilov ◽

Mikhail Khalilov ◽

Nikolay Kondratyuk ◽

...

Keyword(s):

Molecular Dynamics ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Job Scheduling ◽

Cost Effective ◽

Test Bed ◽

Molecular Dynamics Calculations ◽

High Bandwidth ◽

Network Topologies

In this article, we describe the Desmos supercomputer that consists of 32 hybrid nodes connected by a low-latency high-bandwidth Angara interconnect with torus topology. This supercomputer is aimed at cost-effective classical molecular dynamics calculations. Desmos serves as a test bed for the Angara interconnect that supports 3-D and 4-D torus network topologies and verifies its ability to unite massively parallel programming systems speeding-up effectively message-passing interface (MPI)-based applications. We describe the Angara interconnect presenting typical MPI benchmarks. Desmos benchmarks results for GROMACS, LAMMPS, VASP and CP2K are compared with the data for other high-performance computing (HPC) systems. Also, we consider the job scheduling statistics for several months of Desmos deployment.

Download Full-text

High bandwidth low latency chip to chip interconnects using high performance MLC glass ceramic POWER4/sup R/ MCM

IEEE 10th Topical Meeting on Electrical Performance of Electronic Packaging (Cat No 01TH8565) EPEP-01 ◽

10.1109/epep.2001.967668 ◽

2002 ◽

Author(s):

P. Walling ◽

A. Tai ◽

H. Hamel ◽

R. Weekly ◽

A. Haridass

Keyword(s):

Glass Ceramic ◽

High Performance ◽

Low Latency ◽

High Bandwidth

Download Full-text