scholarly journals High-performance packet routing acceleration for cloud systems using high bandwidth memory

Author(s):  
Takashi Shimizu ◽  
Kazuki Hyoudou ◽  
Hiroshi Murakawa ◽  
Tomohiro Ishihara
Electronics ◽  
2020 ◽  
Vol 9 (8) ◽  
pp. 1275
Author(s):  
Changdao Du ◽  
Yoshiki Yamaguchi

Due to performance and energy requirements, FPGA-based accelerators have become a promising solution for high-performance computations. Meanwhile, with the help of high-level synthesis (HLS) compilers, FPGA can be programmed using common programming languages such as C, C++, or OpenCL, thereby improving design efficiency and portability. Stencil computations are significant kernels in various scientific applications. In this paper, we introduce an architecture design for implementing stencil kernels on state-of-the-art FPGA with high bandwidth memory (HBM). Traditional FPGAs are usually equipped with external memory, e.g., DDR3 or DDR4, which limits the design space exploration in the spatial domain of stencil kernels. Therefore, many previous studies mainly relied on exploiting parallelism in the temporal domain to eliminate the bandwidth limitations. In our approach, we scale-up the design performance by considering both the spatial and temporal parallelism of the stencil kernel equally. We also discuss the design portability among different HLS compilers. We use typical stencil kernels to evaluate our design on a Xilinx U280 FPGA board and compare the results with other existing studies. By adopting our method, developers can take broad parallelization strategies based on specific FPGA resources to improve performance.


2021 ◽  
Author(s):  
Artur Saakov

The concept of telepresence allows human beings to interact with hazardous environments and situations without facing any actual risks. Examples include the nuclear industry, outer space and underwater operations, mining, bomb disposal and firefighting. Recent progress in digital system technology, especially in technology of reconfigurable logic devices (e.g. FPGA), allows the effective implementation of advanced embedded systems characterized by high-performance data processing and high-bandwidth communication. However, most of the existing telepresence systems do not benefit from these advancements. Therefore, the goal of this work was to develop a concept and architecture of the platform for the 3D-Panoramic Telepresence System for mobile robotic applications based on reconfigurable logic devices. During the development process, two versions of the system were implemented. The first system focused on feasibility testing of major components of the proposed architecture. Based on the experimental results obtained on the first prototype of the system and their analyses, a set of recommendations were derived for an updated version of the system. These recommendations were incorporated into the implementation of the second and final version of the system.


2001 ◽  
Vol 12 (04) ◽  
pp. 459-467
Author(s):  
CLAUDIO D. ARLANDINI ◽  
MATTEO J. BOSCHINI ◽  
ANDREA MATTASOGLIO

In this work we describe a series of performance tests on different architectures of high bandwidth local area networks, contemporarily in use at C.I.L.E.A. (Inter-University Consortium for Automatic Elaboration of Lombardy) to connect multi-processor machines devoted to educational and research purposes, such as fluido-dynamic and mechanical simulations. This LAN is essentially made out of a standard FDDI ring, and an HyperFabric backbone. HyperFabric is a Hewlett-Packard high performance network system bus, with a declared maximum bandwidth of 2.5 Gbit/s full duplex per link. We present a comparison, in terms of effective bandwidth, average throughput and CPU consumption of the above mentioned network systems. Furthermore we also describe the effects, in terms of transfer efficiency, of such a mixed environment, in which different systems co-exist and must often be cross-walked by various applications, as backups and mass storage access. Measurements and comparisons are made using Open Software tools like netperf and HetPIPE.


2022 ◽  
Vol 15 (1) ◽  
pp. 1-31
Author(s):  
Philippos Papaphilippou ◽  
Jiuxi Meng ◽  
Nadeen Gebara ◽  
Wayne Luk

We present Hipernetch, a novel FPGA-based design for performing high-bandwidth network switching. FPGAs have recently become more popular in data centers due to their promising capabilities for a wide range of applications. With the recent surge in transceiver bandwidth, they could further benefit the implementation and refinement of network switches used in data centers. Hipernetch replaces the crossbar with a “combined parallel round-robin arbiter”. Unlike a crossbar, the combined parallel round-robin arbiter is easy to pipeline, and does not require centralised iterative scheduling algorithms that try to fit too many steps in a single or a few FPGA cycles. The result is a network switch implementation on FPGAs operating at a high frequency and with a low port-to-port latency. Our proposed Hipernetch architecture additionally provides a competitive switching performance approaching output-queued crossbar switches. Our implemented Hipernetch designs exhibit a throughput that exceeds 100 Gbps per port for switches of up to 16 ports, reaching an aggregate throughput of around 1.7 Tbps.


Author(s):  
Stefan Westerlund ◽  
Christopher Harris

AbstractThe latest generation of radio astronomy interferometers will conduct all sky surveys with data products consisting of petabytes of spectral line data. Traditional approaches to identifying and parameterising the astrophysical sources within this data will not scale to datasets of this magnitude, since the performance of workstations will not keep up with the real-time generation of data. For this reason, it is necessary to employ high performance computing systems consisting of a large number of processors connected by a high-bandwidth network. In order to make use of such supercomputers substantial modifications must be made to serial source finding code. To ease the transition, this work presents the Scalable Source Finder Framework, a framework providing storage access, networking communication and data composition functionality, which can support a wide range of source finding algorithms provided they can be applied to subsets of the entire image. Additionally, the Parallel Gaussian Source Finder was implemented using SSoFF, utilising Gaussian filters, thresholding, and local statistics. PGSF was able to search on a 256GB simulated dataset in under 24 minutes, significantly less than the 8 to 12 hour observation that would generate such a dataset.


Author(s):  
Vladimir Stegailov ◽  
Ekaterina Dlinnova ◽  
Timur Ismagilov ◽  
Mikhail Khalilov ◽  
Nikolay Kondratyuk ◽  
...  

In this article, we describe the Desmos supercomputer that consists of 32 hybrid nodes connected by a low-latency high-bandwidth Angara interconnect with torus topology. This supercomputer is aimed at cost-effective classical molecular dynamics calculations. Desmos serves as a test bed for the Angara interconnect that supports 3-D and 4-D torus network topologies and verifies its ability to unite massively parallel programming systems speeding-up effectively message-passing interface (MPI)-based applications. We describe the Angara interconnect presenting typical MPI benchmarks. Desmos benchmarks results for GROMACS, LAMMPS, VASP and CP2K are compared with the data for other high-performance computing (HPC) systems. Also, we consider the job scheduling statistics for several months of Desmos deployment.


Sign in / Sign up

Export Citation Format

Share Document