scholarly journals A High Granularity Approach to NetworkPacket Processing for Latency-TolerantApplications with CUDA (Corvyd)

2021 ◽  
Vol 13 (2) ◽  
pp. 7
Author(s):  
Maria Pantoja

Currently, practical network packet processing used for In-trusion Detection Systems/Intrusion Prevention Systems (IDS/IPS) tendto belong to one of two disjoint categories: software-only implementa-tions running on general-purpose CPUs, or highly specialized networkhardware implementations using ASICs or FPGAs for the most commonfunctions, general-purpose CPUs for the rest. These approaches cover tryto maximize the performance and minimize the cost, but neither system,when implemented effectively, is affordable to any clients except for thoseat the well-funded enterprise level. In this paper, we aim to improve theperformance of affordable network packet processing in heterogeneoussystems with consumer Graphics Processing Units (GPUs) hardware byoptimizing latency-tolerant packet processing operations, notably IDS,to obtain maximum throughput required by such systems in networkssophisticated enough to demand a dedicated IDS/IPS system, but notenough to justify the high cost of cutting-edge specialized hardware. Inparticular, this project investigated increasing the granularity of OSIlayer-based packet batching over that of previous batching approaches.We demonstrate that highly granular GPU-enabled packet processing isgenerally impractical, compared with existing methods, by implementingour own solution that we call Corvyd, a heterogeneous real-time packetprocessing engine.

2014 ◽  
Vol 4 (4) ◽  
Author(s):  
Liberios Vokorokos ◽  
Michal Ennert ◽  
Marek >Čajkovský ◽  
Ján Radušovský

AbstractIntrusion detection is enormously developing field of informatics. This paper provides a survey of actual trends in intrusion detection in academic research. It presents a review about the evolution of intrusion detection systems with usage of general purpose computing on graphics processing units (GPGPU). There are many detection techniques but only some of them bring advantages of parallel computing implementation to graphical processors (GPU). The most common technique transformed into GPU is the technique of pattern matching. There is a number of intrusion detection tools using GPU tested in real network traffic.


Author(s):  
Hua He ◽  
Jimmy Lin ◽  
Adam Lopez

Grammars for machine translation can be materialized on demand by finding source phrases in an indexed parallel corpus and extracting their translations. This approach is limited in practical applications by the computational expense of online lookup and extraction. For phrase-based models, recent work has shown that on-demand grammar extraction can be greatly accelerated by parallelization on general purpose graphics processing units (GPUs), but these algorithms do not work for hierarchical models, which require matching patterns that contain gaps. We address this limitation by presenting a novel GPU algorithm for on-demand hierarchical grammar extraction that is at least an order of magnitude faster than a comparable CPU algorithm when processing large batches of sentences. In terms of end-to-end translation, with decoding on the CPU, we increase throughput by roughly two thirds on a standard MT evaluation dataset. The GPU necessary to achieve these improvements increases the cost of a server by about a third. We believe that GPU-based extraction of hierarchical grammars is an attractive proposition, particularly for MT applications that demand high throughput.


2020 ◽  
pp. 353-385
Author(s):  
James Reinders ◽  
Ben Ashbaugh ◽  
James Brodman ◽  
Michael Kinsner ◽  
John Pennycook ◽  
...  

Abstract Over the last few decades, Graphics Processing Units (GPUs) have evolved from specialized hardware devices capable of drawing images on a screen to general-purpose devices capable of executing complex parallel kernels. Nowadays, nearly every computer includes a GPU alongside a traditional CPU, and many programs may be accelerated by offloading part of a parallel algorithm from the CPU to the GPU.


2021 ◽  
Vol 47 (2) ◽  
pp. 1-28
Author(s):  
Goran Flegar ◽  
Hartwig Anzt ◽  
Terry Cojean ◽  
Enrique S. Quintana-Ortí

The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in high-end GPUs (graphics processing units) has motivated numerous efforts aiming at carefully reducing the working precision in order to speed up the computations. For algorithms whose performance is bound by the memory bandwidth, the idea of compressing its data before (and after) memory accesses has received considerable attention. One idea is to store an approximate operator–like a preconditioner–in lower than working precision hopefully without impacting the algorithm output. We realize the first high-performance implementation of an adaptive precision block-Jacobi preconditioner which selects the precision format used to store the preconditioner data on-the-fly, taking into account the numerical properties of the individual preconditioner blocks. We implement the adaptive block-Jacobi preconditioner as production-ready functionality in the Ginkgo linear algebra library, considering not only the precision formats that are part of the IEEE standard, but also customized formats which optimize the length of the exponent and significand to the characteristics of the preconditioner blocks. Experiments run on a state-of-the-art GPU accelerator show that our implementation offers attractive runtime savings.


2011 ◽  
Vol 28 (1) ◽  
pp. 1-14 ◽  
Author(s):  
W. van Straten ◽  
M. Bailes

Abstractdspsr is a high-performance, open-source, object-oriented, digital signal processing software library and application suite for use in radio pulsar astronomy. Written primarily in C++, the library implements an extensive range of modular algorithms that can optionally exploit both multiple-core processors and general-purpose graphics processing units. After over a decade of research and development, dspsr is now stable and in widespread use in the community. This paper presents a detailed description of its functionality, justification of major design decisions, analysis of phase-coherent dispersion removal algorithms, and demonstration of performance on some contemporary microprocessor architectures.


Processes ◽  
2020 ◽  
Vol 8 (9) ◽  
pp. 1199
Author(s):  
Ravie Chandren Muniyandi ◽  
Ali Maroosi

Long-timescale simulations of biological processes such as photosynthesis or attempts to solve NP-hard problems such as traveling salesman, knapsack, Hamiltonian path, and satisfiability using membrane systems without appropriate parallelization can take hours or days. Graphics processing units (GPU) deliver an immensely parallel mechanism to compute general-purpose computations. Previous studies mapped one membrane to one thread block on GPU. This is disadvantageous given that when the quantity of objects for each membrane is small, the quantity of active thread will also be small, thereby decreasing performance. While each membrane is designated to one thread block, the communication between thread blocks is needed for executing the communication between membranes. Communication between thread blocks is a time-consuming process. Previous approaches have also not addressed the issue of GPU occupancy. This study presents a classification algorithm to manage dependent objects and membranes based on the communication rate associated with the defined weighted network and assign them to sub-matrices. Thus, dependent objects and membranes are allocated to the same threads and thread blocks, thereby decreasing communication between threads and thread blocks and allowing GPUs to maintain the highest occupancy possible. The experimental results indicate that for 48 objects per membrane, the algorithm facilitates a 93-fold increase in processing speed compared to a 1.6-fold increase with previous algorithms.


2019 ◽  
Vol 97 ◽  
pp. 836-848
Author(s):  
Lan Gao ◽  
Yunlong Xu ◽  
Rui Wang ◽  
Hailong Yang ◽  
Zhongzhi Luan ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document