OPTIMIZED METHOD OF NETWORK PACKET ROUTING

Author(s):  
A. S. Kotlyarov

In the paper we review the possibility of using a block search method for network packets routing tasks in high-speed computer networks. The method provides minimization of hardware costs for large-scale routing tables. To support the maximum data transfer rate, it is necessary to perform real-time routing of packets. The existing solution presumes concurrent use of several routing devices. Each device performs independent search of records by the bisection method. However, if the channel rate exceeds 10 Gb/sec, and the number of routs exceeds 220, it leads to high hardware costs. To reduce hardware costs, we have suggested to use a modified block search method, which differs from the classic one by parallel-pipeline form of search. We have presented evaluation of the minimum field-programmable gate array hardware costs for the network packets routing task. Analysis of the results has proved efficiency of the suggested method in comparison with existing solutions. As a result, the hardware costs were reduced in 5 times.

Author(s):  
Esma Yildirim ◽  
Tevfik Kosar

The emerging petascale increase in the data produced by large-scale scientific applications necessitates innovative solutions for efficient transfer of data through the advanced infrastructure provided by today’s high-speed networks and complex computer-architectures (e.g. supercomputers, parallel storage systems). Although the current optical networking technology reached transport speeds of 100Gbps, the applications still suffer from the inadequate transport protocols and end-system bottlenecks such as processor speed, disk I/O speed and network interface card limits that cause underutilization of the existing network infrastructure and let the application achieve only a small portion of the theoretical performance. Fortunately, with the parallelism provided by usage of multiple CPUs/nodes and multiple disks present in today’s systems, these bottlenecks could be eliminated. However it is necessary to understand the characteristics of the end-systems and the transport protocol used. In this book chapter, we analyze methodologies that will improve the data transfer speed of applications and provide maximal speeds that could be obtained from the available end-system resources and high-speed networks through usage of end-to-end dataflow parallelism.


2019 ◽  
Vol 36 (1) ◽  
pp. 1-9 ◽  
Author(s):  
Vahid Jalili ◽  
Enis Afgan ◽  
James Taylor ◽  
Jeremy Goecks

Abstract Motivation Large biomedical datasets, such as those from genomics and imaging, are increasingly being stored on commercial and institutional cloud computing platforms. This is because cloud-scale computing resources, from robust backup to high-speed data transfer to scalable compute and storage, are needed to make these large datasets usable. However, one challenge for large-scale biomedical data on the cloud is providing secure access, especially when datasets are distributed across platforms. While there are open Web protocols for secure authentication and authorization, these protocols are not in wide use in bioinformatics and are difficult to use for even technologically sophisticated users. Results We have developed a generic and extensible approach for securely accessing biomedical datasets distributed across cloud computing platforms. Our approach combines OpenID Connect and OAuth2, best-practice Web protocols for authentication and authorization, together with Galaxy (https://galaxyproject.org), a web-based computational workbench used by thousands of scientists across the world. With our enhanced version of Galaxy, users can access and analyze data distributed across multiple cloud computing providers without any special knowledge of access/authorization protocols. Our approach does not require users to share permanent credentials (e.g. username, password, API key), instead relying on automatically generated temporary tokens that refresh as needed. Our approach is generalizable to most identity providers and cloud computing platforms. To the best of our knowledge, Galaxy is the only computational workbench where users can access biomedical datasets across multiple cloud computing platforms using best-practice Web security approaches and thereby minimize risks of unauthorized data access and credential use. Availability and implementation Freely available for academic and commercial use under the open-source Academic Free License (https://opensource.org/licenses/AFL-3.0) from the following Github repositories: https://github.com/galaxyproject/galaxy and https://github.com/galaxyproject/cloudauthz.


2021 ◽  
Vol 2089 (1) ◽  
pp. 012069
Author(s):  
A. Pradeep kumar ◽  
Y. Devendar Reddy ◽  
T. Srinivas Reddy ◽  
K. Jamal

Abstract Large scale Neural Network (NN) accelerators typically have multiple processing nodes that can be implemented as a multi-core chip, and can be organized on a network of chips (noise) corresponding to neurons with heavy traffic. Portions of several NoC-based NN chip-to-chip interconnect networks are linked to further enhance overall nerve amplification capacity. Large volumes of multicast on-chip or cross-chip can further complicate the construction of a cross-link network and create a NN barrier of device capacity and resources. In this paper, this refer to inter-chip and inter-chip communication strategies known as neuron connection for NN accelerators. Interconnect for powerful fault-tolerant routing system neural NoC is implemented in this paper. This recommends crossbar arbitration placement, virtual interrupts, and path-based parallelization strategies in terms of intra-chip communications for the virtual channel routing resulting in higher NoC output at lower hardware costs. A lightweight NoC compatible chip-to-chip interconnection scheme is proposed regarding to inter-chip communication for multicast-based data traffic to enable efficient interconnection for NoC-based NN chips. Moreover, the proposed methods will be tested with four Field Programmable Gate Arrays (FPGAs) on four hard-wired deep neural network (DNN) chips. From the experimental results it can be illustrate that a high throguput can obtained effectively by the proposed interconnection network in handling thedata traffic and low DNN through advanced links.


2013 ◽  
Vol 22 (04) ◽  
pp. 1350025
Author(s):  
STAVROS P. DOKOUZYANNIS ◽  
ARGIRIS P. MOKIOS

This paper analyzes the design automation of embedded Systolic Array Processors (SAPs), into large scale Field Programmable Gate Array (FPGA) devices. SAPs are hardware implementations of a class of iterative, high-level language algorithms, for applications where the high-speed of processing has the principal meaning of a design. Embedding SAPs onto FPGAs is a complex process. The optimization phase in this process reduces the SAP significantly, thus less FPGA area is occupied by the embedded design, without any loss in the final performance. The present paper examines the effect of Projection Vectors (PVs) and Task Scheduling Vectors (TSVs) on the optimization process. Two optimization approaches are examined, namely technology mapping using FlowMap and Flowpack algorithms and optimization via logic synthesis using Xilinx Synthesis Tool. The multiplication of matrices, with entries being up to 32-bit integer vectors, has been taken as a sample space for the experiments conducted. The results, confirm that the selection of PV and TSV greatly affects the number of input/output signal connections of the FPGA, while the selection of an optimization approach affects the final number of logic resources occupied on the targeted device.


Author(s):  
Nguyen Trinh ◽  
Anh Le Thi Kim ◽  
Hung Nguyen ◽  
Linh Tran

<span>Content addressable memory (CAM) and ternary content addressable memory (TCAM) are specialized high-speed memories for data searching. CAM and TCAM have many applications in network routing, packet forwarding and Internet data centers. These types of memories have drawbacks on power dissipation and area. As field-programmable gate array (FPGA) is recently being used for network acceleration applications, the demand to integrate TCAM and CAM on FPGA is increasing. Because most FPGAs do not support native TCAM and CAM hardware, methods of implementing algorithmic TCAM using FPGA resources have been proposed through recent years. Algorithmic TCAM on FPGA have the advantages of FPGAs low power consumption and high intergration scalability. This paper proposes a scaleable algorithmic TCAM design on FPGA. The design uses memory blocks to negate power dissipation issue and data collision to save area. The paper also presents a design of a 256 x 104-bit algorithmic TCAM on Intel FPGA Cyclone V, evaluates the performance and application ability of the design on large scale and in future developments.</span>


In DSP the most common function is Finite Impulse Response (FIR) filter which is realized in field Programmable gate Arrays (FPGAs). For efficient Very Large Scale Integration (VLSI) computation systolic FIR filter architecture has attractive models. High speed is the major concern for fast computation in real time Digital Signal Processing (DSP) applications. In conventional systolic FIR filter method uses general array multiplier structure which takes more time to compute the process with high design complexity with less power. To overcome this problem the systolic FIR filter utilizing Bypass Feed Direct Multiplier(BFDM) is proposed. The proposed method 16 tap systolic FIR parallel processing offers less delay with less design complexity which is used in image and signal processing applications. The proposed method is simulated using Xilinx ISE 12.4 ISE tool and the functions are evaluated by MODELSIM 6.3C.


Author(s):  
Masaya Yoshikawa ◽  
◽  
Hidekazu Terai

The floorplanning problem, a basic design step in layout design of very large-scale integrated circuit (VLSI), deals with placing rectangular modules at maximum density. Many studies have dealt with conducted this problem using sequence pairs based on genetic algorithms (GAs), but this generally requires much calculation time. We propose an architecture for high-speed floorplanning using a sequence pair based on GA. The proposed architecture, implemented on the field-programmable gate array (FPGA), achieves high-speed processing. Measurement evaluating the proposed architecture demonstrated speeds 37.1 times greater than software processing.


Author(s):  
Carlos Lago-Peñas ◽  
Anton Kalén ◽  
Miguel Lorenzo-Martinez ◽  
Roberto López-Del Campo ◽  
Ricardo Resta ◽  
...  

This study aimed to evaluate the effects playing position, match location (home or away), quality of opposition (strong or weak), effective playing time (total time minus stoppages), and score-line on physical match performance in professional soccer players using a large-scale analysis. A total of 10,739 individual match observations of outfield players competing in the Spanish La Liga during the 2018–2019 season were recorded using a computerized tracking system (TRACAB, Chyronhego, New York, USA). The players were classified into five positions (central defenders, players = 94; external defenders, players = 82; central midfielders, players = 101; external midfielders, players = 72; and forwards, players = 67) and the following match running performance categories were considered: total distance covered, low-speed running (LSR) distance (0–14 km · h−1), medium-speed running (MSR) distance (14–21 km · h−1), high-speed running (HSR) distance (>21 km · h−1), very HSR (VHSR) distance (21–24 km · h−1), sprint distance (>24 km · h−1) Overall, match running performance was highly dependent on situational variables, especially the score-line condition (winning, drawing, losing). Moreover, the score-line affected players running performance differently depending on their playing position. Losing status increased the total distance and the distance covered at MSR, HSR, VHSR and Sprint by defenders, while attacking players showed the opposite trend. These findings may help coaches and managers to better understand the effects of situational variables on physical performance in La Liga and could be used to develop a model for predicting the physical activity profile in competition.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Florian Roessler ◽  
André Streek

Abstract In laser processing, the possible throughput is directly scaling with the available average laser power. To avoid unwanted thermal damage due to high pulse energy or heat accumulation during MHz-repetition rates, energy distribution over the workpiece is required. Polygon mirror scanners enable high deflection speeds and thus, a proper energy distribution within a short processing time. The requirements of laser micro processing with up to 10 kW average laser powers and high scan speeds up to 1000 m/s result in a 30 mm aperture two-dimensional polygon mirror scanner with a patented low-distortion mirror configuration. In combination with a field programmable gate array-based real-time logic, position-true high-accuracy laser switching is enabled for 2D, 2.5D, or 3D laser processing capable to drill holes in multi-pass ablation or engraving. A special developed real-time shifter module within the high-speed logic allows, in combination with external axis, the material processing on the fly and hence, processing of workpieces much larger than the scan field.


Sign in / Sign up

Export Citation Format

Share Document