A novel architecture for a high-performance network processing unit: Flexibility at multiple levels of abstraction

2009 ◽  
Author(s):  
Simon Hauger

2007 ◽  
Author(s):  
Amy Perfors ◽  
Charles Kemp ◽  
Elizabeth Wonnacott ◽  
Joshua B. Tenenbaum


Author(s):  
Hiroshi Yamamoto ◽  
Yasufumi Nagai ◽  
Shinichi Kimura ◽  
Hiroshi Takahashi ◽  
Satoko Mizumoto ◽  
...  


2014 ◽  
Vol 22 (1) ◽  
pp. 159-188 ◽  
Author(s):  
Mikdam Turkey ◽  
Riccardo Poli

Several previous studies have focused on modelling and analysing the collective dynamic behaviour of population-based algorithms. However, an empirical approach for identifying and characterising such a behaviour is surprisingly lacking. In this paper, we present a new model to capture this collective behaviour, and to extract and quantify features associated with it. The proposed model studies the topological distribution of an algorithm's activity from both a genotypic and a phenotypic perspective, and represents population dynamics using multiple levels of abstraction. The model can have different instantiations. Here it has been implemented using a modified version of self-organising maps. These are used to represent and track the population motion in the fitness landscape as the algorithm operates on solving a problem. Based on this model, we developed a set of features that characterise the population's collective dynamic behaviour. By analysing them and revealing their dependency on fitness distributions, we were then able to define an indicator of the exploitation behaviour of an algorithm. This is an entropy-based measure that assesses the dependency on fitness distributions of different features of population dynamics. To test the proposed measures, evolutionary algorithms with different crossover operators, selection pressure levels and population handling techniques have been examined, which lead populations to exhibit a wide range of exploitation-exploration behaviours.





Author(s):  
Alan Gray ◽  
Kevin Stratford

Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus separate lattice quantum chromodynamics particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with Message Passing Interface (MPI) to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and graphics processing unit-accelerated large scale supercomputers.



Sensors ◽  
2021 ◽  
Vol 21 (2) ◽  
pp. 596
Author(s):  
Marco Buzzelli ◽  
Luca Segantin

We address the task of classifying car images at multiple levels of detail, ranging from the top-level car type, down to the specific car make, model, and year. We analyze existing datasets for car classification, and identify the CompCars as an excellent starting point for our task. We show that convolutional neural networks achieve an accuracy above 90% on the finest-level classification task. This high performance, however, is scarcely representative of real-world situations, as it is evaluated on a biased training/test split. In this work, we revisit the CompCars dataset by first defining a new training/test split, which better represents real-world scenarios by setting a more realistic baseline at 61% accuracy on the new test set. We also propagate the existing (but limited) type-level annotation to the entire dataset, and we finally provide a car-tight bounding box for each image, automatically defined through an ad hoc car detector. To evaluate this revisited dataset, we design and implement three different approaches to car classification, two of which exploit the hierarchical nature of car annotations. Our experiments show that higher-level classification in terms of car type positively impacts classification at a finer grain, now reaching 70% accuracy. The achieved performance constitutes a baseline benchmark for future research, and our enriched set of annotations is made available for public download.



Electronics ◽  
2021 ◽  
Vol 10 (8) ◽  
pp. 884
Author(s):  
Stefano Rossi ◽  
Enrico Boni

Methods of increasing complexity are currently being proposed for ultrasound (US) echographic signal processing. Graphics Processing Unit (GPU) resources allowing massive exploitation of parallel computing are ideal candidates for these tasks. Many high-performance US instruments, including open scanners like ULA-OP 256, have an architecture based only on Field-Programmable Gate Arrays (FPGAs) and/or Digital Signal Processors (DSPs). This paper proposes the implementation of the embedded NVIDIA Jetson Xavier AGX module on board ULA-OP 256. The system architecture was revised to allow the introduction of a new Peripheral Component Interconnect Express (PCIe) communication channel, while maintaining backward compatibility with all other embedded computing resources already on board. Moreover, the Input/Output (I/O) peripherals of the module make the ultrasound system independent, freeing the user from the need to use an external controlling PC.



2021 ◽  
Author(s):  
Fei Yu ◽  
Zinan Zhang ◽  
Hui Shen ◽  
Yuanyuan Huang ◽  
Shuo Cai ◽  
...  

Abstract In this paper, a memristive Hopfield neural network with a special activation gradient (MHNN) is proposed by adding a suitable memristor to the Hopfield neural network (HNN) with a special activation gradient. The MHNN is simulated and dynamic analyzed, and implemented on FPGA. Then, a new pseudo-random number generator (PRNG) based on MHNN is proposed. The post-processing unit of the PRNG is composed of nonlinear post-processor and XOR calculator, which effectively ensures the randomness of PRNG. The experiments in this paper comply with the IEEE 754-1985 high precision 32-bit floating point standard and are done on the Vivado design tool using a Xilinx XC7Z020CLG400-2 FPGA chip and the Verilog-HDL hardware programming language. The random sequence generated by the PRNG proposed in this paper has passed the NIST SP800-22 test suite and security analysis, proving its randomness and high performance. Finally, an image encryption system based on PRNG is proposed and implemented on FPGA, which proves the value of the image encryption system in the field of data encryption connected to the Internet of Things (IoT).



Sign in / Sign up

Export Citation Format

Share Document