scholarly journals High Level Design of a Flexible PCA Hardware Accelerator Using a New Block-Streaming Method

Electronics ◽  
2020 ◽  
Vol 9 (3) ◽  
pp. 449
Author(s):  
Mohammad Amir Mansoori ◽  
Mario R. Casu

Principal Component Analysis (PCA) is a technique for dimensionality reduction that is useful in removing redundant information in data for various applications such as Microwave Imaging (MI) and Hyperspectral Imaging (HI). The computational complexity of PCA has made the hardware acceleration of PCA an active research topic in recent years. Although the hardware design flow can be optimized using High Level Synthesis (HLS) tools, efficient high-performance solutions for complex embedded systems still require careful design. In this paper we propose a flexible PCA hardware accelerator in Field-Programmable Gate Arrays (FPGA) that we designed entirely in HLS. In order to make the internal PCA computations more efficient, a new block-streaming method is also introduced. Several HLS optimization strategies are adopted to create an efficient hardware. The flexibility of our design allows us to use it for different FPGA targets, with flexible input data dimensions, and it also lets us easily switch from a more accurate floating-point implementation to a higher speed fixed-point solution. The results show the efficiency of our design compared to state-of-the-art implementations on GPUs, many-core CPUs, and other FPGA approaches in terms of resource usage, execution time and power consumption.

Author(s):  
José Capmany ◽  
Daniel Pérez

The field programmable photonic gate array (FPPGA) is an integrated photonic device/subsystem that operates similarly to a field programmable gate array in electronics. It is a set of programmable photonics analogue blocks (PPABs) and of reconfigurable photonic interconnects (RPIs) implemented over a photonic chip. The PPABs provide the building blocks for implementing basic optical analogue operations (reconfigurable/independent power splitting and phase shifting). Broadly they enable reconfigurable processing just like configurable logic elements (CLE) or programmable logic blocks (PLBs) carry digital operations in electronic FPGAs or configurable analogue blocks (CABs) carry analogue operations in electronic field programmable analogue arrays (FPAAs). Reconfigurable interconnections between PPABs are provided by the RPIs. This chapter presents basic principles of integrated FPPGAs. It describes their main building blocks and discusses alternatives for their high-level layouts, design flow, technology mapping and physical implementation. Finally, it shows that waveguide meshes lead naturally to a compact solution.


Biomolecules ◽  
2019 ◽  
Vol 9 (9) ◽  
pp. 391 ◽  
Author(s):  
Fotirić Akšić ◽  
Gašić ◽  
Dabić Zagorac ◽  
Sredojević ◽  
Tosti ◽  
...  

The aim of this research was to analyze sugars and phenolics of pollen obtained from 15 different ‘Oblačinska’ sour cherry clones and to assess the chemical fingerprint of this cultivar. Carbohydrate analysis was done using high-performance anion-exchange chromatography (HPAEC) with pulsed amperometric detection (PAD), while polyphenols were analyzed by ultra-high-performance liquid chromatography–diode array detector–tandem mass spectrometry (UHPLC-DAD MS/MS) system. Glucose was the most abundant sugar, followed by fructose and sucrose. Some samples had high level of stress sugars, especially trehalose. Rutin was predominantly polyphenol in a quantity up to 181.12 mg/kg (clone III/9), with chlorogenic acid (up to 59.93 mg/kg in clone III/9) and p-coumaric acid (up to 53.99 mg/kg in clone VIII/1) coming after. According to the principal component analysis (PCA), fructose, maltose, maltotriose, sorbitol, and trehalose were the most important sugars in separating pollen samples. PCA showed splitting off clones VIII/1, IV/8, III/9, and V/P according to the quantity of phenolics and dissimilar profiles. Large differences in chemical composition of studied ‘Oblačinska sour cherry’ clone pollen were shown, proving that it is not a cultivar, but population. Finally, due to the highest level of phenolics, clones IV/8, XV/3, and VIII/1 could be singled out as a promising one for producing functional food and/or in medicinal treatments.


2022 ◽  
Vol 15 (3) ◽  
pp. 1-20
Author(s):  
Christian Lienen ◽  
Marco Platzner

Robotics applications process large amounts of data in real time and require compute platforms that provide high performance and energy efficiency. FPGAs are well suited for many of these applications, but there is a reluctance in the robotics community to use hardware acceleration due to increased design complexity and a lack of consistent programming models across the software/hardware boundary. In this article, we present ReconROS , a framework that integrates the widely used robot operating system (ROS) with ReconOS, which features multithreaded programming of hardware and software threads for reconfigurable computers. This unique combination gives ROS 2 developers the flexibility to transparently accelerate parts of their robotics applications in hardware. We elaborate on the architecture and the design flow for ReconROS and report on a set of experiments that underline the feasibility and flexibility of our approach.


2015 ◽  
Vol 140 (5) ◽  
pp. 466-471 ◽  
Author(s):  
Jian-rong Feng ◽  
Wan-peng Xi ◽  
Wen-hui Li ◽  
Hai-nan Liu ◽  
Xiao-fang Liu ◽  
...  

The characterization of aroma of the 14 main apricot (Prunus armeniaca L.) cultivars in Xinjiang was evaluated using high-performance solid-phase microextraction (HP-SPME) with gas chromatography-mass spectroscopy (GC-MS). A total of 208 volatiles that include 80 esters, 25 aldehydes, 15 terpenes, 21 ketones, 39 alcohols, 27 olefins, and 1 acid were identified from these cultivars. The compounds propyl acetate, 3-methyl-1-butanol acetate, (Z)-3-hexen-1-ol acetate, d-limonene, β-linalool, hexanal, hexyl acetate, butyl acetate, β-myrcene, ethyl butanoate, and β-cis-ocimene were the major compounds responsible for aroma in these cultivars. GC-MS results showed that Kuchexiaobaixing, Guoxiyuluke, and seven other cultivars were characterized by a high level of esters and were considered to be fruity apricot aroma. ‘Luotuohuang’ and ‘Heiyexing’ accumulate high levels of terpenes and exhibited an outstanding floral aroma. Higher levels of alcohols and aldehydes were observed in ‘Danxing’, ‘Sumaiti’, and ‘Kumaiti’. The latter are considered green aroma cultivars. These three types of cultivars with different aroma characteristics can be significantly differentiated by using the principal component analysis (PCA) method. The contributions of volatiles to the apricot aroma were assessed by using the partial least squares regression (PLSR) model. Esters, terpenes, and C6 components were shown to be responsible for the fruity, floral, and green character of fresh apricots, respectively.


Author(s):  
Shanjie Xiao ◽  
Tatjana Jevremovic

A high performance hardware acceleration coprocessor built on field programmable arrays (FPGAs) is designed to accelerate neutron transport computation for three dimensional whole reactor cores. The acceleration coprocessor is designed based on the reconfigurable computation techniques and adopts the dataflow-driven non von Neumann architecture for high efficient parallel computation. The hardware acceleration coprocessor supports much more intensive available computation power compare with the same-era CPUs, and is compatible with existing software acceleration methods. It reaches about 20 times speed up in simulation validations. It is the first time that the reconfigurable hardware acceleration techniques are used to improve the computational efficiency of the reactor physics and neutron transport simulations.


2010 ◽  
Vol 2010 ◽  
pp. 1-10 ◽  
Author(s):  
Diego F. Sánchez ◽  
Daniel M. Muñoz ◽  
Carlos H. Llanos ◽  
José M. Motta

Hardware acceleration in high performance computer systems has a particular interest for many engineering and scientific applications in which a large number of arithmetic operations and transcendental functions must be computed. In this paper a hardware architecture for computing direct kinematics of robot manipulators with 5 degrees of freedom (5D.o.f) using floating-point arithmetic is presented for 32, 43, and 64 bit-width representations and it is implemented in Field Programmable Gate Arrays (FPGAs). The proposed architecture has been developed using several floating-point libraries for arithmetic and transcendental functions operators, allowing the designer to select (pre-synthesis) a suitable bit-width representation according to the accuracy and dynamic range, as well as the area, elapsed time and power consumption requirements of the application. Synthesis results demonstrate the effectiveness and high performance of the implemented cores on commercial FPGAs. Simulation results have been addressed in order to compute the Mean Square Error (MSE), using the Matlab as statistical estimator, validating the correct behavior of the implemented cores. Additionally, the processing time of the hardware architecture was compared with the same formulation implemented in software, using the PowerPC (FPGA embedded processor), demonstrating that the hardware architecture speeds-up by factor of 1298 the software implementation.


2022 ◽  
Vol 15 (2) ◽  
pp. 1-35
Author(s):  
Tom Hogervorst ◽  
Răzvan Nane ◽  
Giacomo Marchiori ◽  
Tong Dong Qiu ◽  
Markus Blatt ◽  
...  

Scientific computing is at the core of many High-Performance Computing applications, including computational flow dynamics. Because of the utmost importance to simulate increasingly larger computational models, hardware acceleration is receiving increased attention due to its potential to maximize the performance of scientific computing. Field-Programmable Gate Arrays could accelerate scientific computing because of the possibility to fully customize the memory hierarchy important in irregular applications such as iterative linear solvers. In this article, we study the potential of using Field-Programmable Gate Arrays in High-Performance Computing because of the rapid advances in reconfigurable hardware, such as the increase in on-chip memory size, increasing number of logic cells, and the integration of High-Bandwidth Memories on board. To perform this study, we propose a novel Sparse Matrix-Vector multiplication unit and an ILU0 preconditioner tightly integrated with a BiCGStab solver kernel. We integrate the developed preconditioned iterative solver in Flow from the Open Porous Media project, a state-of-the-art open source reservoir simulator. Finally, we perform a thorough evaluation of the FPGA solver kernel in both stand-alone mode and integrated in the reservoir simulator, using the NORNE field, a real-world case reservoir model using a grid with more than 10 5 cells and using three unknowns per cell.


2020 ◽  
Vol 15 (9) ◽  
pp. 1934578X2095327
Author(s):  
Songtao Liu ◽  
Lin Yang ◽  
Song Wang ◽  
Junying Pan

The purpose of this article is to establish a method to use ultra-high performance liquid chromatography (UPLC)-mass spectrometry (MS)/MS to simultaneously determine 9 main components of Lonicera japonica Thunb. in negative-ion scanning mode, and the main components were analyzed by chemometrics. The chromatographic separation uses the Thermo Hypersil GOLD column (100 mm × 2.1 mm, 1.9 µm) with a constant temperature of 45 °C. The mobile phase consists of methanol and water containing 0.2% formic acid. The results show that 9 compounds had a good linear relationship ( R² > 0.9991), and both intraday and interday precisions and stability have the eligible ranges of relative SDs (RSDs; 0.96%-2.26%, 0.52%-3.04%, and 0.85%-2.15%, respectively). The recovery rates were between 75.90% and 110.58%. The results of chemometrics including hierarchical cluster analysis and principal component analysis showed that there were obvious differences in the content of active components in L. japonica from different regions, and the compounds with the highest contribution to the drug were identified. Through the UPLC-MS/MS combined chemometrics analysis of L. japonica, this experiment can provide a reference for further research on the modernization and innovation of L. japonica and the application research of a high level and multidirection.


Electronics ◽  
2021 ◽  
Vol 10 (22) ◽  
pp. 2859
Author(s):  
Mannhee Cho ◽  
Youngmin Kim

Convolutional neural networks (CNNs) are widely used in modern applications for their versatility and high classification accuracy. Field-programmable gate arrays (FPGAs) are considered to be suitable platforms for CNNs based on their high performance, rapid development, and reconfigurability. Although many studies have proposed methods for implementing high-performance CNN accelerators on FPGAs using optimized data types and algorithm transformations, accelerators can be optimized further by investigating more efficient uses of FPGA resources. In this paper, we propose an FPGA-based CNN accelerator using multiple approximate accumulation units based on a fixed-point data type. We implemented the LeNet-5 CNN architecture, which performs classification of handwritten digits using the MNIST handwritten digit dataset. The proposed accelerator was implemented, using a high-level synthesis tool on a Xilinx FPGA. The proposed accelerator applies an optimized fixed-point data type and loop parallelization to improve performance. Approximate operation units are implemented using FPGA logic resources instead of high-precision digital signal processing (DSP) blocks, which are inefficient for low-precision data. Our accelerator model achieves 66% less memory usage and approximately 50% reduced network latency, compared to a floating point design and its resource utilization is optimized to use 78% fewer DSP blocks, compared to general fixed-point designs.


Electronics ◽  
2020 ◽  
Vol 9 (10) ◽  
pp. 1631
Author(s):  
Lin Zhang ◽  
Bing Li ◽  
Yong Liu ◽  
Xia Zhao ◽  
Yazhou Wang ◽  
...  

With the rapid development of the Internet, malware traffic is seriously endangering the security of cyberspace. Convolutional neural networks (CNNs)-based malware traffic classification can automatically learn features from raw traffic, avoiding the inaccuracy of hand-design traffic features. Through the experiments and comparisons of LeNet, AlexNet, VGGNet, and ResNet, it is found that LeNet has good and stable classification ability for malware traffic and normal traffic. Then, a field programmable gate array (FPGA) accelerator for CNNs-based malware traffic classification is designed, which consists of a parameterized hardware accelerator and a fully automatic software framework. By fully exploring the parallelism between CNN layers, parallel computation and pipeline optimization are used in the hardware design to achieve high performance. Simultaneously, runtime reconfigurability is implemented by using a global register list. By encapsulating the underlying driver, a three-layer software framework is provided for users to deploy their pre-trained models. Finally, a typical CNNs-based malware traffic classification model was selected to test and verify the hardware accelerator. The typical application system can classify each traffic image from the test dataset in 18.97 μs with the accuracy of 99.77%, and the throughput of the system is 411.83 Mbps.


Sign in / Sign up

Export Citation Format

Share Document