High Level Design of a Flexible PCA Hardware Accelerator Using a New Block-Streaming Method

Mohammad Amir Mansoori; Mario R. Casu

doi:10.3390/electronics9030449

High Level Design of a Flexible PCA Hardware Accelerator Using a New Block-Streaming Method

Electronics ◽

10.3390/electronics9030449 ◽

2020 ◽

Vol 9 (3) ◽

pp. 449

Author(s):

Mohammad Amir Mansoori ◽

Mario R. Casu

Keyword(s):

High Performance ◽

Principal Component ◽

Hardware Acceleration ◽

Design Flow ◽

Hardware Accelerator ◽

Field Programmable ◽

Point Solution ◽

Active Research ◽

High Level ◽

Many Core

Principal Component Analysis (PCA) is a technique for dimensionality reduction that is useful in removing redundant information in data for various applications such as Microwave Imaging (MI) and Hyperspectral Imaging (HI). The computational complexity of PCA has made the hardware acceleration of PCA an active research topic in recent years. Although the hardware design flow can be optimized using High Level Synthesis (HLS) tools, efficient high-performance solutions for complex embedded systems still require careful design. In this paper we propose a flexible PCA hardware accelerator in Field-Programmable Gate Arrays (FPGA) that we designed entirely in HLS. In order to make the internal PCA computations more efficient, a new block-streaming method is also introduced. Several HLS optimization strategies are adopted to create an efficient hardware. The flexibility of our design allows us to use it for different FPGA targets, with flexible input data dimensions, and it also lets us easily switch from a more accurate floating-point implementation to a higher speed fixed-point solution. The results show the efficiency of our design compared to state-of-the-art implementations on GPUs, many-core CPUs, and other FPGA approaches in terms of resource usage, execution time and power consumption.

Download Full-text

Field Programmable Photonic Gate Arrays

Programmable Integrated Photonics ◽

10.1093/oso/9780198844402.003.0009 ◽

2020 ◽

pp. 301-330

Author(s):

José Capmany ◽

Daniel Pérez

Keyword(s):

Building Blocks ◽

Design Flow ◽

Power Splitting ◽

Gate Arrays ◽

Field Programmable ◽

Reconfigurable Processing ◽

Gate Array ◽

Photonic Interconnects ◽

High Level ◽

Electronic Field

The field programmable photonic gate array (FPPGA) is an integrated photonic device/subsystem that operates similarly to a field programmable gate array in electronics. It is a set of programmable photonics analogue blocks (PPABs) and of reconfigurable photonic interconnects (RPIs) implemented over a photonic chip. The PPABs provide the building blocks for implementing basic optical analogue operations (reconfigurable/independent power splitting and phase shifting). Broadly they enable reconfigurable processing just like configurable logic elements (CLE) or programmable logic blocks (PLBs) carry digital operations in electronic FPGAs or configurable analogue blocks (CABs) carry analogue operations in electronic field programmable analogue arrays (FPAAs). Reconfigurable interconnections between PPABs are provided by the RPIs. This chapter presents basic principles of integrated FPPGAs. It describes their main building blocks and discusses alternatives for their high-level layouts, design flow, technology mapping and physical implementation. Finally, it shows that waveguide meshes lead naturally to a compact solution.

Download Full-text

Chemical Fingerprint of ‘Oblačinska’ Sour Cherry (Prunus cerasus L.) Pollen

Biomolecules ◽

10.3390/biom9090391 ◽

2019 ◽

Vol 9 (9) ◽

pp. 391 ◽

Cited By ~ 1

Author(s):

Fotirić Akšić ◽

Gašić ◽

Dabić Zagorac ◽

Sredojević ◽

Tosti ◽

...

Keyword(s):

High Performance ◽

Amperometric Detection ◽

Principal Component ◽

Sour Cherry ◽

Anion Exchange Chromatography ◽

Exchange Chromatography ◽

Array Detector ◽

Coumaric Acid ◽

Chemical Fingerprint ◽

High Level

The aim of this research was to analyze sugars and phenolics of pollen obtained from 15 different ‘Oblačinska’ sour cherry clones and to assess the chemical fingerprint of this cultivar. Carbohydrate analysis was done using high-performance anion-exchange chromatography (HPAEC) with pulsed amperometric detection (PAD), while polyphenols were analyzed by ultra-high-performance liquid chromatography–diode array detector–tandem mass spectrometry (UHPLC-DAD MS/MS) system. Glucose was the most abundant sugar, followed by fructose and sucrose. Some samples had high level of stress sugars, especially trehalose. Rutin was predominantly polyphenol in a quantity up to 181.12 mg/kg (clone III/9), with chlorogenic acid (up to 59.93 mg/kg in clone III/9) and p-coumaric acid (up to 53.99 mg/kg in clone VIII/1) coming after. According to the principal component analysis (PCA), fructose, maltose, maltotriose, sorbitol, and trehalose were the most important sugars in separating pollen samples. PCA showed splitting off clones VIII/1, IV/8, III/9, and V/P according to the quantity of phenolics and dissimilar profiles. Large differences in chemical composition of studied ‘Oblačinska sour cherry’ clone pollen were shown, proving that it is not a cultivar, but population. Finally, due to the highest level of phenolics, clones IV/8, XV/3, and VIII/1 could be singled out as a promising one for producing functional food and/or in medicinal treatments.

Download Full-text

Design of Distributed Reconfigurable Robotics Systems with ReconROS

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3494571 ◽

2022 ◽

Vol 15 (3) ◽

pp. 1-20

Author(s):

Christian Lienen ◽

Marco Platzner

Keyword(s):

Operating System ◽

Energy Efficiency ◽

High Performance ◽

Hardware Acceleration ◽

Design Flow ◽

Programming Models ◽

Unique Combination ◽

Reconfigurable Computers ◽

Multithreaded Programming ◽

Robot Operating System

Robotics applications process large amounts of data in real time and require compute platforms that provide high performance and energy efficiency. FPGAs are well suited for many of these applications, but there is a reluctance in the robotics community to use hardware acceleration due to increased design complexity and a lack of consistent programming models across the software/hardware boundary. In this article, we present ReconROS , a framework that integrates the widely used robot operating system (ROS) with ReconOS, which features multithreaded programming of hardware and software threads for reconfigurable computers. This unique combination gives ROS 2 developers the flexibility to transparently accelerate parts of their robotics applications in hardware. We elaborate on the architecture and the design flow for ReconROS and report on a set of experiments that underline the feasibility and flexibility of our approach.

Download Full-text

Volatile Characterization of Major Apricot Cultivars of Southern Xinjiang Region of China

Journal of the American Society for Horticultural Science ◽

10.21273/jashs.140.5.466 ◽

2015 ◽

Vol 140 (5) ◽

pp. 466-471 ◽

Cited By ~ 3

Author(s):

Jian-rong Feng ◽

Wan-peng Xi ◽

Wen-hui Li ◽

Hai-nan Liu ◽

Xiao-fang Liu ◽

...

Keyword(s):

High Performance ◽

Solid Phase ◽

Principal Component ◽

Prunus Armeniaca ◽

Least Squares Regression ◽

Aroma Characteristics ◽

Pca Method ◽

Gas Chromatography Mass Spectroscopy ◽

High Level

The characterization of aroma of the 14 main apricot (Prunus armeniaca L.) cultivars in Xinjiang was evaluated using high-performance solid-phase microextraction (HP-SPME) with gas chromatography-mass spectroscopy (GC-MS). A total of 208 volatiles that include 80 esters, 25 aldehydes, 15 terpenes, 21 ketones, 39 alcohols, 27 olefins, and 1 acid were identified from these cultivars. The compounds propyl acetate, 3-methyl-1-butanol acetate, (Z)-3-hexen-1-ol acetate, d-limonene, β-linalool, hexanal, hexyl acetate, butyl acetate, β-myrcene, ethyl butanoate, and β-cis-ocimene were the major compounds responsible for aroma in these cultivars. GC-MS results showed that Kuchexiaobaixing, Guoxiyuluke, and seven other cultivars were characterized by a high level of esters and were considered to be fruity apricot aroma. ‘Luotuohuang’ and ‘Heiyexing’ accumulate high levels of terpenes and exhibited an outstanding floral aroma. Higher levels of alcohols and aldehydes were observed in ‘Danxing’, ‘Sumaiti’, and ‘Kumaiti’. The latter are considered green aroma cultivars. These three types of cultivars with different aroma characteristics can be significantly differentiated by using the principal component analysis (PCA) method. The contributions of volatiles to the apricot aroma were assessed by using the partial least squares regression (PLSR) model. Esters, terpenes, and C6 components were shown to be responsible for the fruity, floral, and green character of fresh apricots, respectively.

Download Full-text

High Performance Reconfigurable Hardware Acceleration on Neutron Transport Computation Based on AGENT Methodology

18th International Conference on Nuclear Engineering: Volume 2 ◽

10.1115/icone18-29614 ◽

2010 ◽

Cited By ~ 3

Author(s):

Shanjie Xiao ◽

Tatjana Jevremovic

Keyword(s):

High Performance ◽

Neutron Transport ◽

Three Dimensional ◽

Hardware Acceleration ◽

Reconfigurable Hardware ◽

Reactor Physics ◽

Von Neumann ◽

Acceleration Techniques ◽

High Efficient ◽

Field Programmable

A high performance hardware acceleration coprocessor built on field programmable arrays (FPGAs) is designed to accelerate neutron transport computation for three dimensional whole reactor cores. The acceleration coprocessor is designed based on the reconfigurable computation techniques and adopts the dataflow-driven non von Neumann architecture for high efficient parallel computation. The hardware acceleration coprocessor supports much more intensive available computation power compare with the same-era CPUs, and is compatible with existing software acceleration methods. It reaches about 20 times speed up in simulation validations. It is the first time that the reconfigurable hardware acceleration techniques are used to improve the computational efficiency of the reactor physics and neutron transport simulations.

Download Full-text

A Reconfigurable System Approach to the Direct Kinematics of a 5D.o.fRobotic Manipulator

International Journal of Reconfigurable Computing ◽

10.1155/2010/727909 ◽

2010 ◽

Vol 2010 ◽

pp. 1-10 ◽

Cited By ~ 6

Author(s):

Diego F. Sánchez ◽

Daniel M. Muñoz ◽

Carlos H. Llanos ◽

José M. Motta

Keyword(s):

High Performance ◽

Degrees Of Freedom ◽

Dynamic Range ◽

Hardware Acceleration ◽

Hardware Architecture ◽

Floating Point ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Transcendental Functions ◽

Direct Kinematics

Hardware acceleration in high performance computer systems has a particular interest for many engineering and scientific applications in which a large number of arithmetic operations and transcendental functions must be computed. In this paper a hardware architecture for computing direct kinematics of robot manipulators with 5 degrees of freedom (5D.o.f) using floating-point arithmetic is presented for 32, 43, and 64 bit-width representations and it is implemented in Field Programmable Gate Arrays (FPGAs). The proposed architecture has been developed using several floating-point libraries for arithmetic and transcendental functions operators, allowing the designer to select (pre-synthesis) a suitable bit-width representation according to the accuracy and dynamic range, as well as the area, elapsed time and power consumption requirements of the application. Synthesis results demonstrate the effectiveness and high performance of the implemented cores on commercial FPGAs. Simulation results have been addressed in order to compute the Mean Square Error (MSE), using the Matlab as statistical estimator, validating the correct behavior of the implemented cores. Additionally, the processing time of the hardware architecture was compared with the same formulation implemented in software, using the PowerPC (FPGA embedded processor), demonstrating that the hardware architecture speeds-up by factor of 1298 the software implementation.

Download Full-text

Hardware Acceleration of High-Performance Computational Flow Dynamics Using High-Bandwidth Memory-Enabled Field-Programmable Gate Arrays

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3476229 ◽

2022 ◽

Vol 15 (2) ◽

pp. 1-35

Author(s):

Tom Hogervorst ◽

Răzvan Nane ◽

Giacomo Marchiori ◽

Tong Dong Qiu ◽

Markus Blatt ◽

...

Keyword(s):

High Performance ◽

Scientific Computing ◽

Hardware Acceleration ◽

Field Programmable Gate Arrays ◽

Gate Arrays ◽

Computational Flow Dynamics ◽

Field Programmable ◽

Programmable Gate Arrays ◽

High Bandwidth ◽

Reservoir Simulator

Scientific computing is at the core of many High-Performance Computing applications, including computational flow dynamics. Because of the utmost importance to simulate increasingly larger computational models, hardware acceleration is receiving increased attention due to its potential to maximize the performance of scientific computing. Field-Programmable Gate Arrays could accelerate scientific computing because of the possibility to fully customize the memory hierarchy important in irregular applications such as iterative linear solvers. In this article, we study the potential of using Field-Programmable Gate Arrays in High-Performance Computing because of the rapid advances in reconfigurable hardware, such as the increase in on-chip memory size, increasing number of logic cells, and the integration of High-Bandwidth Memories on board. To perform this study, we propose a novel Sparse Matrix-Vector multiplication unit and an ILU0 preconditioner tightly integrated with a BiCGStab solver kernel. We integrate the developed preconditioned iterative solver in Flow from the Open Porous Media project, a state-of-the-art open source reservoir simulator. Finally, we perform a thorough evaluation of the FPGA solver kernel in both stand-alone mode and integrated in the reservoir simulator, using the NORNE field, a real-world case reservoir model using a grid with more than 10 5 cells and using three unknowns per cell.

Download Full-text

Simultaneous Determination of 9 Main Components of Lonicera japonica Thunb. by UPLC-MS/MS and Analysed Combine With Chemometrics

Natural Product Communications ◽

10.1177/1934578x20953272 ◽

2020 ◽

Vol 15 (9) ◽

pp. 1934578X2095327

Author(s):

Songtao Liu ◽

Lin Yang ◽

Song Wang ◽

Junying Pan

Keyword(s):

High Performance ◽

Principal Component ◽

Hierarchical Cluster ◽

Negative Ion ◽

Lonicera Japonica ◽

Application Research ◽

Active Components ◽

Main Components ◽

High Level

The purpose of this article is to establish a method to use ultra-high performance liquid chromatography (UPLC)-mass spectrometry (MS)/MS to simultaneously determine 9 main components of Lonicera japonica Thunb. in negative-ion scanning mode, and the main components were analyzed by chemometrics. The chromatographic separation uses the Thermo Hypersil GOLD column (100 mm × 2.1 mm, 1.9 µm) with a constant temperature of 45 °C. The mobile phase consists of methanol and water containing 0.2% formic acid. The results show that 9 compounds had a good linear relationship ( R² > 0.9991), and both intraday and interday precisions and stability have the eligible ranges of relative SDs (RSDs; 0.96%-2.26%, 0.52%-3.04%, and 0.85%-2.15%, respectively). The recovery rates were between 75.90% and 110.58%. The results of chemometrics including hierarchical cluster analysis and principal component analysis showed that there were obvious differences in the content of active components in L. japonica from different regions, and the compounds with the highest contribution to the drug were identified. Through the UPLC-MS/MS combined chemometrics analysis of L. japonica, this experiment can provide a reference for further research on the modernization and innovation of L. japonica and the application research of a high level and multidirection.

Download Full-text

FPGA-Based Convolutional Neural Network Accelerator with Resource-Optimized Approximate Multiply-Accumulate Unit

Electronics ◽

10.3390/electronics10222859 ◽

2021 ◽

Vol 10 (22) ◽

pp. 2859

Author(s):

Mannhee Cho ◽

Youngmin Kim

Keyword(s):

Fixed Point ◽

High Performance ◽

Rapid Development ◽

Digital Signal ◽

Data Type ◽

Data Types ◽

Precision Data ◽

Field Programmable ◽

Point Data ◽

High Level

Convolutional neural networks (CNNs) are widely used in modern applications for their versatility and high classification accuracy. Field-programmable gate arrays (FPGAs) are considered to be suitable platforms for CNNs based on their high performance, rapid development, and reconfigurability. Although many studies have proposed methods for implementing high-performance CNN accelerators on FPGAs using optimized data types and algorithm transformations, accelerators can be optimized further by investigating more efficient uses of FPGA resources. In this paper, we propose an FPGA-based CNN accelerator using multiple approximate accumulation units based on a fixed-point data type. We implemented the LeNet-5 CNN architecture, which performs classification of handwritten digits using the MNIST handwritten digit dataset. The proposed accelerator was implemented, using a high-level synthesis tool on a Xilinx FPGA. The proposed accelerator applies an optimized fixed-point data type and loop parallelization to improve performance. Approximate operation units are implemented using FPGA logic resources instead of high-precision digital signal processing (DSP) blocks, which are inefficient for low-precision data. Our accelerator model achieves 66% less memory usage and approximately 50% reduced network latency, compared to a floating point design and its resource utilization is optimized to use 78% fewer DSP blocks, compared to general fixed-point designs.

Download Full-text

FPGA Acceleration of CNNs-Based Malware Traffic Classification

Electronics ◽

10.3390/electronics9101631 ◽

2020 ◽

Vol 9 (10) ◽

pp. 1631

Author(s):

Lin Zhang ◽

Bing Li ◽

Yong Liu ◽

Xia Zhao ◽

Yazhou Wang ◽

...

Keyword(s):

High Performance ◽

Rapid Development ◽

Classification Model ◽

Software Framework ◽

Traffic Classification ◽

Hardware Accelerator ◽

Typical Application ◽

Test Dataset ◽

Field Programmable ◽

Automatic Software

With the rapid development of the Internet, malware traffic is seriously endangering the security of cyberspace. Convolutional neural networks (CNNs)-based malware traffic classification can automatically learn features from raw traffic, avoiding the inaccuracy of hand-design traffic features. Through the experiments and comparisons of LeNet, AlexNet, VGGNet, and ResNet, it is found that LeNet has good and stable classification ability for malware traffic and normal traffic. Then, a field programmable gate array (FPGA) accelerator for CNNs-based malware traffic classification is designed, which consists of a parameterized hardware accelerator and a fully automatic software framework. By fully exploring the parallelism between CNN layers, parallel computation and pipeline optimization are used in the hardware design to achieve high performance. Simultaneously, runtime reconfigurability is implemented by using a global register list. By encapsulating the underlying driver, a three-layer software framework is provided for users to deploy their pre-trained models. Finally, a typical CNNs-based malware traffic classification model was selected to test and verify the hardware accelerator. The typical application system can classify each traffic image from the test dataset in 18.97 μs with the accuracy of 99.77%, and the throughput of the system is 411.83 Mbps.

Download Full-text