FPGA Acceleration of CNNs-Based Malware Traffic Classification

Lin Zhang; Bing Li; Yong Liu; Xia Zhao; Yazhou Wang; Jiaxin Wu

doi:10.3390/electronics9101631

FPGA Acceleration of CNNs-Based Malware Traffic Classification

Electronics ◽

10.3390/electronics9101631 ◽

2020 ◽

Vol 9 (10) ◽

pp. 1631

Author(s):

Lin Zhang ◽

Bing Li ◽

Yong Liu ◽

Xia Zhao ◽

Yazhou Wang ◽

...

Keyword(s):

High Performance ◽

Rapid Development ◽

Classification Model ◽

Software Framework ◽

Traffic Classification ◽

Hardware Accelerator ◽

Typical Application ◽

Test Dataset ◽

Field Programmable ◽

Automatic Software

With the rapid development of the Internet, malware traffic is seriously endangering the security of cyberspace. Convolutional neural networks (CNNs)-based malware traffic classification can automatically learn features from raw traffic, avoiding the inaccuracy of hand-design traffic features. Through the experiments and comparisons of LeNet, AlexNet, VGGNet, and ResNet, it is found that LeNet has good and stable classification ability for malware traffic and normal traffic. Then, a field programmable gate array (FPGA) accelerator for CNNs-based malware traffic classification is designed, which consists of a parameterized hardware accelerator and a fully automatic software framework. By fully exploring the parallelism between CNN layers, parallel computation and pipeline optimization are used in the hardware design to achieve high performance. Simultaneously, runtime reconfigurability is implemented by using a global register list. By encapsulating the underlying driver, a three-layer software framework is provided for users to deploy their pre-trained models. Finally, a typical CNNs-based malware traffic classification model was selected to test and verify the hardware accelerator. The typical application system can classify each traffic image from the test dataset in 18.97 μs with the accuracy of 99.77%, and the throughput of the system is 411.83 Mbps.

Download Full-text

High Level Design of a Flexible PCA Hardware Accelerator Using a New Block-Streaming Method

Electronics ◽

10.3390/electronics9030449 ◽

2020 ◽

Vol 9 (3) ◽

pp. 449

Author(s):

Mohammad Amir Mansoori ◽

Mario R. Casu

Keyword(s):

High Performance ◽

Principal Component ◽

Hardware Acceleration ◽

Design Flow ◽

Hardware Accelerator ◽

Field Programmable ◽

Point Solution ◽

Active Research ◽

High Level ◽

Many Core

Principal Component Analysis (PCA) is a technique for dimensionality reduction that is useful in removing redundant information in data for various applications such as Microwave Imaging (MI) and Hyperspectral Imaging (HI). The computational complexity of PCA has made the hardware acceleration of PCA an active research topic in recent years. Although the hardware design flow can be optimized using High Level Synthesis (HLS) tools, efficient high-performance solutions for complex embedded systems still require careful design. In this paper we propose a flexible PCA hardware accelerator in Field-Programmable Gate Arrays (FPGA) that we designed entirely in HLS. In order to make the internal PCA computations more efficient, a new block-streaming method is also introduced. Several HLS optimization strategies are adopted to create an efficient hardware. The flexibility of our design allows us to use it for different FPGA targets, with flexible input data dimensions, and it also lets us easily switch from a more accurate floating-point implementation to a higher speed fixed-point solution. The results show the efficiency of our design compared to state-of-the-art implementations on GPUs, many-core CPUs, and other FPGA approaches in terms of resource usage, execution time and power consumption.

Download Full-text

FPGA-Based Convolutional Neural Network Accelerator with Resource-Optimized Approximate Multiply-Accumulate Unit

Electronics ◽

10.3390/electronics10222859 ◽

2021 ◽

Vol 10 (22) ◽

pp. 2859

Author(s):

Mannhee Cho ◽

Youngmin Kim

Keyword(s):

Fixed Point ◽

High Performance ◽

Rapid Development ◽

Digital Signal ◽

Data Type ◽

Data Types ◽

Precision Data ◽

Field Programmable ◽

Point Data ◽

High Level

Convolutional neural networks (CNNs) are widely used in modern applications for their versatility and high classification accuracy. Field-programmable gate arrays (FPGAs) are considered to be suitable platforms for CNNs based on their high performance, rapid development, and reconfigurability. Although many studies have proposed methods for implementing high-performance CNN accelerators on FPGAs using optimized data types and algorithm transformations, accelerators can be optimized further by investigating more efficient uses of FPGA resources. In this paper, we propose an FPGA-based CNN accelerator using multiple approximate accumulation units based on a fixed-point data type. We implemented the LeNet-5 CNN architecture, which performs classification of handwritten digits using the MNIST handwritten digit dataset. The proposed accelerator was implemented, using a high-level synthesis tool on a Xilinx FPGA. The proposed accelerator applies an optimized fixed-point data type and loop parallelization to improve performance. Approximate operation units are implemented using FPGA logic resources instead of high-precision digital signal processing (DSP) blocks, which are inefficient for low-precision data. Our accelerator model achieves 66% less memory usage and approximately 50% reduced network latency, compared to a floating point design and its resource utilization is optimized to use 78% fewer DSP blocks, compared to general fixed-point designs.

Download Full-text

Surface/interface nanoengineering for rechargeable Zn–air batteries

Energy & Environmental Science ◽

10.1039/c9ee03634b ◽

2020 ◽

Vol 13 (4) ◽

pp. 1132-1153 ◽

Cited By ~ 29

Author(s):

Tianpei Zhou ◽

Nan Zhang ◽

Changzheng Wu ◽

Yi Xie

Keyword(s):

High Performance ◽

Rapid Development

Surface/interface nanoengineering of electrocatalysts and air electrodes will promote the rapid development of high-performance rechargeable Zn–air batteries.

Download Full-text

High Performance Thin Layer Chromatography (HPTLC) for the Investigation of Medicinal Plants

Current Analytical Chemistry ◽

10.2174/1573411016999200602124813 ◽

2020 ◽

Vol 16 ◽

Author(s):

Alper Gökbulut

Keyword(s):

Medicinal Plants ◽

High Performance ◽

Separation Efficiency ◽

Rapid Development ◽

Herbal Remedies ◽

Qualitative And Quantitative ◽

Powerful Analytical Tool ◽

Quantitative Examination ◽

Data Acquisition And Processing ◽

Layer Chromatography

Background: Chromatographic techniques such as TLC basically and, HPLC, GC, HPTLC equipped with various detectors are most frequently used for the qualitative and quantitative examination of herbals. Method: An overview of the recent literature concerning the usage of HPTLC for the analysis of medicinal plants has been reviewed. Results: During the last decade/s, HPTLC, a modern, sophisticated and automatized TLC technique with better and advanced separation efficiency, detection limit, data acquisition and processing, has been used for the analysis of herbal materials and preparations since the rapid development of technology in chromatography world. HPTLC with various detectors is a powerful analytical tool especially for the phytochemical applications such as herbal drug quantification and fingerprint analysis. Conclusion: In this review, a latest perspective has been established and some of the previous studies were summarized for the usage of HPTLC in the analysis of herbal remedies, dietary supplements and nutraceuticals.

Download Full-text

Overview of typical application energy efficiency optimization in high-performance data centers

2021 IEEE International Conference on Power Electronics, Computer Applications (ICPECA) ◽

10.1109/icpeca51329.2021.9362524 ◽

2021 ◽

Author(s):

Weidong Wu ◽

Haiyang Chen ◽

Kuanhong Li ◽

Jun Yu

Keyword(s):

Energy Efficiency ◽

High Performance ◽

Data Centers ◽

Performance Data ◽

Typical Application ◽

Efficiency Optimization

Download Full-text

A RF Redundant TSV Interconnection for High Resistance Si Interposer

Micromachines ◽

10.3390/mi12020169 ◽

2021 ◽

Vol 12 (2) ◽

pp. 169

Author(s):

Mengcheng Wang ◽

Shenglin Ma ◽

Yufeng Jin ◽

Wei Wang ◽

Jing Chen ◽

...

Keyword(s):

Millimeter Wave ◽

High Frequency ◽

High Performance ◽

Rapid Development ◽

Coplanar Waveguide ◽

Equivalent Circuit Model ◽

High Resistivity ◽

Performance Requirements ◽

Wave Radar ◽

Interconnect Design

Through Silicon Via (TSV) technology is capable meeting effective, compact, high density, high integration, and high-performance requirements. In high-frequency applications, with the rapid development of 5G and millimeter-wave radar, the TSV interposer will become a competitive choice for radio frequency system-in-package (RF SIP) substrates. This paper presents a redundant TSV interconnect design for high resistivity Si interposers for millimeter-wave applications. To verify its feasibility, a set of test structures capable of working at millimeter waves are designed, which are composed of three pieces of CPW (coplanar waveguide) lines connected by single TSV, dual redundant TSV, and quad redundant TSV interconnects. First, HFSS software is used for modeling and simulation, then, a modified equivalent circuit model is established to analysis the effect of the redundant TSVs on the high-frequency transmission performance to solidify the HFSS based simulation. At the same time, a failure simulation was carried out and results prove that redundant TSV can still work normally at 44 GHz frequency when failure occurs. Using the developed TSV process, the sample is then fabricated and tested. Using L-2L de-embedding method to extract S-parameters of the TSV interconnection. The insertion loss of dual and quad redundant TSVs are 0.19 dB and 0.46 dB at 40 GHz, respectively.

Download Full-text

Hardware Accelerator Integration Tradeoffs for High-Performance Computing: A Case Study of GEMM Acceleration in N-Body Methods

IEEE Transactions on Parallel and Distributed Systems ◽

10.1109/tpds.2021.3056045 ◽

2021 ◽

Vol 32 (8) ◽

pp. 2035-2048

Author(s):

Mochamad Asri ◽

Dhairya Malhotra ◽

Jiajun Wang ◽

George Biros ◽

Lizy K. John ◽

...

Keyword(s):

High Performance Computing ◽

High Performance ◽

Hardware Accelerator ◽

Performance Computing

Download Full-text

Investigating Lignin-Derived Monomers and Oligomers in Low-Molecular-Weight Fractions Separated from Depolymerized Black Liquor Retentate by Membrane Filtration

Molecules ◽

10.3390/molecules26102887 ◽

2021 ◽

Vol 26 (10) ◽

pp. 2887

Author(s):

Kena Li ◽

Jens Prothmann ◽

Margareta Sandahl ◽

Sara Blomberg ◽

Charlotta Turner ◽

...

Keyword(s):

Molecular Weight ◽

High Performance ◽

Membrane Filtration ◽

Black Liquor ◽

Complex Mixture ◽

Value Added ◽

Kraft Pulping ◽

Classification Model ◽

Low Molecular Weight ◽

Chemical Complexity

Base-catalyzed depolymerization of black liquor retentate (BLR) from the kraft pulping process, followed by ultrafiltration, has been suggested as a means of obtaining low-molecular-weight (LMW) compounds. The chemical complexity of BLR, which consists of a mixture of softwood and hardwood lignin that has undergone several kinds of treatment, leads to a complex mixture of LMW compounds, making the separation of components for the formation of value-added chemicals more difficult. Identifying the phenolic compounds in the LMW fractions obtained under different depolymerization conditions is essential for the upgrading process. In this study, a state-of-the-art nontargeted analysis method using ultra-high-performance supercritical fluid chromatography coupled to high-resolution multiple-stage tandem mass spectrometry (UHPSFC/HRMSn) combined with a Kendrick mass defect-based classification model was applied to analyze the monomers and oligomers in the LMW fractions separated from BLR samples depolymerized at 170–210 °C. The most common phenolic compound types were dimers, followed by monomers. A second round of depolymerization yielded low amounts of monomers and dimers, while a high number of trimers were formed, thought to be the result of repolymerization.

Download Full-text

A DECISION TREE-BASED CLASSIFICATION APPROACH TO RULE EXTRACTION FOR SECURITY ANALYSIS

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622006001824 ◽

2006 ◽

Vol 05 (01) ◽

pp. 227-240 ◽

Cited By ~ 15

Author(s):

N. REN ◽

M. ZARGHAM ◽

S. RAHIMI

Keyword(s):

Decision Tree ◽

High Performance ◽

Security Analysis ◽

Predictive Performance ◽

Classification Model ◽

Stock Selection ◽

Selection Rules ◽

Stock Portfolios ◽

Decision Tree Classification ◽

C4.5 Decision Tree

Stock selection rules are extensively utilized as the guideline to construct high performance stock portfolios. However, the predictive performance of the rules developed by some economic experts in the past has decreased dramatically for the current stock market. In this paper, C4.5 decision tree classification method was adopted to construct a model for stock prediction based on the fundamental stock data, from which a set of stock selection rules was derived. The experimental results showed that the generated rules have exceptional predictive performance. Moreover, it also demonstrated that the C4.5 decision tree classification model can work efficiently on the high noise stock data domain.

Download Full-text

High Performance Low Cost Implementation of FPGA-Based Fractional-Order Operators

Volume 6: 5th International Conference on Multibody Systems, Nonlinear Dynamics, and Control, Parts A, B, and C ◽

10.1115/detc2005-84796 ◽

2005 ◽

Cited By ~ 3

Author(s):

Cindy X. Jiang ◽

Tom T. Hartley ◽

Joan E. Carletta

Keyword(s):

Fractional Order ◽

Word Length ◽

High Performance ◽

Low Cost ◽

Careful Consideration ◽

Order System ◽

System Quality ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays

Hardware implementation of fractional-order differentiators and integrators requires careful consideration of issues of system quality, hardware cost, and speed. This paper proposes using field programmable gate arrays (FPGAs) to implement fractional-order systems, and demonstrates the advantages that FPGAs provide. As an illustration, the fundamental operators to a real power is approximated via the binomial expansion of the backward difference. The resulting high-order FIR filter is implemented in a pipelined multiplierless architecture on a low-cost Spartan-3 FPGA. Unlike common digital implementations in which all filter coefficients have the same word length, this approach exploits variable word length for each coefficient. Our system requires twenty percent less hardware than a system of comparable quality generated by Xilinx’s System Generator on its most area-efficient multiplierless setting. The work shows an effective way to implement a high quality, high throughput approximation to a fractional-order system, while maintaining less cost than traditional FPGA-based designs.

Download Full-text