Survey of the counterflow pipeline processor architectures

AbstractThe join and group-by aggregation are two memory intensive operators that are affecting the performance of relational databases. Hashing is a common approach used to implement both operators. Recent paradigm shifts in multi-core processor architectures have reinvigorated research into how the join and group-by aggregation operators can leverage these advances. However, the poor spatial locality of the hashing approach has hindered performance on multi-core processor architectures which rely on using large cache hierarchies for latency mitigation. Multithreaded architectures can better cope with poor spatial locality by masking memory latency with many outstanding requests. Nevertheless, the number of parallel threads, even in the most advanced multithreaded processors, such as UltraSPARC, is not enough to fully cover the main memory access latency. In this paper, we explore the hardware re-configurability of FPGAs to enable deeper execution pipelines that maintain hundreds (instead of tens) of outstanding memory requests across four FPGAs-drastically increasing concurrency and throughput. We present two end-to-end in-memory accelerators for the join and group-by aggregation operators using FPGAs. Both accelerators use massive multithreading to mask long memory delays of traversing linked-list data structures, while concurrently managing hundreds of thread states across four FPGAs locally. We explore how content addressable memories can be intermixed within our multithreaded designs to act as a synchronizing cache, which enforces locks and merges jobs together before they are written to memory. Throughput results for our hash-join operator accelerator show a speedup between 2$$\times $$ × and 3.4$$\times $$ × over the best multi-core approaches with comparable memory bandwidths on uniform and skewed datasets. The accelerator for the hash-based group-by aggregation operator demonstrates that leveraging CAMs achieves average speedup of 3.3$$\times $$ × with a best case of 9.4$$\times $$ × in terms of throughput over CPU implementations across five types of data distributions.

Download Full-text

Low-Complexity High-Throughput QC-LDPC Decoder for 5G New Radio Wireless Communication

Electronics ◽

10.3390/electronics10040516 ◽

2021 ◽

Vol 10 (4) ◽

pp. 516

Author(s):

Tram Thi Bao Nguyen ◽

Tuy Nguyen Tan ◽

Hanho Lee

Keyword(s):

Wireless Communication ◽

High Throughput ◽

Low Complexity ◽

Ldpc Decoder ◽

Processor Architectures ◽

New Radio ◽

Decoder Architecture ◽

Check Node ◽

Information Update ◽

Wireless Standards

This paper presents a pipelined layered quasi-cyclic low-density parity-check (QC-LDPC) decoder architecture targeting low-complexity, high-throughput, and efficient use of hardware resources compliant with the specifications of 5G new radio (NR) wireless communication standard. First, a combined min-sum (CMS) decoding algorithm, which is a combination of the offset min-sum and the original min-sum algorithm, is proposed. Then, a low-complexity and high-throughput pipelined layered QC-LDPC decoder architecture for enhanced mobile broadband specifications in 5G NR wireless standards based on CMS algorithm with pipeline layered scheduling is presented. Enhanced versions of check node-based processor architectures are proposed to improve the complexity of the LDPC decoders. An efficient minimum-finder for the check node unit architecture that reduces the hardware required for the computation of the first two minima is introduced. Moreover, a low complexity a posteriori information update unit architecture, which only requires one adder array for their operations, is presented. The proposed architecture shows significant improvements in terms of area and throughput compared to other QC-LDPC decoder architectures available in the literature.

Download Full-text

DVFS-Oriented Scenario Applications to Processor Architectures

System-Scenario-based Design Principles and Applications ◽

10.1007/978-3-030-20343-6_4 ◽

2020 ◽

pp. 83-97 ◽

Cited By ~ 1

Author(s):

Nikolaos Zombakis ◽

Yahya H. Yassin ◽

Michail Noltsis ◽

Dimitrios Soudris ◽

Per Gunnar Kjeldsberg ◽

...

Keyword(s):

Processor Architectures

Download Full-text

Introduction to the special section on “Sustainable processor architectures and applications”

Microprocessors and Microsystems ◽

10.1016/j.micpro.2016.07.018 ◽

2016 ◽

Vol 46 ◽

pp. 105-106

Author(s):

Fangyang Shen ◽

Maurizio Palesi ◽

Mei Yang

Keyword(s):

Special Section ◽

Processor Architectures

Download Full-text

Application-Driven Synthesis Methodologies for Real-Time Processor Architectures

The Kluwer International Series in Engineering and Computer Science - Application-Driven Architecture Synthesis ◽

10.1007/978-1-4615-3242-2_1 ◽

1993 ◽

pp. 1-22

Author(s):

Francky Catthoor ◽

Lars Svensson ◽

Klaus Wölcken

Keyword(s):

Real Time ◽

Processor Architectures

Download Full-text

Analysis of Dispatch Sequences on Modern Processor Architectures

Efficient Polymorphic Calls ◽

10.1007/978-1-4615-1681-1_5 ◽

2001 ◽

pp. 55-68

Author(s):

Karel Driesen

Keyword(s):

Processor Architectures

Download Full-text

An Approach to the Construction of a Network Processing Unit

Modeling and Analysis of Information Systems ◽

10.18255/1818-1015-2019-1-39-62 ◽

2019 ◽

Vol 26 (1) ◽

pp. 39-62

Author(s):

Stanislav O. Bezzubtsev ◽

Vyacheslav V. Vasin ◽

Dmitry Yu. Volkanov ◽

Shynar R. Zhailauova ◽

Vladislav A. Miroshnik ◽

...

Keyword(s):

Simulation Model ◽

General Purpose ◽

Network Processor ◽

Processing Unit ◽

Use Case ◽

General Purpose Processor ◽

Software Products ◽

Processor Architectures ◽

Advantages And Disadvantages ◽

Processor Unit

The paper proposes the architecture and basic requirements for a network processor for OpenFlow switches of software-defined networks. An analysis of the architectures of well-known network processors is presented − NP-5 from EZchip (now Mellanox) and Tofino from Barefoot Networks. The advantages and disadvantages of two different versions of network processor architectures are considered: pipeline-based architecture, the stages of which are represented by a set of general-purpose processor cores, and pipeline-based architecture whose stages correspond to cores specialized for specific packet processing operations. Based on a dedicated set of the most common use case scenarios, a new architecture of the network processor unit (NPU) with functionally specialized pipeline stages was proposed. The article presents a description of the simulation model of the NPU of the proposed architecture. The simulation model of the network processor is implemented in C ++ languages using SystemC, the open-source C++ library. For the functional testing of the obtained NPU model, the described use case scenarios were implemented in C. In order to evaluate the performance of the proposed NPU architecture a set of software products developed by KM211 company and the KMX32 family of microcontrollers were used. Evaluation of NPU performance was made on the basis of a simulation model. Estimates of the processing time of one packet and the average throughput of the NPU model for each scenario are obtained.

Download Full-text

Changing Trends in Computer Architecture : A Comprehensive Analysis of ARM and x86 Processors

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit2173188 ◽

2021 ◽

pp. 619-631

Author(s):

Khushi Gupta ◽

Tushar Sharma

Keyword(s):

Computer Architecture ◽

High Performance ◽

Comprehensive Analysis ◽

Low Power Consumption ◽

Modern World ◽

Development Environment ◽

Processor Architectures ◽

Software Development Environment ◽

Changing Trends ◽

Microprocessor Industry

In the modern world, we use microprocessors which are either based on ARM or x86 architecture which are the most common processor architectures. ARM originally stood for ‘Acorn RISC Machines’ but over the years changed to ‘Advanced RISC Machines’. It was started as just an experiment but showed promising results and now it is omnipresent in our modern devices. Unlike x86 which is designed for high performance, ARM focuses on low power consumption with considerable performance. Because of the advancements in the ARM technology, they are becoming more powerful than their x86 counterparts. In this analysis we will collate the two architectures briefly and conclude which microprocessor will dominate the microprocessor industry. The processor which will perform better in different tests will be more suitable for the reader to use in their application. The shift in the industry towards ARM processors can change how we write softwares which in turn will affect the whole software development environment.

Download Full-text

EMPLOYING MULTI-CORE PROCESSOR ARCHITECTURES TO ACCELERATE JAVA CRYPTOGRAPHY EXTENSIONS

Proceedings of the 7th International Conference on Web Information Systems and Technologies ◽

10.5220/0003339000050012 ◽

2011 ◽

Keyword(s):

Processor Architectures ◽

Multi Core Processor

Download Full-text