Parallel Skyline Computation Exploiting the Lattice Structure

The problem of Skyline computation has attracted considerable research attention in the last decade. A Skyline query selects those tuples from a dataset that are optimal with respect to a set of designated preference attributes. Since multicore processors are going mainstream, it has become imperative to develop parallel algorithms, which fully exploit the advantages of such modern hardware architectures. In this paper, the authors present high-performance parallel Skyline algorithms based on the lattice structure generated by a Skyline query. For this, they propose different evaluation strategies and compare several data structures for the parallel evaluation of Skyline queries. The authors present novel optimization techniques for lattice based Skyline algorithms based on pruning and removing one unrestricted attribute domain. They demonstrate through comprehensive experiments on synthetic and real datasets that their new algorithms outperform state-of-the-art multicore Skyline techniques for low-cardinality domains. The authors' algorithms have linear runtime complexity and fully play on modern hardware architectures.

Download Full-text

An Updated Survey of Efficient Hardware Architectures for Accelerating Deep Convolutional Neural Networks

Future Internet ◽

10.3390/fi12070113 ◽

2020 ◽

Vol 12 (7) ◽

pp. 113 ◽

Cited By ~ 7

Author(s):

Maurizio Capra ◽

Beatrice Bussolino ◽

Alberto Marchisio ◽

Muhammad Shafique ◽

Guido Masera ◽

...

Keyword(s):

Neural Networks ◽

High Performance ◽

Optimization Techniques ◽

Deep Convolutional Neural Networks ◽

Computing Power ◽

The Past ◽

History Of ◽

Hardware Architectures ◽

The One ◽

Main Components

Deep Neural Networks (DNNs) are nowadays a common practice in most of the Artificial Intelligence (AI) applications. Their ability to go beyond human precision has made these networks a milestone in the history of AI. However, while on the one hand they present cutting edge performance, on the other hand they require enormous computing power. For this reason, numerous optimization techniques at the hardware and software level, and specialized architectures, have been developed to process these models with high performance and power/energy efficiency without affecting their accuracy. In the past, multiple surveys have been reported to provide an overview of different architectures and optimization techniques for efficient execution of Deep Learning (DL) algorithms. This work aims at providing an up-to-date survey, especially covering the prominent works from the last 3 years of the hardware architectures research for DNNs. In this paper, the reader will first understand what a hardware accelerator is, and what are its main components, followed by the latest techniques in the field of dataflow, reconfigurability, variable bit-width, and sparsity.

Download Full-text

High Performance Parallelization of COMPSYN on a Cluster of Multicore Processors with GPUs

Procedia Computer Science ◽

10.1016/j.procs.2012.04.103 ◽

2012 ◽

Vol 9 ◽

pp. 966-975

Author(s):

Ferdinando Alessi ◽

Annalisa Massini ◽

Roberto Basili

Keyword(s):

High Performance ◽

Multicore Processors

Download Full-text

An Adaptive Throughput-First Packet Scheduling Algorithm for DPDK-Based Packet Processing Systems

Future Internet ◽

10.3390/fi13030078 ◽

2021 ◽

Vol 13 (3) ◽

pp. 78

Author(s):

Chuanhong Li ◽

Lei Song ◽

Xuewen Zeng

Keyword(s):

Packet Loss ◽

High Performance ◽

Packet Scheduling ◽

Scheduling Algorithm ◽

Processing System ◽

System Throughput ◽

Packet Processing ◽

Research Attention ◽

Continuous Increase ◽

Packet Scheduling Algorithm

The continuous increase in network traffic has sharply increased the demand for high-performance packet processing systems. For a high-performance packet processing system based on multi-core processors, the packet scheduling algorithm is critical because of the significant role it plays in load distribution, which is related to system throughput, attracting intensive research attention. However, it is not an easy task since the canonical flow-level packet scheduling algorithm is vulnerable to traffic locality, while the packet-level packet scheduling algorithm fails to maintain cache affinity. In this paper, we propose an adaptive throughput-first packet scheduling algorithm for DPDK-based packet processing systems. Combined with the feature of DPDK burst-oriented packet receiving and transmitting, we propose using Subflow as the scheduling unit and the adjustment unit making the proposed algorithm not only maintain the advantages of flow-level packet scheduling algorithms when the adjustment does not happen but also avoid packet loss as much as possible when the target core may be overloaded Experimental results show that the proposed method outperforms Round-Robin, HRW (High Random Weight), and CRC32 on system throughput and packet loss rate.

Download Full-text

Composite Event Processing for Data Streams and Domain Knowledge

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.219-220.927 ◽

2011 ◽

Vol 219-220 ◽

pp. 927-931

Author(s):

Jun Qiang Liu ◽

Xiao Ling Guan

Keyword(s):

Query Optimization ◽

Data Streams ◽

Domain Knowledge ◽

Semantic Information ◽

Query Language ◽

Processing System ◽

Optimization Techniques ◽

Research Attention ◽

Composite Event ◽

Solid Foundation

In recent years the processing of composite event queries over data streams has attracted a lot of research attention. Traditional database techniques were not designed for stream processing system. Furthermore, example continuous queries are often formulated in declarative query language without specifying the semantics. To overcome these deficiencies, this article presents the design, implementation, and evaluation of a system that executes data streams with semantic information. Then, a set of optimization techniques are proposed for handling query. So, our approach not only makes it possible to express queries with a sound semantics, but also provides a solid foundation for query optimization. Experiment results show that our approach is effective and efficient for data streams and domain knowledge.

Download Full-text

Implementation of Scientific Computing Applications on the Cell Broadband Engine

Scientific Programming ◽

10.1155/2009/589561 ◽

2009 ◽

Vol 17 (1-2) ◽

pp. 135-151 ◽

Cited By ~ 6

Author(s):

Guochun Shi ◽

Volodymyr V. Kindratenko ◽

Ivan S. Ufimtsev ◽

Todd J. Martinez ◽

James C. Phillips ◽

...

Keyword(s):

High Performance ◽

Scientific Computing ◽

Lessons Learned ◽

Optimization Techniques ◽

Cell Processor ◽

Intrinsic Properties ◽

Cell Broadband Engine ◽

Performance Improvements ◽

Cell Architecture ◽

Practical Recommendations

The Cell Broadband Engine architecture is a revolutionary processor architecture well suited for many scientific codes. This paper reports on an effort to implement several traditional high-performance scientific computing applications on the Cell Broadband Engine processor, including molecular dynamics, quantum chromodynamics and quantum chemistry codes. The paper discusses data and code restructuring strategies necessary to adapt the applications to the intrinsic properties of the Cell processor and demonstrates performance improvements achieved on the Cell architecture. It concludes with the lessons learned and provides practical recommendations on optimization techniques that are believed to be most appropriate.

Download Full-text

High Performance Topology-Aware Communication in Multicore Processors

Chapman & Hall/CRC Computational Science - Scientific Computing with Multicore and Accelerators ◽

10.1201/b10376-30 ◽

2010 ◽

pp. 443-460

Author(s):

Hari Subramoni ◽

Fabrizio Petrini ◽

Virat Agarwal ◽

Davide Pasetto

Keyword(s):

High Performance ◽

Multicore Processors

Download Full-text

Systematic Literature Review on Metaheuristic Optimization Techniques in WSNs

International Journal of Mathematics and Computers in Simulation ◽

10.46300/9102.2020.14.23 ◽

2020 ◽

Vol 14 ◽

Keyword(s):

Systematic Review ◽

Wireless Sensor Networks ◽

Exact Solution ◽

Energy Consumption ◽

Optimization Techniques ◽

Metaheuristic Algorithms ◽

Wireless Sensor ◽

Time Constraints ◽

Metaheuristic Optimization ◽

New Algorithms

Metaheuristic algorithms are recognized for developing new algorithms and optimizing various aspects in Wireless Sensor Networks (WSNs). Evaluating a multitude of possible modes is required, in most complicated problems, to obtain an exact solution. Metaheuristic algorithms can obtain solutions in acceptable time constraints. These algorithms play an operational role in solving such problems by optimizing the different metrics such as coverage rate and energy consumption of the networks. These metrics have valuable impact on network lifetime as well. This systematic review focuses on the published work from 2010 to 2020 in metaheuristic optimization in WSN. Furthermore, the systematic review will answer multiple questions that will be discussed in the methodology section.

Download Full-text

NUMA-Aware DGEMM Based on 64-Bit ARMv8 Multicore Processors Architecture

Electronics ◽

10.3390/electronics10161984 ◽

2021 ◽

Vol 10 (16) ◽

pp. 1984

Author(s):

Wei Zhang ◽

Zihao Jiang ◽

Zhiguang Chen ◽

Nong Xiao ◽

Yang Ou

Keyword(s):

Energy Efficiency ◽

High Performance ◽

Multicore Processors ◽

Matrix Multiplication ◽

Memory Access ◽

Double Precision ◽

Competitive Performance ◽

General Matrix ◽

Remarkable Improvement ◽

Task Independence

Double-precision general matrix multiplication (DGEMM) is an essential kernel for measuring the potential performance of an HPC platform. ARMv8-based system-on-chips (SoCs) have become the candidates for the next-generation HPC systems with their highly competitive performance and energy efficiency. Therefore, it is meaningful to design high-performance DGEMM for ARMv8-based SoCs. However, as ARMv8-based SoCs integrate increasing cores, modern CPU uses non-uniform memory access (NUMA). NUMA restricts the performance and scalability of DGEMM when many threads access remote NUMA domains. This poses a challenge to develop high-performance DGEMM on multi-NUMA architecture. We present a NUMA-aware method to reduce the number of cross-die and cross-chip memory access events. The critical enabler for NUMA-aware DGEMM is to leverage two levels of parallelism between and within nodes in a purely threaded implementation, which allows the task independence and data localization of NUMA nodes. We have implemented NUMA-aware DGEMM in the OpenBLAS and evaluated it on a dual-socket server with 48-core processors based on the Kunpeng920 architecture. The results show that NUMA-aware DGEMM has effectively reduced the number of cross-die and cross-chip memory access, resulting in enhancing the scalability of DGEMM significantly and increasing the performance of DGEMM by 17.1% on average, with the most remarkable improvement being 21.9%.

Download Full-text

High Performance Computing on New Accelerated Hardware Architectures

Computational Methods in Science and Technology ◽

10.12921/cmst.2010.si.01.71-79 ◽

2010 ◽

Vol Special Issue (1) ◽

pp. 71-79 ◽

Cited By ~ 2

Author(s):

Marek Błażewicz ◽

Krzysztof Kurowski ◽

Bogdan Ludwiczak ◽

Krystyna Napierała

Keyword(s):

High Performance Computing ◽

High Performance ◽

Hardware Architectures ◽

Performance Computing

Download Full-text

Preliminary Sailplane Design Using MDO And Multi-Fidelity Analysis

10.32920/ryerson.14653626.v1 ◽

2021 ◽

Author(s):

Chris V. Pilcher

Keyword(s):

Multidisciplinary Design Optimization ◽

High Performance ◽

Vortex Lattice ◽

Optimization Techniques ◽

Multidisciplinary Design ◽

Preliminary Design ◽

Adaptive Meshing ◽

Analysis Methods ◽

And Performance ◽

Modern Optimization

A multidisciplinary design optimization (MDO) strategy for the preliminary design of a sailplane has been developed. The proposed approach applies MDO techniques and multi-fidelity analysis methods which have seen successful use in many aerospace design applications. A customized genetic algorithm (GA) was developed to control the sailplane optimization that included aerodynamics/stability, structures/weights and balance and, performance/airworthiness disciplinary analysis modules. An adaptive meshing routine was developed to allow for accurate modeling of the aero structural couplinginvolved in wing design, which included a finite element method (FEM) structural solver along with a vortex lattice aerodynamics solver. Empirical equations were used to evaluate basic sailplane performance and airworthiness requirements. This research yielded an optimum design that correlated well with an existing high performance sailplane. The results of this thesis suggest that preliminary sailplane design is a well suited application for modern optimization techniques when coupled with, multi-fidelity analysis methods.

Download Full-text