Parallel Skyline Computation Exploiting the Lattice Structure

2015 ◽  
Vol 26 (4) ◽  
pp. 18-43 ◽  
Author(s):  
Markus Endres ◽  
Werner Kießling

The problem of Skyline computation has attracted considerable research attention in the last decade. A Skyline query selects those tuples from a dataset that are optimal with respect to a set of designated preference attributes. Since multicore processors are going mainstream, it has become imperative to develop parallel algorithms, which fully exploit the advantages of such modern hardware architectures. In this paper, the authors present high-performance parallel Skyline algorithms based on the lattice structure generated by a Skyline query. For this, they propose different evaluation strategies and compare several data structures for the parallel evaluation of Skyline queries. The authors present novel optimization techniques for lattice based Skyline algorithms based on pruning and removing one unrestricted attribute domain. They demonstrate through comprehensive experiments on synthetic and real datasets that their new algorithms outperform state-of-the-art multicore Skyline techniques for low-cardinality domains. The authors' algorithms have linear runtime complexity and fully play on modern hardware architectures.

2020 ◽  
Vol 12 (7) ◽  
pp. 113 ◽  
Author(s):  
Maurizio Capra ◽  
Beatrice Bussolino ◽  
Alberto Marchisio ◽  
Muhammad Shafique ◽  
Guido Masera ◽  
...  

Deep Neural Networks (DNNs) are nowadays a common practice in most of the Artificial Intelligence (AI) applications. Their ability to go beyond human precision has made these networks a milestone in the history of AI. However, while on the one hand they present cutting edge performance, on the other hand they require enormous computing power. For this reason, numerous optimization techniques at the hardware and software level, and specialized architectures, have been developed to process these models with high performance and power/energy efficiency without affecting their accuracy. In the past, multiple surveys have been reported to provide an overview of different architectures and optimization techniques for efficient execution of Deep Learning (DL) algorithms. This work aims at providing an up-to-date survey, especially covering the prominent works from the last 3 years of the hardware architectures research for DNNs. In this paper, the reader will first understand what a hardware accelerator is, and what are its main components, followed by the latest techniques in the field of dataflow, reconfigurability, variable bit-width, and sparsity.


2012 ◽  
Vol 9 ◽  
pp. 966-975
Author(s):  
Ferdinando Alessi ◽  
Annalisa Massini ◽  
Roberto Basili

2021 ◽  
Vol 13 (3) ◽  
pp. 78
Author(s):  
Chuanhong Li ◽  
Lei Song ◽  
Xuewen Zeng

The continuous increase in network traffic has sharply increased the demand for high-performance packet processing systems. For a high-performance packet processing system based on multi-core processors, the packet scheduling algorithm is critical because of the significant role it plays in load distribution, which is related to system throughput, attracting intensive research attention. However, it is not an easy task since the canonical flow-level packet scheduling algorithm is vulnerable to traffic locality, while the packet-level packet scheduling algorithm fails to maintain cache affinity. In this paper, we propose an adaptive throughput-first packet scheduling algorithm for DPDK-based packet processing systems. Combined with the feature of DPDK burst-oriented packet receiving and transmitting, we propose using Subflow as the scheduling unit and the adjustment unit making the proposed algorithm not only maintain the advantages of flow-level packet scheduling algorithms when the adjustment does not happen but also avoid packet loss as much as possible when the target core may be overloaded Experimental results show that the proposed method outperforms Round-Robin, HRW (High Random Weight), and CRC32 on system throughput and packet loss rate.


2011 ◽  
Vol 219-220 ◽  
pp. 927-931
Author(s):  
Jun Qiang Liu ◽  
Xiao Ling Guan

In recent years the processing of composite event queries over data streams has attracted a lot of research attention. Traditional database techniques were not designed for stream processing system. Furthermore, example continuous queries are often formulated in declarative query language without specifying the semantics. To overcome these deficiencies, this article presents the design, implementation, and evaluation of a system that executes data streams with semantic information. Then, a set of optimization techniques are proposed for handling query. So, our approach not only makes it possible to express queries with a sound semantics, but also provides a solid foundation for query optimization. Experiment results show that our approach is effective and efficient for data streams and domain knowledge.


2009 ◽  
Vol 17 (1-2) ◽  
pp. 135-151 ◽  
Author(s):  
Guochun Shi ◽  
Volodymyr V. Kindratenko ◽  
Ivan S. Ufimtsev ◽  
Todd J. Martinez ◽  
James C. Phillips ◽  
...  

The Cell Broadband Engine architecture is a revolutionary processor architecture well suited for many scientific codes. This paper reports on an effort to implement several traditional high-performance scientific computing applications on the Cell Broadband Engine processor, including molecular dynamics, quantum chromodynamics and quantum chemistry codes. The paper discusses data and code restructuring strategies necessary to adapt the applications to the intrinsic properties of the Cell processor and demonstrates performance improvements achieved on the Cell architecture. It concludes with the lessons learned and provides practical recommendations on optimization techniques that are believed to be most appropriate.


Metaheuristic algorithms are recognized for developing new algorithms and optimizing various aspects in Wireless Sensor Networks (WSNs). Evaluating a multitude of possible modes is required, in most complicated problems, to obtain an exact solution. Metaheuristic algorithms can obtain solutions in acceptable time constraints. These algorithms play an operational role in solving such problems by optimizing the different metrics such as coverage rate and energy consumption of the networks. These metrics have valuable impact on network lifetime as well. This systematic review focuses on the published work from 2010 to 2020 in metaheuristic optimization in WSN. Furthermore, the systematic review will answer multiple questions that will be discussed in the methodology section.


Electronics ◽  
2021 ◽  
Vol 10 (16) ◽  
pp. 1984
Author(s):  
Wei Zhang ◽  
Zihao Jiang ◽  
Zhiguang Chen ◽  
Nong Xiao ◽  
Yang Ou

Double-precision general matrix multiplication (DGEMM) is an essential kernel for measuring the potential performance of an HPC platform. ARMv8-based system-on-chips (SoCs) have become the candidates for the next-generation HPC systems with their highly competitive performance and energy efficiency. Therefore, it is meaningful to design high-performance DGEMM for ARMv8-based SoCs. However, as ARMv8-based SoCs integrate increasing cores, modern CPU uses non-uniform memory access (NUMA). NUMA restricts the performance and scalability of DGEMM when many threads access remote NUMA domains. This poses a challenge to develop high-performance DGEMM on multi-NUMA architecture. We present a NUMA-aware method to reduce the number of cross-die and cross-chip memory access events. The critical enabler for NUMA-aware DGEMM is to leverage two levels of parallelism between and within nodes in a purely threaded implementation, which allows the task independence and data localization of NUMA nodes. We have implemented NUMA-aware DGEMM in the OpenBLAS and evaluated it on a dual-socket server with 48-core processors based on the Kunpeng920 architecture. The results show that NUMA-aware DGEMM has effectively reduced the number of cross-die and cross-chip memory access, resulting in enhancing the scalability of DGEMM significantly and increasing the performance of DGEMM by 17.1% on average, with the most remarkable improvement being 21.9%.


2010 ◽  
Vol Special Issue (1) ◽  
pp. 71-79 ◽  
Author(s):  
Marek Błażewicz ◽  
Krzysztof Kurowski ◽  
Bogdan Ludwiczak ◽  
Krystyna Napierała

2021 ◽  
Author(s):  
Chris V. Pilcher

A multidisciplinary design optimization (MDO) strategy for the preliminary design of a sailplane has been developed. The proposed approach applies MDO techniques and multi-fidelity analysis methods which have seen successful use in many aerospace design applications. A customized genetic algorithm (GA) was developed to control the sailplane optimization that included aerodynamics/stability, structures/weights and balance and, performance/airworthiness disciplinary analysis modules. An adaptive meshing routine was developed to allow for accurate modeling of the aero structural couplinginvolved in wing design, which included a finite element method (FEM) structural solver along with a vortex lattice aerodynamics solver. Empirical equations were used to evaluate basic sailplane performance and airworthiness requirements. This research yielded an optimum design that correlated well with an existing high performance sailplane. The results of this thesis suggest that preliminary sailplane design is a well suited application for modern optimization techniques when coupled with, multi-fidelity analysis methods.


Sign in / Sign up

Export Citation Format

Share Document