Efficient join algorithms for large database tables in a multi-GPU environment

Relational join processing is one of the core functionalities in database management systems. It has been demonstrated that GPUs as a general-purpose parallel computing platform is very promising in processing relational joins. However, join algorithms often need to handle very large input data, which is an issue that was not sufficiently addressed in existing work. Besides, as more and more desktop and workstation platforms support multi-GPU environment, the combined computing capability of multiple GPUs can easily achieve that of a computing cluster. It is worth exploring how join processing would benefit from the adaptation of multiple GPUs. We identify the low rate and complex patterns of data transfer among the CPU and GPUs as the main challenges in designing efficient algorithms for large table joins. To overcome such challenges, we propose three distinctive designs of multi-GPU join algorithms, namely, the nested loop, global sort-merge and hybrid joins for large table joins with different join conditions. Extensive experiments running on multiple databases and two different hardware configurations demonstrate high scalability of our algorithms over data size and significant performance boost brought by the use of multiple GPUs. Furthermore, our algorithms achieve much better performance as compared to existing join algorithms, with a speedup up to 25X and 2.8X over best known code developed for multi-core CPUs and GPUs respectively.

Download Full-text

ByShard

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476275 ◽

2021 ◽

Vol 14 (11) ◽

pp. 2230-2243

Author(s):

Jelle Hellings ◽

Mohammad Sadoghi

Keyword(s):

Data Management ◽

Phase Locking ◽

Large Data ◽

General Purpose ◽

Management Systems ◽

Two Phase ◽

Data Management Systems ◽

Malicious Behavior ◽

Byzantine Failures ◽

Do So

The emergence of blockchains has fueled the development of resilient systems that can deal with Byzantine failures due to crashes, bugs, or even malicious behavior. Recently, we have also seen the exploration of sharding in these resilient systems, this to provide the scalability required by very large data-based applications. Unfortunately, current sharded resilient systems all use system-specific specialized approaches toward sharding that do not provide the flexibility of traditional sharded data management systems. To improve on this situation, we fundamentally look at the design of sharded resilient systems. We do so by introducing BYSHARD, a unifying framework for the study of sharded resilient systems. Within this framework, we show how two-phase commit and two-phase locking ---two techniques central to providing atomicity and isolation in traditional sharded databases---can be implemented efficiently in a Byzantine environment, this with a minimal usage of costly Byzantine resilient primitives. Based on these techniques, we propose eighteen multi-shard transaction processing protocols. Finally, we practically evaluate these protocols and show that each protocol supports high transaction throughput and provides scalability while each striking its own trade-off between throughput, isolation level, latency , and abort rate. As such, our work provides a strong foundation for the development of ACID-compliant general-purpose and flexible sharded resilient data management systems.

Download Full-text

A smart home embedded computer system on programmable chips

10.32920/ryerson.14665896.v1 ◽

2021 ◽

Author(s):

Jonathan B. Chan

Keyword(s):

Embedded System ◽

Computer System ◽

Smart Home ◽

Data Transfer ◽

Cold Water ◽

System Development ◽

Cost Savings ◽

General Purpose ◽

Development Environment ◽

Home Appliances

System on Programmable Chip (SoPC) based embedded system development has been increasing, aiming for improved system design, testing, and cost savings in the workflow for Application Specific ICs (ASIC). We examine the development of Smart Home embedded systems, which have been traditionally based on a fixed processor and memory, with inflexible configuration. We investigate how more ability can be added by updating firmware without the burden of updating hardware, or using a full (but dedicated) general purpose computer system. Our development and implementation of the smart home controller is based on the SoPC development environment from Altera. The development board includes all the necessary parts such as processor, memory, and various communication interfaces. The initial implementation includes a simple protocol for communication between home appliances or devices and controller. This protocol allows data transfer between home appliances or devices and the controller, in turn allowing both to support more features. We have investigated and developed a home resource management application. The main resources being managed in this project are hot and cold water, electricity, and gas. We have introduced a number of expert rules to manage these resources. Additionally, we have developed a home simulator, with virtual appliances and devices, that communicates with the home controller. The simulator interacts with the SoPC based smart home embedded system developed in this project by generating messages representing a number of smart appliances in the home. It provides a useful testing environment for the smart home embedded system to verify its design goals.

Download Full-text

A SYSTEMATIC REVIEW: SOFTWARE SOLUTIONS FOR TIME-MANAGEMENT IN REMOTE WORK SETTINGS

6th LIMEN Conference Proceedings (part of LIMEN conference collection) ◽

10.31410/limen.2020.239 ◽

2020 ◽

Author(s):

Ivan Gjorgjievski ◽

◽

Daniela Karadakov ◽

Keyword(s):

Systematic Review ◽

Time Management ◽

Data Transfer ◽

Virtual Work ◽

Rapid Development ◽

Management Systems ◽

Remote Work ◽

Work Location ◽

Work Settings ◽

Average Worker

Ever since the onset of the Internet and the rapid development in communications, a paradigm shift has been occurring between the human resources and the management systems in place. That shift has already rendered plenty of legacy management systems obsolete and ineffective. Evidently, the acceleration of data transfer speeds has produced a side effect in decreasing the location dependency of the average worker in certain industries, which in turn created a new challenge for the contemporary manager especially when dealing with remote teams and time-management of the same. This work-location decoupling meant that new systems had to be created, new studies to be introduced and plenty of modernization to the legacy control systems had to be implemented. And fast! This paper contains a systematic review of available software solutions for time management, location independency, virtual work and work teams and will provide analytic insight.

Download Full-text

Systems for Knowledge Management along the Supply Chain

Designing and Implementing Global Supply Chain Management - Advances in Logistics, Operations, and Management Science ◽

10.4018/978-1-4666-9720-1.ch005 ◽

2016 ◽

pp. 92-104 ◽

Cited By ~ 1

Author(s):

John S. Edwards

Keyword(s):

Knowledge Management ◽

Supply Chain ◽

Systems Development ◽

General Purpose ◽

Management Systems ◽

Knowledge Management Systems ◽

Service Supply Chain ◽

Knowledge Exploration ◽

General Purpose Software

This chapter explains the role of knowledge management systems, whether technology-based or people-based, in service supply chain management. A systematic literature review was carried out to identify relevant examples of both successful and unsuccessful knowledge management systems. These are analyzed in terms of process, people and technology aspects, and the activities in the knowledge life-cycle (create, acquire, store, use, refine, transfer) that they support. These include systems used within a single organization, systems shared with supply chain partners, and systems shared with customers, the latter being the least common. Notable features are that more systems support knowledge exploitation than knowledge exploration, and that general-purpose software (e.g., internet search, database) is used more than software specific to knowledge management (e.g., data mining, “people finder”). The widespread use of mobile devices and social media offers both an opportunity and a challenge for future knowledge management systems development.

Download Full-text

Benchmarking Untrustworthiness

International Journal of Dependable and Trustworthy Information Systems ◽

10.4018/jdtis.2010040102 ◽

2010 ◽

Vol 1 (2) ◽

pp. 32-54 ◽

Cited By ~ 2

Author(s):

Afonso Araújo Neto ◽

Marco Vieira

Keyword(s):

Database Management ◽

Low Cost ◽

Database Management Systems ◽

Management Systems ◽

Practical Application ◽

Large Database ◽

Security Metrics ◽

High Reward

Benchmarking security is hard and, although there are many proposals of security metrics in the literature, no consensual quantitative security metric has been previously proposed. A key difficulty is that security is usually more influenced by what is unknown about a system than by what is known. In this paper, the authors propose the use of an untrustworthiness metric for benchmarking security. This metric, based on the idea of quantifying and exposing the trustworthiness relationship between a system and its owner, represents a powerful alternative to traditional security metrics. As an example, the authors propose a benchmark for Database Management Systems (DBMS) that can be easily used to assess and compare alternative database configurations based on minimum untrustworthiness, which is a low-cost and high-reward trust-based metric. The practical application of the benchmark in four real large database installations shows that untrustworthiness is a powerful metric for administrators to make informed security decisions by taking into account the specifics needs and characteristics of the environment being managed.

Download Full-text

(Neg)Entropic Scenarios Affecting the Wicked Design Spaces of Knowledge Management Systems

Entropy ◽

10.3390/e22020169 ◽

2020 ◽

Vol 22 (2) ◽

pp. 169

Author(s):

Ulrich Schmitt

Keyword(s):

Knowledge Management ◽

Information Overload ◽

Current Knowledge ◽

Design Science ◽

Knowledge Workers ◽

Science Research ◽

General Purpose ◽

Management Systems ◽

Management Approach ◽

Knowledge Management Systems

The envisioned embracing of thriving knowledge societies is increasingly compromised by threatening perceptions of information overload, attention poverty, opportunity divides, and career fears. This paper traces the roots of these symptoms back to causes of information entropy and structural holes, invisible private and undiscoverable public knowledge which characterize the sad state of our current knowledge management and creation practices. As part of an ongoing design science research and prototyping project, the article’s (neg)entropic perspectives complement a succession of prior multi-disciplinary publications. Looking forward, it proposes a novel decentralized generative knowledge management approach that prioritizes the capacity development of autonomous individual knowledge workers not at the expense of traditional organizational knowledge management systems but as a viable means to foster their fruitful co-evolution. The article, thus, informs relevant stakeholders about the current unsustainable status quo inhibiting knowledge workers; it presents viable remedial options (as a prerequisite for creating the respective future generative Knowledge Management (KM) reality) to afford a sustainable solution with the generative potential to evolve into a prospective general-purpose technology.

Download Full-text

Benchmarking a Many-Core Neuromorphic Platform With an MPI-Based DNA Sequence Matching Algorithm

Electronics ◽

10.3390/electronics8111342 ◽

2019 ◽

Vol 8 (11) ◽

pp. 1342

Author(s):

Gianvito Urgese ◽

Francesco Barchi ◽

Emanuele Parisi ◽

Evelina Forno ◽

Andrea Acquaviva ◽

...

Keyword(s):

Dna Sequence ◽

Parallel Architecture ◽

General Purpose ◽

Sequence Matching ◽

Matching Algorithm ◽

Computing Platform ◽

Globally Asynchronous Locally Synchronous ◽

Efficient Communication ◽

The Many ◽

Many Core

SpiNNaker is a neuromorphic globally asynchronous locally synchronous (GALS) multi-core architecture designed for simulating a spiking neural network (SNN) in real-time. Several studies have shown that neuromorphic platforms allow flexible and efficient simulations of SNN by exploiting the efficient communication infrastructure optimised for transmitting small packets across the many cores of the platform. However, the effectiveness of neuromorphic platforms in executing massively parallel general-purpose algorithms, while promising, is still to be explored. In this paper, we present an implementation of a parallel DNA sequence matching algorithm implemented by using the MPI programming paradigm ported to the SpiNNaker platform. In our implementation, all cores available in the board are configured for executing in parallel an optimised version of the Boyer-Moore (BM) algorithm. Exploiting this application, we benchmarked the SpiNNaker platform in terms of scalability and synchronisation latency. Experimental results indicate that the SpiNNaker parallel architecture allows a linear performance increase with the number of used cores and shows better scalability compared to a general-purpose multi-core computing platform.

Download Full-text

Research on the Extension of SCTP Protocol on the Heterogeneous Wireless Network

International Journal of Interdisciplinary Telecommunications and Networking ◽

10.4018/ijitn.2016040107 ◽

2016 ◽

Vol 8 (2) ◽

pp. 69-87

Author(s):

Yao Yuan ◽

Dalin Zhang ◽

Lin Tian ◽

Jinglin Shi

Keyword(s):

Data Transfer ◽

General Purpose ◽

Transport Layer ◽

Transmission Protocol ◽

Multiple Streams ◽

Stream Control Transmission Protocol ◽

Transfer Rates ◽

Heterogeneous Objects ◽

Transport Layer Protocol ◽

Stream Control

As a promising candidate of general-purpose transport layer protocol, the Stream Control Transmission Protocol (SCTP) has its new features such as multi-homing and multi-streaming. SCTP association can make concurrent multi-path transfer an appealing candidate to satisfy the ever increasing user demands for bandwidth by using Multi-homing feature. And multiple streams provide an aggregation mechanism to accommodate heterogeneous objects, which belong to the same application but may require different QoS from the network. In this paper, the authors introduce WM2-SCTP (Wireless Multi-path Multi-flow - Stream Control Transmission Protocol), a transport layer solution for concurrent multi-path transfer with parallel sub-flows. WM2-SCTP aims at exploiting SCTP's multi-homing and multi-streaming capability by grouping SCTP streams into sub-flows based on their required QoS and selecting best paths for each sub-flow to improve data transfer rates. The results show that under different scenarios WM2-SCTP is able to support QoS among the SCTP stream, and it achieves a better throughput.

Download Full-text

Modeling and Computing Overlapping Aggregation of Large Data Sequences in Geographic Information Systems

International Journal of Information System Modeling and Design ◽

10.4018/ijismd.2019010102 ◽

2019 ◽

Vol 10 (1) ◽

pp. 20-41

Author(s):

Driss En-Nejjary ◽

Francois Pinet ◽

Myoung-Ah Kang

Keyword(s):

Information Systems ◽

Data Transfer ◽

Graphics Processing Unit ◽

Large Data ◽

General Purpose ◽

Environmental Data ◽

Acquisition Time ◽

Processing Unit ◽

Sequential Method ◽

Transfer Cost

Recently, in the field of information systems, the acquisition of geo-referenced data has made a huge leap forward in terms of technology. There is a real issue in terms of the data processing optimization, and different research works have been proposed to analyze large geo-referenced datasets based on multi-core approaches. In this article, different methods based on general-purpose logic on graphics processing unit (GPGPU) are modelled and compared to parallelize overlapping aggregations of raster sequences. Our methods are tested on a sequence of rasters representing the evolution of temperature over time for the same region. Each raster corresponds to a different data acquisition time period, and each raster geo-referenced cell is associated with a temperature value. This article proposes optimized methods to calculate the average temperature for the region for all the possible raster subsequences of a determined length, i.e., to calculate overlapping aggregated data summaries. In these aggregations, the same subsets of values are aggregated several times. For example, this type of aggregation can be useful in different environmental data analyses, e.g., to pre-calculate all the average temperatures in a database. The present article highlights a significant increase in performance and shows that the use of GPGPU parallel processing enabled us to run the aggregations up to more than 50 times faster than the sequential method including data transfer cost and more than 200 times faster without data transfer cost.

Download Full-text

Providing the integrity and availability in the process of data transfer in the electronic documents management systems of transport-logistical clusters

2016 2nd International Conference on Industrial Engineering, Applications and Manufacturing (ICIEAM) ◽

10.1109/icieam.2016.7910915 ◽

2016 ◽

Cited By ~ 2

Author(s):

A. Nyrkov ◽

S. Sokolov ◽

S. Chernyi ◽

A. Chernyakov ◽

A. Karpina

Keyword(s):

Data Transfer ◽

Management Systems ◽

Electronic Documents ◽

Documents Management

Download Full-text