General-purpose join algorithms for large graph triangle listing on heterogeneous systems

This paper (SPE 52637) was revised for publication from paper SPE 38002, first presented at the 1997 SPE Reservoir Simulation Symposium, Dallas, 8-11 June. Original manuscript received for review 1 July 1997. Revised manuscript received 5 August 1998. Paper peer approved 3 September 1998. Summary The gridblock permeabilities used in reservoir simulation are commonly determined through the upscaling of a fine scale geostatistical reservoir description. Though it is well established that permeabilities computed in this manner are, in general, full tensor quantities, most finite difference reservoir simulators still treat permeability as a diagonal tensor. In this paper, we implement a capability to handle full tensor permeabilities in a general purpose finite difference simulator and apply this capability to the modeling of several complex geological systems. We formulate a flux continuous approach for the pressure equation by use of a method analogous to that of previous researchers (Edwards and Rogers; Aavatsmark et al.), consider methods for upwinding in multiphase flow problems, and additionally discuss some relevant implementation and reservoir characterization issues. The accuracy of the finite difference formulation, assessed through comparisons to an accurate finite element approach, is shown to be generally good, particularly for immiscible displacements in heterogeneous systems. The formulation is then applied to the simulation of upscaled descriptions of several geologically complex reservoirs involving crossbedding and extensive fracturing. The method performs quite well for these systems and is shown to capture the effects of the underlying geology accurately. Finally, the significant errors that can be incurred through inaccurate representation of the full permeability tensor are demonstrated for several cases. P. 567

Download Full-text

Parallelisation of equation-based simulation programs on heterogeneous computing systems

PeerJ Computer Science ◽

10.7717/peerj-cs.160 ◽

2018 ◽

Vol 4 ◽

pp. e160 ◽

Cited By ~ 3

Author(s):

Dragan D. Nikolić

Keyword(s):

Data Structure ◽

Graphics Processing Units ◽

Heterogeneous Computing ◽

Numerical Solutions ◽

Heterogeneous Systems ◽

General Purpose ◽

Algebraic Equations ◽

Mathematical Operation ◽

General Purpose Processors ◽

Model Equations

Numerical solutions of equation-based simulations require computationally intensive tasks such as evaluation of model equations, linear algebra operations and solution of systems of linear equations. The focus in this work is on parallel evaluation of model equations on shared memory systems such as general purpose processors (multi-core CPUs and manycore devices), streaming processors (Graphics Processing Units and Field Programmable Gate Arrays) and heterogeneous systems. The current approaches for evaluation of model equations are reviewed and their capabilities and shortcomings analysed. Since stream computing differs from traditional computing in that the system processes a sequential stream of elements, equations must be transformed into a data structure suitable for both types. The postfix notation expression stacks are recognised as a platform and programming language independent method to describe, store in computer memory and evaluate general systems of differential and algebraic equations of any size. Each mathematical operation and its operands are described by a specially designed data structure, and every equation is transformed into an array of these structures (a Compute Stack). Compute Stacks are evaluated by a stack machine using a Last In First Out queue. The stack machine is implemented in the DAE Tools modelling software in the C99 language using two Application Programming Interface (APIs)/frameworks for parallelism. The Open Multi-Processing (OpenMP) API is used for parallelisation on general purpose processors, and the Open Computing Language (OpenCL) framework is used for parallelisation on streaming processors and heterogeneous systems. The performance of the sequential Compute Stack approach is compared to the direct C++ implementation and to the previous approach that uses evaluation trees. The new approach is 45% slower than the C++ implementation and more than five times faster than the previous one. The OpenMP and OpenCL implementations are tested on three medium-scale models using a multi-core CPU, a discrete GPU, an integrated GPU and heterogeneous computing setups. Execution times are compared and analysed and the advantages of the OpenCL implementation running on a discrete GPU and heterogeneous systems are discussed. It is found that the evaluation of model equations using the parallel OpenCL implementation running on a discrete GPU is up to twelve times faster than the sequential version while the overall simulation speed-up gained is more than three times.

Download Full-text

Efficient join algorithms for large database tables in a multi-GPU environment

Proceedings of the VLDB Endowment ◽

10.14778/3436905.3436927 ◽

2020 ◽

Vol 14 (4) ◽

pp. 708-720

Author(s):

Ran Rui ◽

Hao Li ◽

Yi-Cheng Tu

Keyword(s):

Data Transfer ◽

General Purpose ◽

Management Systems ◽

Large Database ◽

Multiple Gpus ◽

Computing Platform ◽

Significant Performance ◽

Join Algorithms ◽

High Scalability ◽

Nested Loop

Relational join processing is one of the core functionalities in database management systems. It has been demonstrated that GPUs as a general-purpose parallel computing platform is very promising in processing relational joins. However, join algorithms often need to handle very large input data, which is an issue that was not sufficiently addressed in existing work. Besides, as more and more desktop and workstation platforms support multi-GPU environment, the combined computing capability of multiple GPUs can easily achieve that of a computing cluster. It is worth exploring how join processing would benefit from the adaptation of multiple GPUs. We identify the low rate and complex patterns of data transfer among the CPU and GPUs as the main challenges in designing efficient algorithms for large table joins. To overcome such challenges, we propose three distinctive designs of multi-GPU join algorithms, namely, the nested loop, global sort-merge and hybrid joins for large table joins with different join conditions. Extensive experiments running on multiple databases and two different hardware configurations demonstrate high scalability of our algorithms over data size and significant performance boost brought by the use of multiple GPUs. Furthermore, our algorithms achieve much better performance as compared to existing join algorithms, with a speedup up to 25X and 2.8X over best known code developed for multi-core CPUs and GPUs respectively.

Download Full-text

Performance analysis of general purpose and digital signal processor kernels for heterogeneous systems-on-chip

Advances in Radio Science ◽

10.5194/ars-1-171-2003 ◽

2003 ◽

Vol 1 ◽

pp. 171-175

Author(s):

T. von Sydow ◽

H. Blume ◽

T. G. Noll

Keyword(s):

Digital Signal Processor ◽

Design Space Exploration ◽

Heterogeneous Systems ◽

Digital Signal ◽

General Purpose ◽

Optimization Techniques ◽

Product Cycle ◽

Systems On Chip ◽

Programmable Architecture ◽

On Chip

Abstract. Various reasons like technology progress, flexibility demands, shortened product cycle time and shortened time to market have brought up the possibility and necessity to integrate different architecture blocks on one heterogeneous System-on-Chip (SoC). Architecture blocks like programmable processor cores (DSP- and GPP-kernels), embedded FPGAs as well as dedicated macros will be integral parts of such a SoC. Especially programmable architecture blocks and associated optimization techniques are discussed in this contribution. Design space exploration and thus the choice which architecture blocks should be integrated in a SoC is a challenging task. Crucial to this exploration is the evaluation of the application domain characteristics and the costs caused by individual architecture blocks integrated on a SoC. An ATE-cost function has been applied to examine the performance of the aforementioned programmable architecture blocks. Therefore, representative discrete devices have been analyzed. Furthermore, several architecture dependent optimization steps and their effects on the cost ratios are presented.

Download Full-text

CUDA or OpenCL

Research Advances in the Integration of Big Data and Smart Computing - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-4666-8737-0.ch015 ◽

2016 ◽

pp. 267-279

Author(s):

Mayank Bhura ◽

Pranav H. Deshpande ◽

K. Chandrasekaran

Keyword(s):

High Performance Computing ◽

Graphics Processing Units ◽

High Performance ◽

Heterogeneous Systems ◽

General Purpose ◽

Programming Environment ◽

Pros And Cons ◽

Nvidia Gpu ◽

Graphics Processing ◽

Performance Computing

Usage of General Purpose Graphics Processing Units (GPGPUs) in high-performance computing is increasing as heterogeneous systems continue to become dominant. CUDA had been the programming environment for nearly all such NVIDIA GPU based GPGPU applications. Still, the framework runs only on NVIDIA GPUs, for other frameworks it requires reimplementation to utilize additional computing devices that are available. OpenCL provides a vendor-neutral and open programming environment, with many implementations available on CPUs, GPUs, and other types of accelerators, OpenCL can thus be regarded as write once, run anywhere framework. Despite this, both frameworks have their own pros and cons. This chapter presents a comparison of the performance of CUDA and OpenCL frameworks, using an algorithm to find the sum of all possible triple products on a list of integers, implemented on GPUs.

Download Full-text

A Heterogeneous System Based on Latent Semantic Analysis Using GPU and Multi-CPU

Scientific Programming ◽

10.1155/2017/8131390 ◽

2017 ◽

Vol 2017 ◽

pp. 1-19 ◽

Cited By ~ 4

Author(s):

Gabriel A. León-Paredes ◽

Liliana I. Barbosa-Santillán ◽

Juan J. Sánchez-Escobar

Keyword(s):

Information Retrieval ◽

Latent Semantic Analysis ◽

Graphics Processing Units ◽

Semantic Analysis ◽

Computational Cost ◽

Heterogeneous Systems ◽

Semantic Space ◽

General Purpose ◽

Space Construction ◽

Value Decomposition

Latent Semantic Analysis (LSA) is a method that allows us to automatically index and retrieve information from a set of objects by reducing the term-by-document matrix using the Singular Value Decomposition (SVD) technique. However, LSA has a high computational cost for analyzing large amounts of information. The goals of this work are (i) to improve the execution time of semantic space construction, dimensionality reduction, and information retrieval stages of LSA based on heterogeneous systems and (ii) to evaluate the accuracy and recall of the information retrieval stage. We present a heterogeneous Latent Semantic Analysis (hLSA) system, which has been developed using General-Purpose computing on Graphics Processing Units (GPGPUs) architecture, which can solve large numeric problems faster through the thousands of concurrent threads on multiple CUDA cores of GPUs and multi-CPU architecture, which can solve large text problems faster through a multiprocessing environment. We execute the hLSA system with documents from the PubMed Central (PMC) database. The results of the experiments show that the acceleration reached by the hLSA system for large matrices with one hundred and fifty thousand million values is around eight times faster than the standard LSA version with an accuracy of 88% and a recall of 100%.

Download Full-text

The economics of selection of mail orders Drs. Zahavi and Levin are the masterminds behind the development of AMOS, a customized predictive modeling system for the Franklin Mint in Philadelphia, and GainSmarts, a general purpose data mining system that is the two-time winner of the KDD-CUP competition for the best data mining tools (1997 and 1998) sponsored by the American Association for Artificial Intelligence.

Journal of Interactive Marketing ◽

10.1002/dir.1016.abs ◽

2001 ◽

Vol 15 (3) ◽

pp. 53

Author(s):

Nissan Levin ◽

Jacob Zahavi

Keyword(s):

Artificial Intelligence ◽

Data Mining ◽

Predictive Modeling ◽

American Association ◽

General Purpose ◽

Mining System ◽

Data Mining System ◽

Mining Tools ◽

Selection Of

Download Full-text

Use of an interactive general-purpose computer terminal to simulate training equipment operation.

PsycEXTRA Dataset ◽

10.1037/e441802004-001 ◽

1975 ◽

Cited By ~ 1

Author(s):

George F. Lahey ◽

Alice M. Crawford ◽

Richard E. Hurlock

Keyword(s):

General Purpose ◽

Computer Terminal ◽

Equipment Operation ◽

General Purpose Computer ◽

Purpose Computer

Download Full-text

The general purpose design of 2-dimensional recursive digital filters

IEE Proceedings G (Electronic Circuits and Systems) ◽

10.1049/ip-g-1.1983.0032 ◽

1983 ◽

Vol 130 (5) ◽

pp. 171 ◽

Cited By ~ 1

Author(s):

G. Crebbin ◽

J. Attikiouzel

Keyword(s):

Digital Filters ◽

General Purpose ◽

Recursive Digital Filters

Download Full-text

Rationale and Design Considerations for a Semantic Mediator in Health Information Systems

Methods of Information in Medicine ◽

10.1055/s-0038-1634545 ◽

1998 ◽

Vol 37 (04/05) ◽

pp. 518-526 ◽

Cited By ~ 9

Author(s):

D. Sauquet ◽

M.-C. Jaulent ◽

E. Zapletal ◽

M. Lavril ◽

P. Degoulet

Keyword(s):

Information Systems ◽

Software Engineering ◽

Health Information ◽

Rapid Development ◽

Heterogeneous Systems ◽

Health Information Systems ◽

Semantic Interoperability ◽

Point Of View ◽

Current State ◽

Engineering Environment

AbstractRapid development of community health information networks raises the issue of semantic interoperability between distributed and heterogeneous systems. Indeed, operational health information systems originate from heterogeneous teams of independent developers and have to cooperate in order to exchange data and services. A good cooperation is based on a good understanding of the messages exchanged between the systems. The main issue of semantic interoperability is to ensure that the exchange is not only possible but also meaningful. The main objective of this paper is to analyze semantic interoperability from a software engineering point of view. It describes the principles for the design of a semantic mediator (SM) in the framework of a distributed object manager (DOM). The mediator is itself a component that should allow the exchange of messages independently of languages and platforms. The functional architecture of such a SM is detailed. These principles have been partly applied in the context of the HEllOS object-oriented software engineering environment. The resulting service components are presented with their current state of achievement.

Download Full-text