A SCALABLE DISTRIBUTED MULTIMEDIA KNOWLEDGE RETRIEVAL SYSTEM ON A CLUSTER OF HETEROGENEOUS HIGH PERFORMANCE ARCHITECTURES

This paper describes a system to distribute and retrieve multimedia knowledge on a cluster of heterogeneous high performance architectures distributed over the Internet. The knowledge is represented using facts and rules in an associative logic-programming model. Associative computation facilitates distribution of facts and rules, and exploits coarse grain data parallel computation. Associative logic programming uses a flat data model that can be easily mapped onto heterogeneous architectures. The paper describes an abstract instruction set for the distributed version of the associative logic programming and the corresponding implementation. The implementation uses a message-passing library for architecture independence within a cluster, uses object oriented programming for modularity and portability, and uses Java as a front-end interface to provide a graphical user interface and multimedia capability and remote access via the Internet. The performance results on a cluster of IBM RS 6000 workstations are presented. The results show that distribution of data improves the performance almost linearly for small number of processors in a cluster.

Download Full-text

A FRAMEWORK FOR HETEROGENEOUS ASSOCIATIVE LOGIC PROGRAMMING

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213095000036 ◽

1995 ◽

Vol 04 (01n02) ◽

pp. 33-53 ◽

Cited By ~ 2

Author(s):

ARVIND K. BANSAL

Keyword(s):

Logic Programming ◽

High Performance ◽

Heterogeneous Computing ◽

Data Transfer ◽

Instruction Set ◽

Data Parallel ◽

Data Alignment ◽

Resolution Scheme ◽

Data Elements ◽

Performance Results

Associative computation is characterized by seamless intertwining of search-by-content and data parallel computation. The search-by-content paradigm is natural to scalable high performance heterogeneous computing since the use of tagged data avoids the need for explicit addressing mechanisms. In this paper, the author presents an algebra for associative logic programming, an associative resolution scheme, and a generic framework of an associative abstract instruction set. The model is based on the integration of data alignment and the use of two types of bags: data element bags and filter bags of Boolean values to select and restrict computation on data elements. The use of filter bags integrated with data alignment reduces computation and data transfer overhead, and the use of tagged data reduces overhead of preparing data before data transmission. The abstract instruction set has been illustrated by an example. Performance results are presented for a simulation in a homogeneous address space.

Download Full-text

HPF LIBRARY AND COMPILER SUPPORT FOR HALOS IN DATA PARALLEL IRREGULAR COMPUTATIONS

Parallel Processing Letters ◽

10.1142/s0129626400000196 ◽

2000 ◽

Vol 10 (02n03) ◽

pp. 189-200 ◽

Cited By ~ 1

Author(s):

THOMAS BRANDES

Keyword(s):

Message Passing ◽

High Performance ◽

Parallel Programs ◽

Address Space ◽

Compiler Support ◽

Data Parallel ◽

High Performance Fortran ◽

Non Local ◽

Performance Results ◽

Memory Architectures

On distributed memory architectures data parallel compilers emulate the global address space by distributing the data onto the processors according to the mapping directives of the user and by generating automatically explicit inter-processor communication. A shadow is additionally allocated local memory to keep on one processor also non-local values of the data that is accessed or defined by this processor. While shadow edges are already well studied for structured grids, this paper focuses on its use for applications with unstructured grids where updates on the shadow edges involve unstructured communication with complex communication schedules. The use of shadow edges is considered for High Performance Fortran (HPF) as the de facto standard language for writing data parallel programs in Fortran. A library with a HPF binding provides the explicit control of unstructured shadows and their communication schedules, also called halos. This halo library allows writing HPF programs with a performance close to hand-coded message-passing versions but where the user is freed of the burden to calculate shadow sizes and communication schedules and to do the exchanging of data with explicit message passing commands. In certain situations, the HPF compiler can create and use halos automatically. This paper shows the advantages and also the limits of this approach. The halo library and an automatic support of halos have been implemented within the ADAPTOR HPF compilation system. The performance results verify the effectiveness of the chosen approach.

Download Full-text

Parallel Object-Oriented Computation Applied to a Finite Element Problem

Scientific Programming ◽

10.1155/1993/859092 ◽

1993 ◽

Vol 2 (4) ◽

pp. 133-144 ◽

Cited By ~ 2

Author(s):

Jon B. Weissman ◽

Andrew S. Grimshaw ◽

R.D. Ferraro

Keyword(s):

Finite Element ◽

Message Passing ◽

Large Scale ◽

Processing System ◽

Object Oriented ◽

Data Parallel ◽

Programming Tools ◽

Comparable Performance ◽

Oriented Parallel ◽

Performance Results

The conventional wisdom in the scientific computing community is that the best way to solve large-scale numerically intensive scientific problems on today's parallel MIMD computers is to use Fortran or C programmed in a data-parallel style using low-level message-passing primitives. This approach inevitably leads to nonportable codes and extensive development time, and restricts parallel programming to the domain of the expert programmer. We believe that these problems are not inherent to parallel computing but are the result of the programming tools used. We will show that comparable performance can be achieved with little effort if better tools that present higher level abstractions are used. The vehicle for our demonstration is a 2D electromagnetic finite element scattering code we have implemented in Mentat, an object-oriented parallel processing system. We briefly describe the application. Mentat, the implementation, and present performance results for both a Mentat and a hand-coded parallel Fortran version.

Download Full-text

Efficient Graph Component Labeling on Hybrid CPU and GPU Platforms

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.596.276 ◽

2014 ◽

Vol 596 ◽

pp. 276-279

Author(s):

Xiao Hui Pan

Keyword(s):

High Performance ◽

General Purpose ◽

Gpu Programming ◽

Data Parallel ◽

Graphical Processing Units ◽

Architectural Features ◽

Graph Coloring Problem ◽

Graphical Processing ◽

And Performance ◽

Performance Results

Graph component labeling, which is a subset of the general graph coloring problem, is a computationally expensive operation in many important applications and simulations. A number of data-parallel algorithmic variations to the component labeling problem are possible and we explore their use with general purpose graphical processing units (GPGPUs) and with the CUDA GPU programming language. We discuss implementation issues and performance results on CPUs and GPUs using CUDA. We evaluated our system with real-world graphs. We show how to consider different architectural features of the GPU and the host CPUs and achieve high performance.

Download Full-text

Run-Time and Compiler Support for Programming in Adaptive Parallel Environments

Scientific Programming ◽

10.1155/1997/926796 ◽

1997 ◽

Vol 6 (2) ◽

pp. 215-227 ◽

Cited By ~ 11

Author(s):

Guy Edjlali ◽

Gagan Guyagrawal ◽

Alan Sussman ◽

Jim Humphries ◽

Joel Saltz

Keyword(s):

Parallel Programming ◽

High Performance ◽

Navier Stokes ◽

Programming Environments ◽

Data Parallel ◽

Adaptive Environment ◽

Run Time ◽

Time Required ◽

The Cost ◽

Performance Results

For better utilization of computing resources, it is important to consider parallel programming environments in which the number of available processors varies at run-time. In this article, we discuss run-time support for data-parallel programming in such an adaptive environment. Executing programs in an adaptive environment requires redistributing data when the number of processors changes, and also requires determining new loop bounds and communication patterns for the new set of processors. We have developed a run-time library to provide this support. We discuss how the run-time library can be used by compilers of high-performance Fortran (HPF)-like languages to generate code for an adaptive environment. We present performance results for a Navier-Stokes solver and a multigrid template run on a network of workstations and an IBM SP-2. Our experiments show that if the number of processors is not varied frequently, the cost of data redistribution is not significant compared to the time required for the actual computation. Overall, our work establishes the feasibility of compiling HPF for a network of nondedicated workstations, which are likely to be an important resource for parallel programming in the future.

Download Full-text

Implementation and Performance of DSMPI

Scientific Programming ◽

10.1155/1997/452521 ◽

1997 ◽

Vol 6 (2) ◽

pp. 201-214 ◽

Cited By ~ 2

Author(s):

Luis M. Silva ◽

JoÃo Gabriel Silva ◽

Simon Chapple

Keyword(s):

Shared Memory ◽

Message Passing ◽

Distributed Memory ◽

Programming Model ◽

Distributed Shared Memory ◽

Memory Systems ◽

Distributed Memory Machines ◽

Coherence Protocols ◽

And Performance ◽

Performance Results

Distributed shared memory has been recognized as an alternative programming model to exploit the parallelism in distributed memory systems because it provides a higher level of abstraction than simple message passing. DSM combines the simple programming model of shared memory with the scalability of distributed memory machines. This article presents DSMPI, a parallel library that runs atop of MPI and provides a DSM abstraction. It provides an easy-to-use programming interface, is fully, portable, and supports heterogeneity. For the sake of flexibility, it supports different coherence protocols and models of consistency. We present some performance results taken in a network of workstations and in a Cray T3D which show that DSMPI can be competitive with MPI for some applications.

Download Full-text

PERFORMANCE EVALUATION OF BLAS ON THE TRIDENT PROCESSOR

Parallel Processing Letters ◽

10.1142/s0129626405002325 ◽

2005 ◽

Vol 15 (04) ◽

pp. 407-414

Author(s):

MOSTAFA I. SOLIMAN ◽

STANISLAV G. SEDUKHIN

Keyword(s):

High Performance ◽

Programming Model ◽

Parallel Applications ◽

Instruction Set ◽

Code Size ◽

Data Parallel ◽

Fine Grain ◽

Multi Level ◽

High Level ◽

Programming Interface

Different subtasks of an application usually have different computational, memory, and I/O requirements that result in different needs for computer capabilities. Thus, the more appropriate approach for both high performance and simple programming model is designing a processor having multi-level instruction set architecture (ISA). This leads to high performance and minimum executable code size. Since the fundamental data structures for a wide variety of existing applications are scalar, vector, and matrix, our research Trident processor has three-level ISA executed on zero-, one-, and two-dimensional arrays of data. These levels are used to express a great amount of fine-grain data parallelism to a processor instead of the dynamical extraction by a complicated logic or statically with compilers. This reduces the design complexity and provides high-level programming interface to hardware. In this paper, the performance of Trident processor is evaluated on BLAS, which represent the kernel operations of many data parallel applications. We show that Trident processor proportionally reduces the number of clock cycles per floating-point operation by increasing the number of execution datapaths.

Download Full-text

The Design and Implementation of Parallel Algorithm Accelerator Based on CPU-GPU Collaborative Computing Environment

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.529.408 ◽

2012 ◽

Vol 529 ◽

pp. 408-412 ◽

Cited By ~ 1

Author(s):

Fan Yang ◽

Tong Nian Shi ◽

Han Chu ◽

Kun Wang

Keyword(s):

High Speed ◽

High Performance ◽

Programming Model ◽

Rapid Development ◽

Collaborative Computing ◽

Computing Environment ◽

Internet Applications ◽

Rich Internet Applications ◽

Mixed Programming ◽

Performance Results

With the rapid development of GPU in recent years, CPU-GPU collaborative computing has become an important technique in scientific research. In this paper, we introduce a cluster system design which based on CPU-GPU collaborative computing environment. This system is based on Intel Embedded Star Platform, and we expand a Computing-Node for it by connecting to high-speed network. Through OpenMP and MPI mixed programming, we integrate different algorithms meeting with the scientific computing and application computing by Master/Worker model and a software system which is based on RIA (Rich Internet Applications). In order to achieve high performance, we used a combination of software and hardware technology. The performance results show that the programs built with hybrid programming model have good performance and scalability.

Download Full-text

PDDP, A Data Parallel Programming Model

Scientific Programming ◽

10.1155/1996/857815 ◽

1996 ◽

Vol 5 (4) ◽

pp. 319-327

Author(s):

Karen H. Warren

Keyword(s):

Parallel Programming ◽

High Performance ◽

Parallel Machines ◽

Programming Model ◽

Data Distribution ◽

Interprocessor Communication ◽

Data Parallel ◽

Parallel Programming Model ◽

Data Objects ◽

Data Parallel Programming

PDDP, the parallel data distribution preprocessor, is a data parallel programming model for distributed memory parallel computers. PDDP implements high-performance Fortran-compatible data distribution directives and parallelism expressed by the use of Fortran 90 array syntax, the FORALL statement, and the WHERE construct. Distributed data objects belong to a global name space; other data objects are treated as local and replicated on each processor. PDDP allows the user to program in a shared memory style and generates codes that are portable to a variety of parallel machines. For interprocessor communication, PDDP uses the fastest communication primitives on each platform.

Download Full-text

Study on Mechanical Equipment Fault Diagnosis System Based on Cloud Computing

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.220-223.2520 ◽

2012 ◽

Vol 220-223 ◽

pp. 2520-2523

Author(s):

Wang Shen Hao ◽

Xin Min Dong ◽

Jie Han ◽

Wen Ping Lei

Keyword(s):

Cloud Computing ◽

Parallel Computing ◽

Fault Diagnosis ◽

High Performance ◽

Programming Model ◽

Distributed Storage ◽

Mechanical Equipment ◽

Data Parallel ◽

On Line ◽

Data Parallel Computing

Generally working in severe conditions, mechanical equipments are subjected to progressive deterioration of their state. The mechanical failures account for more than 60% of breakdowns of the system. Therefore, the identification of impending mechanical fault is very important to prevent the system from illness running. It generally requires high performance computer to complete the traditional parallel computing, while the parallel FFT algorithm based on Hadoop MapReduce programming model can be realized in the low-end machines. Combining with Cloud Computing and equipment fault diagnosis technology, it can realize the massive data parallel computing and distributed storage. The result of experiment shows that it would provide a good solution and technical support for mechanical equipment on-line monitoring and real-time fault diagnosis.

Download Full-text