Performance Analysis of Homogeneous On-Chip Large-Scale Parallel Computing Architectures for Data-Parallel Applications

On-chip computing platforms are evolving from single-core bus-based systems to many-core network-based systems, which are referred to asOn-chip Large-scale Parallel Computing Architectures (OLPCs)in the paper. Homogenous OLPCs feature strong regularity and scalability due to its identical cores and routers. Data-parallel applications have their parallel data subsets that are handled individually by the same program running in different cores. Therefore, data-parallel applications are able to obtain good speedup in homogenous OLPCs. The paper addresses modeling the speedup performance of homogeneous OLPCs for data-parallel applications. When establishing the speedup performance model, the network communication latency and the ways of storing data of data-parallel applications are modeled and analyzed in detail. Two abstract concepts (equivalent serial packet and equivalent serial communication) are proposed to construct the network communication latency model. The uniform and hotspot traffic models are adopted to reflect the ways of storing data. Some useful suggestions are presented during the performance model’s analysis. Finally, three data-parallel applications are performed on our cycle-accurate homogenous OLPC experimental platform to validate the analytic results and demonstrate that our study provides a feasible way to estimate and evaluate the performance of data-parallel applications onto homogenous OLPCs.

Download Full-text

Hiding communication latency in data parallel applications

Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing ◽

10.1109/ipps.1998.669883 ◽

2002 ◽

Cited By ~ 1

Author(s):

V. Garg ◽

D.E. Schimmel

Keyword(s):

Parallel Applications ◽

Communication Latency ◽

Data Parallel

Download Full-text

Domain Specific Language for Deployment of Parallel Applications on Parallel Computing Platforms

Proceedings of the 2014 European Conference on Software Architecture Workshops - ECSAW '14 ◽

10.1145/2642803.2642819 ◽

2007 ◽

Author(s):

Ethem Arkin ◽

Bedir Tekinerdogan

Keyword(s):

Parallel Computing ◽

Parallel Applications ◽

Domain Specific Language ◽

Specific Language ◽

Domain Specific ◽

Computing Platforms

Download Full-text

Contextual Contracts for Component-Based Resource Abstraction in a Cloud of HPC Services

10.5753/wscad.2019.8670 ◽

2019 ◽

Cited By ~ 1

Author(s):

Wagner Al Alam ◽

Francisco Carvalho Junior

Keyword(s):

Cloud Computing ◽

Parallel Computing ◽

Large Scale ◽

Matrix Multiplication ◽

Small Scale ◽

Computing Systems ◽

Computing Platform ◽

Computing Platforms

The efforts to make cloud computing suitable for the requirements of HPC applications have motivated us to design HPC Shelf, a cloud computing platform of services for building and deploying parallel computing systems for large-scale parallel processing. We introduce Alite, the system of contextual contracts of HPC Shelf, aimed at selecting component implementations according to requirements of applications, features of targeting parallel computing platforms (e.g. clusters), QoS (Quality-of-Service) properties and cost restrictions. It is evaluated through a small-scale case study employing a componentbased framework for matrix-multiplication based on the BLAS library.

Download Full-text

A FUNCTIONAL LANGUAGE FOR DEPARTMENTAL METACOMPUTING

Parallel Processing Letters ◽

10.1142/s0129626405002222 ◽

2005 ◽

Vol 15 (03) ◽

pp. 289-304 ◽

Cited By ~ 2

Author(s):

FRÉDÉRIC GAVA ◽

FRÉDÉRIC LOULERGUE

Keyword(s):

Parallel Computing ◽

Functional Data ◽

Execution Time ◽

Large Scale ◽

Parallel Machine ◽

Functional Language ◽

Hierarchical Network ◽

Bulk Synchronous Parallel ◽

Parallel Language ◽

Data Parallel

We have designed a functional data-parallel language called BSML for programming bulk synchronous parallel (BSP) algorithms. Deadlocks and indeterminism are avoided and the execution time can be then estimated. For very large scale applications more than one parallel machine could be needed. One speaks about metacomputing. A major problem in programming application for such architectures is their hierarchical network structures: latency and bandwidth of the network between parallel nodes could be orders of magnitude worse than those inside a parallel node. Here we consider how to extend both the BSP model and BSML, well-suited for parallel computing, in order to obtain a model and a functional language suitable for metacomputing.

Download Full-text

Adaptive parallel computing for large-scale distributed and parallel applications

Proceedings of the First International Workshop on Data Dissemination for Large Scale Complex Critical Infrastructures - DD4LCCI '10 ◽

10.1145/1862821.1862826 ◽

2010 ◽

Author(s):

Jaiganesh Balasubramanian ◽

Alexander Mintz ◽

Andrew Kaplan ◽

Grigory Vilkov ◽

Artem Gleyzer ◽

...

Keyword(s):

Parallel Computing ◽

Large Scale ◽

Parallel Applications

Download Full-text

Architecture Framework for Modeling the Deployment of Parallel Applications on Parallel Computing Platforms

Proceedings of the 3rd International Conference on Model-Driven Engineering and Software Development ◽

10.5220/0005229301850192 ◽

2015 ◽

Keyword(s):

Parallel Computing ◽

Parallel Applications ◽

Architecture Framework ◽

Computing Platforms

Download Full-text

Performance impact of SMP-cluster on the On-chip Large-scale Parallel Computing architecture

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW) ◽

10.1109/ipdpsw.2010.5470778 ◽

2010 ◽

Cited By ~ 1

Author(s):

Shenggang Chen ◽

Shuming Chen ◽

Yaming Yin

Keyword(s):

Parallel Computing ◽

Large Scale ◽

Performance Impact ◽

Computing Architecture ◽

On Chip

Download Full-text

An Evaluation Model and Benchmark for Parallel Computing Frameworks

Mobile Information Systems ◽

10.1155/2018/3890341 ◽

2018 ◽

Vol 2018 ◽

pp. 1-14 ◽

Cited By ~ 2

Author(s):

Weibei Fan ◽

Zhijie Han ◽

Ruchuan Wang

Keyword(s):

Support Vector Machine ◽

Parallel Computing ◽

Comparative Evaluation ◽

Large Scale ◽

Evaluation Model ◽

Performance Model ◽

Support Vector ◽

Large Scale Data ◽

Performance Evaluation Model ◽

Scale Data

MARS and Spark are two popular parallel computing frameworks and widely used for large-scale data analysis. In this paper, we first propose a performance evaluation model based on support vector machine (SVM), which is used to analyze the performance of parallel computing frameworks. Furthermore, we give representative results of a set of analysis with the proposed analytical performance model and then perform a comparative evaluation of MARS and Spark by using representative workloads and considering factors, such as performance and scalability. The experiments show that our evaluation model has higher accuracy than multifactor line regression (MLR) in predicting execution time, and it also provides a resource consumption requirement. Finally, we study benchmark experiments between MARS and Spark. MARS has better performance than Spark in both throughput and speedup in the executions of logistic regression and Bayesian classification because MARS has a large number of GPU threads that can handle higher parallelism. It also shows that Spark has lower latency than MARS in the execution of the four benchmarks.

Download Full-text

Implicitly threaded parallelism in Manticore

Journal of Functional Programming ◽

10.1017/s0956796810000201 ◽

2010 ◽

Vol 20 (5-6) ◽

pp. 537-576 ◽

Cited By ~ 29

Author(s):

MATTHEW FLUET ◽

MIKE RAINEY ◽

JOHN REPPY ◽

ADAM SHAW

Keyword(s):

Large Scale ◽

Multicore Processors ◽

Regular Structure ◽

Parallel Applications ◽

Parallel Language ◽

Fine Grained ◽

Data Parallel ◽

Parallel Languages ◽

Parallel Case ◽

High Level

AbstractThe increasing availability of commodity multicore processors is making parallel computing ever more widespread. In order to exploit its potential, programmers need languages that make the benefits of parallelism accessible and understandable. Previous parallel languages have traditionally been intended for large-scale scientific computing, and they tend not to be well suited to programming the applications one typically finds on a desktop system. Thus, we need new parallel-language designs that address a broader spectrum of applications. The Manticore project is our effort to address this need. At its core is Parallel ML, a high-level functional language for programming parallel applications on commodity multicore hardware. Parallel ML provides a diverse collection of parallel constructs for different granularities of work. In this paper, we focus on the implicitly threaded parallel constructs of the language, which support fine-grained parallelism. We concentrate on those elements that distinguish our design from related ones, namely, a novel parallel binding form, a nondeterministic parallel case form, and the treatment of exceptions in the presence of data parallelism. These features differentiate the present work from related work on functional data-parallel language designs, which have focused largely on parallel problems with regular structure and the compiler transformations—most notably, flattening—that make such designs feasible. We present detailed examples utilizing various mechanisms of the language and give a formal description of our implementation.

Download Full-text

An Adaptive Parallel Computer Vision System

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800149800021x ◽

1998 ◽

Vol 12 (03) ◽

pp. 311-334 ◽

Cited By ~ 2

Author(s):

Jeong-Min Kim ◽

Youngsik Kim ◽

Shin-Dug Kim ◽

Tack-Don Han ◽

Sung-Bong Yang

Keyword(s):

Computer Vision ◽

Vision System ◽

Performance Model ◽

Parallel Computer ◽

Parallel Applications ◽

Processor Array ◽

Computer Vision System ◽

Data Parallel ◽

Medium Level ◽

Significant Performance

An approach for designing a hybrid parallel system that can perform different levels of parallelism adaptively is presented. An adaptive parallel computer vision system (APVIS) is proposed to attain this goal. The APVIS is constructed by integrating two different types of parallel architectures, i.e. a multiprocessor based system (MBS) and a memory based processor array (MPA), tightly into a single machine. One important feature in the APVIS is that the programming interface to execute data parallel code onto the MPA is the same as the usual subroutine calling mechanism. Thus the existence of the MPA is transparent to the programmers. This research is to design an underlying base architecture that can be optimally executed for a broad range of vision tasks. A performance model is provided to show the effectiveness of the APVIS. It turns out that the proposed APVIS can provide significant performance improvement and cost effectiveness for highly parallel applications having a mixed set of parallelisms. Also an example application composed of a series of vision algorithms, from low-level and medium-level processing steps, is mapped onto the MPA. Consequently, the APVIS with a few or tens of MPA modules can perform the chosen example application in real time when multiple images are incoming successively with a few seconds inter-arrival time.

Download Full-text