A FUNCTIONAL LANGUAGE FOR DEPARTMENTAL METACOMPUTING

We have designed a functional data-parallel language called BSML for programming bulk synchronous parallel (BSP) algorithms. Deadlocks and indeterminism are avoided and the execution time can be then estimated. For very large scale applications more than one parallel machine could be needed. One speaks about metacomputing. A major problem in programming application for such architectures is their hierarchical network structures: latency and bandwidth of the network between parallel nodes could be orders of magnitude worse than those inside a parallel node. Here we consider how to extend both the BSP model and BSML, well-suited for parallel computing, in order to obtain a model and a functional language suitable for metacomputing.

Download Full-text

A MODULAR IMPLEMENTATION OF DATA STRUCTURES IN BULK-SYNCHRONOUS PARALLEL ML

Parallel Processing Letters ◽

10.1142/s0129626408003211 ◽

2008 ◽

Vol 18 (01) ◽

pp. 39-53 ◽

Cited By ~ 4

Author(s):

FRÉDÉRIC GAVA

Keyword(s):

Data Structures ◽

Functional Data ◽

Parallel Machines ◽

Parallel Programs ◽

Data Types ◽

Programming Environments ◽

Bulk Synchronous Parallel ◽

Parallel Language ◽

Data Parallel ◽

Sequential Programming

A functional data-parallel language called BSML has been designed for programming Bulk-Synchronous Parallel algorithms. Many sequential algorithms do not have parallel counterparts and many non-computer science researchers do not want to deal with parallel programming. In sequential programming environments, common data structures are often provided through reusable libraries to simplify the development of applications. A parallel representation of such data structures is thus a solution for writing parallel programs without suffering from disadvantages of all the features of a parallel language. In this paper we describe a modular implementation in BSML of some data structures and show how those data types can address the needs of many potential users of parallel machines who have so far been deterred by the complexity of parallelizing code.

Download Full-text

Performance Analysis of Homogeneous On-Chip Large-Scale Parallel Computing Architectures for Data-Parallel Applications

Journal of Electrical and Computer Engineering ◽

10.1155/2015/902591 ◽

2015 ◽

Vol 2015 ◽

pp. 1-20 ◽

Cited By ~ 2

Author(s):

Xiaowen Chen ◽

Zhonghai Lu ◽

Axel Jantsch ◽

Shuming Chen ◽

Yang Guo ◽

...

Keyword(s):

Parallel Computing ◽

Large Scale ◽

Performance Model ◽

Parallel Applications ◽

Network Communication ◽

Core Network ◽

Communication Latency ◽

Data Parallel ◽

Computing Platforms ◽

On Chip

On-chip computing platforms are evolving from single-core bus-based systems to many-core network-based systems, which are referred to asOn-chip Large-scale Parallel Computing Architectures (OLPCs)in the paper. Homogenous OLPCs feature strong regularity and scalability due to its identical cores and routers. Data-parallel applications have their parallel data subsets that are handled individually by the same program running in different cores. Therefore, data-parallel applications are able to obtain good speedup in homogenous OLPCs. The paper addresses modeling the speedup performance of homogeneous OLPCs for data-parallel applications. When establishing the speedup performance model, the network communication latency and the ways of storing data of data-parallel applications are modeled and analyzed in detail. Two abstract concepts (equivalent serial packet and equivalent serial communication) are proposed to construct the network communication latency model. The uniform and hotspot traffic models are adopted to reflect the ways of storing data. Some useful suggestions are presented during the performance model’s analysis. Finally, three data-parallel applications are performed on our cycle-accurate homogenous OLPC experimental platform to validate the analytic results and demonstrate that our study provides a feasible way to estimate and evaluate the performance of data-parallel applications onto homogenous OLPCs.

Download Full-text

Memory Reuse Analysis in the Polyhedral Model

Parallel Processing Letters ◽

10.1142/s0129626497000218 ◽

1997 ◽

Vol 07 (02) ◽

pp. 203-215 ◽

Cited By ~ 10

Author(s):

D. Wilde ◽

S. Rajopadhye

Keyword(s):

Functional Data ◽

Sufficient Conditions ◽

Necessary And Sufficient Conditions ◽

Polyhedral Model ◽

Parallel Language ◽

Data Parallel ◽

Recurrence Equations ◽

Multiple Assignment ◽

Necessary And Sufficient ◽

Memory Reuse

In the context of developing a compiler for a ALPHA, a functional data-parallel language based on systems of affine recurrence equations (SAREs), we address the problem of transforming scheduled single-assignment code to multiple assignment code. We show how the polyhedral model allows us to statically compute the lifetimes of program variables, and thus enables us to derive necessary and sufficient conditions for reusing memory.

Download Full-text

Implicitly threaded parallelism in Manticore

Journal of Functional Programming ◽

10.1017/s0956796810000201 ◽

2010 ◽

Vol 20 (5-6) ◽

pp. 537-576 ◽

Cited By ~ 29

Author(s):

MATTHEW FLUET ◽

MIKE RAINEY ◽

JOHN REPPY ◽

ADAM SHAW

Keyword(s):

Large Scale ◽

Multicore Processors ◽

Regular Structure ◽

Parallel Applications ◽

Parallel Language ◽

Fine Grained ◽

Data Parallel ◽

Parallel Languages ◽

Parallel Case ◽

High Level

AbstractThe increasing availability of commodity multicore processors is making parallel computing ever more widespread. In order to exploit its potential, programmers need languages that make the benefits of parallelism accessible and understandable. Previous parallel languages have traditionally been intended for large-scale scientific computing, and they tend not to be well suited to programming the applications one typically finds on a desktop system. Thus, we need new parallel-language designs that address a broader spectrum of applications. The Manticore project is our effort to address this need. At its core is Parallel ML, a high-level functional language for programming parallel applications on commodity multicore hardware. Parallel ML provides a diverse collection of parallel constructs for different granularities of work. In this paper, we focus on the implicitly threaded parallel constructs of the language, which support fine-grained parallelism. We concentrate on those elements that distinguish our design from related ones, namely, a novel parallel binding form, a nondeterministic parallel case form, and the treatment of exceptions in the presence of data parallelism. These features differentiate the present work from related work on functional data-parallel language designs, which have focused largely on parallel problems with regular structure and the compiler transformations—most notably, flattening—that make such designs feasible. We present detailed examples utilizing various mechanisms of the language and give a formal description of our implementation.

Download Full-text

Thermal-Scalable 3D Parallel-Heat-Sinking Integration Methodology: Key SoC Technology for Large-Scale Parallel Computing

Chinese Journal of Computers ◽

10.3724/sp.j.1016.2011.00717 ◽

2011 ◽

Vol 34 (4) ◽

pp. 717-728

Author(s):

Zu-Ying LUO ◽

Yin-He HAN ◽

Guo-Xing ZHAO ◽

Xian-Chuan YU ◽

Ming-Quan ZHOU

Keyword(s):

Parallel Computing ◽

Large Scale ◽

Heat Sinking

Download Full-text

Efficient Straggler Replication in Large-Scale Parallel Computing

ACM Transactions on Modeling and Performance Evaluation of Computing Systems ◽

10.1145/3310336 ◽

2019 ◽

Vol 4 (2) ◽

pp. 1-23 ◽

Cited By ~ 8

Author(s):

Da Wang ◽

Gauri Joshi ◽

Gregory W. Wornell

Keyword(s):

Parallel Computing ◽

Large Scale

Download Full-text

Multi-GPU approach to global induction of classification trees for large-scale data mining

Applied Intelligence ◽

10.1007/s10489-020-01952-5 ◽

2021 ◽

Author(s):

Krzysztof Jurczuk ◽

Marcin Czajkowski ◽

Marek Kretowski

Keyword(s):

Data Mining ◽

Large Scale ◽

Real Life ◽

Population Based ◽

Tree Structure ◽

Global Approach ◽

Data Parallel ◽

Large Scale Data ◽

The Impact ◽

Scale Data

AbstractThis paper concerns the evolutionary induction of decision trees (DT) for large-scale data. Such a global approach is one of the alternatives to the top-down inducers. It searches for the tree structure and tests simultaneously and thus gives improvements in the prediction and size of resulting classifiers in many situations. However, it is the population-based and iterative approach that can be too computationally demanding to apply for big data mining directly. The paper demonstrates that this barrier can be overcome by smart distributed/parallel processing. Moreover, we ask the question whether the global approach can truly compete with the greedy systems for large-scale data. For this purpose, we propose a novel multi-GPU approach. It incorporates the knowledge of global DT induction and evolutionary algorithm parallelization together with efficient utilization of memory and computing GPU’s resources. The searches for the tree structure and tests are performed simultaneously on a CPU, while the fitness calculations are delegated to GPUs. Data-parallel decomposition strategy and CUDA framework are applied. Experimental validation is performed on both artificial and real-life datasets. In both cases, the obtained acceleration is very satisfactory. The solution is able to process even billions of instances in a few hours on a single workstation equipped with 4 GPUs. The impact of data characteristics (size and dimension) on convergence and speedup of the evolutionary search is also shown. When the number of GPUs grows, nearly linear scalability is observed what suggests that data size boundaries for evolutionary DT mining are fading.

Download Full-text

Large-scale numerical simulation of laser propulsion by parallel computing

10.1117/12.2011249 ◽

2013 ◽

Author(s):

Yaoyuan Zeng ◽

Wentao Zhao ◽

Zhenghua Wang

Keyword(s):

Numerical Simulation ◽

Parallel Computing ◽

Large Scale ◽

Laser Propulsion

Download Full-text

A large scale, homogeneous, fully distributed parallel machine, I

10.1145/800255.810659 ◽

1977 ◽

Cited By ~ 215

Author(s):

Herbert Sullivan ◽

T R Bashkow

Keyword(s):

Large Scale ◽

Parallel Machine

Download Full-text

Application of High-Performance Computing to Numerical Simulation of Human Movement

Journal of Biomechanical Engineering ◽

10.1115/1.2792264 ◽

1995 ◽

Vol 117 (1) ◽

pp. 155-157 ◽

Cited By ~ 29

Author(s):

F. C. Anderson ◽

J. M. Ziegler ◽

M. G. Pandy ◽

R. T. Whalen

Keyword(s):

Computer Architecture ◽

High Performance ◽

Large Scale ◽

Optimization Problems ◽

Optimal Solution ◽

Human Movement ◽

Parallel Machine ◽

Optimal Controls ◽

Processing Machine ◽

Vector Processing

We have examined the feasibility of using massively-parallel and vector-processing supercomputers to solve large-scale optimization problems for human movement. Specifically, we compared the computational expense of determining the optimal controls for the single support phase of gait using a conventional serial machine (SGI Iris 4D25), a MIMD parallel machine (Intel iPSC/860), and a parallel-vector-processing machine (Cray Y-MP 8/864). With the human body modeled as a 14 degree-of-freedom linkage actuated by 46 musculotendinous units, computation of the optimal controls for gait could take up to 3 months of CPU time on the Iris. Both the Cray and the Intel are able to reduce this time to practical levels. The optimal solution for gait can be found with about 77 hours of CPU on the Cray and with about 88 hours of CPU on the Intel. Although the overall speeds of the Cray and the Intel were found to be similar, the unique capabilities of each machine are better suited to different portions of the computational algorithm used. The Intel was best suited to computing the derivatives of the performance criterion and the constraints whereas the Cray was best suited to parameter optimization of the controls. These results suggest that the ideal computer architecture for solving very large-scale optimal control problems is a hybrid system in which a vector-processing machine is integrated into the communication network of a MIMD parallel machine.

Download Full-text