A FUNCTIONAL LANGUAGE FOR DEPARTMENTAL METACOMPUTING

2005 ◽  
Vol 15 (03) ◽  
pp. 289-304 ◽  
Author(s):  
FRÉDÉRIC GAVA ◽  
FRÉDÉRIC LOULERGUE

We have designed a functional data-parallel language called BSML for programming bulk synchronous parallel (BSP) algorithms. Deadlocks and indeterminism are avoided and the execution time can be then estimated. For very large scale applications more than one parallel machine could be needed. One speaks about metacomputing. A major problem in programming application for such architectures is their hierarchical network structures: latency and bandwidth of the network between parallel nodes could be orders of magnitude worse than those inside a parallel node. Here we consider how to extend both the BSP model and BSML, well-suited for parallel computing, in order to obtain a model and a functional language suitable for metacomputing.

2008 ◽  
Vol 18 (01) ◽  
pp. 39-53 ◽  
Author(s):  
FRÉDÉRIC GAVA

A functional data-parallel language called BSML has been designed for programming Bulk-Synchronous Parallel algorithms. Many sequential algorithms do not have parallel counterparts and many non-computer science researchers do not want to deal with parallel programming. In sequential programming environments, common data structures are often provided through reusable libraries to simplify the development of applications. A parallel representation of such data structures is thus a solution for writing parallel programs without suffering from disadvantages of all the features of a parallel language. In this paper we describe a modular implementation in BSML of some data structures and show how those data types can address the needs of many potential users of parallel machines who have so far been deterred by the complexity of parallelizing code.


2015 ◽  
Vol 2015 ◽  
pp. 1-20 ◽  
Author(s):  
Xiaowen Chen ◽  
Zhonghai Lu ◽  
Axel Jantsch ◽  
Shuming Chen ◽  
Yang Guo ◽  
...  

On-chip computing platforms are evolving from single-core bus-based systems to many-core network-based systems, which are referred to asOn-chip Large-scale Parallel Computing Architectures (OLPCs)in the paper. Homogenous OLPCs feature strong regularity and scalability due to its identical cores and routers. Data-parallel applications have their parallel data subsets that are handled individually by the same program running in different cores. Therefore, data-parallel applications are able to obtain good speedup in homogenous OLPCs. The paper addresses modeling the speedup performance of homogeneous OLPCs for data-parallel applications. When establishing the speedup performance model, the network communication latency and the ways of storing data of data-parallel applications are modeled and analyzed in detail. Two abstract concepts (equivalent serial packet and equivalent serial communication) are proposed to construct the network communication latency model. The uniform and hotspot traffic models are adopted to reflect the ways of storing data. Some useful suggestions are presented during the performance model’s analysis. Finally, three data-parallel applications are performed on our cycle-accurate homogenous OLPC experimental platform to validate the analytic results and demonstrate that our study provides a feasible way to estimate and evaluate the performance of data-parallel applications onto homogenous OLPCs.


1997 ◽  
Vol 07 (02) ◽  
pp. 203-215 ◽  
Author(s):  
D. Wilde ◽  
S. Rajopadhye

In the context of developing a compiler for a ALPHA, a functional data-parallel language based on systems of affine recurrence equations (SAREs), we address the problem of transforming scheduled single-assignment code to multiple assignment code. We show how the polyhedral model allows us to statically compute the lifetimes of program variables, and thus enables us to derive necessary and sufficient conditions for reusing memory.


2010 ◽  
Vol 20 (5-6) ◽  
pp. 537-576 ◽  
Author(s):  
MATTHEW FLUET ◽  
MIKE RAINEY ◽  
JOHN REPPY ◽  
ADAM SHAW

AbstractThe increasing availability of commodity multicore processors is making parallel computing ever more widespread. In order to exploit its potential, programmers need languages that make the benefits of parallelism accessible and understandable. Previous parallel languages have traditionally been intended for large-scale scientific computing, and they tend not to be well suited to programming the applications one typically finds on a desktop system. Thus, we need new parallel-language designs that address a broader spectrum of applications. The Manticore project is our effort to address this need. At its core is Parallel ML, a high-level functional language for programming parallel applications on commodity multicore hardware. Parallel ML provides a diverse collection of parallel constructs for different granularities of work. In this paper, we focus on the implicitly threaded parallel constructs of the language, which support fine-grained parallelism. We concentrate on those elements that distinguish our design from related ones, namely, a novel parallel binding form, a nondeterministic parallel case form, and the treatment of exceptions in the presence of data parallelism. These features differentiate the present work from related work on functional data-parallel language designs, which have focused largely on parallel problems with regular structure and the compiler transformations—most notably, flattening—that make such designs feasible. We present detailed examples utilizing various mechanisms of the language and give a formal description of our implementation.


2011 ◽  
Vol 34 (4) ◽  
pp. 717-728
Author(s):  
Zu-Ying LUO ◽  
Yin-He HAN ◽  
Guo-Xing ZHAO ◽  
Xian-Chuan YU ◽  
Ming-Quan ZHOU

Author(s):  
Krzysztof Jurczuk ◽  
Marcin Czajkowski ◽  
Marek Kretowski

AbstractThis paper concerns the evolutionary induction of decision trees (DT) for large-scale data. Such a global approach is one of the alternatives to the top-down inducers. It searches for the tree structure and tests simultaneously and thus gives improvements in the prediction and size of resulting classifiers in many situations. However, it is the population-based and iterative approach that can be too computationally demanding to apply for big data mining directly. The paper demonstrates that this barrier can be overcome by smart distributed/parallel processing. Moreover, we ask the question whether the global approach can truly compete with the greedy systems for large-scale data. For this purpose, we propose a novel multi-GPU approach. It incorporates the knowledge of global DT induction and evolutionary algorithm parallelization together with efficient utilization of memory and computing GPU’s resources. The searches for the tree structure and tests are performed simultaneously on a CPU, while the fitness calculations are delegated to GPUs. Data-parallel decomposition strategy and CUDA framework are applied. Experimental validation is performed on both artificial and real-life datasets. In both cases, the obtained acceleration is very satisfactory. The solution is able to process even billions of instances in a few hours on a single workstation equipped with 4 GPUs. The impact of data characteristics (size and dimension) on convergence and speedup of the evolutionary search is also shown. When the number of GPUs grows, nearly linear scalability is observed what suggests that data size boundaries for evolutionary DT mining are fading.


1995 ◽  
Vol 117 (1) ◽  
pp. 155-157 ◽  
Author(s):  
F. C. Anderson ◽  
J. M. Ziegler ◽  
M. G. Pandy ◽  
R. T. Whalen

We have examined the feasibility of using massively-parallel and vector-processing supercomputers to solve large-scale optimization problems for human movement. Specifically, we compared the computational expense of determining the optimal controls for the single support phase of gait using a conventional serial machine (SGI Iris 4D25), a MIMD parallel machine (Intel iPSC/860), and a parallel-vector-processing machine (Cray Y-MP 8/864). With the human body modeled as a 14 degree-of-freedom linkage actuated by 46 musculotendinous units, computation of the optimal controls for gait could take up to 3 months of CPU time on the Iris. Both the Cray and the Intel are able to reduce this time to practical levels. The optimal solution for gait can be found with about 77 hours of CPU on the Cray and with about 88 hours of CPU on the Intel. Although the overall speeds of the Cray and the Intel were found to be similar, the unique capabilities of each machine are better suited to different portions of the computational algorithm used. The Intel was best suited to computing the derivatives of the performance criterion and the constraints whereas the Cray was best suited to parameter optimization of the controls. These results suggest that the ideal computer architecture for solving very large-scale optimal control problems is a hybrid system in which a vector-processing machine is integrated into the communication network of a MIMD parallel machine.


Sign in / Sign up

Export Citation Format

Share Document