Distributed data-parallel computing using a high-level programming language

Author(s):  
Michael Isard ◽  
Yuan Yu
2013 ◽  
Vol 753-755 ◽  
pp. 3018-3024 ◽  
Author(s):  
Fen Gyu Yang ◽  
Ying Chen ◽  
Ye Zhang

As increasing data have been collected in many applications, we have to face with millions of data in record linkage. With respect to traditional methods, there comes out a big challenge in performance while dealing with massive data. Parallel computing framework, such as MapReduce, has become an efficient and practical way to address this problem. In this paper, we propose a practical 3-phase MapReduce approach that fulfills blocking, filtering, and linking in 3 consecutive processes on Hadoop cluster. Experiments show that our approach functions efficiently and effectively with keeping high recall in contrast to tradition method.


2003 ◽  
Vol 13 (03) ◽  
pp. 473-484 ◽  
Author(s):  
KONRAD HINSEN

One of the main obstacles to a more widespread use of parallel computing in computational science is the difficulty of implementing, testing, and maintaining parallel programs. The combination of a simple parallel computation model, BSP, and a high-level programming language, Python, simplifies these tasks significantly. It allows the rapid development facilities of Python to be applied to parallel programs, providing interactive development as well as interactive debugging of parallel programs.


Sign in / Sign up

Export Citation Format

Share Document