Snow: A Parallel Computing Framework for the R System

2008 ◽  
Vol 37 (1) ◽  
pp. 78-90 ◽  
Author(s):  
Luke Tierney ◽  
A. J. Rossini ◽  
Na Li
2013 ◽  
Vol 753-755 ◽  
pp. 3018-3024 ◽  
Author(s):  
Fen Gyu Yang ◽  
Ying Chen ◽  
Ye Zhang

As increasing data have been collected in many applications, we have to face with millions of data in record linkage. With respect to traditional methods, there comes out a big challenge in performance while dealing with massive data. Parallel computing framework, such as MapReduce, has become an efficient and practical way to address this problem. In this paper, we propose a practical 3-phase MapReduce approach that fulfills blocking, filtering, and linking in 3 consecutive processes on Hadoop cluster. Experiments show that our approach functions efficiently and effectively with keeping high recall in contrast to tradition method.


2019 ◽  
Vol 8 (3) ◽  
pp. 103
Author(s):  
Zhigang Han ◽  
Fen Qin ◽  
Caihui Cui ◽  
Yannan Liu ◽  
Lingling Wang ◽  
...  

A soil erosion model is used to evaluate the conditions of soil erosion and guide agricultural production. Recently, high spatial resolution data have been collected in new ways, such as three-dimensional laser scanning, providing the foundation for refined soil erosion modelling. However, serial computing cannot fully meet the computational requirements of massive data sets. Therefore, it is necessary to perform soil erosion modelling under a parallel computing framework. This paper focuses on a parallel computing framework for soil erosion modelling based on the Hadoop platform. The framework includes three layers: the methodology, algorithm, and application layers. In the methodology layer, two types of parallel strategies for data splitting are defined as row-oriented and sub-basin-oriented methods. The algorithms for six parallel calculation operators for local, focal and zonal computing tasks are designed in detail. These operators can be called to calculate the model factors and perform model calculations. We defined the key-value data structure of GeoCSV format for vector, row-based and cell-based rasters as the inputs for the algorithms. A geoprocessing toolbox is developed and integrated with the geographic information system (GIS) platform in the application layer. The performance of the framework is examined by taking the Gushanchuan basin as an example. The results show that the framework can perform calculations involving large data sets with high computational efficiency and GIS integration. This approach is easy to extend and use and provides essential support for applying high-precision data to refine soil erosion modelling.


Sign in / Sign up

Export Citation Format

Share Document