Efficient data distribution and results merging for parallel data clustering in mapreduce environment

2017 ◽  
Vol 48 (8) ◽  
pp. 2408-2428
Author(s):  
Abdelhak Bousbaci ◽  
Nadjet Kamel
2014 ◽  
Vol 102 (1) ◽  
pp. 17-26 ◽  
Author(s):  
Baltescu Paul ◽  
Blunsom Phil

Abstract Hierarchical phrase-based machine translation systems rely on the synchronous context free grammar formalism to learn and use translation rules containing gaps. The grammars learned by such systems become unmanageably large even for medium sized parallel corpora. The traditional approach of preprocessing the training data and loading all possible translation rules into memory does not scale well for hierarchical phrase-based systems. Online grammar extractors address this problem by constructing memory efficient data structures on top of the source sideof the parallel data (often based on suffix arrays), which are usedto efficiently match phrases in the corpus and to extract translation rules on the fly during decoding. This paper describes an open source implementation of an online synchronous context free grammar extractor. Our approach builds on the work of Lopez (2008a) and introduces a new technique for extending the lists of phrase matches for phrases containing gaps that reduces the extraction time by a factor of 4. Our extractor is available as part of the cdec toolkit1 (Dyer et al., 2010).


Author(s):  
Raquel Almeida ◽  
Jorge Vieira ◽  
Marco Vieira ◽  
Henrique Madeira ◽  
Jorge Bernardino

2019 ◽  
Author(s):  
Lars Nerger ◽  
Qi Tang ◽  
Longjiang Mu

Abstract. Data assimilation integrates information from observational measurements with numerical models. When used with coupled models of Earth system compartments, e.g. the atmosphere and the ocean, consistent joint states can be estimated. A common approach for data assimilation are ensemble-based methods which use an ensemble of state realizations to estimate the state and its uncertainty. These methods are far more costly to compute than a single coupled model because of the required integration of the ensemble. However, with uncoupled models, the methods also have been shown to exhibit a particularly good scaling behavior. This study discusses an approach to augment a coupled model with data assimilation functionality provided by the Parallel Data Assimilation Framework (PDAF). Using only minimal changes in the codes of the different compartment models, a particularly efficient data assimilation system is generated that utilizes parallelization and in-memory data transfers between the models and the data assimilation functions and hence avoids most of the filter reading and writing and also model restarts during the data assimilation process. The study explains the required modifications of the programs on the example of the coupled atmosphere-sea ice-ocean model AWI-CM. Using the case of the assimilation of oceanic observations shows that the data assimilation leads only small overheads in computing time of about 15 % compared to the model without data assimilation and a very good parallel scalability. The model-agnostic structure of the assimilation software ensures a separation of concerns in that the development of data assimilation methods and be separated from the model application.


Sign in / Sign up

Export Citation Format

Share Document