A Taxonomy and Survey of Data Partitioning Algorithms for Big Data Distributed Systems

Author(s):  
Quadri Waseem ◽  
Mohd Aizaini Maarof ◽  
Mohd Yazid Idris ◽  
Amril Nazir

Big data is traditionally associated with distributed systems and this is understandable given that the volume dimension of Big Data appears to be best accommodated by the continuous addition of resources over a distributed network rather than the continuous upgrade of a central storage resource. Based on this implementation context, non- distributed relational database models are considered volume-inefficient and a departure from their usage contemplated by the database community. Distributed systems depend on data partitioning to determine chunks of related data and where in storage they can be accommodated. In existing Database Management Systems (DBMS), data partitioning is automated which in the opinion of this paper does not give the best results since partitioning is an NP-hard problem in terms of algorithmic time complexity. The NP-hardness is shown to be reduced by a partitioning strategy that relies on the discretion of the programmer which is more effective and flexible though requires extra coding effort. NP-hard problems are solved more effectively by a combination of discretion rather than full automation. In this paper, the partitioning process is reviewed and a programmer-based partitioning strategy implemented for an application with a relational DBMS backend. By doing this, the relational DBMS is made adaptive in the volume dimension of big data. The ACID properties (atomicity, consistency, isolation, and durability) of the relational database model which constitutes a major attraction especially for applications that process transactions is thus harnessed. On a more general note, the results of this research suggest that databases can be made adaptive in the areas of their weaknesses as a one-size-fits- all database management system may no longer be feasible.


2015 ◽  
Vol 28 (8) ◽  
pp. 2440-2456 ◽  
Author(s):  
Maeva Antoine ◽  
Laurent Pellegrino ◽  
Fabrice Huet ◽  
Françoise Baude

Author(s):  
Marcelo Paiva Ramos ◽  
Paulo Marcelo Tasinaffo ◽  
Eugenio Sper de Almeida ◽  
Luis Marcelo Achite ◽  
Adilson Marques da Cunha ◽  
...  
Keyword(s):  
Big Data ◽  

Sensors ◽  
2019 ◽  
Vol 19 (15) ◽  
pp. 3438 ◽  
Author(s):  
Xia ◽  
Huang ◽  
Li ◽  
Zhou ◽  
Zhang

Remote sensing big data (RSBD) is generally characterized by huge volumes, diversity, and high dimensionality. Mining hidden information from RSBD for different applications imposes significant computational challenges. Clustering is an important data mining technique widely used in processing and analyzing remote sensing imagery. However, conventional clustering algorithms are designed for relatively small datasets. When applied to problems with RSBD, they are, in general, too slow or inefficient for practical use. In this paper, we proposed a parallel subsampling-based clustering (PARSUC) method for improving the performance of RSBD clustering in terms of both efficiency and accuracy. PARSUC leverages a novel subsampling-based data partitioning (SubDP) method to realize three-step parallel clustering, effectively solving the notable performance bottleneck of the existing parallel clustering algorithms; that is, they must cope with numerous repeated calculations to get a reasonable result. Furthermore, we propose a centroid filtering algorithm (CFA) to eliminate subsampling errors and to guarantee the accuracy of the clustering results. PARSUC was implemented on a Hadoop platform by using the MapReduce parallel model. Experiments conducted on massive remote sensing imageries with different sizes showed that PARSUC (1) provided much better accuracy than conventional remote sensing clustering algorithms in handling larger image data; (2) achieved notable scalability with increased computing nodes added; and (3) spent much less time than the existing parallel clustering algorithm in handling RSBD.


2013 ◽  
Vol 41 (3) ◽  
pp. 249-260 ◽  
Author(s):  
Lisa Wu ◽  
Raymond J. Barker ◽  
Martha A. Kim ◽  
Kenneth A. Ross

Sign in / Sign up

Export Citation Format

Share Document