Using parallel Batcher sort in Active Storage System

10.12737/2420 ◽  
2013 ◽  
Vol 4 (4) ◽  
pp. 127-142
Author(s):  
Екатерина Тютляева ◽  
Ekaterina Tyutlyaeva

This paper describes a modified parallel Batcher sort algorithm for big data processing. The main novelty of implemented sort algorithm is to integrate effective parallel batcher sort and Active Storage concept. We use Active Storage based on Lustre File System and TSim C++ template library for parallelization. This paper presents experimental testing results for scientific processing real seismic data. Presented results indicate that described algorithm can reach linear acceleration on sorting big data sets (More then 100 Gb).

2017 ◽  
Vol 1 (21) ◽  
pp. 19-35 ◽  
Author(s):  
Zbigniew Marszałek

Merge sort algorithm is widely used in databases to organize and search for information. In the work the author describes some newly proposed not recursive version of the merge sort algorithm for large data sets. Tests of the algorithm confirm the effectiveness of the method and the stability of the proposed version.


Author(s):  
Abou_el_ela Abdou Hussein

Day by day advanced web technologies have led to tremendous growth amount of daily data generated volumes. This mountain of huge and spread data sets leads to phenomenon that called big data which is a collection of massive, heterogeneous, unstructured, enormous and complex data sets. Big Data life cycle could be represented as, Collecting (capture), storing, distribute, manipulating, interpreting, analyzing, investigate and visualizing big data. Traditional techniques as Relational Database Management System (RDBMS) couldn’t handle big data because it has its own limitations, so Advancement in computing architecture is required to handle both the data storage requisites and the weighty processing needed to analyze huge volumes and variety of data economically. There are many technologies manipulating a big data, one of them is hadoop. Hadoop could be understand as an open source spread data processing that is one of the prominent and well known solutions to overcome handling big data problem. Apache Hadoop was based on Google File System and Map Reduce programming paradigm. Through this paper we dived to search for all big data characteristics starting from first three V's that have been extended during time through researches to be more than fifty six V's and making comparisons between researchers to reach to best representation and the precise clarification of all big data V’s characteristics. We highlight the challenges that face big data processing and how to overcome these challenges using Hadoop and its use in processing big data sets as a solution for resolving various problems in a distributed cloud based environment. This paper mainly focuses on different components of hadoop like Hive, Pig, and Hbase, etc. Also we institutes absolute description of Hadoop Pros and cons and improvements to face hadoop problems by choosing proposed Cost-efficient Scheduler Algorithm for heterogeneous Hadoop system.


Geophysics ◽  
2021 ◽  
pp. 1-44
Author(s):  
Eduardo Silva ◽  
Jessé Costa ◽  
Jörg Schleicher

Eikonal solvers have found important applications in seismic data processing and in-version, the so-called image-guided methods. To this day in image-guided applications, thesolution of the eikonal equation is implemented using partial-differential-equationsolvers, such as fast-marching or fast-sweeping methods. We show that alternatively, onecan numerically integrate the dynamic Hamiltonian system defined by the image-guidedeikonal equation and reconstruct the solution with image-guided rays. We present interest-ing applications of image-guided raytracing to seismic data processing, demonstrating theuse of the resulting rays in image-guided interpolation and smoothing, well-log interpola-tion, image flattening, and residual-moveout picking. Some of these applications make useof properties of the raytracing system that are not directly obtained by eikonal solvers, suchas ray position, ray density, wavefront curvature, and ray curvature. These ray propertiesopen space for a different set of applications of the image-guided eikonal equation, beyondthe original motivation of accelerating the construction of minimum distance tables. Westress that image-guided raytracing is an embarrassingly parallel problem, which makes itsimplementation highly efficient on massively parallel platforms. Image-guided raytracing isadvantageous for most applications involving the tracking of seismic events and imaging-guided interpolation. Our numerical experiments using synthetic and real data sets showthe efficiency and robustness of image-guided rays for the selected applications.


2018 ◽  
Vol 7 (2.12) ◽  
pp. 184
Author(s):  
Konda Sreenu ◽  
Dr Boddu Raja Srinivasa Reddy

Computer plays a key role in everywhere world. Data is growing along with the usage of computers. In everyday life we use computer for various purpose and store bulk of information. One or other way we want to retrieve data from the storage system. Retrieving bulk of data information is not a simple thing or it is magic show. Every user wants data in different forms like reports or output information. For doing all this exercises we require one process. Process is nothing but marching ants colonies. Data related databases and tables are collected, trivial data is selected from huge tables and databases, apply aggregate functions on data and output information or reports related to data. Paper focus on how efficiently we can use software for some extent on solving business related problems. Paper may not solve century year’s data but we can achieve something. When century years data it is better to go for data mining approach because it accumulates large time to solve such big problems 


Big data applications play an important role in real time data processing. Apache Spark is a data processing framework with in-memory data engine that quickly processes large data sets. It can also distribute data processing tasks across multiple computers, either on its own or in tandem with other distributed computing tools. Spark’s in-memory processing cannot share data between the applications and hence, the RAM memory will be insufficient for storing petabytes of data. Alluxio is a virtual distributed storage system that leverages memory for data storage and provides faster access to data in different storage systems. Alluxio helps to speed up data intensive Spark applications, with various storage systems. In this work, the performance of applications on Spark as well as Spark running over Alluxio have been studied with respect to several storage formats such as Parquet, ORC, CSV, and JSON; and four types of queries from Star Schema Benchmark (SSB). A benchmark is evolved to suggest the suitability of Spark Alluxio combination for big data applications. It is found that Alluxio is suitable for applications that use databases of size more than 2.6 GB storing data in JSON and CSV formats. Spark is found suitable for applications that use storage formats such as parquet and ORC with database sizes less than 2.6GB.


2018 ◽  
Vol 7 (4.5) ◽  
pp. 689
Author(s):  
Sarada. B ◽  
Vinayaka Murthy. M ◽  
Udaya Rani. V

Now a days data is increasing exponentially daily in terms of velocity, variety and volume which is also known as Big data. When the dataset has small number of dimensions, limited number of clusters and less number of data points the existing traditional clustering al- gorithms will give the expected results. As we know this is the Big Data age, with large volume of data sets through the traditional clus- tering algorithms we will not be able to get expected results. So there is a need to develop a new approach which gives better accuracy and computational time for large volume of data processing. The Proposed new System Architecture is a combination of canopy, Kmeans and RK sorting algorithm through Map Reduce Hadoop frame work platform. The analysis shows that the large volume of data processing will take less computational time and higher accuracy, and the RK sorting does not require swapping of elements and stack spaces. 


Author(s):  
Olanrewaju E. Abikoye ◽  
Y. O. Olaboye ◽  
Abdullateef O. Alabi

Most industries around the globe make use of image processing to improve their productions. On the other hand Big Data Processing is a big dataset; this required fast method to processing irrespective of Generic nature, therefore Clarification of heterogeneous images can improve the integrity of any system design. To avoid waste of time and energy, it is necessary to classify images. Big Data Processing for Generic Clarification of heterogeneous images provides fast, accurate and objectives results. In this study, the researchers classified into three category using resnet50 techniques for training dataset images. The outcome of the research is analyzing these techniques and comparison analysis on different existing image data sets as pre-trained data and test data as sample images for decision making based on their limitations and strengths.


Sign in / Sign up

Export Citation Format

Share Document