Using parallel Batcher sort in Active Storage System

This paper describes a modified parallel Batcher sort algorithm for big data processing. The main novelty of implemented sort algorithm is to integrate effective parallel batcher sort and Active Storage concept. We use Active Storage based on Lustre File System and TSim C++ template library for parallelization. This paper presents experimental testing results for scientific processing real seismic data. Presented results indicate that described algorithm can reach linear acceleration on sorting big data sets (More then 100 Gb).

Download Full-text

Performance tests on merge sort and recursive merge sort for big data processing

Technical Sciences ◽

10.31648/ts.2714 ◽

2017 ◽

Vol 1 (21) ◽

pp. 19-35 ◽

Cited By ~ 2

Author(s):

Zbigniew Marszałek

Keyword(s):

Big Data ◽

Data Processing ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Performance Tests ◽

Merge Sort ◽

The Stability ◽

Search For Information ◽

Sort Algorithm

Merge sort algorithm is widely used in databases to organize and search for information. In the work the author describes some newly proposed not recursive version of the merge sort algorithm for large data sets. Tests of the algorithm confirm the effectiveness of the method and the stability of the proposed version.

Download Full-text

EA2S2: An Efficient Application-Aware Storage System for Big Data Processing in Heterogeneous Clusters

2017 26th International Conference on Computer Communication and Networks (ICCCN) ◽

10.1109/icccn.2017.8038371 ◽

2017 ◽

Cited By ~ 8

Author(s):

Teng Wang ◽

Jiayin Wang ◽

Son Nam Nguyen ◽

Zhengyu Yang ◽

Ningfang Mi ◽

...

Keyword(s):

Big Data ◽

Data Processing ◽

Storage System ◽

Heterogeneous Clusters ◽

Big Data Processing ◽

Application Aware

Download Full-text

Using Hadoop Technology to Overcome Big Data Problems by Choosing Proposed Cost-efficient Scheduler Algorithm for Heterogeneous Hadoop System (BD3)

Journal of Scientific Research and Reports ◽

10.9734/jsrr/2020/v26i930310 ◽

2020 ◽

pp. 58-84

Author(s):

Abou_el_ela Abdou Hussein

Keyword(s):

Big Data ◽

Data Processing ◽

Data Storage ◽

Database Management System ◽

Data Sets ◽

Complex Data ◽

Daily Data ◽

Complex Data Sets ◽

Cost Efficient ◽

Hadoop System

Day by day advanced web technologies have led to tremendous growth amount of daily data generated volumes. This mountain of huge and spread data sets leads to phenomenon that called big data which is a collection of massive, heterogeneous, unstructured, enormous and complex data sets. Big Data life cycle could be represented as, Collecting (capture), storing, distribute, manipulating, interpreting, analyzing, investigate and visualizing big data. Traditional techniques as Relational Database Management System (RDBMS) couldn’t handle big data because it has its own limitations, so Advancement in computing architecture is required to handle both the data storage requisites and the weighty processing needed to analyze huge volumes and variety of data economically. There are many technologies manipulating a big data, one of them is hadoop. Hadoop could be understand as an open source spread data processing that is one of the prominent and well known solutions to overcome handling big data problem. Apache Hadoop was based on Google File System and Map Reduce programming paradigm. Through this paper we dived to search for all big data characteristics starting from first three V's that have been extended during time through researches to be more than fifty six V's and making comparisons between researchers to reach to best representation and the precise clarification of all big data V’s characteristics. We highlight the challenges that face big data processing and how to overcome these challenges using Hadoop and its use in processing big data sets as a solution for resolving various problems in a distributed cloud based environment. This paper mainly focuses on different components of hadoop like Hive, Pig, and Hbase, etc. Also we institutes absolute description of Hadoop Pros and cons and improvements to face hadoop problems by choosing proposed Cost-efficient Scheduler Algorithm for heterogeneous Hadoop system.

Download Full-text

Image-guided raytracing and its applications

Geophysics ◽

10.1190/geo2020-0642.1 ◽

2021 ◽

pp. 1-44

Author(s):

Eduardo Silva ◽

Jessé Costa ◽

Jörg Schleicher

Keyword(s):

Data Processing ◽

Seismic Data ◽

Eikonal Equation ◽

Real Data ◽

Data Sets ◽

Seismic Data Processing ◽

Fast Marching ◽

Image Guided ◽

Ray Density ◽

Original Motivation

Eikonal solvers have found important applications in seismic data processing and in-version, the so-called image-guided methods. To this day in image-guided applications, thesolution of the eikonal equation is implemented using partial-differential-equationsolvers, such as fast-marching or fast-sweeping methods. We show that alternatively, onecan numerically integrate the dynamic Hamiltonian system defined by the image-guidedeikonal equation and reconstruct the solution with image-guided rays. We present interest-ing applications of image-guided raytracing to seismic data processing, demonstrating theuse of the resulting rays in image-guided interpolation and smoothing, well-log interpola-tion, image flattening, and residual-moveout picking. Some of these applications make useof properties of the raytracing system that are not directly obtained by eikonal solvers, suchas ray position, ray density, wavefront curvature, and ray curvature. These ray propertiesopen space for a different set of applications of the image-guided eikonal equation, beyondthe original motivation of accelerating the construction of minimum distance tables. Westress that image-guided raytracing is an embarrassingly parallel problem, which makes itsimplementation highly efficient on massively parallel platforms. Image-guided raytracing isadvantageous for most applications involving the tracking of seismic events and imaging-guided interpolation. Our numerical experiments using synthetic and real data sets showthe efficiency and robustness of image-guided rays for the selected applications.

Download Full-text

a\A technique on novel based marching ants colonies clusters for operational big data sets

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.12.11276 ◽

2018 ◽

Vol 7 (2.12) ◽

pp. 184

Author(s):

Konda Sreenu ◽

Dr Boddu Raja Srinivasa Reddy

Keyword(s):

Data Mining ◽

Big Data ◽

Everyday Life ◽

Large Time ◽

Storage System ◽

Data Sets ◽

World Data ◽

Data Mining Approach ◽

Output Information ◽

Aggregate Functions

Computer plays a key role in everywhere world. Data is growing along with the usage of computers. In everyday life we use computer for various purpose and store bulk of information. One or other way we want to retrieve data from the storage system. Retrieving bulk of data information is not a simple thing or it is magic show. Every user wants data in different forms like reports or output information. For doing all this exercises we require one process. Process is nothing but marching ants colonies. Data related databases and tables are collected, trivial data is selected from huge tables and databases, apply aggregate functions on data and output information or reports related to data. Paper focus on how efficiently we can use software for some extent on solving business related problems. Paper may not solve century year’s data but we can achieve something. When century years data it is better to go for data mining approach because it accumulates large time to solve such big problems

Download Full-text

A Benchmark for Suitability of Alluxio over Spark

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a8190.1110120 ◽

2020 ◽

Vol 10 (1) ◽

pp. 245-250

Keyword(s):

Big Data ◽

Data Processing ◽

Data Storage ◽

Storage Systems ◽

Distributed Storage ◽

Storage System ◽

Large Data ◽

Time Data ◽

Big Data Applications ◽

Access To Data

Big data applications play an important role in real time data processing. Apache Spark is a data processing framework with in-memory data engine that quickly processes large data sets. It can also distribute data processing tasks across multiple computers, either on its own or in tandem with other distributed computing tools. Spark’s in-memory processing cannot share data between the applications and hence, the RAM memory will be insufficient for storing petabytes of data. Alluxio is a virtual distributed storage system that leverages memory for data storage and provides faster access to data in different storage systems. Alluxio helps to speed up data intensive Spark applications, with various storage systems. In this work, the performance of applications on Spark as well as Spark running over Alluxio have been studied with respect to several storage formats such as Parquet, ORC, CSV, and JSON; and four types of queries from Star Schema Benchmark (SSB). A benchmark is evolved to suggest the suitability of Spark Alluxio combination for big data applications. It is found that Alluxio is suitable for applications that use databases of size more than 2.6 GB storing data in JSON and CSV formats. Spark is found suitable for applications that use storage formats such as parquet and ORC with database sizes less than 2.6GB.

Download Full-text

APPLYING ACTIVE STORAGE APPROACH FOR SEISMIC DATA PROCESSING TASKS

Bulletin of the South Ural State University Series Computational Mathematics and Software Engineering ◽

10.14529/cmse130110 ◽

2013 ◽

Vol 2 (1) ◽

Keyword(s):

Data Processing ◽

Seismic Data ◽

Seismic Data Processing ◽

Active Storage

Download Full-text

Abstract: Using Active Storage Concept for Seismic Data Processing

2012 SC Companion: High Performance Computing, Networking Storage and Analysis ◽

10.1109/sc.companion.2012.207 ◽

2012 ◽

Cited By ~ 3

Author(s):

Ekaterina Tyutlyaeva ◽

Evgeny Kurin ◽

Alexander Moskovsky ◽

Sergey Konuhov

Keyword(s):

Data Processing ◽

Seismic Data ◽

Seismic Data Processing ◽

Active Storage

Download Full-text

An approach to achieve high efficiency for large volume data processing using multiple clustering algorithms

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.5.25059 ◽

2018 ◽

Vol 7 (4.5) ◽

pp. 689

Author(s):

Sarada. B ◽

Vinayaka Murthy. M ◽

Udaya Rani. V

Keyword(s):

Big Data ◽

Data Processing ◽

Large Volume ◽

High Efficiency ◽

Clustering Algorithms ◽

Computational Time ◽

Sorting Algorithm ◽

Data Sets ◽

Volume Data ◽

Data Points

Now a days data is increasing exponentially daily in terms of velocity, variety and volume which is also known as Big data. When the dataset has small number of dimensions, limited number of clusters and less number of data points the existing traditional clustering al- gorithms will give the expected results. As we know this is the Big Data age, with large volume of data sets through the traditional clus- tering algorithms we will not be able to get expected results. So there is a need to develop a new approach which gives better accuracy and computational time for large volume of data processing. The Proposed new System Architecture is a combination of canopy, Kmeans and RK sorting algorithm through Map Reduce Hadoop frame work platform. The analysis shows that the large volume of data processing will take less computational time and higher accuracy, and the RK sorting does not require swapping of elements and stack spaces.

Download Full-text

Big Data Processing for Generic Clarification of Heterogeneous Images

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2020/v6i430167 ◽

2021 ◽

pp. 39-49

Author(s):

Olanrewaju E. Abikoye ◽

Y. O. Olaboye ◽

Abdullateef O. Alabi

Keyword(s):

Big Data ◽

Data Processing ◽

Image Data ◽

The Other ◽

Training Dataset ◽

Data Sets ◽

Fast Method ◽

Comparison Analysis ◽

Big Data Processing ◽

Time And Energy

Most industries around the globe make use of image processing to improve their productions. On the other hand Big Data Processing is a big dataset; this required fast method to processing irrespective of Generic nature, therefore Clarification of heterogeneous images can improve the integrity of any system design. To avoid waste of time and energy, it is necessary to classify images. Big Data Processing for Generic Clarification of heterogeneous images provides fast, accurate and objectives results. In this study, the researchers classified into three category using resnet50 techniques for training dataset images. The outcome of the research is analyzing these techniques and comparison analysis on different existing image data sets as pre-trained data and test data as sample images for decision making based on their limitations and strengths.

Download Full-text