Evolution Mining: A Novel Accelerated Framework for Large Data processing in the big Data Paradigm

The growth rate of these enterprises has increased significantly in the last decade. Research has shown that over the past two decades, the amount of data has increased approximately tenfold every two years - this exceeded Moore's Law, which doubles the power of processors. About thirty thousand gigabytes of data are accumulated every second, and their processing requires an increase in the efficiency of data processing. Uploading videos, photos and letters from users on social networks leads to the accumulation of a large amount of data, including unstructured ones. This leads to the need for enterprises to work with big data of different formats, which must be prepared in a certain way for further work in order to obtain the results of modeling and calculations. In connection with the above, the research carried out in the article on processing and storing large data of an enterprise, developing a model and algorithms, as well as using new technologies is relevant. Undoubtedly, every year the information flows of enterprises will increase and in this regard, it is important to solve the issues of storing and processing large amounts of data. The relevance of the article is due to the growing digitalization, the increasing transition to professional activities online in many areas of modern society. The article provides a detailed analysis and research of these new technologies.

Download Full-text

Attributes Reduction in Big Data

Applied Sciences ◽

10.3390/app10144901 ◽

2020 ◽

Vol 10 (14) ◽

pp. 4901

Author(s):

Waleed Albattah ◽

Rehan Ullah Khan ◽

Khalil Khan

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Processing ◽

Data Analytics ◽

Detection Efficiency ◽

Large Data ◽

Model Learning ◽

Big Data Processing ◽

Performance Loss ◽

Points Of View

Processing big data requires serious computing resources. Because of this challenge, big data processing is an issue not only for algorithms but also for computing resources. This article analyzes a large amount of data from different points of view. One perspective is the processing of reduced collections of big data with less computing resources. Therefore, the study analyzed 40 GB data to test various strategies to reduce data processing. Thus, the goal is to reduce this data, but not to compromise on the detection and model learning in machine learning. Several alternatives were analyzed, and it is found that in many cases and types of settings, data can be reduced to some extent without compromising detection efficiency. Tests of 200 attributes showed that with a performance loss of only 4%, more than 80% of the data could be ignored. The results found in the study, thus provide useful insights into large data analytics.

Download Full-text

First of All, Understand Data Analytics Context and Changes

Big Data Analytics for Entrepreneurial Success - Advances in Business Information Systems and Analytics ◽

10.4018/978-1-5225-7609-9.ch004 ◽

2019 ◽

pp. 92-124

Keyword(s):

Big Data ◽

Data Analysis ◽

Data Processing ◽

Large Data ◽

Big Data Analysis ◽

Raw Material ◽

Use Of Data ◽

Broad Term ◽

Major Turning Point ◽

Data Collections

Big data marks a major turning point in the use of data and is a powerful vehicle for growth and profitability. A comprehensive understanding of a company's data, its potential can be a new vector for performance. It must be recognized that without an adequate analysis, our data are just an unusable raw material. In this context, the traditional data processing tools cannot support such an explosion of volume. They cannot respond to new needs in a timely manner and at a reasonable cost. Big data is a broad term generally referring to very large data collections that impose complications on analytics tools for harnessing and managing such. This chapter details what big data analysis is. It presents the development of its applications. It is interested in the important changes that have touched the analytics context.

Download Full-text

BREEDING AND GENETICS SYMPOSIUM: Really big data: Processing and analysis of very large data sets1

Journal of Animal Science ◽

10.2527/jas.2011-4584 ◽

2012 ◽

Vol 90 (3) ◽

pp. 723-733 ◽

Cited By ~ 19

Author(s):

J. B. Cole ◽

S. Newman ◽

F. Foertter ◽

I. Aguilar ◽

M. Coffey

Keyword(s):

Big Data ◽

Data Processing ◽

Large Data ◽

Big Data Processing ◽

Breeding And Genetics

Download Full-text

Cloud Computing and Its Application in Big Data Processing of Distance Higher Education

International Journal of Emerging Technologies in Learning (iJET) ◽

10.3991/ijet.v10i8.5280 ◽

2015 ◽

Vol 10 (8) ◽

pp. 55 ◽

Cited By ~ 4

Author(s):

Guolei Zhang ◽

Jia Li ◽

Li Hao

Keyword(s):

Higher Education ◽

Cloud Computing ◽

Distance Education ◽

Big Data ◽

Data Processing ◽

Clustering Algorithm ◽

Evaluation Method ◽

Low Cost ◽

Science And Technology ◽

Large Data

In the development of information technology the development of scientific theory has brought the progress of science and technology. The progress of science and technology has an impact on the educational field, which changes the way of education. The arrival of the era of big data for the promotion and dissemination of educational resources has played an important role, it makes more and more people benefit. Modern distance education relies on the background of big data and cloud computing, which is composed of a series of tools to support a variety of teaching mode. Clustering algorithm can provide an effective evaluation method for students' personality characteristics and learning status in distance education. However, the traditional K-means clustering algorithm has the characteristics of randomness, uncertainty, high time complexity, and it does not meet the requirements of large data processing. In this paper, we study the parallel K-means clustering algorithm based on cloud computing platform Hadoop, and give the design and strategy of the algorithm. Then, we carry out experiments on several different sizes of data sets, and compare the performance of the proposed method with the general clustering method. Experimental results show that the proposed algorithm which is accelerated has good speed up and low cost. It is suitable for the analysis and mining of large data in the distance higher education.

Download Full-text

A Benchmark for Suitability of Alluxio over Spark

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a8190.1110120 ◽

2020 ◽

Vol 10 (1) ◽

pp. 245-250

Keyword(s):

Big Data ◽

Data Processing ◽

Data Storage ◽

Storage Systems ◽

Distributed Storage ◽

Storage System ◽

Large Data ◽

Time Data ◽

Big Data Applications ◽

Access To Data

Big data applications play an important role in real time data processing. Apache Spark is a data processing framework with in-memory data engine that quickly processes large data sets. It can also distribute data processing tasks across multiple computers, either on its own or in tandem with other distributed computing tools. Spark’s in-memory processing cannot share data between the applications and hence, the RAM memory will be insufficient for storing petabytes of data. Alluxio is a virtual distributed storage system that leverages memory for data storage and provides faster access to data in different storage systems. Alluxio helps to speed up data intensive Spark applications, with various storage systems. In this work, the performance of applications on Spark as well as Spark running over Alluxio have been studied with respect to several storage formats such as Parquet, ORC, CSV, and JSON; and four types of queries from Star Schema Benchmark (SSB). A benchmark is evolved to suggest the suitability of Spark Alluxio combination for big data applications. It is found that Alluxio is suitable for applications that use databases of size more than 2.6 GB storing data in JSON and CSV formats. Spark is found suitable for applications that use storage formats such as parquet and ORC with database sizes less than 2.6GB.

Download Full-text

Performance tests on merge sort and recursive merge sort for big data processing

Technical Sciences ◽

10.31648/ts.2714 ◽

2017 ◽

Vol 1 (21) ◽

pp. 19-35 ◽

Cited By ~ 2

Author(s):

Zbigniew Marszałek

Keyword(s):

Big Data ◽

Data Processing ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Performance Tests ◽

Merge Sort ◽

The Stability ◽

Search For Information ◽

Sort Algorithm

Merge sort algorithm is widely used in databases to organize and search for information. In the work the author describes some newly proposed not recursive version of the merge sort algorithm for large data sets. Tests of the algorithm confirm the effectiveness of the method and the stability of the proposed version.

Download Full-text

Scalable and Flexible Big Data Analytic Framework (SFBAF) For Big Data Processing and Knowledge Extraction

International Conference on Engineering Technologies and Big Data Analytics (ETBDA’2016) Jan. 21-22, 2016 Bangkok (Thailand) ◽

10.15242/iie.e0116024 ◽

2016 ◽

Cited By ~ 1

Keyword(s):

Big Data ◽

Data Processing ◽

Knowledge Extraction ◽

Analytic Framework ◽

Big Data Processing ◽

Data Analytic

Download Full-text

Load Balancing in Big Data Processing Systems

International Review of Automatic Control (IREACO) ◽

10.15866/ireaco.v12i1.16808 ◽

2019 ◽

Vol 12 (1) ◽

pp. 42 ◽

Cited By ~ 1

Author(s):

Andrey I. Vlasov ◽

Konstantin A. Muraviev ◽

Alexandra A. Prudius ◽

Demid A. Uzenkov

Keyword(s):

Big Data ◽

Load Balancing ◽

Data Processing ◽

Big Data Processing

Download Full-text

Analytical Study on Big Data

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v8i5.668 ◽

2018 ◽

Vol 8 (5) ◽

pp. 75

Author(s):

Vivek Raich ◽

Pankaj Maurya

Keyword(s):

Information Technology ◽

Big Data ◽

Decision Maker ◽

Analytical Study ◽

Large Data ◽

Decision Makers ◽

Continuous Increase ◽

Analytic Methods ◽

Data Store ◽

Business Engineering

in the time of the Information Technology, the big data store is going on. Due to which, Huge amounts of data are available for decision makers, and this has resulted in the progress of information technology and its wide growth in many areas of business, engineering, medical, and scientific studies. Big data means that the size which is bigger in size, but there are several types, which are not easy to handle, technology is required to handle it. Due to continuous increase in the data in this way, it is important to study and manage these datasets by adjusting the requirements so that the necessary information can be obtained.The aim of this paper is to analyze some of the analytic methods and tools. Which can be applied to large data. In addition, the application of Big Data has been analyzed, using the Decision Maker working on big data and using enlightened information for different applications.

Download Full-text