scholarly journals Evolution Mining: A Novel Accelerated Framework for Large Data processing in the big Data Paradigm

Author(s):  
Mrs.M. Rajalakshmi
Keyword(s):  
Big Data ◽  
2021 ◽  
Vol 75 (3) ◽  
pp. 76-82
Author(s):  
G.T. Balakayeva ◽  
◽  
D.K. Darkenbayev ◽  
M. Turdaliyev ◽  
◽  
...  

The growth rate of these enterprises has increased significantly in the last decade. Research has shown that over the past two decades, the amount of data has increased approximately tenfold every two years - this exceeded Moore's Law, which doubles the power of processors. About thirty thousand gigabytes of data are accumulated every second, and their processing requires an increase in the efficiency of data processing. Uploading videos, photos and letters from users on social networks leads to the accumulation of a large amount of data, including unstructured ones. This leads to the need for enterprises to work with big data of different formats, which must be prepared in a certain way for further work in order to obtain the results of modeling and calculations. In connection with the above, the research carried out in the article on processing and storing large data of an enterprise, developing a model and algorithms, as well as using new technologies is relevant. Undoubtedly, every year the information flows of enterprises will increase and in this regard, it is important to solve the issues of storing and processing large amounts of data. The relevance of the article is due to the growing digitalization, the increasing transition to professional activities online in many areas of modern society. The article provides a detailed analysis and research of these new technologies.


2020 ◽  
Vol 10 (14) ◽  
pp. 4901
Author(s):  
Waleed Albattah ◽  
Rehan Ullah Khan ◽  
Khalil Khan

Processing big data requires serious computing resources. Because of this challenge, big data processing is an issue not only for algorithms but also for computing resources. This article analyzes a large amount of data from different points of view. One perspective is the processing of reduced collections of big data with less computing resources. Therefore, the study analyzed 40 GB data to test various strategies to reduce data processing. Thus, the goal is to reduce this data, but not to compromise on the detection and model learning in machine learning. Several alternatives were analyzed, and it is found that in many cases and types of settings, data can be reduced to some extent without compromising detection efficiency. Tests of 200 attributes showed that with a performance loss of only 4%, more than 80% of the data could be ignored. The results found in the study, thus provide useful insights into large data analytics.


Big data marks a major turning point in the use of data and is a powerful vehicle for growth and profitability. A comprehensive understanding of a company's data, its potential can be a new vector for performance. It must be recognized that without an adequate analysis, our data are just an unusable raw material. In this context, the traditional data processing tools cannot support such an explosion of volume. They cannot respond to new needs in a timely manner and at a reasonable cost. Big data is a broad term generally referring to very large data collections that impose complications on analytics tools for harnessing and managing such. This chapter details what big data analysis is. It presents the development of its applications. It is interested in the important changes that have touched the analytics context.


2012 ◽  
Vol 90 (3) ◽  
pp. 723-733 ◽  
Author(s):  
J. B. Cole ◽  
S. Newman ◽  
F. Foertter ◽  
I. Aguilar ◽  
M. Coffey

Author(s):  
Guolei Zhang ◽  
Jia Li ◽  
Li Hao

In the development of information technology the development of scientific theory has brought the progress of science and technology. The progress of science and technology has an impact on the educational field, which changes the way of education. The arrival of the era of big data for the promotion and dissemination of educational resources has played an important role, it makes more and more people benefit. Modern distance education relies on the background of big data and cloud computing, which is composed of a series of tools to support a variety of teaching mode. Clustering algorithm can provide an effective evaluation method for students' personality characteristics and learning status in distance education. However, the traditional K-means clustering algorithm has the characteristics of randomness, uncertainty, high time complexity, and it does not meet the requirements of large data processing. In this paper, we study the parallel K-means clustering algorithm based on cloud computing platform Hadoop, and give the design and strategy of the algorithm. Then, we carry out experiments on several different sizes of data sets, and compare the performance of the proposed method with the general clustering method. Experimental results show that the proposed algorithm which is accelerated has good speed up and low cost. It is suitable for the analysis and mining of large data in the distance higher education.


Big data applications play an important role in real time data processing. Apache Spark is a data processing framework with in-memory data engine that quickly processes large data sets. It can also distribute data processing tasks across multiple computers, either on its own or in tandem with other distributed computing tools. Spark’s in-memory processing cannot share data between the applications and hence, the RAM memory will be insufficient for storing petabytes of data. Alluxio is a virtual distributed storage system that leverages memory for data storage and provides faster access to data in different storage systems. Alluxio helps to speed up data intensive Spark applications, with various storage systems. In this work, the performance of applications on Spark as well as Spark running over Alluxio have been studied with respect to several storage formats such as Parquet, ORC, CSV, and JSON; and four types of queries from Star Schema Benchmark (SSB). A benchmark is evolved to suggest the suitability of Spark Alluxio combination for big data applications. It is found that Alluxio is suitable for applications that use databases of size more than 2.6 GB storing data in JSON and CSV formats. Spark is found suitable for applications that use storage formats such as parquet and ORC with database sizes less than 2.6GB.


2017 ◽  
Vol 1 (21) ◽  
pp. 19-35 ◽  
Author(s):  
Zbigniew Marszałek

Merge sort algorithm is widely used in databases to organize and search for information. In the work the author describes some newly proposed not recursive version of the merge sort algorithm for large data sets. Tests of the algorithm confirm the effectiveness of the method and the stability of the proposed version.


2019 ◽  
Vol 12 (1) ◽  
pp. 42 ◽  
Author(s):  
Andrey I. Vlasov ◽  
Konstantin A. Muraviev ◽  
Alexandra A. Prudius ◽  
Demid A. Uzenkov

Author(s):  
Vivek Raich ◽  
Pankaj Maurya

in the time of the Information Technology, the big data store is going on. Due to which, Huge amounts of data are available for decision makers, and this has resulted in the progress of information technology and its wide growth in many areas of business, engineering, medical, and scientific studies. Big data means that the size which is bigger in size, but there are several types, which are not easy to handle, technology is required to handle it. Due to continuous increase in the data in this way, it is important to study and manage these datasets by adjusting the requirements so that the necessary information can be obtained.The aim of this paper is to analyze some of the analytic methods and tools. Which can be applied to large data. In addition, the application of Big Data has been analyzed, using the Decision Maker working on big data and using enlightened information for different applications.


Sign in / Sign up

Export Citation Format

Share Document