A Survey on Efficient Data Deduplication in Data Analytics

Author(s):  
Ch. Prathima ◽  
L. S. S. Reddy
Author(s):  
Bosco Nirmala Priya, Et. al.

In current world, on account of tremendous enthusiasm for the big data extra space there is high odds of data duplication. Consequently, repetition makes issue by growing extra room in this manner stockpiling cost. Constant assessments have shown that moderate to high data excess obviously exists in fundamental stockpiling structures in the big data specialist. Our test thinks about uncover those data plenitude shows and a lot further degree of power on the I/O way than that on hovers because of for the most part high common access an area related with little I/O deals to dull data. Furthermore, direct applying data deduplication to fundamental stockpiling structures in the big data laborer will likely explanation space struggle in memory and data fragmentation on circles. We propose a genuine exhibition arranged I/O deduplication with cryptography, called CDEP (crowd deduplication with effective data placement), and rather than a limit situated I/O deduplication. This technique achieves data sections as the deduplication system develops. It is imperative to separate the data pieces in the deduplication structure and to fathom its features. Our test assessment utilizing authentic follows shows that contrasted and the progression based deduplication calculations, the copy end proportion and the understanding presentation (dormancy) can be both improved at the same time.


2018 ◽  
Vol 7 (3.12) ◽  
pp. 239
Author(s):  
Chitransh Rajesh ◽  
Yash Jain ◽  
J Jayapradha

Data Analytics is the process of analyzing unprocessed data to draw conclusions by studying and inspecting various patterns in the data. Several algorithms and conceptual methods are often followed to derive legit and accurate results. Efficient data handling is important for interactive visualization of data sets. Considering recent researches and analytical theories on column-oriented Database Management System, we are developing a new data engine using R and Tableau to predict airport trends. The engine uses Univariate datasets (Example, Perth Airport Passenger Movement Dataset, and Newark Airport Cargo Stats Dataset) to analyze and predict accurate trends. Data analyzing and prediction is done with the implementation of Time Series Analysis and respective ARIMA Models for respective modules. Development of modules is done using RStudio whereas Tableau is used for interactive visualization and end-user report generation. The Airport Trends Analytics Engine is an integral part of R and Tableau 10.4 and is optimized for use on desktop and server environments.  


2019 ◽  
Vol 30 (12) ◽  
pp. 2677-2691 ◽  
Author(s):  
Qiufen Xia ◽  
Zichuan Xu ◽  
Weifa Liang ◽  
Shui Yu ◽  
Song Guo ◽  
...  

2020 ◽  
Vol 29 (6) ◽  
pp. 1287-1310
Author(s):  
Sebastian Kruse ◽  
Zoi Kaoudi ◽  
Bertty Contreras-Rojas ◽  
Sanjay Chawla ◽  
Felix Naumann ◽  
...  

AbstractData analytics are moving beyond the limits of a single platform. In this paper, we present the cost-based optimizer of Rheem, an open-source cross-platform system that copes with these new requirements. The optimizer allocates the subtasks of data analytic tasks to the most suitable platforms. Our main contributions are: (i) a mechanism based on graph transformations to explore alternative execution strategies; (ii) a novel graph-based approach to determine efficient data movement plans among subtasks and platforms; and (iii) an efficient plan enumeration algorithm, based on a novel enumeration algebra. We extensively evaluate our optimizer under diverse real tasks. We show that our optimizer can perform tasks more than one order of magnitude faster when using multiple platforms than when using a single platform.


2014 ◽  
Vol 41 (9) ◽  
pp. 611-616
Author(s):  
Jeonghyeon Ma ◽  
Sejin Park ◽  
Chanik Park

2018 ◽  
Vol 11 (12) ◽  
pp. 2070-2073 ◽  
Author(s):  
Walter dos Santos ◽  
Gustavo P. Avelar ◽  
Manoel Horta Ribeiro ◽  
Dorgival Guedes ◽  
Wagner Meira

Author(s):  
Jagdish Patel ◽  
Komal Murtadak ◽  
Sayali Deore ◽  
Vaishnavi Thorat

They say that companies that do not understand the importance of Analyzation are less likely to survive in the modern economy. Your data is your most valuable asset. Data management is important because the data your organization create is a very valuable resource. The last thing you want to do is spend time and resources collecting data and business intelligence, only to lose or misplace that information. In that case, you would then have to spend time and resources again to get that same business intelligence you already had. However, only well prepared and analyzed data leads to process knowledge and finally, to process control and continuous improvement. Thus, a robust and efficient data analytics strategy is one of the most valuable concepts for the process industry.


Sign in / Sign up

Export Citation Format

Share Document