Dimension Reduction and Storage Optimization Techniques for Distributed and Big Data Cluster Environment

Author(s):  
S. Kalyan Chakravarthy ◽  
N. Sudhakar ◽  
E. Srinivasa Reddy ◽  
D. Venkata Subramanian ◽  
P. Shankar
2018 ◽  
Vol 7 (3.27) ◽  
pp. 252
Author(s):  
Ranjeet V. Powar ◽  
B Arunkumar

Nowadays the volume of digital data generated and used by enterprises is increasing at an enormous rate. The survey says that more than 80% of data that were generated in the last two years are unstructured in nature. Hence storage space requirement for storing this big volume of unstructured data is very high.  It has gained attention to large-scale storage systems. Deduplication is a space efficient method mainly used to solve storage space optimization problem. This paper focuses on the effect of massive volume of unstructured data and review various storage optimization techniques and survey of various storage types. In addition, it elaborates specific challenges with regard to storage optimization using deduplication and technology that handles a huge amount of unstructured data.  


2019 ◽  
Vol 9 (2) ◽  
pp. 43-59 ◽  
Author(s):  
Kaium Hossain ◽  
Mizanur Rahman ◽  
Shanto Roy

This article presents a detailed survey on different data compression and storage optimization techniques in the cloud, their implications, and discussion over future directions. The development of the smart city or smart home systems lies in the development of the Internet of Things (IoT). With the increasing number of IoT devices, the tremendous volume of data is being generated every single day. Therefore, it is necessary to optimize the system's performance by managing, compressing and mining IoT data for smart decision support systems. In this article, the authors surveyed recent approaches with up-to-date outcomes and findings related to the management, mining, compression, and optimization of IoT data. The authors then discuss the scopes and limitations of present works and finally, this article presents the future perspectives of IoT data management on basis of cloud, fog, and mobile edge computing.


2014 ◽  
Vol 1 (2) ◽  
pp. 293-314 ◽  
Author(s):  
Jianqing Fan ◽  
Fang Han ◽  
Han Liu

Abstract Big Data bring new opportunities to modern society and challenges to data scientists. On the one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This paper gives overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasize on the viability of the sparsest solution in high-confidence set and point out that exogenous assumptions in most statistical methods for Big Data cannot be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions.


2013 ◽  
Vol 63 (3) ◽  
Author(s):  
Jelena Fiosina ◽  
Maxims Fiosins, Jörg P. Müller

The deployment of future Internet and communication technologies (ICT) provide intelligent transportation systems (ITS) with huge volumes of real-time data (Big Data) that need to be managed, communicated, interpreted, aggregated and analysed. These technologies considerably enhance the effectiveness and user friendliness of ITS, providing considerable economic and social impact. Real-world application scenarios are needed to derive requirements for software architecture and novel features of ITS in the context of the Internet of Things (IoT) and cloud technologies. In this study, we contend that future service- and cloud-based ITS can largely benefit from sophisticated data processing capabilities. Therefore, new Big Data processing and mining (BDPM) as well as optimization techniques need to be developed and applied to support decision-making capabilities. This study presents real-world scenarios of ITS applications, and demonstrates the need for next-generation Big Data analysis and optimization strategies. Decentralised cooperative BDPM methods are reviewed and their effectiveness is evaluated using real-world data models of the city of Hannover, Germany. We point out and discuss future work directions and opportunities in the area of the development of BDPM methods in ITS.


2020 ◽  
Author(s):  
Mario A. R. Dantas

This work presents an introduction to the Data Intensive Scalable Computing (DISC) approach. This paradigm represents a valuable effort to tackle the large amount of data produced by several ordinary applications. Therefore, subjects such as characterization of big data and storage approaches, in addition to brief comparison between HPC and DISC are differentiated highlight.


Author(s):  
Ewa Niewiadomska-Szynkiewicz ◽  
Michał P. Karpowicz

Progress in life, physical sciences and technology depends on efficient data-mining and modern computing technologies. The rapid growth of data-intensive domains requires a continuous development of new solutions for network infrastructure, servers and storage in order to address Big Datarelated problems. Development of software frameworks, include smart calculation, communication management, data decomposition and allocation algorithms is clearly one of the major technological challenges we are faced with. Reduction in energy consumption is another challenge arising in connection with the development of efficient HPC infrastructures. This paper addresses the vital problem of energy-efficient high performance distributed and parallel computing. An overview of recent technologies for Big Data processing is presented. The attention is focused on the most popular middleware and software platforms. Various energy-saving approaches are presented and discussed as well.


Author(s):  
Pethuru Raj

The implications of the digitization process among a bevy of trends are definitely many and memorable. One is the abnormal growth in data generation, gathering, and storage due to a steady increase in the number of data sources, structures, scopes, sizes, and speeds. In this chapter, the author shows some of the impactful developments brewing in the IT space, how the tremendous amount of data getting produced and processed all over the world impacts the IT and business domains, how next-generation IT infrastructures are accordingly getting refactored, remedied, and readied for the impending big data-induced challenges, how likely the move of the big data analytics discipline towards fulfilling the digital universe requirements of extracting and extrapolating actionable insights for the knowledge-parched is, and finally, the establishment and sustenance of the dreamt smarter planet.


Author(s):  
Nada M. Alhakkak

BigGIS is a new product that resulted from developing GIS in the “Big Data” area, which is used in storing and processing big geographical data and helps in solving its issues. This chapter describes an optimized Big GIS framework in Map Reduce Environment M2BG. The suggested framework has been integrated into Map Reduce Environment in order to solve the storage issues and get the benefit of the Hadoop environment. M2BG include two steps: Big GIS warehouse and Big GIS Map Reduce. The first step contains three main layers: Data Source and Storage Layer (DSSL), Data Processing Layer (DPL), and Data Analysis Layer (DAL). The second layer is responsible for clustering using swarms as inputs for the Hadoop phase. Then it is scheduled in the mapping part with the use of a preempted priority scheduling algorithm; some data types are classified as critical and some others are ordinary data type; the reduce part used, merge sort algorithm M2BG, should solve security and be implemented with real data in the simulated environment and later in the real world.


Author(s):  
Sreenu G. ◽  
M.A. Saleem Durai

Advances in recent hardware technology have permitted to document transactions and other pieces of information of everyday life at an express pace. In addition of speed up and storage capacity, real-life perceptions tend to transform over time. However, there are so much prospective and highly functional values unseen in the vast volume of data. For this kind of applications conventional data mining is not suitable, so they should be tuned and changed or designed with new algorithms. Big data computing is inflowing to the category of most hopeful technologies that shows the way to new ways of thinking and decision making. This epoch of big data helps users to take benefit out of all available data to gain more precise systematic results or determine latent information, and then make best possible decisions. Depiction from a broad set of workloads, the author establishes a set of classifying measures based on the storage architecture, processing types, processing techniques and the tools and technologies used.


Sign in / Sign up

Export Citation Format

Share Document