Data Reduction for Big Data

Data reduction is perhaps the most critical component in retrieving information from big data (i.e., petascale-sized data) in many data-mining processes. The central issue of these data reduction techniques is to save time and bandwidth in enabling the user to deal with larger datasets even in minimal resource environments, such as in desktop or small cluster systems. In this chapter, the authors examine the motivations behind why these reduction techniques are important in the analysis of big datasets. Then they present several basic reduction techniques in detail, stressing the advantages and disadvantages of each. The authors also consider signal processing techniques for mining big data by the use of discrete wavelet transformation and server-side data reduction techniques. Lastly, they include a general discussion on parallel algorithms for data reduction, with special emphasis given to parallel wavelet-based multi-resolution data reduction techniques on distributed memory systems using MPI and shared memory architectures on GPUs along with a demonstration of the improvement of performance and scalability for one case study.

Download Full-text

Text-Data Reduction Method to Grasp the Sequence of a Disaster Situation: Case Study of Web News Analysis of the 2015 Typhoons 17 and 18

Journal of Disaster Research ◽

10.20965/jdr.2017.p0329 ◽

2017 ◽

Vol 12 (2) ◽

pp. 329-334

Author(s):

Shosuke Sato ◽

◽

Toru Okamoto ◽

Shunichi Koshimura ◽

Keyword(s):

Big Data ◽

Data Reduction ◽

Reduction Method ◽

Text Data ◽

News Analysis ◽

Disaster Situation ◽

Data Source ◽

Data Reduction Method ◽

Web News

This study aims to compress web news, delivered as a big-data source after disasters. In this paper, article clustering, which is a combination of conventional means and an algorithm that selects the representative articles of each cluster, is designed and adopted. Experiments are conducted by evaluators. The proposed algorithm is in accord with the evaluators for 50s% of the clustering and for about 30s% to 40s% of the representative-article selection.

Download Full-text

Parallel Data Reduction Techniques for Big Datasets

Big Data ◽

10.4018/978-1-4666-9840-6.ch034 ◽

2016 ◽

pp. 734-756 ◽

Cited By ~ 1

Author(s):

Ahmet Artu Yıldırım ◽

Cem Özdoğan ◽

Dan Watson

Keyword(s):

Big Data ◽

Data Reduction ◽

Memory Systems ◽

Discrete Wavelet ◽

Server Side ◽

Advantages And Disadvantages ◽

Reduction Techniques ◽

Parallel Data ◽

Processing Techniques

Data reduction is perhaps the most critical component in retrieving information from big data (i.e., petascale-sized data) in many data-mining processes. The central issue of these data reduction techniques is to save time and bandwidth in enabling the user to deal with larger datasets even in minimal resource environments, such as in desktop or small cluster systems. In this chapter, the authors examine the motivations behind why these reduction techniques are important in the analysis of big datasets. Then they present several basic reduction techniques in detail, stressing the advantages and disadvantages of each. The authors also consider signal processing techniques for mining big data by the use of discrete wavelet transformation and server-side data reduction techniques. Lastly, they include a general discussion on parallel algorithms for data reduction, with special emphasis given to parallel wavelet-based multi-resolution data reduction techniques on distributed memory systems using MPI and shared memory architectures on GPUs along with a demonstration of the improvement of performance and scalability for one case study.

Download Full-text

Big Data Reduction Methods: A Survey

Data Science and Engineering ◽

10.1007/s41019-016-0022-0 ◽

2016 ◽

Vol 1 (4) ◽

pp. 265-284 ◽

Cited By ~ 50

Author(s):

Muhammad Habib ur Rehman ◽

Chee Sun Liew ◽

Assad Abbas ◽

Prem Prakash Jayaraman ◽

Teh Ying Wah ◽

...

Keyword(s):

Big Data ◽

Data Reduction ◽

Reduction Methods

Download Full-text

An Improved Secure High-Order-Lanczos Based Orthogonal Tensor SVD for Outsourced Cyber-Physical-Social Big Data Reduction

IEEE Transactions on Big Data ◽

10.1109/tbdata.2018.2881441 ◽

2018 ◽

pp. 1-1 ◽

Cited By ~ 5

Author(s):

Jun Feng ◽

Laurence T. Yang ◽

Guohui Dai ◽

Jinjun Chen ◽

Zheng Yan

Keyword(s):

Big Data ◽

Data Reduction ◽

High Order ◽

Social Big Data

Download Full-text

Red-RF: Reduced Random Forest for Big Data Using Priority Voting & Dynamic Data Reduction

2015 IEEE International Congress on Big Data ◽

10.1109/bigdatacongress.2015.26 ◽

2015 ◽

Cited By ~ 1

Author(s):

Hussein Mohsen ◽

Hasan Kurban ◽

Kurt Zimmer ◽

Mark Jenne ◽

Mehmet M. Dalkilic

Keyword(s):

Big Data ◽

Random Forest ◽

Data Reduction ◽

Dynamic Data

Download Full-text

Summary of Data Reduction Based on Cloud Environment

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.1079-1080.779 ◽

2014 ◽

Vol 1079-1080 ◽

pp. 779-781

Author(s):

Shu Li Huang

Keyword(s):

Data Mining ◽

Cloud Computing ◽

Big Data ◽

Data Reduction ◽

Attribute Reduction ◽

Cloud Environment ◽

Mining Technology ◽

Data Mining Techniques ◽

Using Data

In today's era of big data, how to quickly find the data they need is a difficult thing from the mass of information, in order to achieve this goal, cloud computing to data mining technology provides a new direction, this article on how cloud environment attribute Reduction using data mining techniques are described.

Download Full-text