Application-Oriented Succinct Data Structures for Big Data

Abstract A data structure is called succinct if its asymptotical space requirement matches the original data size. The development of succinct data structures is an important factor to deal with the explosively increasing big data. Moreover, wider variations of big data have been produced in various fields recently and there is a substantial need for the development of more application-specific succinct data structures. In this study, we review the recently proposed application-oriented succinct data structures motivated by big data applications in three different fields: privacy-preserving computation in cryptography, genome assembly in bioinformatics, and work space reduction for compressed communications.

Download Full-text

SetSketch

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476276 ◽

2021 ◽

Vol 14 (11) ◽

pp. 2244-2257

Author(s):

Otmar Ertl

Keyword(s):

Big Data ◽

Data Structure ◽

Data Structures ◽

Similarity Search ◽

State Of The Art ◽

Use Cases ◽

Distributed Environments ◽

Jaccard Similarity ◽

Big Data Applications ◽

Better Than

MinHash and HyperLogLog are sketching algorithms that have become indispensable for set summaries in big data applications. While HyperLogLog allows counting different elements with very little space, MinHash is suitable for the fast comparison of sets as it allows estimating the Jaccard similarity and other joint quantities. This work presents a new data structure called SetSketch that is able to continuously fill the gap between both use cases. Its commutative and idempotent insert operation and its mergeable state make it suitable for distributed environments. Fast, robust, and easy-to-implement estimators for cardinality and joint quantities, as well as the ability to use SetSketch for similarity search, enable versatile applications. The presented joint estimator can also be applied to other data structures such as MinHash, HyperLogLog, or Hyper-MinHash, where it even performs better than the corresponding state-of-the-art estimators in many cases.

Download Full-text

Data Representation in Big data via Succinct Data Structures

International Journal of Engineering Science and Technology ◽

10.21817/ijest/2018/v10i1/181001013 ◽

2018 ◽

Vol 10 (1) ◽

pp. 21-28 ◽

Cited By ~ 1

Author(s):

Vinesh kumar ◽

Dr. Amit Asthana ◽

Sunil Kumar ◽

Dr. Jayant Shekhar

Keyword(s):

Big Data ◽

Data Structures ◽

Data Representation ◽

Succinct Data Structures

Download Full-text

Compendious and Optimized Succinct Data Structures for Big Data Store

SSRN Electronic Journal ◽

10.2139/ssrn.3170513 ◽

2018 ◽

Cited By ~ 1

Author(s):

Vinesh Kumar ◽

Amit Asthana ◽

Sunil Kumar ◽

Sunil Kumar

Keyword(s):

Big Data ◽

Data Structures ◽

Succinct Data Structures ◽

Data Store

Download Full-text

R3D3: A Doubly Opportunistic Data Structure for Compressing and Indexing Massive Data

Infocommunications journal ◽

10.36244/icj.2019.2.7 ◽

2019 ◽

pp. 58-66

Author(s):

Máté Nagy ◽

János Tapolcai ◽

Gábor Rétvári

Keyword(s):

Data Structure ◽

Data Structures ◽

Real Data ◽

Small Error ◽

Data Sets ◽

Space Reduction ◽

Wide Range ◽

Arbitrary Position ◽

Efficient Data ◽

Space Requirements

Opportunistic data structures are used extensively in big data practice to break down the massive storage space requirements of processing large volumes of information. A data structure is called (singly) opportunistic if it takes advantage of the redundancy in the input in order to store it in iformationtheoretically minimum space. Yet, efficient data processing requires a separate index alongside the data, whose size often substantially exceeds that of the compressed information. In this paper, we introduce doubly opportunistic data structures to not only attain best possible compression on the input data but also on the index. We present R3D3 that encodes a bitvector of length n and Shannon entropy H0 to nH0 bits and the accompanying index to nH0(1/2 + O(log C/C)) bits, thus attaining provably minimum space (up to small error terms) on both the data and the index, and supports a rich set of queries to arbitrary position in the compressed bitvector in O(C) time when C = o(log n). Our R3D3 prototype attains several times space reduction beyond known compression techniques on a wide range of synthetic and real data sets, while it supports operations on the compressed data at comparable speed.

Download Full-text

Compendious and Succinct Data Structures for Big Data

Advances in Intelligent Systems and Computing - Advances in Computational Intelligence and Communication Technology ◽

10.1007/978-981-15-1275-9_37 ◽

2020 ◽

pp. 457-467

Author(s):

Vinesh Kumar ◽

Akhilesh Kumar Singh ◽

Sharad Pratap Singh

Keyword(s):

Big Data ◽

Data Structures ◽

Succinct Data Structures

Download Full-text

Introducing Data Structures for Big Data

Effective Big Data Management and Opportunities for Implementation - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-0182-4.ch002 ◽

2016 ◽

pp. 25-52 ◽

Cited By ~ 1

Author(s):

Ranjit Biswas

Keyword(s):

Big Data ◽

Data Structure ◽

Data Structures ◽

Binary Tree ◽

Data Science ◽

Heterogeneous Data ◽

Binary Trees ◽

Homogeneous Trees ◽

The Subject ◽

Homogeneous Data

The homogeneous data structure ‘train' and the heterogeneous data structure ‘atrain' are the fundamental, very powerful dynamic and flexible data structures, being the first data structures introduced exclusively for big data. Thus ‘Data Structures for Big Data' is to be regarded as a new subject in Big Data Science, not just as a new topic, considering the explosive momentum of the big data. Based upon the notion of the big data structures train and atrain, the author introduces the useful data structures for the programmers working with big data which are: homogeneous stacks ‘train stack' and ‘rT-coach stack', heterogeneous stacks ‘atrain stack' and ‘rA-coach stack', homogeneous queues ‘train queue' and ‘rT-coach queue', heterogeneous queues ‘atrain queue' and ‘rA-coach queue', homogeneous binary trees ‘train binary tree' and ‘rT-coach binary tree', heterogeneous binary trees ‘atrain binary tree' and ‘rA-coach binary tree', homogeneous trees ‘train tree' and ‘rT-coach tree', heterogeneous trees ‘atrain tree' and ‘rA-coach tree', to enrich the subject ‘Data Structures for Big Data' for big data science.

Download Full-text

Data Representation in Big data via succinct data structures

GBAMS- Vidushi ◽

10.26829/vidushi.v9i02.12288 ◽

2017 ◽

Vol 9 (02) ◽

Author(s):

Vinesh Kumar ◽

Jayant Shekhar ◽

Sunil Kumar

Keyword(s):

Big Data ◽

Data Structures ◽

Time Complexity ◽

Data Representation ◽

Complex Problem ◽

Main Memory ◽

Succinct Data Structures ◽

Efficient Data ◽

Tree Data ◽

Day By Day

Data Representation in memory is one of the tasks in Big data. Data representation includes several types of tree data structures through the system can access accurate and efficient data in big data. Succinct data structures can play important role in data representation while data in big-data is processed in main memory. Data representation is a very complex problem in Big Data.We proposed some solution of problems of data representation in Big data. Data processing in big data can be utilized to take a decision on data mining. We know the function and rules for query processing. We have to either change the method of processor we can change the way of representation. In this paper, different kind of tree data structures is presented for data representation in main memory of computer system for big data by using succinct data structures. Here we first compare all data structures by the table. Each method has different space and time complexity. We know that Big data information services increasing day by day. So space complexity of succinct data structures is becoming very popular in practice in this era.

Download Full-text