scholarly journals Application-Oriented Succinct Data Structures for Big Data

2019 ◽  
Vol 13 (2) ◽  
pp. 227-236
Author(s):  
Tetsuo Shibuya

Abstract A data structure is called succinct if its asymptotical space requirement matches the original data size. The development of succinct data structures is an important factor to deal with the explosively increasing big data. Moreover, wider variations of big data have been produced in various fields recently and there is a substantial need for the development of more application-specific succinct data structures. In this study, we review the recently proposed application-oriented succinct data structures motivated by big data applications in three different fields: privacy-preserving computation in cryptography, genome assembly in bioinformatics, and work space reduction for compressed communications.

2021 ◽  
Vol 14 (11) ◽  
pp. 2244-2257
Author(s):  
Otmar Ertl

MinHash and HyperLogLog are sketching algorithms that have become indispensable for set summaries in big data applications. While HyperLogLog allows counting different elements with very little space, MinHash is suitable for the fast comparison of sets as it allows estimating the Jaccard similarity and other joint quantities. This work presents a new data structure called SetSketch that is able to continuously fill the gap between both use cases. Its commutative and idempotent insert operation and its mergeable state make it suitable for distributed environments. Fast, robust, and easy-to-implement estimators for cardinality and joint quantities, as well as the ability to use SetSketch for similarity search, enable versatile applications. The presented joint estimator can also be applied to other data structures such as MinHash, HyperLogLog, or Hyper-MinHash, where it even performs better than the corresponding state-of-the-art estimators in many cases.


Author(s):  
Vinesh kumar ◽  
Dr. Amit Asthana ◽  
Sunil Kumar ◽  
Dr. Jayant Shekhar

Author(s):  
Vinesh Kumar ◽  
Amit Asthana ◽  
Sunil Kumar ◽  
Sunil Kumar

2019 ◽  
pp. 58-66
Author(s):  
Máté Nagy ◽  
János Tapolcai ◽  
Gábor Rétvári

Opportunistic data structures are used extensively in big data practice to break down the massive storage space requirements of processing large volumes of information. A data structure is called (singly) opportunistic if it takes advantage of the redundancy in the input in order to store it in iformationtheoretically minimum space. Yet, efficient data processing requires a separate index alongside the data, whose size often substantially exceeds that of the compressed information. In this paper, we introduce doubly opportunistic data structures to not only attain best possible compression on the input data but also on the index. We present R3D3 that encodes a bitvector of length n and Shannon entropy H0 to nH0 bits and the accompanying index to nH0(1/2 + O(log C/C)) bits, thus attaining provably minimum space (up to small error terms) on both the data and the index, and supports a rich set of queries to arbitrary position in the compressed bitvector in O(C) time when C = o(log n). Our R3D3 prototype attains several times space reduction beyond known compression techniques on a wide range of synthetic and real data sets, while it supports operations on the compressed data at comparable speed.


Author(s):  
Ranjit Biswas

The homogeneous data structure ‘train' and the heterogeneous data structure ‘atrain' are the fundamental, very powerful dynamic and flexible data structures, being the first data structures introduced exclusively for big data. Thus ‘Data Structures for Big Data' is to be regarded as a new subject in Big Data Science, not just as a new topic, considering the explosive momentum of the big data. Based upon the notion of the big data structures train and atrain, the author introduces the useful data structures for the programmers working with big data which are: homogeneous stacks ‘train stack' and ‘rT-coach stack', heterogeneous stacks ‘atrain stack' and ‘rA-coach stack', homogeneous queues ‘train queue' and ‘rT-coach queue', heterogeneous queues ‘atrain queue' and ‘rA-coach queue', homogeneous binary trees ‘train binary tree' and ‘rT-coach binary tree', heterogeneous binary trees ‘atrain binary tree' and ‘rA-coach binary tree', homogeneous trees ‘train tree' and ‘rT-coach tree', heterogeneous trees ‘atrain tree' and ‘rA-coach tree', to enrich the subject ‘Data Structures for Big Data' for big data science.


2017 ◽  
Vol 9 (02) ◽  
Author(s):  
Vinesh Kumar ◽  
Jayant Shekhar ◽  
Sunil Kumar

Data Representation in memory is one of the tasks in Big data. Data representation includes several types of tree data structures through the system can access accurate and efficient data in big data. Succinct data structures can play important role in data representation while data in big-data is processed in main memory. Data representation is a very complex problem in Big Data.We proposed some solution of problems of data representation in Big data. Data processing in big data can be utilized to take a decision on data mining. We know the function and rules for query processing. We have to either change the method of processor we can change the way of representation. In this paper, different kind of tree data structures is presented for data representation in main memory of computer system for big data by using succinct data structures. Here we first compare all data structures by the table. Each method has different space and time complexity. We know that Big data information services increasing day by day. So space complexity of succinct data structures is becoming very popular in practice in this era.


2019 ◽  
Vol 7 (4) ◽  
pp. 278-292 ◽  
Author(s):  
Raffaella Rizzi ◽  
Stefano Beretta ◽  
Murray Patterson ◽  
Yuri Pirola ◽  
Marco Previtali ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document