A Hybrid Compressed Data Structure Supporting Rank and Select on Bit Sequences

Author(s):  
Diego Arroyuelo ◽  
Manuel Weitzman
2022 ◽  
Vol 16 (2) ◽  
pp. 1-21
Author(s):  
Michael Nelson ◽  
Sridhar Radhakrishnan ◽  
Chandra Sekharan ◽  
Amlan Chatterjee ◽  
Sudhindra Gopal Krishna

Time-evolving web and social network graphs are modeled as a set of pages/individuals (nodes) and their arcs (links/relationships) that change over time. Due to their popularity, they have become increasingly massive in terms of their number of nodes, arcs, and lifetimes. However, these graphs are extremely sparse throughout their lifetimes. For example, it is estimated that Facebook has over a billion vertices, yet at any point in time, it has far less than 0.001% of all possible relationships. The space required to store these large sparse graphs may not fit in most main memories using underlying representations such as a series of adjacency matrices or adjacency lists. We propose building a compressed data structure that has a compressed binary tree corresponding to each row of each adjacency matrix of the time-evolving graph. We do not explicitly construct the adjacency matrix, and our algorithms take the time-evolving arc list representation as input for its construction. Our compressed structure allows for directed and undirected graphs, faster arc and neighborhood queries, as well as the ability for arcs and frames to be added and removed directly from the compressed structure (streaming operations). We use publicly available network data sets such as Flickr, Yahoo!, and Wikipedia in our experiments and show that our new technique performs as well or better than our benchmarks on all datasets in terms of compression size and other vital metrics.


2017 ◽  
Author(s):  
Agnieszka Danek ◽  
Sebastian Deorowicz

AbstractMotivationResultsWe present GTC, a novel compressed data structure for representation of huge collections of genetic variation data. GTC significantly outperforms existing solutions in terms of compression ratio and time of answering various types of queries. We show that the largest of publicly available database of about 60 thousand haplotypes at about 40 million SNPs can be stored in less than 4 Gbytes, while the queries related to variants are answered in a fraction of a second.AvailabilityGTC can be downloaded from https://github.com/refresh-bio/GTC or http://sun.aei.polsl.pl/REFRESH/[email protected]


This article describes the proposed approaches to creating distributed models that can, with given accuracy under given restrictions, replace classical physical models for construction objects. The ability to implement the proposed approaches is a consequence of the cyber-physical integration of building systems. The principles of forming the data structure of designed objects and distributed models, which make it possible to uniquely identify the elements and increase the level of detail of such a model, are presented. The data structure diagram of distributed modeling includes, among other things, the level of formation and transmission of signals about physical processes inside cyber-physical building systems. An enlarged algorithm for creating the structure of the distributed model which describes the process of developing a data structure, formalizing requirements for the parameters of a design object and its operating modes (including normal operating conditions and extreme conditions, including natural disasters) and selecting objects for a complete group that provides distributed modeling is presented. The article formulates the main approaches to the implementation of an important practical application of the cyber-physical integration of building systems - the possibility of forming distributed physical models of designed construction objects and the directions of further research are outlined.


Sign in / Sign up

Export Citation Format

Share Document