A Compressed Data Structure for Surface Representation

Time-evolving web and social network graphs are modeled as a set of pages/individuals (nodes) and their arcs (links/relationships) that change over time. Due to their popularity, they have become increasingly massive in terms of their number of nodes, arcs, and lifetimes. However, these graphs are extremely sparse throughout their lifetimes. For example, it is estimated that Facebook has over a billion vertices, yet at any point in time, it has far less than 0.001% of all possible relationships. The space required to store these large sparse graphs may not fit in most main memories using underlying representations such as a series of adjacency matrices or adjacency lists. We propose building a compressed data structure that has a compressed binary tree corresponding to each row of each adjacency matrix of the time-evolving graph. We do not explicitly construct the adjacency matrix, and our algorithms take the time-evolving arc list representation as input for its construction. Our compressed structure allows for directed and undirected graphs, faster arc and neighborhood queries, as well as the ability for arcs and frames to be added and removed directly from the compressed structure (streaming operations). We use publicly available network data sets such as Flickr, Yahoo!, and Wikipedia in our experiments and show that our new technique performs as well or better than our benchmarks on all datasets in terms of compression size and other vital metrics.

Download Full-text

A Hybrid Compressed Data Structure Supporting Rank and Select on Bit Sequences

2020 39th International Conference of the Chilean Computer Science Society (SCCC) ◽

10.1109/sccc51225.2020.9281244 ◽

2020 ◽

Author(s):

Diego Arroyuelo ◽

Manuel Weitzman

Keyword(s):

Data Structure ◽

Rank And Select ◽

Compressed Data

Download Full-text

A locally encodable and decodable compressed data structure

2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton) ◽

10.1109/allerton.2009.5394919 ◽

2009 ◽

Cited By ~ 9

Author(s):

Venkat Chandar ◽

Devavrat Shah ◽

Gregory W. Wornell

Keyword(s):

Data Structure ◽

Compressed Data

Download Full-text

GTC: a novel attempt to maintenance of huge genome collections compressed

10.1101/131649 ◽

2017 ◽

Author(s):

Agnieszka Danek ◽

Sebastian Deorowicz

Keyword(s):

Genetic Variation ◽

Data Structure ◽

Compression Ratio ◽

Link Type ◽

Variation Data ◽

Compressed Data

AbstractMotivationResultsWe present GTC, a novel compressed data structure for representation of huge collections of genetic variation data. GTC significantly outperforms existing solutions in terms of compression ratio and time of answering various types of queries. We show that the largest of publicly available database of about 60 thousand haplotypes at about 40 million SNPs can be stored in less than 4 Gbytes, while the queries related to variants are answered in a fraction of a second.AvailabilityGTC can be downloaded from https://github.com/refresh-bio/GTC or http://sun.aei.polsl.pl/REFRESH/[email protected]

Download Full-text