compressed data Latest Research Papers

Queryable Compression on Time-evolving Web and Social Networks with Streaming

ACM Transactions on the Web ◽

10.1145/3495012 ◽

2022 ◽

Vol 16 (2) ◽

pp. 1-21

Author(s):

Michael Nelson ◽

Sridhar Radhakrishnan ◽

Chandra Sekharan ◽

Amlan Chatterjee ◽

Sudhindra Gopal Krishna

Keyword(s):

Data Structure ◽

Adjacency Matrix ◽

Binary Tree ◽

Network Data ◽

Data Sets ◽

Undirected Graphs ◽

Sparse Graphs ◽

Compressed Data ◽

Over Time ◽

Better Than

Time-evolving web and social network graphs are modeled as a set of pages/individuals (nodes) and their arcs (links/relationships) that change over time. Due to their popularity, they have become increasingly massive in terms of their number of nodes, arcs, and lifetimes. However, these graphs are extremely sparse throughout their lifetimes. For example, it is estimated that Facebook has over a billion vertices, yet at any point in time, it has far less than 0.001% of all possible relationships. The space required to store these large sparse graphs may not fit in most main memories using underlying representations such as a series of adjacency matrices or adjacency lists. We propose building a compressed data structure that has a compressed binary tree corresponding to each row of each adjacency matrix of the time-evolving graph. We do not explicitly construct the adjacency matrix, and our algorithms take the time-evolving arc list representation as input for its construction. Our compressed structure allows for directed and undirected graphs, faster arc and neighborhood queries, as well as the ability for arcs and frames to be added and removed directly from the compressed structure (streaming operations). We use publicly available network data sets such as Flickr, Yahoo!, and Wikipedia in our experiments and show that our new technique performs as well or better than our benchmarks on all datasets in terms of compression size and other vital metrics.

Download Full-text

Cooperation and compressed data exchange between multiple gliders used to map oil spills in the ocean

Applied Ocean Research ◽

10.1016/j.apor.2021.102999 ◽

2022 ◽

Vol 118 ◽

pp. 102999

Author(s):

Yaomei Wang ◽

Worakanok Thanyamanta ◽

Neil Bose

Keyword(s):

Data Exchange ◽

Oil Spills ◽

Compressed Data

Download Full-text

Analysis of Economic Development Trend in Postepidemic Era Based on Improved Clustering Algorithm

Scientific Programming ◽

10.1155/2021/4467001 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Li Guo ◽

Kunlin Zhu ◽

Ruijun Duan

Keyword(s):

Economic Development ◽

Large Scale ◽

Clustering Algorithm ◽

Development Trend ◽

Data Sets ◽

Analysis Model ◽

Data Set ◽

Clustering Problem ◽

Large Scale Data ◽

Compressed Data

In order to explore the economic development trend in the postepidemic era, this paper improves the traditional clustering algorithm and constructs a postepidemic economic development trend analysis model based on intelligent algorithms. In order to solve the clustering problem of large-scale nonuniform density data sets, this paper proposes an adaptive nonuniform density clustering algorithm based on balanced iterative reduction and uses the algorithm to further cluster the compressed data sets. For large-scale data sets, the clustering results can accurately reflect the class characteristics of the data set as a whole. Moreover, the algorithm greatly improves the time efficiency of clustering. From the research results, we can see that the improved clustering algorithm has a certain effect on the analysis of economic development trends in the postepidemic era and can continue to play a role in subsequent economic analysis.

Download Full-text

Engineering Practical Lempel-Ziv Tries

Journal of Experimental Algorithmics ◽

10.1145/3481638 ◽

2021 ◽

Vol 26 (1) ◽

pp. 1-47

Author(s):

Diego Arroyuelo ◽

Rodrigo Cánovas ◽

Johannes Fischer ◽

Dominik Köppl ◽

Marvin Löbel ◽

...

Keyword(s):

Factor Structure ◽

Data Structures ◽

Independent Interest ◽

New Techniques ◽

Memory Space ◽

Compressed Data Structures ◽

Regular Factor ◽

Factorization Algorithms ◽

Compressed Data

The Lempel-Ziv 78 ( LZ78 ) and Lempel-Ziv-Welch ( LZW ) text factorizations are popular, not only for bare compression but also for building compressed data structures on top of them. Their regular factor structure makes them computable within space bounded by the compressed output size. In this article, we carry out the first thorough study of low-memory LZ78 and LZW text factorization algorithms, introducing more efficient alternatives to the classical methods, as well as new techniques that can run within less memory space than the necessary to hold the compressed file. Our results build on hash-based representations of tries that may have independent interest.

Download Full-text

NanoMethViz: An R/Bioconductor package for visualizing long-read methylation data

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009524 ◽

2021 ◽

Vol 17 (10) ◽

pp. e1009524

Author(s):

Shian Su ◽

Quentin Gouil ◽

Marnie E. Blewitt ◽

Dianne Cook ◽

Peter F. Hickey ◽

...

Keyword(s):

Cpg Islands ◽

R Package ◽

Data Format ◽

Bioconductor Project ◽

Modified Dna ◽

Long Read ◽

Effective Visualization ◽

Genomic Regions ◽

Methylation Patterns ◽

Compressed Data

A key benefit of long-read nanopore sequencing technology is the ability to detect modified DNA bases, such as 5-methylcytosine. The lack of R/Bioconductor tools for the effective visualization of nanopore methylation profiles between samples from different experimental groups led us to develop the NanoMethViz R package. Our software can handle methylation output generated from a range of different methylation callers and manages large datasets using a compressed data format. To fully explore the methylation patterns in a dataset, NanoMethViz allows plotting of data at various resolutions. At the sample-level, we use dimensionality reduction to look at the relationships between methylation profiles in an unsupervised way. We visualize methylation profiles of classes of features such as genes or CpG islands by scaling them to relative positions and aggregating their profiles. At the finest resolution, we visualize methylation patterns across individual reads along the genome using the spaghetti plot and heatmaps, allowing users to explore particular genes or genomic regions of interest. In summary, our software makes the handling of methylation signal more convenient, expands upon the visualization options for nanopore data and works seamlessly with existing methylation analysis tools available in the Bioconductor project. Our software is available at https://bioconductor.org/packages/NanoMethViz.

Download Full-text

Information Processing on Compressed Data

10.1007/978-981-16-4095-7_6 ◽

2021 ◽

pp. 89-104

Author(s):

Yoshimasa Takabatake ◽

Tomohiro I ◽

Hiroshi Sakamoto

Keyword(s):

Time Series ◽

Information Processing ◽

Pattern Mining ◽

Time Series Data ◽

Privacy Preserving ◽

Series Data ◽

Work Related ◽

Wide Range ◽

Similarity Computation ◽

Compressed Data

AbstractWe survey our recent work related to information processing on compressed strings. Note that a “string” here contains any fixed-length sequence of symbols and therefore includes not only ordinary text but also a wide range of data, such as pixel sequences and time-series data. Over the past two decades, a variety of algorithms and their applications have been proposed for compressed information processing. In this survey, we mainly focus on two problems: recompression and privacy-preserving computation over compressed strings. Recompression is a framework in which algorithms transform a given compressed data into another compressed format without decompression. Recent studies have shown that a higher compression ratio can be achieved at lower cost by using an appropriate recompression algorithm such as preprocessing. Furthermore, various privacy-preserving computation models have been proposed for information retrieval, similarity computation, and pattern mining.

Download Full-text

Statistical Distortion of Supervised Learning Predictions in Optical Microscopy Induced by Image Compression

10.21203/rs.3.rs-956884/v1 ◽

2021 ◽

Author(s):

Enrico Pomarico ◽

Cédric Schmidt ◽

Florian Chays ◽

David Nguyen ◽

Arielle Planchette ◽

...

Keyword(s):

Image Compression ◽

Optical Microscopy ◽

Supervised Learning ◽

Automated Analysis ◽

Cell Segmentation ◽

Sensor Calibration ◽

Predictive Uncertainty ◽

Data Throughput ◽

Noise Statistics ◽

Compressed Data

Abstract The growth of data throughput in optical microscopy has triggered the extensive use of supervised learning (SL) models on compressed datasets for automated analysis. Investigating the effects of image compression on SL predictions is therefore pivotal to assess their reliability, especially for clinical use.We quantify the statistical distortions induced by compression through the comparison of predictions on compressed data to the raw predictive uncertainty, numerically estimated from the raw noise statistics measured via sensor calibration. Predictions on cell segmentation parameters are altered by up to 15% and more than 10 standard deviations after 16-to-8 bits pixel depth reduction and 10:1 JPEG compression. JPEG formats with higher compression ratios show significantly larger distortions. Interestingly, a recent metrologically accurate algorithm, offering up to 10:1 compression ratio, provides a prediction spread equivalent to that stemming from raw noise. The method described here allows to set a lower bound to the predictive uncertainty of a SL task and can be generalized to determine the statistical distortions originated from a variety of processing pipelines in AI-assisted fields.

Download Full-text

Anomaly Detection based on Compressed Data: an Information Theoretic Characterization

10.36227/techrxiv.16738171.v1 ◽

2021 ◽

Author(s):

Alex Marchioni ◽

Andriy Enttsel ◽

Mauro Mangia ◽

Riccardo Rovatti ◽

Gianluca Setti

Keyword(s):

Anomaly Detection ◽

Lossy Compression ◽

Quality Loss ◽

Information Theoretic ◽

Asymptotic Regime ◽

Sensor Signals ◽

Theoretic Characterization ◽

Detector Configurations ◽

Compressed Data

<div>We analyze the effect of lossy compression in the processing of sensor signals that must be used to detect anomalous events in the system under observation. The intuitive relationship between the quality loss at higher compression and the possibility of telling anomalous behaviours from normal ones is formalized in terms of information-theoretic quantities. Some analytic derivations are made within the Gaussian framework and possibly in the asymptotic regime for what concerns the stretch of signals considered.</div><div>Analytical conclusions are matched with the performance of practical detectors in a toy case allowing the assessment of different compression/detector configurations.</div>

Download Full-text

Anomaly Detection based on Compressed Data: an Information Theoretic Characterization

10.36227/techrxiv.16738171 ◽

2021 ◽

Author(s):

Alex Marchioni ◽

Andriy Enttsel ◽

Mauro Mangia ◽

Riccardo Rovatti ◽

Gianluca Setti

Keyword(s):

Anomaly Detection ◽

Lossy Compression ◽

Quality Loss ◽

Information Theoretic ◽

Asymptotic Regime ◽

Sensor Signals ◽

Theoretic Characterization ◽

Detector Configurations ◽

Compressed Data

<div>We analyze the effect of lossy compression in the processing of sensor signals that must be used to detect anomalous events in the system under observation. The intuitive relationship between the quality loss at higher compression and the possibility of telling anomalous behaviours from normal ones is formalized in terms of information-theoretic quantities. Some analytic derivations are made within the Gaussian framework and possibly in the asymptotic regime for what concerns the stretch of signals considered.</div><div>Analytical conclusions are matched with the performance of practical detectors in a toy case allowing the assessment of different compression/detector configurations.</div>

Download Full-text

NOISE IMMUNITY IN DATA COMPRESSION INFORMATION AND MEASUREMENT SYSTEMS

Bulletin Series of Physics & Mathematical Sciences ◽

10.51889/2021-3.1728-7901.12 ◽

2021 ◽

Vol 75 (3) ◽

pp. 100-107

Author(s):

B.-B.S. Yesmagambetov ◽

Keyword(s):

Data Compression ◽

Noise Immunity ◽

Observation Interval ◽

Measuring System ◽

Parametric Methods ◽

Measurement Systems ◽

Huge Data ◽

The Difference ◽

Compressed Data ◽

Non Parametric

When processing huge data streams in information systems, individual measurements or whole groups of measurements can be distorted or lost due to various reasons. Recovery of compressed data during transmission on communication channels is accompanied by errors related to distortion of information and service parts of messages due to presence of interference in transmission channel. To these errors are added errors caused by quantization of the transmitted implementations by level and time sampling. Research on methods of increasing noise immunity both during transmission and during recovery of measured data is an urgent task in the design of information and measurement systems. The article considers non-parametric methods of estimating probabilistic characteristics of random processes. A distinctive feature of non-parametric methods is the ranking of data measured at the observation interval. It is shown that ranking of data on transmitting side of information-measuring system enables correction of errors and failures based on strict monotony of ranked number of codes. Also, the error of recovery of continuous implementations taking into account distortions of compressed data in the communication channel was investigated. The obtained results indicate that the use of complex compression algorithms is impractical, since the difference in the error in the restoration of non-stationary messages between the simplest algorithm and the rather difficult one becomes negligible. The article presents the results of estimating recovery errors for various data compression methods.

Download Full-text

compressed data
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Queryable Compression on Time-evolving Web and Social Networks with Streaming

Cooperation and compressed data exchange between multiple gliders used to map oil spills in the ocean

Analysis of Economic Development Trend in Postepidemic Era Based on Improved Clustering Algorithm

Engineering Practical Lempel-Ziv Tries

NanoMethViz: An R/Bioconductor package for visualizing long-read methylation data

Information Processing on Compressed Data

Statistical Distortion of Supervised Learning Predictions in Optical Microscopy Induced by Image Compression

Anomaly Detection based on Compressed Data: an Information Theoretic Characterization

Anomaly Detection based on Compressed Data: an Information Theoretic Characterization

NOISE IMMUNITY IN DATA COMPRESSION INFORMATION AND MEASUREMENT SYSTEMS

Export Citation Format

compressed dataRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Queryable Compression on Time-evolving Web and Social Networks with Streaming

Cooperation and compressed data exchange between multiple gliders used to map oil spills in the ocean

Analysis of Economic Development Trend in Postepidemic Era Based on Improved Clustering Algorithm

Engineering Practical Lempel-Ziv Tries

NanoMethViz: An R/Bioconductor package for visualizing long-read methylation data

Information Processing on Compressed Data

Statistical Distortion of Supervised Learning Predictions in Optical Microscopy Induced by Image Compression

Anomaly Detection based on Compressed Data: an Information Theoretic Characterization

Anomaly Detection based on Compressed Data: an Information Theoretic Characterization

NOISE IMMUNITY IN DATA COMPRESSION INFORMATION AND MEASUREMENT SYSTEMS

compressed data
Recently Published Documents