Constructing smaller genome graphs via string compression

Mapping Intimacies ◽

10.1101/2021.02.08.430279 ◽

2021 ◽

Author(s):

Yutong Qiu ◽

Carl Kingsford

Keyword(s):

Linear Time ◽

Human Chromosomes ◽

Proof Of Concept ◽

Compression Algorithms ◽

De Bruijn Graphs ◽

A Genome ◽

Speed Up ◽

Node Labels ◽

Linear Time Algorithms ◽

Genome Graph

AbstractThe size of a genome graph — the space required to store the nodes, their labels and edges — affects the efficiency of operations performed on it. For example, the time complexity to align a sequence to a graph without a graph index depends on the total number of characters in the node labels and the number of edges in the graph. The size of the graph also affects the size of the graph index that is used to speed up the alignment. This raises the need for approaches to construct space-efficient genome graphs.We point out similarities in the string encoding approaches of genome graphs and the external pointer macro (EPM) compression model. Supported by these similarities, we present a pair of linear-time algorithms that transform between genome graphs and EPM-compressed forms. We show that the algorithms result in an upper bound on the size of the genome graph constructed based on an optimal EPM compression. In addition to the transformation, we show that equivalent choices made by EPM compression algorithms may result in different sizes of genome graphs. To further optimize the size of the genome graph, we purpose the source assignment problem that optimizes over the equivalent choices during compression and introduce an ILP formulation that solves that problem optimally. As a proof-of-concept, we introduce RLZ-Graph, a genome graph constructed based on the relative Lempel-Ziv EPM compression algorithm. We show that using RLZ-Graph, across all human chromosomes, we are able to reduce the disk space to store a genome graph on average by 40.7% compared to colored de Bruijn graphs constructed by Bifrost under the default settings.The RLZ-Graph software is available at https://github.com/Kingsford-Group/rlzgraph

Download Full-text

What do Eulerian and Hamiltonian cycles have to do with genome assembly?

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008928 ◽

2021 ◽

Vol 17 (5) ◽

pp. e1008928

Author(s):

Paul Medvedev ◽

Mihai Pop

Keyword(s):

Genome Assembly ◽

Linear Time ◽

Hamiltonian Cycles ◽

De Bruijn Graphs ◽

Genome Reconstruction ◽

Assembly Algorithm ◽

A Genome ◽

De Bruijn ◽

Do So

Many students are taught about genome assembly using the dichotomy between the complexity of finding Eulerian and Hamiltonian cycles (easy versus hard, respectively). This dichotomy is sometimes used to motivate the use of de Bruijn graphs in practice. In this paper, we explain that while de Bruijn graphs have indeed been very useful, the reason has nothing to do with the complexity of the Hamiltonian and Eulerian cycle problems. We give 2 arguments. The first is that a genome reconstruction is never unique and hence an algorithm for finding Eulerian or Hamiltonian cycles is not part of any assembly algorithm used in practice. The second is that even if an arbitrary genome reconstruction was desired, one could do so in linear time in both the Eulerian and Hamiltonian paradigms.

Download Full-text

Using Genome Graph Topology to Guide Annotation Matrix Sparsification

10.1101/2020.11.17.386649 ◽

2020 ◽

Author(s):

Daniel Danciu ◽

Mikhail Karasikov ◽

Harun Mustafa ◽

André Kahles ◽

Gunnar Rätsch

Keyword(s):

Linear Time ◽

Sequencing Data ◽

Construction Time ◽

Binary Matrix ◽

De Bruijn Graphs ◽

Large Sets ◽

Average Factor ◽

A New Technique ◽

Matrix Sparsification ◽

Genome Graph

AbstractSince the amount of published biological sequencing data is growing exponentially, efficient methods for storing and indexing this data are more needed than ever to truly benefit from this invaluable resource for biomedical research. Labeled de Bruijn graphs are a frequently-used approach for representing large sets of sequencing data. While significant progress has been made to succinctly represent the graph itself, efficient methods for storing labels on such graphs are still rapidly evolving. In this paper, we present RowDiff, a new technique for compacting graph labels by leveraging expected similarities in annotations of nodes adjacent in the graph. RowDiff can be constructed in linear time relative to the number of nodes and labels in the graph, and the construction can be efficiently parallelized and distributed, significantly reducing construction time. RowDiff can be viewed as an intermediary sparsification step of the initial annotation matrix and can thus naturally be combined with existing generic schemes for compressed binary matrix representation. Our experiments on the Fungi subset of the RefSeq collection show that applying RowDiff sparsification reduces the size of individual annotation columns stored as compressed bit vectors by an average factor of 42. When combining RowDiff with a Multi-BRWT representation, the resulting annotation is 26 times smaller than Mantis-MST, the previously known smallest annotation representation. In addition, experiments on 10,000 RNA-seq datasets show that RowDiff combined with Multi-BRWT results in a 30% reduction in annotation footprint over Mantis-MST.

Download Full-text

Direct measurement of electron-diffraction-pattern intensities using an energy loss spectrometer

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100155311 ◽

1989 ◽

Vol 47 ◽

pp. 668-669

Author(s):

A. G. Jackson ◽

M. Rowe

Keyword(s):

Thermal Effects ◽

Dynamic Range ◽

Scattering Amplitude ◽

Dynamical Theory ◽

Site Symmetry ◽

Proof Of Concept ◽

Parallel Acquisition ◽

Kinematic Approximation ◽

Speed Up ◽

Structure Calculations

Diffraction intensities from intermetallic compounds are, in the kinematic approximation, proportional to the scattering amplitude from the element doing the scattering. More detailed calculations have shown that site symmetry and occupation by various atom species also affects the intensity in a diffracted beam. [1] Hence, by measuring the intensities of beams, or their ratios, the occupancy can be estimated. Measurement of the intensity values also allows structure calculations to be made to determine the spatial distribution of the potentials doing the scattering. Thermal effects are also present as a background contribution. Inelastic effects such as loss or absorption/excitation complicate the intensity behavior, and dynamical theory is required to estimate the intensity value.The dynamic range of currents in diffracted beams can be 104or 105:1. Hence, detection of such information requires a means for collecting the intensity over a signal-to-noise range beyond that obtainable with a single film plate, which has a S/N of about 103:1. Although such a collection system is not available currently, a simple system consisting of instrumentation on an existing STEM can be used as a proof of concept which has a S/N of about 255:1, limited by the 8 bit pixel attributes used in the electronics. Use of 24 bit pixel attributes would easily allowthe desired noise range to be attained in the processing instrumentation. The S/N of the scintillator used by the photoelectron sensor is about 106 to 1, well beyond the S/N goal. The trade-off that must be made is the time for acquiring the signal, since the pattern can be obtained in seconds using film plates, compared to 10 to 20 minutes for a pattern to be acquired using the digital scan. Parallel acquisition would, of course, speed up this process immensely.

Download Full-text

Sub-triangle opacity masks for faster ray tracing of transparent objects

Proceedings of the ACM on Computer Graphics and Interactive Techniques ◽

10.1145/3406180 ◽

2020 ◽

Vol 3 (2) ◽

pp. 1-12

Author(s):

Holger Gruen ◽

Carsten Benthin ◽

Sven Woop

Keyword(s):

Ray Tracing ◽

Integrate Approach ◽

Proof Of Concept ◽

Test Operation ◽

Transparent Objects ◽

Speed Up

We propose an easy and simple-to-integrate approach to accelerate ray tracing of alpha-tested transparent geometry with a focus on Microsoft® DirectX® or Vulkan® ray tracing extensions. Pre-computed bit masks are used to quickly determine fully transparent and fully opaque regions of triangles thereby skipping the more expensive alpha-test operation. These bit masks allow us to skip up to 86% of all transparency tests, yielding up to 40% speed up in a proof-of-concept DirectX® software only implementation.

Download Full-text

Almost Linear Time Algorithms for Minsum k-Sink Problems on Dynamic Flow Path Networks

Theoretical Computer Science ◽

10.1016/j.tcs.2021.05.003 ◽

2021 ◽

Author(s):

Yuya Higashikawa ◽

Naoki Katoh ◽

Junichi Teruyama ◽

Koji Watase

Keyword(s):

Linear Time ◽

Flow Path ◽

Dynamic Flow ◽

Linear Time Algorithms

Download Full-text

Displacement-based design of precast hinged portal frames with additional dissipating devices at beam-to-column joints

Bulletin of Earthquake Engineering ◽

10.1007/s10518-021-01169-y ◽

2021 ◽

Author(s):

Andrea Belleri ◽

Simone Labò

Keyword(s):

Design Methodology ◽

Linear Time ◽

Design Procedure ◽

Time History ◽

Structural System ◽

Additional Degree ◽

Speed Up ◽

Column Joint ◽

Displacement Based ◽

Mechanical Devices

AbstractThe seismic performance of precast portal frames typical of the industrial and commercial sector could be generally improved by providing additional mechanical devices at the beam-to-column joint. Such devices could provide an additional degree of fixity and energy dissipation in a joint generally characterized by a dry hinged connection, adopted to speed-up the construction phase. Another advantage of placing additional devices at the beam-to-column joint is the possibility to act as a fuse, concentrating the seismic damage on few sacrificial and replaceable elements. A procedure to design precast portal frames adopting additional devices is provided herein. The procedure moves from the Displacement-Based Design methodology proposed by M.J.N. Priestley, and it is applicable for both the design of new structures and the retrofit of existing ones. After the derivation of the required analytical formulations, the procedure is applied to select the additional devices for a new and an existing structural system. The validation through non-linear time history analyses allows to highlight the advantages and drawbacks of the considered devices and to prove the effectiveness of the proposed design procedure.

Download Full-text

Linear time algorithms for linear programming

Computers & Mathematics with Applications ◽

10.1016/s0898-1221(99)00069-3 ◽

1999 ◽

Vol 37 (4-5) ◽

pp. 199-208

Author(s):

E.A. Galperin

Keyword(s):

Linear Programming ◽

Linear Time ◽

Linear Time Algorithms

Download Full-text

Almost-linear-time algorithms for Markov chains and new spectral primitives for directed graphs

Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing - STOC 2017 ◽

10.1145/3055399.3055463 ◽

2017 ◽

Cited By ~ 14

Author(s):

Michael B. Cohen ◽

Jonathan Kelner ◽

John Peebles ◽

Richard Peng ◽

Anup B. Rao ◽

...

Keyword(s):

Markov Chains ◽

Linear Time ◽

Directed Graphs ◽

Linear Time Algorithms

Download Full-text

Linear-Time Algorithms for Tree Root Problems

Algorithmica ◽

10.1007/s00453-013-9815-y ◽

2013 ◽

Vol 71 (2) ◽

pp. 471-495 ◽

Cited By ~ 1

Author(s):

Maw-Shang Chang ◽

Ming-Tat Ko ◽

Hsueh-I Lu

Keyword(s):

Linear Time ◽

Linear Time Algorithms

Download Full-text

FAULT TOLERANT ROUTING IN HYPERCUBES AND STAR GRAPHS

Parallel Processing Letters ◽

10.1142/s0129626496000133 ◽

1996 ◽

Vol 06 (01) ◽

pp. 127-136 ◽

Cited By ~ 5

Author(s):

QIAN-PING GU ◽

SHIETUNG PENG

Keyword(s):

Free Path ◽

Fault Tolerant ◽

Linear Time ◽

Time Efficiency ◽

Routing Problem ◽

Star Graphs ◽

Linear Time Algorithms ◽

Better Than

In this paper, we give two linear time algorithms for node-to-node fault tolerant routing problem in n-dimensional hypercubes Hn and star graphs Gn. The first algorithm, given at most n−1 arbitrary fault nodes and two non-fault nodes s and t in Hn, finds a fault-free path s→t of length at most [Formula: see text] in O(n) time, where d(s, t) is the distance between s and t. Our second algorithm, given at most n−2 fault nodes and two non-fault nodes s and t in Gn, finds a fault-free path s→t of length at most d(Gn)+3 in O(n) time, where [Formula: see text] is the diameter of Gn. When the time efficiency of finding the routing path is more important than the length of the path, the algorithms in this paper are better than the previous ones.

Download Full-text