scholarly journals Representation of k-mer sets using spectrum-preserving string sets

Author(s):  
Amatur Rahman ◽  
Paul Medvedev

AbstractGiven the popularity and elegance of k-mer based tools, finding a space-efficient way to represent a set of k-mers is important for improving the scalability of bioinformatics analyses. One popular approach is to convert the set of k-mers into the more compact set of unitigs. We generalize this approach and formulate it as the problem of finding a smallest spectrum-preserving string set (SPSS) representation. We show that this problem is equivalent to finding a smallest path cover in a compacted de Bruijn graph. Using this reduction, we prove a lower bound on the size of the optimal SPSS and propose a greedy method called UST that results in a smaller representation than unitigs and is nearly optimal with respect to our lower bound. We demonstrate the usefulness of the SPSS formulation with two applications of UST. The first one is a compression algorithm, UST-Compress, which we show can store a set of k-mers using an order-of-magnitude less disk space than other lossless compression tools. The second one is an exact static k-mer membership index, UST-FM, which we show improves index size by 10-44% compared to other state-of-the-art low memory indices. Our tool is publicly available at: https://github.com/medvedevgroup/UST/.

2017 ◽  
Author(s):  
Jèrèmy Gauthier ◽  
Charlotte Mouden ◽  
Tomasz Suchan ◽  
Nadir Alvarez ◽  
Nils Arrigo ◽  
...  

AbstractWe present an original method to de novo call variants for Restriction site associated DNA Sequencing (RAD-Seq). RAD-Seq is a technique characterized by the sequencing of specific loci along the genome, that is widely employed in the field of evolutionary biology since it allows to exploit variants (mainly SNPs) information from entire populations at a reduced cost. Common RAD dedicated tools, as STACKS or IPyRAD, are based on all-versus-all read comparisons, which require consequent time and computing resources. Based on the variant caller DiscoSnp, initially designed for shotgun sequencing, DiscoSnp-RAD avoids this pitfall as variants are detected by exploring the De Bruijn Graph built from all the read datasets. We tested the implementation on RAD data from 259 specimens of Chiastocheta flies, morphologically assigned to 7 species. All individuals were successfully assigned to their species using both STRUCTURE and Maximum Likelihood phylogenetic reconstruction. Moreover, identified variants succeeded to reveal a within species structuration and the existence of two populations linked to their geographic distributions. Furthermore, our results show that DiscoSnp-RAD is at least one order of magnitude faster than state-of-the-art tools. The overall results show that DiscoSnp-RAD is suitable to identify variants from RAD data, and stands out from other tools due to his completely different principle, making it significantly faster, in particular on large datasets.LicenseGNU Affero general public licenseAvailabilityhttps://github.com/GATB/[email protected]


2017 ◽  
Author(s):  
Prashant Pandey ◽  
Michael A. Bender ◽  
Rob Johnson ◽  
Rob Patro

AbstractMotivationk-mer-based algorithms have become increasingly popular in the processing of high-throughput sequencing (HTS) data. These algorithms span the gamut of the analysis pipeline from k-mer counting (e.g., for estimating assembly parameters), to error correction, genome and transcriptome assembly, and even transcript quantification. Yet, these tasks often use very different k-mer representations and data structures. In this paper, we set forth the fundamental operations for maintaining multisets of k-mers and classify existing systems from a data-structural perspective. We then show how to build a k-mer-counting and multiset-representation system using the counting quotient filter (CQF), a feature-rich approximate membership query (AMQ) data structure. We introduce the k-mer-counting/querying system Squeakr (Simple Quotient filter-based Exact and Approximate Kmer Representation), which is based on the CQF. This off-the-shelf data structure turns out to be an efficient (approximate or exact) representation for sets or multisets of k-mers.ResultsSqueakr takes 2×−3;4.3× less time than the state-of-the-art to count and perform a random-point-query workload. Squeakr is memory-efficient, consuming 1.5X–4.3X less memory than the state-of-the-art. It offers competitive counting performance, and answers point queries (i.e. queries for the abundance of a particular k-mer) over an order-of-magnitude faster than other systems. The Squeakr representation of the k-mer multiset turns out to be immediately useful for downstream processing (e.g., de Bruijn graph traversal) because it supports fast queries and dynamic k-mer insertion, deletion, and modification.Availabilityhttps://github.com/splatlab/[email protected]


2021 ◽  
Vol 15 (5) ◽  
pp. 1-32
Author(s):  
Quang-huy Duong ◽  
Heri Ramampiaro ◽  
Kjetil Nørvåg ◽  
Thu-lan Dam

Dense subregion (subgraph & subtensor) detection is a well-studied area, with a wide range of applications, and numerous efficient approaches and algorithms have been proposed. Approximation approaches are commonly used for detecting dense subregions due to the complexity of the exact methods. Existing algorithms are generally efficient for dense subtensor and subgraph detection, and can perform well in many applications. However, most of the existing works utilize the state-or-the-art greedy 2-approximation algorithm to capably provide solutions with a loose theoretical density guarantee. The main drawback of most of these algorithms is that they can estimate only one subtensor, or subgraph, at a time, with a low guarantee on its density. While some methods can, on the other hand, estimate multiple subtensors, they can give a guarantee on the density with respect to the input tensor for the first estimated subsensor only. We address these drawbacks by providing both theoretical and practical solution for estimating multiple dense subtensors in tensor data and giving a higher lower bound of the density. In particular, we guarantee and prove a higher bound of the lower-bound density of the estimated subgraph and subtensors. We also propose a novel approach to show that there are multiple dense subtensors with a guarantee on its density that is greater than the lower bound used in the state-of-the-art algorithms. We evaluate our approach with extensive experiments on several real-world datasets, which demonstrates its efficiency and feasibility.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Vittorino Lanzio ◽  
Gregory Telian ◽  
Alexander Koshelev ◽  
Paolo Micheletti ◽  
Gianni Presti ◽  
...  

AbstractThe combination of electrophysiology and optogenetics enables the exploration of how the brain operates down to a single neuron and its network activity. Neural probes are in vivo invasive devices that integrate sensors and stimulation sites to record and manipulate neuronal activity with high spatiotemporal resolution. State-of-the-art probes are limited by tradeoffs involving their lateral dimension, number of sensors, and ability to access independent stimulation sites. Here, we realize a highly scalable probe that features three-dimensional integration of small-footprint arrays of sensors and nanophotonic circuits to scale the density of sensors per cross-section by one order of magnitude with respect to state-of-the-art devices. For the first time, we overcome the spatial limit of the nanophotonic circuit by coupling only one waveguide to numerous optical ring resonators as passive nanophotonic switches. With this strategy, we achieve accurate on-demand light localization while avoiding spatially demanding bundles of waveguides and demonstrate the feasibility with a proof-of-concept device and its scalability towards high-resolution and low-damage neural optoelectrodes.


2021 ◽  
Vol 7 (6) ◽  
pp. 96
Author(s):  
Alessandro Rossi ◽  
Marco Barbiero ◽  
Paolo Scremin ◽  
Ruggero Carli

Industrial 3D models are usually characterized by a large number of hidden faces and it is very important to simplify them. Visible-surface determination methods provide one of the most common solutions to the visibility problem. This study presents a robust technique to address the global visibility problem in object space that guarantees theoretical convergence to the optimal result. More specifically, we propose a strategy that, in a finite number of steps, determines if each face of the mesh is globally visible or not. The proposed method is based on the use of Plücker coordinates that allows it to provide an efficient way to determine the intersection between a ray and a triangle. This algorithm does not require pre-calculations such as estimating the normal at each face: this implies the resilience to normals orientation. We compared the performance of the proposed algorithm against a state-of-the-art technique. Results showed that our approach is more robust in terms of convergence to the maximum lossless compression.


2021 ◽  
Vol 6 (1) ◽  
pp. 47
Author(s):  
Julian Schütt ◽  
Rico Illing ◽  
Oleksii Volkov ◽  
Tobias Kosub ◽  
Pablo Nicolás Granell ◽  
...  

The detection, manipulation, and tracking of magnetic nanoparticles is of major importance in the fields of biology, biotechnology, and biomedical applications as labels as well as in drug delivery, (bio-)detection, and tissue engineering. In this regard, the trend goes towards improvements of existing state-of-the-art methodologies in the spirit of timesaving, high-throughput analysis at ultra-low volumes. Here, microfluidics offers vast advantages to address these requirements, as it deals with the control and manipulation of liquids in confined microchannels. This conjunction of microfluidics and magnetism, namely micro-magnetofluidics, is a dynamic research field, which requires novel sensor solutions to boost the detection limit of tiny quantities of magnetized objects. We present a sensing strategy relying on planar Hall effect (PHE) sensors in droplet-based micro-magnetofluidics for the detection of a multiphase liquid flow, i.e., superparamagnetic aqueous droplets in an oil carrier phase. The high resolution of the sensor allows the detection of nanoliter-sized superparamagnetic droplets with a concentration of 0.58 mg cm−3, even when they are only biased in a geomagnetic field. The limit of detection can be boosted another order of magnitude, reaching 0.04 mg cm−³ (1.4 million particles in a single 100 nL droplet) when a magnetic field of 5 mT is applied to bias the droplets. With this performance, our sensing platform outperforms the state-of-the-art solutions in droplet-based micro-magnetofluidics by a factor of 100. This allows us to detect ferrofluid droplets in clinically and biologically relevant concentrations, and even in lower concentrations, without the need of externally applied magnetic fields.


1995 ◽  
Vol 396 ◽  
Author(s):  
Charles W. Allen ◽  
Loren L. Funk ◽  
Edward A. Ryan

AbstractDuring 1995, a state-of-the-art intermediate voltage electron microscope (IVEM) has been installed in the HVEM-Tandem Facility with in situ ion irradiation capabilities similar to those of the HVEM. A 300 kV Hitachi H-9000NAR has been interfaced to the two ion accelerators of the Facility, with a spatial resolution for imaging which is nearly an order of magnitude better than that for the 1.2 MV HVEM which dates from the early 1970s. The HVEM remains heavily utilized for electron- and ion irradiation-related materials studies, nevertheless, especially those for which less demanding microscopy is adequate. The capabilities and limitations of this IVEM and HVEM are compared. Both the HVEM and IVEM are part of the DOE funded User Facility and therefore are available to the scientific community for materials studies, free of charge for non-proprietary research.


Author(s):  
Alexander Rügamer ◽  
Cécile Mongrédien ◽  
Santiago Urquijo ◽  
Günter Rohmer

Having given a short overview of GNSS signals and state-of-the-art multi-band front-end architectures, this paper presents a novel contribution to efficient multi-band GNSS reception. A general overlay based front-end architecture is introduced that enables the joint reception of two signals broadcast in separate frequency bands, sharing just one common baseband stage. The consequences of this overlay are analyzed for both signal and noise components. Signal overlay is shown to have a negligible impact on signal quality. It is shown that the noise floor superposition results in non-negligible degradations. However, it is also demonstrated that these degradations can be minimized by judiciously setting the relative gain between the two signal paths. As an illustration, the analytical optimal path-control expression to combine overlaid signals in an ionospheric-free pseudorange is derived for both Cramér-Rao Lower Bound and practical code tracking parameters. Finally, some practical overlay receiver and path control aspects are discussed.


2020 ◽  
Vol 34 (03) ◽  
pp. 2327-2334
Author(s):  
Vidal Alcázar ◽  
Pat Riddle ◽  
Mike Barley

In the past few years, new very successful bidirectional heuristic search algorithms have been proposed. Their key novelty is a lower bound on the cost of a solution that includes information from the g values in both directions. Kaindl and Kainz (1997) proposed measuring how inaccurate a heuristic is while expanding nodes in the opposite direction, and using this information to raise the f value of the evaluated nodes. However, this comes with a set of disadvantages and remains yet to be exploited to its full potential. Additionally, Sadhukhan (2013) presented BAE∗, a bidirectional best-first search algorithm based on the accumulated heuristic inaccuracy along a path. However, no complete comparison in regards to other bidirectional algorithms has yet been done, neither theoretical nor empirical. In this paper we define individual bounds within the lower-bound framework and show how both Kaindl and Kainz's and Sadhukhan's methods can be generalized thus creating new bounds. This overcomes previous shortcomings and allows newer algorithms to benefit from these techniques as well. Experimental results show a substantial improvement, up to an order of magnitude in the number of necessarily-expanded nodes compared to state-of-the-art near-optimal algorithms in common benchmarks.


2009 ◽  
Vol 145 (6) ◽  
pp. 1401-1441 ◽  
Author(s):  
V. Blomer ◽  
J. Brüdern ◽  
R. Dietmann

AbstractLet R(n,θ) denote the number of representations of the natural number n as the sum of four squares, each composed only with primes not exceeding nθ/2. When θ>e−1/3 a lower bound for R(n,θ) of the expected order of magnitude is established, and when θ>365/592, it is shown that R(n,θ)>0 holds for large n. A similar result is obtained for sums of three squares. An asymptotic formula is obtained for the related problem of representing an integer as the sum of two squares and two squares composed of small primes, as above, for any fixed θ>0. This last result is the key to bound R(n,θ) from below.


Sign in / Sign up

Export Citation Format

Share Document