Syotti: Scalable Bait Design for DNA Enrichment

Bait-enriched sequencing is a relatively new sequencing protocol that is becoming increasingly ubiquitous as it has been shown to successfully amplify regions of interest in metagenomic samples. In this method, a set of synthetic probes ("baits") are designed, manufactured, and applied to fragmented metagenomic DNA. The probes bind to the fragmented DNA and any unbound DNA is rinsed away, leaving the bound fragments to be amplified for sequencing. This effectively enriches the DNA for which the probes were designed. Most recently, Metsky et al. (Nature Biotech 2019) demonstrated that bait-enrichment is capable of detecting a large number of human viral pathogens within metagenomic samples. In this work, we formalize the problem of designing baits by defining the Minimum Bait Cover problem, which aims to find the smallest possible set of bait sequences that cover every position of a set of reference sequences under an approximate matching model. We show that the problem is NP-hard, and that it remains NP-hard under very restrictive assumptions. This indicates that no polynomial-time exact algorithm exists for the problem, and that the problem is intractable even for small and deceptively simple inputs. In light of this, we design an efficient heuristic that takes advantage of succinct data structures. We refer to our method as Syotti. The running time of Syotti shows linear scaling in practice, running at least an order of magnitude faster than state-of-the-art methods, including the recent method of Metsky et al. At the same time, our method produces bait sets that are smaller than the ones produced by the competing methods, while also leaving fewer positions uncovered. Lastly, we show that Syotti requires only 25 minutes to design baits for a dataset comprised of 3 billion nucleotides from 1000 related bacterial substrains, whereas the method of Metsky et al. shows clearly super-linear running time and fails to process even a subset of 8% of the data in 24 hours. Our implementation is publicly available at https://github.com/jnalanko/syotti.

Download Full-text

Compressing Exact Cover Problems with Zero-suppressed Binary Decision Diagrams

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/275 ◽

2021 ◽

Author(s):

Masaaki Nishino ◽

Norihito Yasuda ◽

Kengo Nakamura

Keyword(s):

Data Structure ◽

State Of The Art ◽

Linear Time ◽

Binary Decision Diagrams ◽

Experimental Results ◽

Decision Diagrams ◽

Binary Decision ◽

Running Time ◽

Order Of Magnitude ◽

Exact Cover

Exact cover refers to the problem of finding subfamily F of a given family of sets S whose universe is D, where F forms a partition of D. Knuth’s Algorithm DLX is a state-of-the-art method for solving exact cover problems. Since DLX’s running time depends on the cardinality of input S, it can be slow if S is large. Our proposal can improve DLX by exploiting a novel data structure, DanceDD, which extends the zero-suppressed binary decision diagram (ZDD) by adding links to enable efficient modifications of the data structure. With DanceDD, we can represent S in a compressed way and perform search in linear time with the size of the structure by using link operations. The experimental results show that our method is an order of magnitude faster when the problem is highly compressed.

Download Full-text

Faster Motif Counting via Succinct Color Coding and Adaptive Sampling

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3447397 ◽

2021 ◽

Vol 15 (6) ◽

pp. 1-27

Author(s):

Marco Bressan ◽

Stefano Leucci ◽

Alessandro Panconesi

Keyword(s):

Adaptive Sampling ◽

Relative Frequency ◽

State Of The Art ◽

Color Coding ◽

Input Graph ◽

Large Graphs ◽

Running Time ◽

Uniform Sampling ◽

Current State ◽

Connected Subgraphs

We address the problem of computing the distribution of induced connected subgraphs, aka graphlets or motifs , in large graphs. The current state-of-the-art algorithms estimate the motif counts via uniform sampling by leveraging the color coding technique by Alon, Yuster, and Zwick. In this work, we extend the applicability of this approach by introducing a set of algorithmic optimizations and techniques that reduce the running time and space usage of color coding and improve the accuracy of the counts. To this end, we first show how to optimize color coding to efficiently build a compact table of a representative subsample of all graphlets in the input graph. For 8-node motifs, we can build such a table in one hour for a graph with 65M nodes and 1.8B edges, which is times larger than the state of the art. We then introduce a novel adaptive sampling scheme that breaks the “additive error barrier” of uniform sampling, guaranteeing multiplicative approximations instead of just additive ones. This allows us to count not only the most frequent motifs, but also extremely rare ones. For instance, on one graph we accurately count nearly 10.000 distinct 8-node motifs whose relative frequency is so small that uniform sampling would literally take centuries to find them. Our results show that color coding is still the most promising approach to scalable motif counting.

Download Full-text

Small footprint optoelectrodes using ring resonators for passive light localization

Microsystems & Nanoengineering ◽

10.1038/s41378-021-00263-0 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Vittorino Lanzio ◽

Gregory Telian ◽

Alexander Koshelev ◽

Paolo Micheletti ◽

Gianni Presti ◽

...

Keyword(s):

State Of The Art ◽

Three Dimensional ◽

Network Activity ◽

Ring Resonators ◽

Spatiotemporal Resolution ◽

Optical Ring Resonators ◽

Light Localization ◽

Order Of Magnitude ◽

Small Footprint

AbstractThe combination of electrophysiology and optogenetics enables the exploration of how the brain operates down to a single neuron and its network activity. Neural probes are in vivo invasive devices that integrate sensors and stimulation sites to record and manipulate neuronal activity with high spatiotemporal resolution. State-of-the-art probes are limited by tradeoffs involving their lateral dimension, number of sensors, and ability to access independent stimulation sites. Here, we realize a highly scalable probe that features three-dimensional integration of small-footprint arrays of sensors and nanophotonic circuits to scale the density of sensors per cross-section by one order of magnitude with respect to state-of-the-art devices. For the first time, we overcome the spatial limit of the nanophotonic circuit by coupling only one waveguide to numerous optical ring resonators as passive nanophotonic switches. With this strategy, we achieve accurate on-demand light localization while avoiding spatially demanding bundles of waveguides and demonstrate the feasibility with a proof-of-concept device and its scalability towards high-resolution and low-damage neural optoelectrodes.

Download Full-text

Computing Robust Principal Components by A* Search

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213018600138 ◽

2018 ◽

Vol 27 (07) ◽

pp. 1860013 ◽

Cited By ~ 1

Author(s):

Swair Shah ◽

Baokun He ◽

Crystal Maung ◽

Haim Schweitzer

Keyword(s):

State Of The Art ◽

Principal Component ◽

Low Rank ◽

A Algorithm ◽

Running Time ◽

Current State ◽

Dimensionality Reduction Technique ◽

Related Variant ◽

Low Rank Representation ◽

The Cost

Principal Component Analysis (PCA) is a classical dimensionality reduction technique that computes a low rank representation of the data. Recent studies have shown how to compute this low rank representation from most of the data, excluding a small amount of outlier data. We show how to convert this problem into graph search, and describe an algorithm that solves this problem optimally by applying a variant of the A* algorithm to search for the outliers. The results obtained by our algorithm are optimal in terms of accuracy, and are shown to be more accurate than results obtained by the current state-of-the- art algorithms which are shown not to be optimal. This comes at the cost of running time, which is typically slower than the current state of the art. We also describe a related variant of the A* algorithm that runs much faster than the optimal variant and produces a solution that is guaranteed to be near the optimal. This variant is shown experimentally to be more accurate than the current state-of-the-art and has a comparable running time.

Download Full-text

Two Orders of Magnitude Improvement in the Detection Limit of Droplet-Based Micro-Magnetofluidics with Planar Hall Effect Sensors

Engineering Proceedings ◽

10.3390/i3s2021dresden-10105 ◽

2021 ◽

Vol 6 (1) ◽

pp. 47

Author(s):

Julian Schütt ◽

Rico Illing ◽

Oleksii Volkov ◽

Tobias Kosub ◽

Pablo Nicolás Granell ◽

...

Keyword(s):

Detection Limit ◽

Hall Effect ◽

State Of The Art ◽

Limit Of Detection ◽

Research Field ◽

High Throughput Analysis ◽

Planar Hall Effect ◽

Biologically Relevant ◽

Sensing Platform ◽

Order Of Magnitude

The detection, manipulation, and tracking of magnetic nanoparticles is of major importance in the fields of biology, biotechnology, and biomedical applications as labels as well as in drug delivery, (bio-)detection, and tissue engineering. In this regard, the trend goes towards improvements of existing state-of-the-art methodologies in the spirit of timesaving, high-throughput analysis at ultra-low volumes. Here, microfluidics offers vast advantages to address these requirements, as it deals with the control and manipulation of liquids in confined microchannels. This conjunction of microfluidics and magnetism, namely micro-magnetofluidics, is a dynamic research field, which requires novel sensor solutions to boost the detection limit of tiny quantities of magnetized objects. We present a sensing strategy relying on planar Hall effect (PHE) sensors in droplet-based micro-magnetofluidics for the detection of a multiphase liquid flow, i.e., superparamagnetic aqueous droplets in an oil carrier phase. The high resolution of the sensor allows the detection of nanoliter-sized superparamagnetic droplets with a concentration of 0.58 mg cm−3, even when they are only biased in a geomagnetic field. The limit of detection can be boosted another order of magnitude, reaching 0.04 mg cm−³ (1.4 million particles in a single 100 nL droplet) when a magnetic field of 5 mT is applied to bias the droplets. With this performance, our sensing platform outperforms the state-of-the-art solutions in droplet-based micro-magnetofluidics by a factor of 100. This allows us to detect ferrofluid droplets in clinically and biologically relevant concentrations, and even in lower concentrations, without the need of externally applied magnetic fields.

Download Full-text

New Instrumentation in Argonne's Hvem-Tandem Facility: Expanded Capability for in Situ Ion Beam Studies+

MRS Proceedings ◽

10.1557/proc-396-641 ◽

1995 ◽

Vol 396 ◽

Cited By ~ 33

Author(s):

Charles W. Allen ◽

Loren L. Funk ◽

Edward A. Ryan

Keyword(s):

Scientific Community ◽

State Of The Art ◽

Ion Irradiation ◽

Ion Beam ◽

Voltage Electron ◽

User Facility ◽

Order Of Magnitude ◽

New Instrumentation ◽

Better Than

AbstractDuring 1995, a state-of-the-art intermediate voltage electron microscope (IVEM) has been installed in the HVEM-Tandem Facility with in situ ion irradiation capabilities similar to those of the HVEM. A 300 kV Hitachi H-9000NAR has been interfaced to the two ion accelerators of the Facility, with a spatial resolution for imaging which is nearly an order of magnitude better than that for the 1.2 MV HVEM which dates from the early 1970s. The HVEM remains heavily utilized for electron- and ion irradiation-related materials studies, nevertheless, especially those for which less demanding microscopy is adequate. The capabilities and limitations of this IVEM and HVEM are compared. Both the HVEM and IVEM are part of the DOE funded User Facility and therefore are available to the scientific community for materials studies, free of charge for non-proprietary research.

Download Full-text

Fast Filtering of Search Results Sorted by Attribute

ACM Transactions on Information Systems ◽

10.1145/3477982 ◽

2022 ◽

Vol 40 (2) ◽

pp. 1-24

Author(s):

Franco Maria Nardini ◽

Roberto Trani ◽

Rossano Venturini

Keyword(s):

Heuristic Algorithms ◽

State Of The Art ◽

Optimal Algorithm ◽

Computational Cost ◽

Approximation Error ◽

Optimal Solution ◽

Exact Algorithm ◽

Performance Bounds ◽

Search Results ◽

Filtering Problem

Modern search services often provide multiple options to rank the search results, e.g., sort “by relevance”, “by price” or “by discount” in e-commerce. While the traditional rank by relevance effectively places the relevant results in the top positions of the results list, the rank by attribute could place many marginally relevant results in the head of the results list leading to poor user experience. In the past, this issue has been addressed by investigating the relevance-aware filtering problem, which asks to select the subset of results maximizing the relevance of the attribute-sorted list. Recently, an exact algorithm has been proposed to solve this problem optimally. However, the high computational cost of the algorithm makes it impractical for the Web search scenario, which is characterized by huge lists of results and strict time constraints. For this reason, the problem is often solved using efficient yet inaccurate heuristic algorithms. In this article, we first prove the performance bounds of the existing heuristics. We then propose two efficient and effective algorithms to solve the relevance-aware filtering problem. First, we propose OPT-Filtering, a novel exact algorithm that is faster than the existing state-of-the-art optimal algorithm. Second, we propose an approximate and even more efficient algorithm, ϵ-Filtering, which, given an allowed approximation error ϵ, finds a (1-ϵ)–optimal filtering, i.e., the relevance of its solution is at least (1-ϵ) times the optimum. We conduct a comprehensive evaluation of the two proposed algorithms against state-of-the-art competitors on two real-world public datasets. Experimental results show that OPT-Filtering achieves a significant speedup of up to two orders of magnitude with respect to the existing optimal solution, while ϵ-Filtering further improves this result by trading effectiveness for efficiency. In particular, experiments show that ϵ-Filtering can achieve quasi-optimal solutions while being faster than all state-of-the-art competitors in most of the tested configurations.

Download Full-text

AI Feynman: A physics-inspired method for symbolic regression

Science Advances ◽

10.1126/sciadv.aay2631 ◽

2020 ◽

Vol 6 (16) ◽

pp. eaay2631 ◽

Cited By ~ 12

Author(s):

Silviu-Marian Udrescu ◽

Max Tegmark

Keyword(s):

Neural Network ◽

Artificial Intelligence ◽

Success Rate ◽

Unknown Function ◽

State Of The Art ◽

Practical Interest ◽

Symbolic Regression ◽

The State ◽

Np Hard ◽

Test Set

A core challenge for both physics and artificial intelligence (AI) is symbolic regression: finding a symbolic expression that matches data from an unknown function. Although this problem is likely to be NP-hard in principle, functions of practical interest often exhibit symmetries, separability, compositionality, and other simplifying properties. In this spirit, we develop a recursive multidimensional symbolic regression algorithm that combines neural network fitting with a suite of physics-inspired techniques. We apply it to 100 equations from the Feynman Lectures on Physics, and it discovers all of them, while previous publicly available software cracks only 71; for a more difficult physics-based test set, we improve the state-of-the-art success rate from 15 to 90%.

Download Full-text

Surface Accumulating E. coli in Water Flow Using a Bypass Mini-Channel Based Device

ASME 2014 12th International Conference on Nanochannels, Microchannels and Minichannels ◽

10.1115/icnmm2014-21965 ◽

2014 ◽

Author(s):

Mohammed S. Mayeed ◽

Golam Newaz

Keyword(s):

State Of The Art ◽

Inlet Concentration ◽

E Coli ◽

High Particle ◽

Lagrangian Equations ◽

Order Of Magnitude ◽

Computational Research ◽

Water Ratio ◽

Design Challenges ◽

Particle Boundary

The objective of this research is to design and optimize a bypass mini/micro-channel based surface accumulator of E. coli which could be easily integrated with an acoustic wave biosensor. A computational research has been carried out using the state of the art computational software, CFD-ACE with water as bacteria bearing fluid. E. coli bacteria have been modeled as random discrete particles tracked by solving the Lagrangian equations. The design challenges are to achieve high particle to water ratio in a bypass channel and accumulation of particles on a surface of the channel, high enough Reynolds number to avoid bacteria swimming, and various particle boundary conditions. The optimized designs have achieved accumulation concentration of more than an order of magnitude higher than the inlet concentration at a flow velocity much higher than the bacteria swimming speed under various particle-boundary interactions. A bypass channel has been used in this design to separate concentrated water-particle mixture and accumulate particles on a surface of the channel where the biosensor can be installed safely and precisely.

Download Full-text

MXD6 in film manufacturing: State of the art and recent advances in the synthesis and characterization of new copolyamides

Journal of Plastic Film & Sheeting ◽

10.1177/8756087919846933 ◽

2019 ◽

Vol 36 (1) ◽

pp. 16-37

Author(s):

Micaela Vannini ◽

Paola Marchese ◽

Annamaria Celli ◽

Cesare Lorenzetti

Keyword(s):

Chemical Resistance ◽

State Of The Art ◽

Large Family ◽

Isophthalic Acid ◽

Future Perspectives ◽

Gas Barrier ◽

Structures And Properties ◽

Recent Developments ◽

Order Of Magnitude

Semiaromatic polyamides belong to a large family of polymers commonly utilized in demanding engineering applications due to a unique set of outstanding mechanical properties as well as in terms of thermal and chemical resistance. Somewhat less understood is the use of certain members of this family in packaging applications such as the manufacturing of films and sheets or thin-walled containers, as well as the motivations and limitations in designing film structures containing them. This article reviews how m-xylylene diamine (MXD)-based polyamides are used in packaging applications. Attention is also given to film manufacturing and criticalities in its processing. Recent developments in MXD-based polyamides for gas barrier applications are also reported, while examining future perspectives in speciality film manufacturing. The new described copolymers based on MXD6 were synthesized by introducing different co-units, like isophthalic acid, 2,6-naphthalenedicarboxylic acid, glutaric acid, oxalic acid or 1,6-hexamethylene diamine. The performed characterization analyses (DMTA, DSC, density and OTR) allowed the polymer structures and properties to be correlated. Introducing 2,6-naphthalenedicarboxylic acid along the MXD6 chain led to the highest Tg (101°C) and density (1.233 g/cm3) and the lowest OTR (0.0035 cm3·cm/m2·day·atm, up to one order of magnitude lower).

Download Full-text