computational bottleneck
Recently Published Documents

In this paper, we demonstrate the feasibility and efficiency of approximate computing techniques (ACTs) in the embedded Support Vector Machine (SVM) tensorial kernel circuit implementation in tactile sensing systems. Improving the performance of the embedded SVM in terms of power, area, and delay can be achieved by implementing approximate multipliers in the SVD. Singular Value Decomposition (SVD) is the main computational bottleneck of the tensorial kernel approach; since digital multipliers are extensively used in SVD implementation, we aim to optimize the implementation of the multiplier circuit. We present the implementation of the approximate SVD circuit based on the Approximate Baugh-Wooley (Approx-BW) multiplier. The approximate SVD achieves an energy consumption reduction of up to 16% at the cost of a Mean Relative Error decrease (MRE) of less than 5%. We assess the impact of the approximate SVD on the accuracy of the classification; showing that approximate SVD increases the Error rate (Err) within a range of one to eight percent. Besides, we propose a hybrid evaluation test approach that consists of implementing three different approximate SVD circuits having different numbers of approximated Least Significant Bits (LSBs). The results show that energy consumption is reduced by more than five percent with the same accuracy loss.

Download Full-text

Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2

10.1101/2021.12.14.472718 ◽

2021 ◽

Author(s):

Jamshed Khan ◽

Marek Kokot ◽

Sebastian Deorowicz ◽

Rob Patro

Keyword(s):

Open Source Software ◽

De Bruijn Graph ◽

Sequencing Data ◽

Bacterial Genomes ◽

De Bruijn Graphs ◽

Genomic Analyses ◽

Efficient Construction ◽

De Bruijn ◽

Computational Bottleneck ◽

Reference Genomes

The de Bruijn graph has become a key data structure in modern computational genomics, and of keen interest is its compacted variant. The compacted de Bruijn graph provides a lossless representation of the graph, and it is often considerably more efficient to store and process than its non-compacted counterpart. Construction of the compacted de Bruijn graph resides upstream of many genomic analyses. As the quantity of sequencing data and the number of reference genomes on which to perform these analyses grow rapidly, efficient construction of the compacted graph becomes a computational bottleneck for these tasks. We present Cuttlefish 2, significantly advancing the existing state-of-the-art methods for construction of this graph. On a typical shared-memory machine, it reduces the construction of the compacted de Bruijn graph for 661K bacterial genomes (2.58 Tbp of input reference genomes) from about 4.5 days to 17—23 hours. Similarly on sequencing data, it constructs the graph for a 1.52 Tbp white spruce read set in about 10 hours, while the closest competitor, which also uses considerably more memory, requires 54—58 hours. Cuttlefish 2 is implemented in C++14, and is available as open-source software under a BSD-3-Clause license at https://github.com/COMBINE-lab/cuttlefish.

Download Full-text

Subsampling to Enhance Efficiency in Input Uncertainty Quantification

Operations Research ◽

10.1287/opre.2021.2168 ◽

2021 ◽

Author(s):

Henry Lam ◽

Huajie Qian

Keyword(s):

Uncertainty Quantification ◽

Estimation Error ◽

Data Driven ◽

Sampling Effort ◽

Estimation Errors ◽

Input Estimation ◽

Enhance Efficiency ◽

Computational Bottleneck ◽

The Impact ◽

Output Variance

Quantifying the impact of input estimation errors in data-driven stochastic simulation often encounters substantial computational challenges due to the entanglement of Monte Carlo and input data noises. In this paper, we propose a subsampling framework to bypass this computational bottleneck, by leveraging the form of the output variance and its estimation error in terms of data size and sampling effort. Compared with standard subsampling in the literature, our motivation is distinctly to reduce the sampling complexity of the two-layer bootstrap required in simulation uncertainty quantification. Compared with standard bootstraps, our subsampling approach provably and experimentally leads to more accurate variance and confidence interval estimations under the same amount of simulation budget.

Download Full-text

PPalign: optimal alignment of Potts models representing proteins with direct coupling information

BMC Bioinformatics ◽

10.1186/s12859-021-04222-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Hugo Talibart ◽

François Coste

Keyword(s):

Markov Models ◽

Pairwise Alignment ◽

Homology Search ◽

Sequence Alignments ◽

Potts Models ◽

Functional Annotations ◽

Current State ◽

Linear Programming Formulation ◽

Computational Bottleneck ◽

New Research

Abstract Background To assign structural and functional annotations to the ever increasing amount of sequenced proteins, the main approach relies on sequence-based homology search methods, e.g. BLAST or the current state-of-the-art methods based on profile Hidden Markov Models, which rely on significant alignments of query sequences to annotated proteins or protein families. While powerful, these approaches do not take coevolution between residues into account. Taking advantage of recent advances in the field of contact prediction, we propose here to represent proteins by Potts models, which model direct couplings between positions in addition to positional composition, and to compare proteins by aligning these models. Due to non-local dependencies, the problem of aligning Potts models is hard and remains the main computational bottleneck for their use. Methods We introduce here an Integer Linear Programming formulation of the problem and PPalign, a program based on this formulation, to compute the optimal pairwise alignment of Potts models representing proteins in tractable time. The approach is assessed with respect to a non-redundant set of reference pairwise sequence alignments from SISYPHUS benchmark which have lowest sequence identity (between $$3\%$$ 3 % and $$20\%$$ 20 % ) and enable to build reliable Potts models for each sequence to be aligned. This experimentation confirms that Potts models can be aligned in reasonable time ($$1'37''$$ 1 ′ 37 ′ ′ in average on these alignments). The contribution of couplings is evaluated in comparison with HHalign and independent-site PPalign. Although Potts models were not fully optimized for alignment purposes and simple gap scores were used, PPalign yields a better mean $$F_1$$ F 1 score and finds significantly better alignments than HHalign and PPalign without couplings in some cases. Conclusions These results show that pairwise couplings from protein Potts models can be used to improve the alignment of remotely related protein sequences in tractable time. Our experimentation suggests yet that new research on the inference of Potts models is now needed to make them more comparable and suitable for homology search. We think that PPalign’s guaranteed optimality will be a powerful asset to perform unbiased investigations in this direction.

Download Full-text

Low rank compression in the numerical solution of the nonequilibrium Dyson equation

SciPost Physics ◽

10.21468/scipostphys.10.4.091 ◽

2021 ◽

Vol 10 (4) ◽

Author(s):

Jason Kaye ◽

Denis Golez

Keyword(s):

Computing Time ◽

Mean Field Theory ◽

Mean Field ◽

Dyson Equation ◽

Low Rank ◽

Strong Coupling Regime ◽

Gw Approximation ◽

Empirical Observation ◽

Time Stepping ◽

Computational Bottleneck

We propose a method to improve the computational and memory efficiency of numerical solvers for the nonequilibrium Dyson equation in the Keldysh formalism. It is based on the empirical observation that the nonequilibrium Green's functions and self energies arising in many problems of physical interest, discretized as matrices, have low rank off-diagonal blocks, and can therefore be compressed using a hierarchical low rank data structure. We describe an efficient algorithm to build this compressed representation on the fly during the course of time stepping, and use the representation to reduce the cost of computing history integrals, which is the main computational bottleneck. For systems with the hierarchical low rank property, our method reduces the computational complexity of solving the nonequilibrium Dyson equation from cubic to near quadratic, and the memory complexity from quadratic to near linear. We demonstrate the full solver for the Falicov-Kimball model exposed to a rapid ramp and Floquet driving of system parameters, and are able to increase feasible propagation times substantially. We present examples with 262 144 time steps, which would require approximately five months of computing time and 2.2 TB of memory using the direct time stepping method, but can be completed in just over a day on a laptop with less than 4 GB of memory using our method. We also confirm the hierarchical low rank property for the driven Hubbard model in the weak coupling regime within the GW approximation, and in the strong coupling regime within dynamical mean-field theory.

Download Full-text

Accelerating advanced preconditioning methods on hybrid architectures

CLEI electronic journal ◽

10.19153/cleiej.24.1.6 ◽

2021 ◽

Vol 24 (1) ◽

Author(s):

Ernesto Dufrechou

Keyword(s):

Linear Systems ◽

Large Scale ◽

Krylov Subspace ◽

Linear Equations ◽

Memory Systems ◽

Sparse Linear Systems ◽

Data Parallel ◽

Systems Of Linear Equations ◽

Sparse Systems ◽

Computational Bottleneck

Many problems, in diverse areas of science and engineering, involve the solution of largescale sparse systems of linear equations. In most of these scenarios, they are also a computational bottleneck, and therefore their efficient solution on parallel architectureshas motivated a tremendous volume of research.This dissertation targets the use of GPUs to enhance the performance of the solution of sparse linear systems using iterative methods complemented with state-of-the-art preconditioned techniques. In particular, we study ILUPACK, a package for the solution of sparse linear systems via Krylov subspace methods that relies on a modern inverse-based multilevel ILU (incomplete LU) preconditioning technique.We present new data-parallel versions of the preconditioner and the most important solvers contained in the package that significantly improve its performance without affecting its accuracy. Additionally we enhance existing task-parallel versions of ILUPACK for shared- and distributed-memory systems with the inclusion of GPU acceleration. The results obtained show a sensible reduction in the runtime of the methods, as well as the possibility of addressing large-scale problems efficiently.

Download Full-text

An Overcomplete Approach to Fitting Drift-Diffusion Decision Models to Trial-By-Trial Data

Frontiers in Artificial Intelligence ◽

10.3389/frai.2021.531316 ◽

2021 ◽

Vol 4 ◽

Author(s):

Q. Feltgen ◽

J. Daunizeau

Keyword(s):

Parameter Estimation ◽

Data Analysis ◽

Reaction Times ◽

Mechanistic Explanation ◽

Added Value ◽

Parameter Estimates ◽

Model Parameters ◽

Drift Diffusion ◽

Neural Noise ◽

Computational Bottleneck

Drift-diffusion models or DDMs are becoming a standard in the field of computational neuroscience. They extend models from signal detection theory by proposing a simple mechanistic explanation for the observed relationship between decision outcomes and reaction times (RT). In brief, they assume that decisions are triggered once the accumulated evidence in favor of a particular alternative option has reached a predefined threshold. Fitting a DDM to empirical data then allows one to interpret observed group or condition differences in terms of a change in the underlying model parameters. However, current approaches only yield reliable parameter estimates in specific situations (c.f. fixed drift rates vs drift rates varying over trials). In addition, they become computationally unfeasible when more general DDM variants are considered (e.g., with collapsing bounds). In this note, we propose a fast and efficient approach to parameter estimation that relies on fitting a “self-consistency” equation that RT fulfill under the DDM. This effectively bypasses the computational bottleneck of standard DDM parameter estimation approaches, at the cost of estimating the trial-specific neural noise variables that perturb the underlying evidence accumulation process. For the purpose of behavioral data analysis, these act as nuisance variables and render the model “overcomplete,” which is finessed using a variational Bayesian system identification scheme. However, for the purpose of neural data analysis, estimates of neural noise perturbation terms are a desirable (and unique) feature of the approach. Using numerical simulations, we show that this “overcomplete” approach matches the performance of current parameter estimation approaches for simple DDM variants, and outperforms them for more complex DDM variants. Finally, we demonstrate the added-value of the approach, when applied to a recent value-based decision making experiment.

Download Full-text

Optimized permutation testing for information theoretic measures of multi-gene interactions

BMC Bioinformatics ◽

10.1186/s12859-021-04107-6 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

James M. Kunert-Graf ◽

Nikita A. Sakhanenko ◽

David J. Galas

Keyword(s):

Large Scale ◽

Permutation Test ◽

Association Studies ◽

Genome Wide Association Studies ◽

Permutation Testing ◽

Exact Test ◽

Information Theoretic ◽

Information Theoretic Measures ◽

Full Analysis ◽

Computational Bottleneck

Abstract Background Permutation testing is often considered the “gold standard” for multi-test significance analysis, as it is an exact test requiring few assumptions about the distribution being computed. However, it can be computationally very expensive, particularly in its naive form in which the full analysis pipeline is re-run after permuting the phenotype labels. This can become intractable in multi-locus genome-wide association studies (GWAS), in which the number of potential interactions to be tested is combinatorially large. Results In this paper, we develop an approach for permutation testing in multi-locus GWAS, specifically focusing on SNP–SNP-phenotype interactions using multivariable measures that can be computed from frequency count tables, such as those based in Information Theory. We find that the computational bottleneck in this process is the construction of the count tables themselves, and that this step can be eliminated at each iteration of the permutation testing by transforming the count tables directly. This leads to a speed-up by a factor of over 103 for a typical permutation test compared to the naive approach. Additionally, this approach is insensitive to the number of samples making it suitable for datasets with large number of samples. Conclusions The proliferation of large-scale datasets with genotype data for hundreds of thousands of individuals enables new and more powerful approaches for the detection of multi-locus genotype-phenotype interactions. Our approach significantly improves the computational tractability of permutation testing for these studies. Moreover, our approach is insensitive to the large number of samples in these modern datasets. The code for performing these computations and replicating the figures in this paper is freely available at https://github.com/kunert/permute-counts.

Download Full-text

A machine learning-guided adaptive algorithm to reduce the computational cost of atmospheric chemistry in Earth System models: application to GEOS-Chem versions 12.0.0 and 12.9.1

10.5194/gmd-2020-425 ◽

2021 ◽

Author(s):

Lu Shen ◽

Daniel J. Jacob ◽

Mauricio Santillana ◽

Kelvin Bates ◽

Jiawei Zhuang ◽

...

Keyword(s):

Machine Learning ◽

Atmospheric Chemistry ◽

Computational Cost ◽

Coupled System ◽

Atmospheric Composition ◽

Earth System ◽

Major Barrier ◽

System Models ◽

Computational Bottleneck ◽

Earth System Models

Abstract. Atmospheric composition plays a crucial role in determining the evolution of the atmosphere, but the high computational cost has been the major barrier to include atmospheric chemistry into Earth system models. Here we present an adaptive and efficient algorithm that can remove this barrier. Our approach is inspired by unsupervised machine learning clustering techniques and traditional asymptotic analysis ideas. We first partition species into 13 blocks, using a novel machine learning approach that analyzes the species network structures and their production and loss rates. Building on these blocks, we pre-select 20 submechanisms, as defined by unique assemblages of the species blocks, and then pick locally on the fly which submechanism to use based on local chemical conditions. In each submechanism, we isolate slow species and unimportant reactions from the coupled system. Application to a global 3-D model shows that we can cut the computational costs of the chemical integration by 50 % with accuracy losses smaller than 1 % that do not propagate in time. Tests show that this algorithm is highly chemically coherent making it easily portable to new models without compromising its performance. Our algorithm will significantly ease the computational bottleneck and will facilitate the development of next generation of earth system models.

Download Full-text

Inference about Absence as a Window into the Mental Self-Model

10.31234/osf.io/zgf6s ◽

2021 ◽

Author(s):

Matan Mazor

Keyword(s):

Structure And Function ◽

Cognitive Resources ◽

Monitoring And Control ◽

Counterfactual Reasoning ◽

Cognitive States ◽

Self Knowledge ◽

Key Features ◽

Computational Bottleneck ◽

And Function ◽

And Control

To represent something as absent, one must know that they would have known if it was present. This form of counterfactual reasoning critically relies on a mental self-model: a simplified schema of one’s own cognition, which specifies expected perceptual and cognitive states under different world states and affords better monitoring and control over cognitive resources. Here I propose to use inference about absence as a unique window into the structure and function of the mental self-model. In contrast to commonly used paradigms, using inference about absence bypasses the need for explicit metacognitive reports. I draw on findings from low-level perception, spatial attention, and episodic memory, in support of the idea that self knowledge is a computational bottleneck for efficient inference about absence, making inference about absence a cross-cutting framework for probing key features of the mental self-model that are not accessible for introspection.

Download Full-text

computational bottleneckRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Approximate Computing Circuits for Embedded Tactile Data Processing

Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2

Subsampling to Enhance Efficiency in Input Uncertainty Quantification

PPalign: optimal alignment of Potts models representing proteins with direct coupling information

Low rank compression in the numerical solution of the nonequilibrium Dyson equation

Accelerating advanced preconditioning methods on hybrid architectures

An Overcomplete Approach to Fitting Drift-Diffusion Decision Models to Trial-By-Trial Data

Optimized permutation testing for information theoretic measures of multi-gene interactions

A machine learning-guided adaptive algorithm to reduce the computational cost of atmospheric chemistry in Earth System models: application to GEOS-Chem versions 12.0.0 and 12.9.1

Inference about Absence as a Window into the Mental Self-Model

computational bottleneck
Recently Published Documents