Sapling: accelerating suffix array queries with learned data models

Abstract Motivation As genomic data becomes more abundant, efficient algorithms and data structures for sequence alignment become increasingly important. The suffix array is a widely used data structure to accelerate alignment, but the binary search algorithm used to query, it requires widespread memory accesses, causing a large number of cache misses on large datasets. Results Here, we present Sapling, an algorithm for sequence alignment, which uses a learned data model to augment the suffix array and enable faster queries. We investigate different types of data models, providing an analysis of different neural network models as well as providing an open-source aligner with a compact, practical piecewise linear model. We show that Sapling outperforms both an optimized binary search approach and multiple widely used read aligners on a diverse collection of genomes, including human, bacteria and plants, speeding up the algorithm by more than a factor of two while adding <1% to the suffix array’s memory footprint. Availability and implementation The source code and tutorial are available open-source at https://github.com/mkirsche/sapling. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Sapling: Accelerating Suffix Array Queries with Learned Data Models

10.1101/2020.01.29.925768 ◽

2020 ◽

Author(s):

Melanie Kirsche ◽

Arun Das ◽

Michael C. Schatz

Keyword(s):

Open Source ◽

Sequence Alignment ◽

Search Algorithm ◽

Piecewise Linear ◽

Network Models ◽

Suffix Array ◽

Binary Search ◽

Data Models ◽

Supplementary Information ◽

Neural Network Models

AbstractMotivationAs genomic data becomes more abundant, efficient algorithms and data structures for sequence alignment become increasingly important. The suffix array is a widely used data structure to accelerate alignment, but the binary search algorithm used to query it requires widespread memory accesses, causing a large number of cache misses on large datasets.ResultsHere we present Sapling, an algorithm for sequence alignment which uses a learned data model to augment the suffix array and enable faster queries. We investigate different types of data models, providing an analysis of different neural network models as well as providing an open-source aligner with a compact, practical piecewise linear model. We show that Sapling outperforms both an optimized binary search approach and multiple existing read aligners on a wide collection of genomes, including human, bacteria, and plants, speeding up the algorithm by more than a factor of two while adding less than 1% to the suffix array’s memory footprint.Availability and implementationThe source code and tutorial are available open-source at https://github.com/mkirsche/sapling.Supplementary InformationSupplementary notes and figures are available online.

Download Full-text

Usage and Scaling of an Open-Source Spiking Multi-Area Model of Monkey Cortex

Lecture Notes in Computer Science - Brain-Inspired Computing ◽

10.1007/978-3-030-82427-3_4 ◽

2021 ◽

pp. 47-59

Author(s):

Sacha J. van Albada ◽

Jari Pronold ◽

Alexander van Meegen ◽

Markus Diesmann

Keyword(s):

Open Source ◽

Large Scale ◽

Network Models ◽

Macaque Monkey ◽

Source Model ◽

Model Specification ◽

Data Sets ◽

Neural Network Models ◽

Wide Range ◽

Ict Infrastructure

AbstractWe are entering an age of ‘big’ computational neuroscience, in which neural network models are increasing in size and in numbers of underlying data sets. Consolidating the zoo of models into large-scale models simultaneously consistent with a wide range of data is only possible through the effort of large teams, which can be spread across multiple research institutions. To ensure that computational neuroscientists can build on each other’s work, it is important to make models publicly available as well-documented code. This chapter describes such an open-source model, which relates the connectivity structure of all vision-related cortical areas of the macaque monkey with their resting-state dynamics. We give a brief overview of how to use the executable model specification, which employs NEST as simulation engine, and show its runtime scaling. The solutions found serve as an example for organizing the workflow of future models from the raw experimental data to the visualization of the results, expose the challenges, and give guidance for the construction of an ICT infrastructure for neuroscience.

Download Full-text

Learning transferable deep convolutional neural networks for the classification of bacterial virulence factors

Bioinformatics ◽

10.1093/bioinformatics/btaa230 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3693-3702 ◽

Cited By ~ 2

Author(s):

Dandan Zheng ◽

Guansong Pang ◽

Bo Liu ◽

Lihong Chen ◽

Jian Yang

Keyword(s):

Virulence Factors ◽

State Of The Art ◽

Binary Classification ◽

Network Models ◽

Bacterial Virulence ◽

Supplementary Information ◽

Deep Convolutional Neural Networks ◽

Neural Network Models ◽

Auxiliary Data

Abstract Motivation Identification of virulence factors (VFs) is critical to the elucidation of bacterial pathogenesis and prevention of related infectious diseases. Current computational methods for VF prediction focus on binary classification or involve only several class(es) of VFs with sufficient samples. However, thousands of VF classes are present in real-world scenarios, and many of them only have a very limited number of samples available. Results We first construct a large VF dataset, covering 3446 VF classes with 160 495 sequences, and then propose deep convolutional neural network models for VF classification. We show that (i) for common VF classes with sufficient samples, our models can achieve state-of-the-art performance with an overall accuracy of 0.9831 and an F1-score of 0.9803; (ii) for uncommon VF classes with limited samples, our models can learn transferable features from auxiliary data and achieve good performance with accuracy ranging from 0.9277 to 0.9512 and F1-score ranging from 0.9168 to 0.9446 when combined with different predefined features, outperforming traditional classifiers by 1–13% in accuracy and by 1–16% in F1-score. Availability and implementation All of our datasets are made publicly available at http://www.mgc.ac.cn/VFNet/, and the source code of our models is publicly available at https://github.com/zhengdd0422/VFNet. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Mining patterns in open source software using software metrics and neural network models

International Journal of System of Systems Engineering ◽

10.1504/ijsse.2020.112311 ◽

2020 ◽

Vol 10 (4) ◽

pp. 397

Author(s):

Ashish Kumar Dwivedi ◽

Shashank Mouli Satapathy

Keyword(s):

Neural Network ◽

Open Source ◽

Open Source Software ◽

Software Metrics ◽

Network Models ◽

Neural Network Models

Download Full-text

Uncertainpy: A Python toolbox for uncertainty quantification and sensitivity analysis in computational neuroscience

10.1101/274779 ◽

2018 ◽

Cited By ~ 3

Author(s):

Simen Tennøe ◽

Geir Halnes ◽

Gaute T. Einevoll

Keyword(s):

Neural Network ◽

Sensitivity Analysis ◽

Action Potential ◽

Uncertainty Quantification ◽

Open Source ◽

Computational Neuroscience ◽

Network Models ◽

Model Parameters ◽

Model Output ◽

Neural Network Models

AbstractComputational models in neuroscience typically contain many parameters that are poorly constrained by experimental data. Uncertainty quantification and sensitivity analysis provide rigorous procedures to quantify how the model output depends on this parameter uncertainty. Unfortunately, the application of such methods is not yet standard within the field of neuroscience.Here we present Uncertainpy, an open-source Python toolbox, tailored to perform uncertainty quantification and sensitivity analysis of neuroscience models. Uncertainpy aims to make it easy and quick to get started with uncertainty analysis, without any need for detailed prior knowledge. The toolbox allows uncertainty quantification and sensitivity analysis to be performed on already existing models without needing to modify the model equations or model implementation. Uncertainpy bases its analysis on polynomial chaos expansions, which are more efficient than the more standard Monte-Carlo based approaches.Uncertainpy is tailored for neuroscience applications by its built-in capability for calculating characteristic features in the model output. The toolbox does not merely perform a point-to- point comparison of the “raw” model output (e.g. membrane voltage traces), but can also calculate the uncertainty and sensitivity of salient model response features such as spike timing, action potential width, mean interspike interval, and other features relevant for various neural and neural network models. Uncertainpy comes with several common models and features built in, and including custom models and new features is easy.The aim of the current paper is to present Uncertainpy for the neuroscience community in a user- oriented manner. To demonstrate its broad applicability, we perform an uncertainty quantification and sensitivity analysis on three case studies relevant for neuroscience: the original Hodgkin-Huxley point-neuron model for action potential generation, a multi-compartmental model of a thalamic interneuron implemented in the NEURON simulator, and a sparsely connected recurrent network model implemented in the NEST simulator.SIGNIFICANCE STATEMENTA major challenge in computational neuroscience is to specify the often large number of parameters that define the neuron and neural network models. Many of these parameters have an inherent variability, and some may even be actively regulated and change with time. It is important to know how the uncertainty in model parameters affects the model predictions. To address this need we here present Uncertainpy, an open-source Python toolbox tailored to perform uncertainty quantification and sensitivity analysis of neuroscience models.

Download Full-text

Mining Patterns in Open Source Software using Software Metrics and Neural Network Models

International Journal of System of Systems Engineering ◽

10.1504/ijsse.2020.10033954 ◽

2020 ◽

Vol 10 (4) ◽

pp. 1

Author(s):

Shashank Mouli Satapathy ◽

Ashish Kumar Dwivedi

Keyword(s):

Neural Network ◽

Open Source ◽

Open Source Software ◽

Software Metrics ◽

Network Models ◽

Neural Network Models

Download Full-text

Fault Diagnosis Analysis of Power Transformer Based on PSO-BP Algorithm

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.466-467.789 ◽

2012 ◽

Vol 466-467 ◽

pp. 789-793

Author(s):

Hui Qin Sun ◽

Zhi Hong Xue ◽

Ke Jun Sun ◽

Su Zhi Wang ◽

Yun Du

Keyword(s):

Neural Network ◽

Fault Diagnosis ◽

Search Algorithm ◽

Power Transformer ◽

Network Models ◽

Pso Algorithm ◽

Bp Algorithm ◽

Neural Network Models ◽

Bp Network ◽

Transformer Fault

BP neural network is currently the most widely used of neural network models in practical application in transformer fault diagnosis. BP algorithm is a local search algorithm which is easy to make the network into the local minimum values. Network training results are poor. It discusses PSO-BP algorithm which combines the particle swarm optimization (PSO) algorithm with the BP algorithm in this paper. It uses PSO algorithm to optimize the BP network’s weights and threshold. It is used in power transformer fault diagnosis. Experimental data results show that PSO-BP network fault diagnosis accuracy is higher than BP algorithm.

Download Full-text

Deep neural networks-based classification optimization by reducing the feature dimensionality with the variants of gravitational search algorithm

International Journal of Modern Physics C ◽

10.1142/s0129183121501370 ◽

2021 ◽

pp. 2150137

Author(s):

Asha

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Deep Neural Network ◽

Recognition Accuracy ◽

Search Algorithm ◽

Gravitational Search Algorithm ◽

Network Models ◽

Deep Convolutional Neural Network ◽

Neural Network Models ◽

Gravitational Search

The optimization of the problems significantly improves the solution of the complex problems. The reduction in the feature dimensionality is enormously salient to reduce the redundant features and improve the system accuracy. In this paper, an amalgamation of different concepts is proposed to optimize the features and improve the system classification. The experiment is performed on the facial expression detection application by proposing the amalgamation of deep neural network models with the variants of the gravitational search algorithm. Facial expressions are the movement of the facial components such as lips, nose, eyes that are considered as the features to classify human emotions into different classes. The initial feature extraction is performed with the local binary pattern. The extracted feature set is optimized with the variants of gravitational search algorithm (GSA) as standard gravitational search algorithm (SGSA), binary gravitational search algorithm (BGSA) and fast discrete gravitational search algorithm (FDGSA). The deep neural network models of deep convolutional neural network (DCNN) and extended deep convolutional neural network (EDCNN) are employed for the classification of emotions from imagery datasets of JAFFE and KDEF. The fixed pose images of both the datasets are acquired and comparison based on average recognition accuracy is performed. The comparative analysis of the mentioned techniques and state-of-the-art techniques illustrates the superior recognition accuracy of the FDGSA with the EDCNN technique.

Download Full-text

Demystifying MLOps and Presenting a Recipe for the Selection of Open-Source Tools

Applied Sciences ◽

10.3390/app11198861 ◽

2021 ◽

Vol 11 (19) ◽

pp. 8861

Author(s):

Philipp Ruf ◽

Manav Madan ◽

Christoph Reich ◽

Djaffar Ould-Abdeslam

Keyword(s):

Machine Learning ◽

Object Detection ◽

Open Source ◽

Model Performance ◽

Network Models ◽

System Quality ◽

Neural Network Models ◽

Automated Machine Learning ◽

Learning Projects ◽

Selection Of

Nowadays, machine learning projects have become more and more relevant to various real-world use cases. The success of complex Neural Network models depends upon many factors, as the requirement for structured and machine learning-centric project development management arises. Due to the multitude of tools available for different operational phases, responsibilities and requirements become more and more unclear. In this work, Machine Learning Operations (MLOps) technologies and tools for every part of the overall project pipeline, as well as involved roles, are examined and clearly defined. With the focus on the inter-connectivity of specific tools and comparison by well-selected requirements of MLOps, model performance, input data, and system quality metrics are briefly discussed. By identifying aspects of machine learning, which can be reused from project to project, open-source tools which help in specific parts of the pipeline, and possible combinations, an overview of support in MLOps is given. Deep learning has revolutionized the field of Image processing, and building an automated machine learning workflow for object detection is of great interest for many organizations. For this, a simple MLOps workflow for object detection with images is portrayed.

Download Full-text