scholarly journals Sapling: accelerating suffix array queries with learned data models

Author(s):  
Melanie Kirsche ◽  
Arun Das ◽  
Michael C Schatz

Abstract Motivation As genomic data becomes more abundant, efficient algorithms and data structures for sequence alignment become increasingly important. The suffix array is a widely used data structure to accelerate alignment, but the binary search algorithm used to query, it requires widespread memory accesses, causing a large number of cache misses on large datasets. Results Here, we present Sapling, an algorithm for sequence alignment, which uses a learned data model to augment the suffix array and enable faster queries. We investigate different types of data models, providing an analysis of different neural network models as well as providing an open-source aligner with a compact, practical piecewise linear model. We show that Sapling outperforms both an optimized binary search approach and multiple widely used read aligners on a diverse collection of genomes, including human, bacteria and plants, speeding up the algorithm by more than a factor of two while adding <1% to the suffix array’s memory footprint. Availability and implementation The source code and tutorial are available open-source at https://github.com/mkirsche/sapling. Supplementary information Supplementary data are available at Bioinformatics online.

2020 ◽  
Author(s):  
Melanie Kirsche ◽  
Arun Das ◽  
Michael C. Schatz

AbstractMotivationAs genomic data becomes more abundant, efficient algorithms and data structures for sequence alignment become increasingly important. The suffix array is a widely used data structure to accelerate alignment, but the binary search algorithm used to query it requires widespread memory accesses, causing a large number of cache misses on large datasets.ResultsHere we present Sapling, an algorithm for sequence alignment which uses a learned data model to augment the suffix array and enable faster queries. We investigate different types of data models, providing an analysis of different neural network models as well as providing an open-source aligner with a compact, practical piecewise linear model. We show that Sapling outperforms both an optimized binary search approach and multiple existing read aligners on a wide collection of genomes, including human, bacteria, and plants, speeding up the algorithm by more than a factor of two while adding less than 1% to the suffix array’s memory footprint.Availability and implementationThe source code and tutorial are available open-source at https://github.com/mkirsche/sapling.Supplementary InformationSupplementary notes and figures are available online.


Author(s):  
Sacha J. van Albada ◽  
Jari Pronold ◽  
Alexander van Meegen ◽  
Markus Diesmann

AbstractWe are entering an age of ‘big’ computational neuroscience, in which neural network models are increasing in size and in numbers of underlying data sets. Consolidating the zoo of models into large-scale models simultaneously consistent with a wide range of data is only possible through the effort of large teams, which can be spread across multiple research institutions. To ensure that computational neuroscientists can build on each other’s work, it is important to make models publicly available as well-documented code. This chapter describes such an open-source model, which relates the connectivity structure of all vision-related cortical areas of the macaque monkey with their resting-state dynamics. We give a brief overview of how to use the executable model specification, which employs NEST as simulation engine, and show its runtime scaling. The solutions found serve as an example for organizing the workflow of future models from the raw experimental data to the visualization of the results, expose the challenges, and give guidance for the construction of an ICT infrastructure for neuroscience.


2020 ◽  
Vol 36 (12) ◽  
pp. 3693-3702 ◽  
Author(s):  
Dandan Zheng ◽  
Guansong Pang ◽  
Bo Liu ◽  
Lihong Chen ◽  
Jian Yang

Abstract Motivation Identification of virulence factors (VFs) is critical to the elucidation of bacterial pathogenesis and prevention of related infectious diseases. Current computational methods for VF prediction focus on binary classification or involve only several class(es) of VFs with sufficient samples. However, thousands of VF classes are present in real-world scenarios, and many of them only have a very limited number of samples available. Results We first construct a large VF dataset, covering 3446 VF classes with 160 495 sequences, and then propose deep convolutional neural network models for VF classification. We show that (i) for common VF classes with sufficient samples, our models can achieve state-of-the-art performance with an overall accuracy of 0.9831 and an F1-score of 0.9803; (ii) for uncommon VF classes with limited samples, our models can learn transferable features from auxiliary data and achieve good performance with accuracy ranging from 0.9277 to 0.9512 and F1-score ranging from 0.9168 to 0.9446 when combined with different predefined features, outperforming traditional classifiers by 1–13% in accuracy and by 1–16% in F1-score. Availability and implementation All of our datasets are made publicly available at http://www.mgc.ac.cn/VFNet/, and the source code of our models is publicly available at https://github.com/zhengdd0422/VFNet. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Simen Tennøe ◽  
Geir Halnes ◽  
Gaute T. Einevoll

AbstractComputational models in neuroscience typically contain many parameters that are poorly constrained by experimental data. Uncertainty quantification and sensitivity analysis provide rigorous procedures to quantify how the model output depends on this parameter uncertainty. Unfortunately, the application of such methods is not yet standard within the field of neuroscience.Here we present Uncertainpy, an open-source Python toolbox, tailored to perform uncertainty quantification and sensitivity analysis of neuroscience models. Uncertainpy aims to make it easy and quick to get started with uncertainty analysis, without any need for detailed prior knowledge. The toolbox allows uncertainty quantification and sensitivity analysis to be performed on already existing models without needing to modify the model equations or model implementation. Uncertainpy bases its analysis on polynomial chaos expansions, which are more efficient than the more standard Monte-Carlo based approaches.Uncertainpy is tailored for neuroscience applications by its built-in capability for calculating characteristic features in the model output. The toolbox does not merely perform a point-to- point comparison of the “raw” model output (e.g. membrane voltage traces), but can also calculate the uncertainty and sensitivity of salient model response features such as spike timing, action potential width, mean interspike interval, and other features relevant for various neural and neural network models. Uncertainpy comes with several common models and features built in, and including custom models and new features is easy.The aim of the current paper is to present Uncertainpy for the neuroscience community in a user- oriented manner. To demonstrate its broad applicability, we perform an uncertainty quantification and sensitivity analysis on three case studies relevant for neuroscience: the original Hodgkin-Huxley point-neuron model for action potential generation, a multi-compartmental model of a thalamic interneuron implemented in the NEURON simulator, and a sparsely connected recurrent network model implemented in the NEST simulator.SIGNIFICANCE STATEMENTA major challenge in computational neuroscience is to specify the often large number of parameters that define the neuron and neural network models. Many of these parameters have an inherent variability, and some may even be actively regulated and change with time. It is important to know how the uncertainty in model parameters affects the model predictions. To address this need we here present Uncertainpy, an open-source Python toolbox tailored to perform uncertainty quantification and sensitivity analysis of neuroscience models.


2012 ◽  
Vol 466-467 ◽  
pp. 789-793
Author(s):  
Hui Qin Sun ◽  
Zhi Hong Xue ◽  
Ke Jun Sun ◽  
Su Zhi Wang ◽  
Yun Du

BP neural network is currently the most widely used of neural network models in practical application in transformer fault diagnosis. BP algorithm is a local search algorithm which is easy to make the network into the local minimum values. Network training results are poor. It discusses PSO-BP algorithm which combines the particle swarm optimization (PSO) algorithm with the BP algorithm in this paper. It uses PSO algorithm to optimize the BP network’s weights and threshold. It is used in power transformer fault diagnosis. Experimental data results show that PSO-BP network fault diagnosis accuracy is higher than BP algorithm.


Author(s):  
Asha

The optimization of the problems significantly improves the solution of the complex problems. The reduction in the feature dimensionality is enormously salient to reduce the redundant features and improve the system accuracy. In this paper, an amalgamation of different concepts is proposed to optimize the features and improve the system classification. The experiment is performed on the facial expression detection application by proposing the amalgamation of deep neural network models with the variants of the gravitational search algorithm. Facial expressions are the movement of the facial components such as lips, nose, eyes that are considered as the features to classify human emotions into different classes. The initial feature extraction is performed with the local binary pattern. The extracted feature set is optimized with the variants of gravitational search algorithm (GSA) as standard gravitational search algorithm (SGSA), binary gravitational search algorithm (BGSA) and fast discrete gravitational search algorithm (FDGSA). The deep neural network models of deep convolutional neural network (DCNN) and extended deep convolutional neural network (EDCNN) are employed for the classification of emotions from imagery datasets of JAFFE and KDEF. The fixed pose images of both the datasets are acquired and comparison based on average recognition accuracy is performed. The comparative analysis of the mentioned techniques and state-of-the-art techniques illustrates the superior recognition accuracy of the FDGSA with the EDCNN technique.


2021 ◽  
Vol 11 (19) ◽  
pp. 8861
Author(s):  
Philipp Ruf ◽  
Manav Madan ◽  
Christoph Reich ◽  
Djaffar Ould-Abdeslam

Nowadays, machine learning projects have become more and more relevant to various real-world use cases. The success of complex Neural Network models depends upon many factors, as the requirement for structured and machine learning-centric project development management arises. Due to the multitude of tools available for different operational phases, responsibilities and requirements become more and more unclear. In this work, Machine Learning Operations (MLOps) technologies and tools for every part of the overall project pipeline, as well as involved roles, are examined and clearly defined. With the focus on the inter-connectivity of specific tools and comparison by well-selected requirements of MLOps, model performance, input data, and system quality metrics are briefly discussed. By identifying aspects of machine learning, which can be reused from project to project, open-source tools which help in specific parts of the pipeline, and possible combinations, an overview of support in MLOps is given. Deep learning has revolutionized the field of Image processing, and building an automated machine learning workflow for object detection is of great interest for many organizations. For this, a simple MLOps workflow for object detection with images is portrayed.


2020 ◽  
Vol 5 ◽  
pp. 140-147 ◽  
Author(s):  
T.N. Aleksandrova ◽  
◽  
E.K. Ushakov ◽  
A.V. Orlova ◽  
◽  
...  

Sign in / Sign up

Export Citation Format

Share Document