pyconsFold: A fast and easy tool for modelling and docking using distance predictions

Mapping Intimacies ◽

10.1101/2021.02.08.430195 ◽

2021 ◽

Author(s):

J Lamb ◽

A Elofsson

Keyword(s):

State Of The Art ◽

Source Code ◽

Link Type ◽

Contact Distance ◽

Protein Dimers ◽

Strong Focus ◽

Distance Distributions ◽

Regular Contact ◽

Contact Predictions ◽

Viable Method

AbstractMotivationContact predictions within a protein has recently become a viable method for accurate prediction of protein structure. Using predicted distance distributions has been shown in many cases to be superior to only using a binary contact annotation. Using predicted inter-protein distances has also been shown to be able to dock some protein dimers.ResultsHere we present pyconsFold. Using CNS as its underlying folding mechanism and predicted contact distance it outperforms regular contact prediction based modelling on our dataset of 210 proteins. It performs marginally worse than the state of the art pyRosetta folding pipeline but is on average about 20 times faster per model. More importantly pyconsFold can also be used as a fold-and-dock protocol by using predicted inter-protein contacts to simultaneously fold and dock two protein chains.Availability and implementationpyconsFold is implemented in Python 3 with a strong focus on using as few dependencies as possible for longevity. It is available both as a pip package in Python 3 and as source code on GitHub and is published under the GPLv3 [email protected] materialInstall instructions, examples and parameters can be found in the supplemental notes.Availability of dataThe data underlying this article together with source code are available on github, at https://github.com/johnlamb/pyconsfold.

Download Full-text

GraphAligner: rapid and versatile sequence-to-graph alignment

Genome Biology ◽

10.1186/s13059-020-02157-2 ◽

2020 ◽

Vol 21 (1) ◽

Cited By ~ 1

Author(s):

Mikko Rautiainen ◽

Tobias Marschall

Keyword(s):

Genetic Variation ◽

Error Correction ◽

Genome Assembly ◽

State Of The Art ◽

Source Code ◽

The State ◽

Graph Alignment ◽

Link Type ◽

Long Reads

Abstract Genome graphs can represent genetic variation and sequence uncertainty. Aligning sequences to genome graphs is key to many applications, including error correction, genome assembly, and genotyping of variants in a pangenome graph. Yet, so far, this step is often prohibitively slow. We present GraphAligner, a tool for aligning long reads to genome graphs. Compared to the state-of-the-art tools, GraphAligner is 13x faster and uses 3x less memory. When employing GraphAligner for error correction, we find it to be more than twice as accurate and over 12x faster than extant tools.Availability: Package manager: https://anaconda.org/bioconda/graphalignerand source code: https://github.com/maickrau/GraphAligner

Download Full-text

GraphAligner: Rapid and Versatile Sequence-to-Graph Alignment

10.1101/810812 ◽

2019 ◽

Cited By ~ 9

Author(s):

Mikko Rautiainen ◽

Tobias Marschall

Keyword(s):

Genetic Variation ◽

Error Correction ◽

Genome Assembly ◽

State Of The Art ◽

Source Code ◽

Graph Alignment ◽

Link Type ◽

Long Reads ◽

Reference Genomes ◽

Genome Graph

AbstractGenome graphs can represent genetic variation and sequence uncertainty. Aligning sequences to genome graphs is key to many applications, including error correction, genome assembly, and genotyping of variants in a pan-genome graph. Yet, so far this step is often prohibitively slow. We present GraphAligner, a tool for aligning long reads to genome graphs. Compared to state-of-the-art tools, GraphAligner is 12x faster and uses 5x less memory, making it as efficient as aligning reads to linear reference genomes. When employing GraphAligner for error correction, we find it to be almost 3x more accurate and over 15x faster than extant tools.Availability Package managerhttps://anaconda.org/bioconda/graphaligner and source code: https://github.com/maickrau/GraphAligner

Download Full-text

AKT: Ancestry and Kinship Toolkit

10.1101/047829 ◽

2016 ◽

Author(s):

Rudy Arthur ◽

Ole Schulz-Trieglaff ◽

Anthony J. Cox ◽

Jared Michael O’Connell

Keyword(s):

Data Clustering ◽

State Of The Art ◽

Source Code ◽

Statistical Genetics ◽

Data Sets ◽

Whole Genome ◽

Link Type ◽

Art Methods ◽

Invaluable Tool

AbstractAncestry and Kinship Toolkit (AKT) is a statistical genetics tool for analysing large cohorts of whole-genome sequenced samples. It can rapidly detect related samples, characterise sample ancestry, calculate correlation between variants, check Mendel consistency and perform data clustering. AKT brings together the functionality of many state-of-the-art methods, with a focus on speed and a unified interface. We believe it will be an invaluable tool for the curation of large WGS data-sets.AvailabilityThe source code is available at https://illumina.github.io/[email protected], [email protected]

Download Full-text

cinaR: A comprehensive R package for the differential analyses and functional interpretation of ATAC-seq data

10.1101/2021.03.05.434143 ◽

2021 ◽

Author(s):

E Onur Karakaslar ◽

Duygu Ucar

Keyword(s):

State Of The Art ◽

Source Code ◽

R Package ◽

Chromatin Accessibility ◽

Functional Enrichment ◽

Multiple Sources ◽

Functional Interpretation ◽

Link Type ◽

Data Analyses ◽

Gene Sets

AbstractSummaryATAC-seq is a frequently used assay to study chromatin accessibility levels. Differential chromatin accessibility analyses between biological groups and functional interpretation of these differential regions are essential in ATAC-seq data analyses. Although distinct methods and analyses pipelines are developed for this purpose, a stand-alone R package that combines state-of-the art differential and functional enrichment analyses pipelines is missing. To fill this gap, we developed cinaR (Chromatin Analyses in R), which is a single wrapper function and provides users with various data analyses and visualization options, including functional enrichment analyses with gene sets curated from multiple sources.Availability and implementationcinaR is an R/CRAN package which is under GPL-3 License and its source code is freely accessible at https://CRAN.R-project.org/package=cinaR.Gene sets are available at https://CRAN.R-project.org/package=cinaRgenesets.Bone marrow ATAC-seq data is available at https://www.ncbi.nlm.nih.gov/geo/query/[email protected] or [email protected]

Download Full-text

Repeat aware evaluation of scaffolding tools

10.1101/148932 ◽

2017 ◽

Cited By ~ 1

Author(s):

Igor Mandric ◽

Sergey Knyazev ◽

Alex Zelikovsky

Keyword(s):

State Of The Art ◽

Source Code ◽

Evaluation Framework ◽

Whole Genome ◽

Accurate Assessment ◽

Challenging Problem ◽

Scalable Algorithm ◽

Link Type ◽

Representative Subset ◽

Evaluation Problem

AbstractSummaryGenomic sequences are assembled into a variable, but large number of contigs that should be scaffolded (ordered and oriented) for facilitating comparative or functional analysis. Finding scaffolding is computationally challenging due to misassemblies, inconsistent coverage across the genome, and long repeats. An accurate assessment of scaffolding tools should take into account multiple locations of the same contig on the reference scaffolding rather than matching a repeat to a single best location. This makes mapping of inferred scaffoldings onto the reference a computationally challenging problem. This paper formulates the repeat-aware scaffolding evaluation problem which is to find a mapping of the inferred scaffolding onto the reference maximizing number of correct links and proposes a scalable algorithm capable of handling large whole-genome datasets. Our novel scaffolding validation pipeline has been applied to assess the most of state-of-the-art scaffolding tools on the representative subset of GAGE datasets.AvailabilityThe source code of this evaluation framework is available at https://github.com/mandricigor/repeat-aware. The documentation is hosted at https://mandricigor.github.io/repeat-aware.

Download Full-text

Image Restoration by Learning Morphological Opening-Closing Network

Mathematical Morphology - Theory and Applications ◽

10.1515/mathm-2020-0103 ◽

2020 ◽

Vol 4 (1) ◽

pp. 87-107

Author(s):

Ranjan Mondal ◽

Moni Shankar Dey ◽

Bhabatosh Chanda

Keyword(s):

Neural Network ◽

Image Restoration ◽

State Of The Art ◽

Source Code ◽

Back Propagation ◽

Image Features ◽

Main Difficulty ◽

The Right ◽

Right Order ◽

Morphological Opening

AbstractMathematical morphology is a powerful tool for image processing tasks. The main difficulty in designing mathematical morphological algorithm is deciding the order of operators/filters and the corresponding structuring elements (SEs). In this work, we develop morphological network composed of alternate sequences of dilation and erosion layers, which depending on learned SEs, may form opening or closing layers. These layers in the right order along with linear combination (of their outputs) are useful in extracting image features and processing them. Structuring elements in the network are learned by back-propagation method guided by minimization of the loss function. Efficacy of the proposed network is established by applying it to two interesting image restoration problems, namely de-raining and de-hazing. Results are comparable to that of many state-of-the-art algorithms for most of the images. It is also worth mentioning that the number of network parameters to handle is much less than that of popular convolutional neural network for similar tasks. The source code can be found here https://github.com/ranjanZ/Mophological-Opening-Closing-Net

Download Full-text

The smallest extraction problem

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476293 ◽

2021 ◽

Vol 14 (11) ◽

pp. 2445-2458

Author(s):

Valerio Cetorelli ◽

Paolo Atzeni ◽

Valter Crescenzi ◽

Franco Milicchio

Keyword(s):

Unsupervised Learning ◽

Optimization Problem ◽

Learning Algorithm ◽

State Of The Art ◽

Data Extraction ◽

Source Code ◽

Web Data ◽

Web Data Extraction ◽

New Family ◽

Context Free

We introduce landmark grammars , a new family of context-free grammars aimed at describing the HTML source code of pages published by large and templated websites and therefore at effectively tackling Web data extraction problems. Indeed, they address the inherent ambiguity of HTML, one of the main challenges of Web data extraction, which, despite over twenty years of research, has been largely neglected by the approaches presented in literature. We then formalize the Smallest Extraction Problem (SEP), an optimization problem for finding the grammar of a family that best describes a set of pages and contextually extract their data. Finally, we present an unsupervised learning algorithm to induce a landmark grammar from a set of pages sharing a common HTML template, and we present an automatic Web data extraction system. The experiments on consolidated benchmarks show that the approach can substantially contribute to improve the state-of-the-art.

Download Full-text

DANNP: an efficient artificial neural network pruning tool

PeerJ Computer Science ◽

10.7717/peerj-cs.137 ◽

2017 ◽

Vol 3 ◽

pp. e137 ◽

Cited By ~ 7

Author(s):

Mona Alshahrani ◽

Othman Soufan ◽

Arturo Magana-Mora ◽

Vladimir B. Bajic

Keyword(s):

Neural Network ◽

State Of The Art ◽

Model Performance ◽

Training Data ◽

Classification Problems ◽

Link Type ◽

On Line ◽

Pruning Algorithms ◽

Artificial Neural ◽

The Impact

Background Artificial neural networks (ANNs) are a robust class of machine learning models and are a frequent choice for solving classification problems. However, determining the structure of the ANNs is not trivial as a large number of weights (connection links) may lead to overfitting the training data. Although several ANN pruning algorithms have been proposed for the simplification of ANNs, these algorithms are not able to efficiently cope with intricate ANN structures required for complex classification problems. Methods We developed DANNP, a web-based tool, that implements parallelized versions of several ANN pruning algorithms. The DANNP tool uses a modified version of the Fast Compressed Neural Network software implemented in C++ to considerably enhance the running time of the ANN pruning algorithms we implemented. In addition to the performance evaluation of the pruned ANNs, we systematically compared the set of features that remained in the pruned ANN with those obtained by different state-of-the-art feature selection (FS) methods. Results Although the ANN pruning algorithms are not entirely parallelizable, DANNP was able to speed up the ANN pruning up to eight times on a 32-core machine, compared to the serial implementations. To assess the impact of the ANN pruning by DANNP tool, we used 16 datasets from different domains. In eight out of the 16 datasets, DANNP significantly reduced the number of weights by 70%–99%, while maintaining a competitive or better model performance compared to the unpruned ANN. Finally, we used a naïve Bayes classifier derived with the features selected as a byproduct of the ANN pruning and demonstrated that its accuracy is comparable to those obtained by the classifiers trained with the features selected by several state-of-the-art FS methods. The FS ranking methodology proposed in this study allows the users to identify the most discriminant features of the problem at hand. To the best of our knowledge, DANNP (publicly available at www.cbrc.kaust.edu.sa/dannp) is the only available and on-line accessible tool that provides multiple parallelized ANN pruning options. Datasets and DANNP code can be obtained at www.cbrc.kaust.edu.sa/dannp/data.php and https://doi.org/10.5281/zenodo.1001086.

Download Full-text

idCOV: a pipeline for quick clade identification of SARS-CoV-2 isolates

10.1101/2020.10.08.330456 ◽

2020 ◽

Author(s):

Xun Zhu ◽

Ti-Cheng Chang ◽

Richard Webby ◽

Gang Wu

Keyword(s):

Personal Computer ◽

Source Code ◽

Command Line ◽

Sequencing Data ◽

Link Type ◽

Public Dataset ◽

Virus Isolates

AbstractidCOV is a phylogenetic pipeline for quickly identifying the clades of SARS-CoV-2 virus isolates from raw sequencing data based on a selected clade-defining marker list. Using a public dataset, we show that idCOV can make equivalent calls as annotated by Nextstrain.org on all three common clade systems using user uploaded FastQ files directly. Web and equivalent command-line interfaces are available. It can be deployed on any Linux environment, including personal computer, HPC and the cloud. The source code is available at https://github.com/xz-stjude/idcov. A documentation for installation can be found at https://github.com/xz-stjude/idcov/blob/master/README.md.

Download Full-text

AtLoc: Attention Guided Camera Localization

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i06.6608 ◽

2020 ◽

Vol 34 (06) ◽

pp. 10393-10401

Author(s):

Bing Wang ◽

Changhao Chen ◽

Chris Xiaoxuan Lu ◽

Peijun Zhao ◽

Niki Trigoni ◽

...

Keyword(s):

Deep Learning ◽

Experimental Evidence ◽

State Of The Art ◽

Source Code ◽

Single Image ◽

Saliency Maps ◽

Camera Localization ◽

Camera Pose ◽

Dynamic Objects ◽

Indoor And Outdoor

Deep learning has achieved impressive results in camera localization, but current single-image techniques typically suffer from a lack of robustness, leading to large outliers. To some extent, this has been tackled by sequential (multi-images) or geometry constraint approaches, which can learn to reject dynamic objects and illumination conditions to achieve better performance. In this work, we show that attention can be used to force the network to focus on more geometrically robust objects and features, achieving state-of-the-art performance in common benchmark, even if using only a single image as input. Extensive experimental evidence is provided through public indoor and outdoor datasets. Through visualization of the saliency maps, we demonstrate how the network learns to reject dynamic objects, yielding superior global camera pose regression performance. The source code is avaliable at https://github.com/BingCS/AtLoc.

Download Full-text