Repeat aware evaluation of scaffolding tools

Mapping Intimacies ◽

10.1101/148932 ◽

2017 ◽

Cited By ~ 1

Author(s):

Igor Mandric ◽

Sergey Knyazev ◽

Alex Zelikovsky

Keyword(s):

State Of The Art ◽

Source Code ◽

Evaluation Framework ◽

Whole Genome ◽

Accurate Assessment ◽

Challenging Problem ◽

Scalable Algorithm ◽

Link Type ◽

Representative Subset ◽

Evaluation Problem

AbstractSummaryGenomic sequences are assembled into a variable, but large number of contigs that should be scaffolded (ordered and oriented) for facilitating comparative or functional analysis. Finding scaffolding is computationally challenging due to misassemblies, inconsistent coverage across the genome, and long repeats. An accurate assessment of scaffolding tools should take into account multiple locations of the same contig on the reference scaffolding rather than matching a repeat to a single best location. This makes mapping of inferred scaffoldings onto the reference a computationally challenging problem. This paper formulates the repeat-aware scaffolding evaluation problem which is to find a mapping of the inferred scaffolding onto the reference maximizing number of correct links and proposes a scalable algorithm capable of handling large whole-genome datasets. Our novel scaffolding validation pipeline has been applied to assess the most of state-of-the-art scaffolding tools on the representative subset of GAGE datasets.AvailabilityThe source code of this evaluation framework is available at https://github.com/mandricigor/repeat-aware. The documentation is hosted at https://mandricigor.github.io/repeat-aware.

AKT: Ancestry and Kinship Toolkit

10.1101/047829 ◽

2016 ◽

Author(s):

Rudy Arthur ◽

Ole Schulz-Trieglaff ◽

Anthony J. Cox ◽

Jared Michael O’Connell

Keyword(s):

Data Clustering ◽

State Of The Art ◽

Source Code ◽

Statistical Genetics ◽

Data Sets ◽

Whole Genome ◽

Link Type ◽

Art Methods ◽

Invaluable Tool

AbstractAncestry and Kinship Toolkit (AKT) is a statistical genetics tool for analysing large cohorts of whole-genome sequenced samples. It can rapidly detect related samples, characterise sample ancestry, calculate correlation between variants, check Mendel consistency and perform data clustering. AKT brings together the functionality of many state-of-the-art methods, with a focus on speed and a unified interface. We believe it will be an invaluable tool for the curation of large WGS data-sets.AvailabilityThe source code is available at https://illumina.github.io/[email protected], [email protected]

GraphAligner: rapid and versatile sequence-to-graph alignment

Genome Biology ◽

10.1186/s13059-020-02157-2 ◽

2020 ◽

Vol 21 (1) ◽

Cited By ~ 1

Author(s):

Mikko Rautiainen ◽

Tobias Marschall

Keyword(s):

Genetic Variation ◽

Error Correction ◽

Genome Assembly ◽

State Of The Art ◽

Source Code ◽

The State ◽

Graph Alignment ◽

Link Type ◽

Long Reads

Abstract Genome graphs can represent genetic variation and sequence uncertainty. Aligning sequences to genome graphs is key to many applications, including error correction, genome assembly, and genotyping of variants in a pangenome graph. Yet, so far, this step is often prohibitively slow. We present GraphAligner, a tool for aligning long reads to genome graphs. Compared to the state-of-the-art tools, GraphAligner is 13x faster and uses 3x less memory. When employing GraphAligner for error correction, we find it to be more than twice as accurate and over 12x faster than extant tools.Availability: Package manager: https://anaconda.org/bioconda/graphalignerand source code: https://github.com/maickrau/GraphAligner

GraphAligner: Rapid and Versatile Sequence-to-Graph Alignment

10.1101/810812 ◽

2019 ◽

Cited By ~ 9

Author(s):

Mikko Rautiainen ◽

Tobias Marschall

Keyword(s):

Genetic Variation ◽

Error Correction ◽

Genome Assembly ◽

State Of The Art ◽

Source Code ◽

Graph Alignment ◽

Link Type ◽

Long Reads ◽

Reference Genomes ◽

Genome Graph

AbstractGenome graphs can represent genetic variation and sequence uncertainty. Aligning sequences to genome graphs is key to many applications, including error correction, genome assembly, and genotyping of variants in a pan-genome graph. Yet, so far this step is often prohibitively slow. We present GraphAligner, a tool for aligning long reads to genome graphs. Compared to state-of-the-art tools, GraphAligner is 12x faster and uses 5x less memory, making it as efficient as aligning reads to linear reference genomes. When employing GraphAligner for error correction, we find it to be almost 3x more accurate and over 15x faster than extant tools.Availability Package managerhttps://anaconda.org/bioconda/graphaligner and source code: https://github.com/maickrau/GraphAligner

HLA-MA: Simple yet powerful matching of samples using HLA typing results

10.1101/066548 ◽

2016 ◽

Author(s):

Clemens Messerschmidt ◽

Manuel Holtgrewe ◽

Dieter Beule

Keyword(s):

Microsatellite Instability ◽

State Of The Art ◽

The State ◽

Hla Typing ◽

Whole Genome ◽

Consistency Checking ◽

Rna Seq ◽

Simple Method ◽

Link Type ◽

Typing Result

AbstractSummaryWe propose the simple method HLA-MA for consistency checking in pipelines operating on human HTS data. The method is based on the HLA typing result of the state-of-the-art method Opti-Type. Provided that there is sufficient coverage of the HLA loci, comparing HLA types allows for simple, fast, and robust matching of samples from whole genome, exome, and RNA-seq data. This approach is reliable for sample re-identification even for samples with high mutational loads, e.g., caused by microsatellite instability or POLE1 defects.Availability and ImplementationThe software is implemented In Python 3 and freely available under the MIT license at https://github.com/bihealth/hlama and via [email protected]

SANS serif: alignment-free, whole-genome based phylogenetic reconstruction

10.1101/2020.12.31.424643 ◽

2021 ◽

Author(s):

Andreas Rempel ◽

Roland Wittler

Keyword(s):

Phylogenetic Tree ◽

Source Code ◽

Phylogenetic Reconstruction ◽

Whole Genome ◽

Link Type ◽

Alignment Free ◽

Phylogeny Estimation

AbstractSummarySANS serif is a novel software for alignment-free, whole-genome based phylogeny estimation that follows a pangenomic approach to efficiently calculate a set of splits in a phylogenetic tree or network.Availability and ImplementationImplemented in C++ and supported on Linux, MacOS, and Windows. The source code is freely available for download at https://gitlab.ub.uni-bielefeld.de/gi/[email protected]

pyconsFold: A fast and easy tool for modelling and docking using distance predictions

10.1101/2021.02.08.430195 ◽

2021 ◽

Author(s):

J Lamb ◽

A Elofsson

Keyword(s):

State Of The Art ◽

Source Code ◽

Link Type ◽

Contact Distance ◽

Protein Dimers ◽

Strong Focus ◽

Distance Distributions ◽

Regular Contact ◽

Contact Predictions ◽

Viable Method

AbstractMotivationContact predictions within a protein has recently become a viable method for accurate prediction of protein structure. Using predicted distance distributions has been shown in many cases to be superior to only using a binary contact annotation. Using predicted inter-protein distances has also been shown to be able to dock some protein dimers.ResultsHere we present pyconsFold. Using CNS as its underlying folding mechanism and predicted contact distance it outperforms regular contact prediction based modelling on our dataset of 210 proteins. It performs marginally worse than the state of the art pyRosetta folding pipeline but is on average about 20 times faster per model. More importantly pyconsFold can also be used as a fold-and-dock protocol by using predicted inter-protein contacts to simultaneously fold and dock two protein chains.Availability and implementationpyconsFold is implemented in Python 3 with a strong focus on using as few dependencies as possible for longevity. It is available both as a pip package in Python 3 and as source code on GitHub and is published under the GPLv3 [email protected] materialInstall instructions, examples and parameters can be found in the supplemental notes.Availability of dataThe data underlying this article together with source code are available on github, at https://github.com/johnlamb/pyconsfold.

cinaR: A comprehensive R package for the differential analyses and functional interpretation of ATAC-seq data

10.1101/2021.03.05.434143 ◽

2021 ◽

Author(s):

E Onur Karakaslar ◽

Duygu Ucar

Keyword(s):

State Of The Art ◽

Source Code ◽

R Package ◽

Chromatin Accessibility ◽

Functional Enrichment ◽

Multiple Sources ◽

Functional Interpretation ◽

Link Type ◽

Data Analyses ◽

Gene Sets

AbstractSummaryATAC-seq is a frequently used assay to study chromatin accessibility levels. Differential chromatin accessibility analyses between biological groups and functional interpretation of these differential regions are essential in ATAC-seq data analyses. Although distinct methods and analyses pipelines are developed for this purpose, a stand-alone R package that combines state-of-the art differential and functional enrichment analyses pipelines is missing. To fill this gap, we developed cinaR (Chromatin Analyses in R), which is a single wrapper function and provides users with various data analyses and visualization options, including functional enrichment analyses with gene sets curated from multiple sources.Availability and implementationcinaR is an R/CRAN package which is under GPL-3 License and its source code is freely accessible at https://CRAN.R-project.org/package=cinaR.Gene sets are available at https://CRAN.R-project.org/package=cinaRgenesets.Bone marrow ATAC-seq data is available at https://www.ncbi.nlm.nih.gov/geo/query/[email protected] or [email protected]

Image Restoration by Learning Morphological Opening-Closing Network

Mathematical Morphology - Theory and Applications ◽

10.1515/mathm-2020-0103 ◽

2020 ◽

Vol 4 (1) ◽

pp. 87-107

Author(s):

Ranjan Mondal ◽

Moni Shankar Dey ◽

Bhabatosh Chanda

Keyword(s):

Neural Network ◽

Image Restoration ◽

State Of The Art ◽

Source Code ◽

Back Propagation ◽

Image Features ◽

Main Difficulty ◽

The Right ◽

Right Order ◽

Morphological Opening

AbstractMathematical morphology is a powerful tool for image processing tasks. The main difficulty in designing mathematical morphological algorithm is deciding the order of operators/filters and the corresponding structuring elements (SEs). In this work, we develop morphological network composed of alternate sequences of dilation and erosion layers, which depending on learned SEs, may form opening or closing layers. These layers in the right order along with linear combination (of their outputs) are useful in extracting image features and processing them. Structuring elements in the network are learned by back-propagation method guided by minimization of the loss function. Efficacy of the proposed network is established by applying it to two interesting image restoration problems, namely de-raining and de-hazing. Results are comparable to that of many state-of-the-art algorithms for most of the images. It is also worth mentioning that the number of network parameters to handle is much less than that of popular convolutional neural network for similar tasks. The source code can be found here https://github.com/ranjanZ/Mophological-Opening-Closing-Net

Bottom-up and Layerwise Domain Adaptation for Pedestrian Detection in Thermal Images

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3418213 ◽

2021 ◽

Vol 17 (1) ◽

pp. 1-19

Author(s):

My Kieu ◽

Andrew D. Bagdanov ◽

Marco Bertini

Keyword(s):

Domain Adaptation ◽

State Of The Art ◽

Pedestrian Detection ◽

Challenging Problem ◽

Top Down ◽

Bottom Up ◽

Security Applications ◽

Lighting Conditions ◽

Initial Layers ◽

Single Modality

Pedestrian detection is a canonical problem for safety and security applications, and it remains a challenging problem due to the highly variable lighting conditions in which pedestrians must be detected. This article investigates several domain adaptation approaches to adapt RGB-trained detectors to the thermal domain. Building on our earlier work on domain adaptation for privacy-preserving pedestrian detection, we conducted an extensive experimental evaluation comparing top-down and bottom-up domain adaptation and also propose two new bottom-up domain adaptation strategies. For top-down domain adaptation, we leverage a detector pre-trained on RGB imagery and efficiently adapt it to perform pedestrian detection in the thermal domain. Our bottom-up domain adaptation approaches include two steps: first, training an adapter segment corresponding to initial layers of the RGB-trained detector adapts to the new input distribution; then, we reconnect the adapter segment to the original RGB-trained detector for final adaptation with a top-down loss. To the best of our knowledge, our bottom-up domain adaptation approaches outperform the best-performing single-modality pedestrian detection results on KAIST and outperform the state of the art on FLIR.

App2Vec: Context-Aware Application Usage Prediction

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3451396 ◽

2021 ◽

Vol 15 (6) ◽

pp. 1-21

Author(s):

Huandong Wang ◽

Yong Li ◽

Mu Du ◽

Zhenhui Li ◽

Depeng Jin

Keyword(s):

Dirichlet Process ◽

Service Providers ◽

State Of The Art ◽

Representation Learning ◽

Context Aware ◽

Challenging Problem ◽

Performance Gap ◽

Bayesian Mixture Model ◽

Bayesian Mixture ◽

Spatio Temporal

Both app developers and service providers have strong motivations to understand when and where certain apps are used by users. However, it has been a challenging problem due to the highly skewed and noisy app usage data. Moreover, apps are regarded as independent items in existing studies, which fail to capture the hidden semantics in app usage traces. In this article, we propose App2Vec, a powerful representation learning model to learn the semantic embedding of apps with the consideration of spatio-temporal context. Based on the obtained semantic embeddings, we develop a probabilistic model based on the Bayesian mixture model and Dirichlet process to capture when , where , and what semantics of apps are used to predict the future usage. We evaluate our model using two different app usage datasets, which involve over 1.7 million users and 2,000+ apps. Evaluation results show that our proposed App2Vec algorithm outperforms the state-of-the-art algorithms in app usage prediction with a performance gap of over 17.0%.