scholarly journals Repeat aware evaluation of scaffolding tools

2017 ◽  
Author(s):  
Igor Mandric ◽  
Sergey Knyazev ◽  
Alex Zelikovsky

AbstractSummaryGenomic sequences are assembled into a variable, but large number of contigs that should be scaffolded (ordered and oriented) for facilitating comparative or functional analysis. Finding scaffolding is computationally challenging due to misassemblies, inconsistent coverage across the genome, and long repeats. An accurate assessment of scaffolding tools should take into account multiple locations of the same contig on the reference scaffolding rather than matching a repeat to a single best location. This makes mapping of inferred scaffoldings onto the reference a computationally challenging problem. This paper formulates the repeat-aware scaffolding evaluation problem which is to find a mapping of the inferred scaffolding onto the reference maximizing number of correct links and proposes a scalable algorithm capable of handling large whole-genome datasets. Our novel scaffolding validation pipeline has been applied to assess the most of state-of-the-art scaffolding tools on the representative subset of GAGE datasets.AvailabilityThe source code of this evaluation framework is available at https://github.com/mandricigor/repeat-aware. The documentation is hosted at https://mandricigor.github.io/repeat-aware.


2016 ◽  
Author(s):  
Rudy Arthur ◽  
Ole Schulz-Trieglaff ◽  
Anthony J. Cox ◽  
Jared Michael O’Connell

AbstractAncestry and Kinship Toolkit (AKT) is a statistical genetics tool for analysing large cohorts of whole-genome sequenced samples. It can rapidly detect related samples, characterise sample ancestry, calculate correlation between variants, check Mendel consistency and perform data clustering. AKT brings together the functionality of many state-of-the-art methods, with a focus on speed and a unified interface. We believe it will be an invaluable tool for the curation of large WGS data-sets.AvailabilityThe source code is available at https://illumina.github.io/[email protected], [email protected]



2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Mikko Rautiainen ◽  
Tobias Marschall

Abstract Genome graphs can represent genetic variation and sequence uncertainty. Aligning sequences to genome graphs is key to many applications, including error correction, genome assembly, and genotyping of variants in a pangenome graph. Yet, so far, this step is often prohibitively slow. We present GraphAligner, a tool for aligning long reads to genome graphs. Compared to the state-of-the-art tools, GraphAligner is 13x faster and uses 3x less memory. When employing GraphAligner for error correction, we find it to be more than twice as accurate and over 12x faster than extant tools.Availability: Package manager: https://anaconda.org/bioconda/graphalignerand source code: https://github.com/maickrau/GraphAligner



2019 ◽  
Author(s):  
Mikko Rautiainen ◽  
Tobias Marschall

AbstractGenome graphs can represent genetic variation and sequence uncertainty. Aligning sequences to genome graphs is key to many applications, including error correction, genome assembly, and genotyping of variants in a pan-genome graph. Yet, so far this step is often prohibitively slow. We present GraphAligner, a tool for aligning long reads to genome graphs. Compared to state-of-the-art tools, GraphAligner is 12x faster and uses 5x less memory, making it as efficient as aligning reads to linear reference genomes. When employing GraphAligner for error correction, we find it to be almost 3x more accurate and over 15x faster than extant tools.Availability Package managerhttps://anaconda.org/bioconda/graphaligner and source code: https://github.com/maickrau/GraphAligner



2016 ◽  
Author(s):  
Clemens Messerschmidt ◽  
Manuel Holtgrewe ◽  
Dieter Beule

AbstractSummaryWe propose the simple method HLA-MA for consistency checking in pipelines operating on human HTS data. The method is based on the HLA typing result of the state-of-the-art method Opti-Type. Provided that there is sufficient coverage of the HLA loci, comparing HLA types allows for simple, fast, and robust matching of samples from whole genome, exome, and RNA-seq data. This approach is reliable for sample re-identification even for samples with high mutational loads, e.g., caused by microsatellite instability or POLE1 defects.Availability and ImplementationThe software is implemented In Python 3 and freely available under the MIT license at https://github.com/bihealth/hlama and via [email protected]



2021 ◽  
Author(s):  
Andreas Rempel ◽  
Roland Wittler

AbstractSummarySANS serif is a novel software for alignment-free, whole-genome based phylogeny estimation that follows a pangenomic approach to efficiently calculate a set of splits in a phylogenetic tree or network.Availability and ImplementationImplemented in C++ and supported on Linux, MacOS, and Windows. The source code is freely available for download at https://gitlab.ub.uni-bielefeld.de/gi/[email protected]



2021 ◽  
Author(s):  
J Lamb ◽  
A Elofsson

AbstractMotivationContact predictions within a protein has recently become a viable method for accurate prediction of protein structure. Using predicted distance distributions has been shown in many cases to be superior to only using a binary contact annotation. Using predicted inter-protein distances has also been shown to be able to dock some protein dimers.ResultsHere we present pyconsFold. Using CNS as its underlying folding mechanism and predicted contact distance it outperforms regular contact prediction based modelling on our dataset of 210 proteins. It performs marginally worse than the state of the art pyRosetta folding pipeline but is on average about 20 times faster per model. More importantly pyconsFold can also be used as a fold-and-dock protocol by using predicted inter-protein contacts to simultaneously fold and dock two protein chains.Availability and implementationpyconsFold is implemented in Python 3 with a strong focus on using as few dependencies as possible for longevity. It is available both as a pip package in Python 3 and as source code on GitHub and is published under the GPLv3 [email protected] materialInstall instructions, examples and parameters can be found in the supplemental notes.Availability of dataThe data underlying this article together with source code are available on github, at https://github.com/johnlamb/pyconsfold.



2021 ◽  
Author(s):  
E Onur Karakaslar ◽  
Duygu Ucar

AbstractSummaryATAC-seq is a frequently used assay to study chromatin accessibility levels. Differential chromatin accessibility analyses between biological groups and functional interpretation of these differential regions are essential in ATAC-seq data analyses. Although distinct methods and analyses pipelines are developed for this purpose, a stand-alone R package that combines state-of-the art differential and functional enrichment analyses pipelines is missing. To fill this gap, we developed cinaR (Chromatin Analyses in R), which is a single wrapper function and provides users with various data analyses and visualization options, including functional enrichment analyses with gene sets curated from multiple sources.Availability and implementationcinaR is an R/CRAN package which is under GPL-3 License and its source code is freely accessible at https://CRAN.R-project.org/package=cinaR.Gene sets are available at https://CRAN.R-project.org/package=cinaRgenesets.Bone marrow ATAC-seq data is available at https://www.ncbi.nlm.nih.gov/geo/query/[email protected] or [email protected]



2020 ◽  
Vol 4 (1) ◽  
pp. 87-107
Author(s):  
Ranjan Mondal ◽  
Moni Shankar Dey ◽  
Bhabatosh Chanda

AbstractMathematical morphology is a powerful tool for image processing tasks. The main difficulty in designing mathematical morphological algorithm is deciding the order of operators/filters and the corresponding structuring elements (SEs). In this work, we develop morphological network composed of alternate sequences of dilation and erosion layers, which depending on learned SEs, may form opening or closing layers. These layers in the right order along with linear combination (of their outputs) are useful in extracting image features and processing them. Structuring elements in the network are learned by back-propagation method guided by minimization of the loss function. Efficacy of the proposed network is established by applying it to two interesting image restoration problems, namely de-raining and de-hazing. Results are comparable to that of many state-of-the-art algorithms for most of the images. It is also worth mentioning that the number of network parameters to handle is much less than that of popular convolutional neural network for similar tasks. The source code can be found here https://github.com/ranjanZ/Mophological-Opening-Closing-Net



Author(s):  
My Kieu ◽  
Andrew D. Bagdanov ◽  
Marco Bertini

Pedestrian detection is a canonical problem for safety and security applications, and it remains a challenging problem due to the highly variable lighting conditions in which pedestrians must be detected. This article investigates several domain adaptation approaches to adapt RGB-trained detectors to the thermal domain. Building on our earlier work on domain adaptation for privacy-preserving pedestrian detection, we conducted an extensive experimental evaluation comparing top-down and bottom-up domain adaptation and also propose two new bottom-up domain adaptation strategies. For top-down domain adaptation, we leverage a detector pre-trained on RGB imagery and efficiently adapt it to perform pedestrian detection in the thermal domain. Our bottom-up domain adaptation approaches include two steps: first, training an adapter segment corresponding to initial layers of the RGB-trained detector adapts to the new input distribution; then, we reconnect the adapter segment to the original RGB-trained detector for final adaptation with a top-down loss. To the best of our knowledge, our bottom-up domain adaptation approaches outperform the best-performing single-modality pedestrian detection results on KAIST and outperform the state of the art on FLIR.



2021 ◽  
Vol 15 (6) ◽  
pp. 1-21
Author(s):  
Huandong Wang ◽  
Yong Li ◽  
Mu Du ◽  
Zhenhui Li ◽  
Depeng Jin

Both app developers and service providers have strong motivations to understand when and where certain apps are used by users. However, it has been a challenging problem due to the highly skewed and noisy app usage data. Moreover, apps are regarded as independent items in existing studies, which fail to capture the hidden semantics in app usage traces. In this article, we propose App2Vec, a powerful representation learning model to learn the semantic embedding of apps with the consideration of spatio-temporal context. Based on the obtained semantic embeddings, we develop a probabilistic model based on the Bayesian mixture model and Dirichlet process to capture when , where , and what semantics of apps are used to predict the future usage. We evaluate our model using two different app usage datasets, which involve over 1.7 million users and 2,000+ apps. Evaluation results show that our proposed App2Vec algorithm outperforms the state-of-the-art algorithms in app usage prediction with a performance gap of over 17.0%.



Sign in / Sign up

Export Citation Format

Share Document