HLA-MA: Simple yet powerful matching of samples using HLA typing results

Mapping Intimacies ◽

10.1101/066548 ◽

2016 ◽

Author(s):

Clemens Messerschmidt ◽

Manuel Holtgrewe ◽

Dieter Beule

Keyword(s):

Microsatellite Instability ◽

State Of The Art ◽

The State ◽

Hla Typing ◽

Whole Genome ◽

Consistency Checking ◽

Rna Seq ◽

Simple Method ◽

Link Type ◽

Typing Result

AbstractSummaryWe propose the simple method HLA-MA for consistency checking in pipelines operating on human HTS data. The method is based on the HLA typing result of the state-of-the-art method Opti-Type. Provided that there is sufficient coverage of the HLA loci, comparing HLA types allows for simple, fast, and robust matching of samples from whole genome, exome, and RNA-seq data. This approach is reliable for sample re-identification even for samples with high mutational loads, e.g., caused by microsatellite instability or POLE1 defects.Availability and ImplementationThe software is implemented In Python 3 and freely available under the MIT license at https://github.com/bihealth/hlama and via [email protected]

Graphmap2 - splice-aware RNA-seq mapper for long reads

10.1101/720458 ◽

2019 ◽

Cited By ~ 1

Author(s):

Josip Marić ◽

Ivan Sović ◽

Krešimir Križanović ◽

Niranjan Nagarajan ◽

Mile Šikić

Keyword(s):

State Of The Art ◽

The State ◽

Rna Seq ◽

Link Type ◽

Pacific Biosciences ◽

Long Reads ◽

Oxford Nanopore

AbstractIn this paper we present Graphmap2, a splice-aware mapper built on our previously developed DNA mapper Graphmap. Graphmap2 is tailored for long reads produced by Pacific Biosciences and Oxford Nanopore devices. It uses several newly developed algorithms which enable higher precision and recall of correctly detected transcripts and exon boundaries. We compared its performance with the state-of-the-art tools Minimap2 and Gmap. On both simulated and real datasets Graphmap2 achieves higher mappability and more correctly recognized exons and their ends. In addition we present an analysis of potential of splice aware mappers and long reads for the identification of previously unknown isoforms and even genes. The Graphmap2 tool is publicly available at https://github.com/lbcb-sci/graphmap2.

The state of the art in soybean transcriptomics resources and gene coexpression networks

in silico Plants ◽

10.1093/insilicoplants/diab005 ◽

2021 ◽

Author(s):

Fabricio Almeida-Silva ◽

Kanhu C Moharana ◽

Thiago M Venancio

Keyword(s):

State Of The Art ◽

The State ◽

Gene Coexpression Network ◽

Rna Seq ◽

Transcriptomic Data ◽

The Past ◽

Gene Coexpression ◽

Genomics Research ◽

Public Repositories ◽

Coexpression Networks

Abstract In the past decade, over 3000 samples of soybean transcriptomic data have accumulated in public repositories. Here, we review the state of the art in soybean transcriptomics, highlighting the major microarray and RNA-seq studies that investigated soybean transcriptional programs in different tissues and conditions. Further, we propose approaches for integrating such big data using gene coexpression network and outline important web resources that may facilitate soybean data acquisition and analysis, contributing to the acceleration of soybean breeding and functional genomics research.

GraphAligner: rapid and versatile sequence-to-graph alignment

Genome Biology ◽

10.1186/s13059-020-02157-2 ◽

2020 ◽

Vol 21 (1) ◽

Cited By ~ 1

Author(s):

Mikko Rautiainen ◽

Tobias Marschall

Keyword(s):

Genetic Variation ◽

Error Correction ◽

Genome Assembly ◽

State Of The Art ◽

Source Code ◽

The State ◽

Graph Alignment ◽

Link Type ◽

Long Reads

Abstract Genome graphs can represent genetic variation and sequence uncertainty. Aligning sequences to genome graphs is key to many applications, including error correction, genome assembly, and genotyping of variants in a pangenome graph. Yet, so far, this step is often prohibitively slow. We present GraphAligner, a tool for aligning long reads to genome graphs. Compared to the state-of-the-art tools, GraphAligner is 13x faster and uses 3x less memory. When employing GraphAligner for error correction, we find it to be more than twice as accurate and over 12x faster than extant tools.Availability: Package manager: https://anaconda.org/bioconda/graphalignerand source code: https://github.com/maickrau/GraphAligner

PlasClass improves plasmid sequence classification

10.1101/783571 ◽

2019 ◽

Cited By ~ 1

Author(s):

David Pellow ◽

Itzik Mizrahi ◽

Ron Shamir

Keyword(s):

State Of The Art ◽

Bacterial Genome ◽

Unknown Origin ◽

The State ◽

Sequence Classification ◽

Genome Sequences ◽

Plasmid Sequence ◽

Link Type ◽

Classification Tool ◽

Metagenomic Assembly

AbstractBackgroundMany bacteria contain plasmids, but separating between contigs that originate on the plasmid and those that are part of the bacterial genome can be difficult. This is especially true in metagenomic assembly, which yields many contigs of unknown origin. Existing tools for classifying sequences of plasmid origin give less reliable results for shorter sequences, are trained using a fraction of the known plasmids, and can be difficult to use in practice.ResultsWe present PlasClass, a new plasmid classifier. It uses a set of standard classifiers trained on the most current set of known plasmid sequences for different sequence lengths. PlasClass outperforms the state-of-the-art plasmid classification tool on shorter sequences, which constitute the majority of assembly contigs, while using less time and memory.ConclusionsPlasClass can be used to easily classify plasmid and bacterial genome sequences in metagenomic or isolate assemblies. It is available from: https://github.com/Shamir-Lab/PlasClass

Abstract B12: Independent comparison of the state of the art in-silico HLA typing software

10.1158/2326-6074.tumimm16-b12 ◽

2017 ◽

Author(s):

Alvin Ng ◽

Steve Rozen

Keyword(s):

In Silico ◽

State Of The Art ◽

The State ◽

Hla Typing

Genomics: the state of the art in RNA-seq analysis

Nature Methods ◽

10.1038/nmeth.2735 ◽

2013 ◽

Vol 10 (12) ◽

pp. 1165-1166 ◽

Cited By ~ 17

Author(s):

Ian Korf

Keyword(s):

State Of The Art ◽

The State ◽

Rna Seq

Fast Bayesian Inference of Copy Number Variants using Hidden Markov Models with Wavelet Compression

10.1101/023705 ◽

2015 ◽

Author(s):

John Wiedenhoeft ◽

Eric Brugel ◽

Alexander Schliep

Keyword(s):

Hidden Markov Models ◽

Copy Number ◽

Markov Models ◽

State Of The Art ◽

Hidden Markov ◽

Copy Number Variants ◽

Computational Effort ◽

The State ◽

Haar Wavelets ◽

Link Type

AbstractBy combining Haar wavelets with Bayesian Hidden Markov Models, we improve detection of genomic copy number variants (CNV) in array CGH experiments compared to the state-of-the-art, including standard Gibbs sampling. At the same time, we achieve drastically reduced running times, as the method concentrates computational effort on chromosomal segments which are difficult to call, by dynamically and adaptively recomputing consecutive blocks of observations likely to share a copy number. This makes routine diagnostic use and re-analysis of legacy data collections feasible; to this end, we also propose an effective automatic prior. An open source software implementation of our method is available at http://bioinformatics.rutgers.edu/Software/HaMMLET/. The web supplement is at http://bioinformatics.rutgers.edu/Supplements/HaMMLET/.Author SummaryIdentifying large-scale genome deletions and duplications, or copy number variants (CNV), accurately in populations or individual patients is a crucial step in indicating disease factors or diagnosing an individual patient's disease type. Hidden Markov Models (HMM) are a type of statistical model widely used for CNV detection, as well as other biological applications such as the analysis of gene expression time course data or the analysis of discrete-valued DNA and protein sequences.As with many statistical models, there are two fundamentally different inference approaches. In the frequentist framework, a single estimate of the model parameters would be used as a basis for subsequent inference, making the identification of CNV dependent on the quality of that estimate. This is an acute problem for HMM as methods for finding globally optimal parameters are not known. Alternatively, one can use a Bayesian approach and integrate over all possible parameter choices. While the latter is known to lead to significantly better results, the much—up to hundreds of times—larger computational effort prevents wide adaptation so far.Our proposed method addresses this by combining Haar wavelets and HMM. We greatly accelerate fully Bayesian HMMs, while simultaneously increasing convergence and thus the accuracy of the Gibbs sampler used for Bayesian computations, leading to substantial improvements over the state-of-the-art.

Valid post-clustering differential analysis for single-cell RNA-Seq

10.1101/463265 ◽

2018 ◽

Cited By ~ 1

Author(s):

Jesse M. Zhang ◽

Govinda M. Kamath ◽

David N. Tse

Keyword(s):

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

State Of The Art ◽

Differential Expression Analysis ◽

Differential Analysis ◽

Rna Seq ◽

Analysis Framework ◽

Link Type ◽

False Discoveries

SummarySingle-cell computational pipelines involve two critical steps: organizing cells (clustering) and identifying the markers driving this organization (differential expression analysis). State-of-the-art pipelines perform differential analysis after clustering on the same dataset. We observe that because clustering forces separation, reusing the same dataset generates artificially low p-values and hence false discoveries. We introduce a valid post-clustering differential analysis framework which corrects for this problem. We provide software at https://github.com/jessemzhang/tn_test.

Quark enables semi-reference-based compression of RNA-seq data

10.1101/085878 ◽

2016 ◽

Author(s):

Hirak Sarkar ◽

Rob Patro

Keyword(s):

State Of The Art ◽

Reference Sequence ◽

Rna Seq ◽

Sequencing Data ◽

The Past ◽

Link Type ◽

Exponential Increase

AbstractMotivationThe past decade has seen an exponential increase in biological sequencing capacity, and there has been a simultaneous effort to help organize and archive some of the vast quantities of sequencing data that are being generated. While these developments are tremendous from the perspective of maximizing the scientific utility of available data, they come with heavy costs. The storage and transmission of such vast amounts of sequencing data is expensive.ResultsWe present Quark, a semi-reference-based compression tool designed for RNA-seq data. Quark makes use of a reference sequence when encoding reads, but produces a representation that can be decoded independently, without the need for a reference. This allows Quark to achieve markedly better compression rates than existing reference-free schemes, while still relieving the burden of assuming a specific, shared reference sequence between the encoder and decoder. We demonstrate that Quark achieves state-of-the-art compression rates, and that, typically, only a small fraction of the reference sequence must be encoded along with the reads to allow reference-free decompression.AvailabilityQuark is implemented in C++11, and is available under a GPLv3 license at www.github.com/COMBINE-lab/[email protected]

AKT: Ancestry and Kinship Toolkit

10.1101/047829 ◽

2016 ◽

Author(s):

Rudy Arthur ◽

Ole Schulz-Trieglaff ◽

Anthony J. Cox ◽

Jared Michael O’Connell

Keyword(s):

Data Clustering ◽

State Of The Art ◽

Source Code ◽

Statistical Genetics ◽

Data Sets ◽

Whole Genome ◽

Link Type ◽

Art Methods ◽

Invaluable Tool

AbstractAncestry and Kinship Toolkit (AKT) is a statistical genetics tool for analysing large cohorts of whole-genome sequenced samples. It can rapidly detect related samples, characterise sample ancestry, calculate correlation between variants, check Mendel consistency and perform data clustering. AKT brings together the functionality of many state-of-the-art methods, with a focus on speed and a unified interface. We believe it will be an invaluable tool for the curation of large WGS data-sets.AvailabilityThe source code is available at https://illumina.github.io/[email protected], [email protected]