RERconverge: an R package for associating evolutionary rates with convergent traits

Amanda Kowalczyk; Wynn K Meyer; Raghavendran Partha; Weiguang Mao; Nathan L Clark; Maria Chikina

doi:10.1093/bioinformatics/btz468

RERconverge: an R package for associating evolutionary rates with convergent traits

Bioinformatics ◽

10.1093/bioinformatics/btz468 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4815-4817 ◽

Cited By ~ 6

Author(s):

Amanda Kowalczyk ◽

Wynn K Meyer ◽

Raghavendran Partha ◽

Weiguang Mao ◽

Nathan L Clark ◽

...

Keyword(s):

Molecular Basis ◽

Evolutionary Rate ◽

Source Code ◽

R Package ◽

Evolutionary Rates ◽

Supplementary Information ◽

Supplementary Data ◽

Genome Sequences ◽

Convergent Rate ◽

Tests For Association

Abstract Motivation When different lineages of organisms independently adapt to similar environments, selection often acts repeatedly upon the same genes, leading to signatures of convergent evolutionary rate shifts at these genes. With the increasing availability of genome sequences for organisms displaying a variety of convergent traits, the ability to identify genes with such convergent rate signatures would enable new insights into the molecular basis of these traits. Results Here we present the R package RERconverge, which tests for association between relative evolutionary rates of genes and the evolution of traits across a phylogeny. RERconverge can perform associations with binary and continuous traits, and it contains tools for visualization and enrichment analyses of association results. Availability and implementation RERconverge source code, documentation and a detailed usage walk-through are freely available at https://github.com/nclark-lab/RERconverge. Datasets for mammals, Drosophila and yeast are available at https://bit.ly/2J2QBnj. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

RERconverge: an R package for associating evolutionary rates with convergent traits

10.1101/451138 ◽

2018 ◽

Cited By ~ 5

Author(s):

Amanda Kowalczyk ◽

Wynn K Meyer ◽

Raghavendran Partha ◽

Weiguang Mao ◽

Nathan L Clark ◽

...

Keyword(s):

Molecular Basis ◽

Evolutionary Rate ◽

Source Code ◽

R Package ◽

Evolutionary Rates ◽

Supplementary Information ◽

Genome Sequences ◽

Link Type ◽

Convergent Rate ◽

Tests For Association

AbstractMotivation: When different lineages of organisms independently adapt to similar environments, selection often acts repeatedly upon the same genes, leading to signatures of convergent evolutionary rate shifts at these genes. With the increasing availability of genome sequences for organisms displaying a variety of convergent traits, the ability to identify genes with such convergent rate signatures would enable new insights into the molecular basis of these traits.Results: Here we present the R package RERconverge, which tests for association between relative evolutionary rates of genes and the evolution of traits across a phylogeny. RERconverge can perform associations with binary and continuous traits, and it contains tools for visualization and enrichment analyses of association results.Availability: RERconverge source code, documentation, and a detailed usage walk-through are freely available at https://github.com/nclark-lab/RERconverge. Datasets for mammals, Drosophila, and yeast are available at https://bit.ly/2J2QBnj.Contact:[email protected] information: Supplementary information, containing detailed vignettes for usage of RERconverge, are available at Bioinformatics online.

Download Full-text

DepLogo: visualizing sequence dependencies in R

Bioinformatics ◽

10.1093/bioinformatics/btz507 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4812-4814 ◽

Cited By ~ 2

Author(s):

Jan Grau ◽

Martin Nettling ◽

Jens Keilwagen

Keyword(s):

Mutual Information ◽

Sequence Data ◽

Source Code ◽

Protein Sequences ◽

R Package ◽

Supplementary Information ◽

Supplementary Data ◽

Sequence Logos ◽

End Sequences ◽

Dependency Structures

Abstract Summary Statistical dependencies are present in a variety of sequence data, but are not discernible from traditional sequence logos. Here, we present the R package DepLogo for visualizing inter-position dependencies in aligned sequence data as dependency logos. Dependency logos make dependency structures, which correspond to regular co-occurrences of symbols at dependent positions, visually perceptible. To this end, sequences are partitioned based on their symbols at highly dependent positions as measured by mutual information, and each partition obtains its own visual representation. We illustrate the utility of the DepLogo package in several use cases generating dependency logos from DNA, RNA and protein sequences. Availability and implementation The DepLogo R package is available from CRAN and its source code is available at https://github.com/Jstacs/DepLogo. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

DysRegSig: an R package for identifying gene dysregulations and building mechanistic signatures in cancer

Bioinformatics ◽

10.1093/bioinformatics/btaa688 ◽

2020 ◽

Author(s):

Quanxue Li ◽

Wentao Dai ◽

Jixiang Liu ◽

Qingqing Sang ◽

Yi-Xue Li ◽

...

Keyword(s):

Source Code ◽

R Package ◽

Supplementary Information ◽

High Dimensional ◽

Supplementary Data ◽

Analysis Framework ◽

Gene Dysregulation ◽

Cell Processes ◽

Effective Path

Abstract Summary Dysfunctional regulations of gene expression programs relevant to fundamental cell processes can drive carcinogenesis. Therefore, systematically identifying dysregulation events is an effective path for understanding carcinogenesis and provides insightful clues to build predictive signatures with mechanistic interpretability for cancer precision medicine. Here, we implemented a machine learning-based gene dysregulation analysis framework in an R package, DysRegSig, which is capable of exploring gene dysregulations from high-dimensional data and building mechanistic signature based on gene dysregulations. DysRegSig can serve as an easy-to-use tool to facilitate gene dysregulation analysis and follow-up analysis. Availability and implementation The source code and user’s guide of DysRegSig are freely available at Github: https://github.com/SCBIT-YYLab/DysRegSig. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

KEC: unique sequence search by K-mer exclusion

Bioinformatics ◽

10.1093/bioinformatics/btab196 ◽

2021 ◽

Author(s):

Pavel Beran ◽

Dagmar Stehlíková ◽

Stephen P Cohen ◽

Vladislav Čurn

Keyword(s):

Amino Acid ◽

Nucleic Acid ◽

Source Code ◽

Unique Sequence ◽

Supplementary Information ◽

Supplementary Data ◽

Laptop Computers ◽

Sequence Search ◽

Target Sequences ◽

Cross Reference

Abstract Summary Searching for amino acid or nucleic acid sequences unique to one organism may be challenging depending on size of the available datasets. K-mer elimination by cross-reference (KEC) allows users to quickly and easily find unique sequences by providing target and non-target sequences. Due to its speed, it can be used for datasets of genomic size and can be run on desktop or laptop computers with modest specifications. Availability and implementation KEC is freely available for non-commercial purposes. Source code and executable binary files compiled for Linux, Mac and Windows can be downloaded from https://github.com/berybox/KEC. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

BioCommons: a robust java library for RNA structural bioinformatics

Bioinformatics ◽

10.1093/bioinformatics/btab069 ◽

2021 ◽

Author(s):

Tomasz Zok

Keyword(s):

Source Code ◽

Structural Bioinformatics ◽

Supplementary Information ◽

Supplementary Data ◽

Bioinformatic Tools ◽

Data Formats ◽

Central Repository ◽

Diverse Data ◽

2D And 3D ◽

Java Library

Abstract Motivation Biomolecular structures come in multiple representations and diverse data formats. Their incompatibility with the requirements of data analysis programs significantly hinders the analytics and the creation of new structure-oriented bioinformatic tools. Therefore, the need for robust libraries of data processing functions is still growing. Results BioCommons is an open-source, Java library for structural bioinformatics. It contains many functions working with the 2D and 3D structures of biomolecules, with a particular emphasis on RNA. Availability and implementation The library is available in Maven Central Repository and its source code is hosted on GitHub: https://github.com/tzok/BioCommons Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

DEsingle for detecting three types of differential expression in single-cell RNA-seq data

10.1101/173997 ◽

2017 ◽

Cited By ~ 1

Author(s):

Zhun Miao ◽

Ke Deng ◽

Xiaowo Wang ◽

Xuegong Zhang

Keyword(s):

Single Cell ◽

Differential Expression ◽

Negative Binomial ◽

Single Cells ◽

R Package ◽

Supplementary Information ◽

Binomial Model ◽

Supplementary Data ◽

Rna Seq ◽

Real Zeros

AbstractSummaryThe excessive amount of zeros in single-cell RNA-seq data include “real” zeros due to the on-off nature of gene transcription in single cells and “dropout” zeros due to technical reasons. Existing differential expression (DE) analysis methods cannot distinguish these two types of zeros. We developed an R package DEsingle which employed Zero-Inflated Negative Binomial model to estimate the proportion of real and dropout zeros and to define and detect 3 types of DE genes in single-cell RNA-seq data with higher accuracy.Availability and ImplementationThe R package DEsingle is freely available at https://github.com/miaozhun/DEsingle and is under Bioconductor’s consideration [email protected] informationSupplementary data are available at bioRxiv online.

Download Full-text

SVIM-asm: Structural variant detection from haploid and diploid genome assemblies

10.1101/2020.10.27.356907 ◽

2020 ◽

Author(s):

David Heller ◽

Martin Vingron

Keyword(s):

Genetic Information ◽

Source Code ◽

Supplementary Information ◽

Supplementary Data ◽

Diploid Genome ◽

Insertions And Deletions ◽

Structural Variant ◽

Sequencing Technologies ◽

Variant Detection ◽

Genome Assemblies

AbstractMotivationWith the availability of new sequencing technologies, the generation of haplotype-resolved genome assemblies up to chromosome scale has become feasible. These assemblies capture the complete genetic information of both parental haplotypes, increase structural variant (SV) calling sensitivity and enable direct genotyping and phasing of SVs. Yet, existing SV callers are designed for haploid genome assemblies only, do not support genotyping or detect only a limited set of SV classes.ResultsWe introduce our method SVIM-asm for the detection and genotyping of six common classes of SVs from haploid and diploid genome assemblies. Compared against the only other existing SV caller for diploid assemblies, DipCall, SVIM-asm detects more SV classes and reached higher F1 scores for the detection of insertions and deletions on two recently published assemblies of the HG002 individual.Availability and ImplementationSVIM-asm has been implemented in Python and can be easily installed via bioconda. Its source code is available at github.com/eldariont/[email protected] informationSupplementary data are available online.

Download Full-text

ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions

Bioinformatics ◽

10.1093/bioinformatics/btz431 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4754-4756 ◽

Cited By ~ 29

Author(s):

Egor Dolzhenko ◽

Viraj Deshpande ◽

Felix Schlesinger ◽

Peter Krusche ◽

Roman Petrovski ◽

...

Keyword(s):

Tandem Repeat ◽

Broad Class ◽

Source Code ◽

Computational Method ◽

Supplementary Information ◽

Dna Repeats ◽

Supplementary Data ◽

Sequence Graph ◽

Version 2.0 ◽

Short Tandem

Abstract Summary We describe a novel computational method for genotyping repeats using sequence graphs. This method addresses the long-standing need to accurately genotype medically important loci containing repeats adjacent to other variants or imperfect DNA repeats such as polyalanine repeats. Here we introduce a new version of our repeat genotyping software, ExpansionHunter, that uses this method to perform targeted genotyping of a broad class of such loci. Availability and implementation ExpansionHunter is implemented in C++ and is available under the Apache License Version 2.0. The source code, documentation, and Linux/macOS binaries are available at https://github.com/Illumina/ExpansionHunter/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

LIONS: analysis suite for detecting and quantifying transposable element initiated transcription from RNA-seq

Bioinformatics ◽

10.1093/bioinformatics/btz130 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3839-3841 ◽

Cited By ~ 6

Author(s):

Artem Babaian ◽

I Richard Thompson ◽

Jake Lever ◽

Liane Gagnier ◽

Mohammad M Karimi ◽

...

Keyword(s):

Transposable Elements ◽

Transposable Element ◽

Test Data ◽

Source Code ◽

Supplementary Information ◽

Transcriptional Networks ◽

Supplementary Data ◽

Rna Seq ◽

Transcriptional Initiation ◽

Instruction Manual

Abstract Summary Transposable elements (TEs) influence the evolution of novel transcriptional networks yet the specific and meaningful interpretation of how TE-derived transcriptional initiation contributes to the transcriptome has been marred by computational and methodological deficiencies. We developed LIONS for the analysis of RNA-seq data to specifically detect and quantify TE-initiated transcripts. Availability and implementation Source code, container, test data and instruction manual are freely available at www.github.com/ababaian/LIONS. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

OpenBioLink: a benchmarking framework for large-scale biomedical link prediction

Bioinformatics ◽

10.1093/bioinformatics/btaa274 ◽

2020 ◽

Vol 36 (13) ◽

pp. 4097-4098 ◽

Cited By ~ 3

Author(s):

Anna Breit ◽

Simon Ott ◽

Asan Agibetov ◽

Matthias Samwald

Keyword(s):

Link Prediction ◽

Large Scale ◽

Source Code ◽

Machine Learning Algorithms ◽

Knowledge Networks ◽

Supplementary Information ◽

Supplementary Data ◽

Biomedical Knowledge ◽

High Quality ◽

Baseline Evaluation

Abstract Summary Recently, novel machine-learning algorithms have shown potential for predicting undiscovered links in biomedical knowledge networks. However, dedicated benchmarks for measuring algorithmic progress have not yet emerged. With OpenBioLink, we introduce a large-scale, high-quality and highly challenging biomedical link prediction benchmark to transparently and reproducibly evaluate such algorithms. Furthermore, we present preliminary baseline evaluation results. Availability and implementation Source code and data are openly available at https://github.com/OpenBioLink/OpenBioLink. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text