Multiscale part mutual information for quantifying nonlinear direct associations in networks

Abstract Motivation For network-assisted analysis, which has become a popular method of data mining, network construction is a crucial task. Network construction relies on the accurate quantification of direct associations among variables. The existence of multiscale associations among variables presents several quantification challenges, especially when quantifying nonlinear direct interactions. Results In this study, the multiscale part mutual information (MPMI), based on part mutual information (PMI) and nonlinear partial association (NPA), was developed for effectively quantifying nonlinear direct associations among variables in networks with multiscale associations. First, we defined the MPMI in theory and derived its five important properties. Second, an experiment in a three-node network was carried out to numerically estimate its quantification ability under two cases of strong associations. Third, experiments of the MPMI and comparisons with the PMI, NPA and conditional mutual information were performed on simulated datasets and on datasets from DREAM challenge project. Finally, the MPMI was applied to real datasets of glioblastoma and lung adenocarcinoma to validate its effectiveness. Results showed that the MPMI is an effective alternative measure for quantifying nonlinear direct associations in networks, especially those with multiscale associations. Availability The source code of MPMI is available online at https://github.com/CDMB-lab/MPMI. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

DepLogo: visualizing sequence dependencies in R

Bioinformatics ◽

10.1093/bioinformatics/btz507 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4812-4814 ◽

Cited By ~ 2

Author(s):

Jan Grau ◽

Martin Nettling ◽

Jens Keilwagen

Keyword(s):

Mutual Information ◽

Sequence Data ◽

Source Code ◽

Protein Sequences ◽

R Package ◽

Supplementary Information ◽

Supplementary Data ◽

Sequence Logos ◽

End Sequences ◽

Dependency Structures

Abstract Summary Statistical dependencies are present in a variety of sequence data, but are not discernible from traditional sequence logos. Here, we present the R package DepLogo for visualizing inter-position dependencies in aligned sequence data as dependency logos. Dependency logos make dependency structures, which correspond to regular co-occurrences of symbols at dependent positions, visually perceptible. To this end, sequences are partitioned based on their symbols at highly dependent positions as measured by mutual information, and each partition obtains its own visual representation. We illustrate the utility of the DepLogo package in several use cases generating dependency logos from DNA, RNA and protein sequences. Availability and implementation The DepLogo R package is available from CRAN and its source code is available at https://github.com/Jstacs/DepLogo. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

KEC: unique sequence search by K-mer exclusion

Bioinformatics ◽

10.1093/bioinformatics/btab196 ◽

2021 ◽

Author(s):

Pavel Beran ◽

Dagmar Stehlíková ◽

Stephen P Cohen ◽

Vladislav Čurn

Keyword(s):

Amino Acid ◽

Nucleic Acid ◽

Source Code ◽

Unique Sequence ◽

Supplementary Information ◽

Supplementary Data ◽

Laptop Computers ◽

Sequence Search ◽

Target Sequences ◽

Cross Reference

Abstract Summary Searching for amino acid or nucleic acid sequences unique to one organism may be challenging depending on size of the available datasets. K-mer elimination by cross-reference (KEC) allows users to quickly and easily find unique sequences by providing target and non-target sequences. Due to its speed, it can be used for datasets of genomic size and can be run on desktop or laptop computers with modest specifications. Availability and implementation KEC is freely available for non-commercial purposes. Source code and executable binary files compiled for Linux, Mac and Windows can be downloaded from https://github.com/berybox/KEC. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

BioCommons: a robust java library for RNA structural bioinformatics

Bioinformatics ◽

10.1093/bioinformatics/btab069 ◽

2021 ◽

Author(s):

Tomasz Zok

Keyword(s):

Source Code ◽

Structural Bioinformatics ◽

Supplementary Information ◽

Supplementary Data ◽

Bioinformatic Tools ◽

Data Formats ◽

Central Repository ◽

Diverse Data ◽

2D And 3D ◽

Java Library

Abstract Motivation Biomolecular structures come in multiple representations and diverse data formats. Their incompatibility with the requirements of data analysis programs significantly hinders the analytics and the creation of new structure-oriented bioinformatic tools. Therefore, the need for robust libraries of data processing functions is still growing. Results BioCommons is an open-source, Java library for structural bioinformatics. It contains many functions working with the 2D and 3D structures of biomolecules, with a particular emphasis on RNA. Availability and implementation The library is available in Maven Central Repository and its source code is hosted on GitHub: https://github.com/tzok/BioCommons Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Efficient multi-sensor exploration using dependent observations and conditional mutual information

2016 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR) ◽

10.1109/ssrr.2016.7784275 ◽

2016 ◽

Cited By ~ 3

Author(s):

Wennie Tabib ◽

Red Whittaker ◽

Nathan Michael

Keyword(s):

Mutual Information ◽

Conditional Mutual Information ◽

Dependent Observations

Download Full-text

Multi-dimensional conditional mutual information with application on the EEG signal analysis for spatial cognitive ability evaluation

Neural Networks ◽

10.1016/j.neunet.2021.12.010 ◽

2021 ◽

Author(s):

Dong Wen ◽

Rou Li ◽

Mengmeng Jiang ◽

Jingjing Li ◽

Yijun Liu ◽

...

Keyword(s):

Mutual Information ◽

Cognitive Ability ◽

Signal Analysis ◽

Eeg Signal ◽

Conditional Mutual Information ◽

Eeg Signal Analysis ◽

Ability Evaluation

Download Full-text

SMILE: Mutual Information Learning for Integration of Single-cell Omics Data

Bioinformatics ◽

10.1093/bioinformatics/btab706 ◽

2021 ◽

Author(s):

Yang Xu ◽

Priyojit Das ◽

Rachel Patton McCord

Keyword(s):

Deep Learning ◽

Mutual Information ◽

Single Cell ◽

Learning Algorithm ◽

Cellular Systems ◽

Supplementary Information ◽

Omics Data ◽

Learning Approaches ◽

Rna Seq ◽

Integrate Data

Abstract Motivation Deep learning approaches have empowered single-cell omics data analysis in many ways and generated new insights from complex cellular systems. As there is an increasing need for single cell omics data to be integrated across sources, types, and features of data, the challenges of integrating single-cell omics data are rising. Here, we present an unsupervised deep learning algorithm that learns discriminative representations for single-cell data via maximizing mutual information, SMILE (Single-cell Mutual Information Learning). Results Using a unique cell-pairing design, SMILE successfully integrates multi-source single-cell transcriptome data, removing batch effects and projecting similar cell types, even from different tissues, into the shared space. SMILE can also integrate data from two or more modalities, such as joint profiling technologies using single-cell ATAC-seq, RNA-seq, DNA methylation, Hi-C, and ChIP data. When paired cells are known, SMILE can integrate data with unmatched feature, such as genes for RNA-seq and genome wide peaks for ATAC-seq. Integrated representations learned from joint profiling technologies can then be used as a framework for comparing independent single source data. Supplementary information Supplementary data are available at Bioinformatics online. The source code of SMILE including analyses of key results in the study can be found at: https://github.com/rpmccordlab/SMILE.

Download Full-text

Bulk private curves require large conditional mutual information

Journal of High Energy Physics ◽

10.1007/jhep09(2021)042 ◽

2021 ◽

Vol 2021 (9) ◽

Author(s):

Alex May

Keyword(s):

Mutual Information ◽

Boundary Region ◽

Theoretic Approach ◽

Strong Correlations ◽

Conditional Mutual Information ◽

Information Theoretic ◽

Causal Curve ◽

Resource Requirements ◽

Theoretic Argument ◽

Information Theoretic Approach

Abstract We prove a theorem showing that the existence of “private” curves in the bulk of AdS implies two regions of the dual CFT share strong correlations. A private curve is a causal curve which avoids the entanglement wedge of a specified boundary region $$ \mathcal{U} $$ U . The implied correlation is measured by the conditional mutual information $$ I\left({\mathcal{V}}_1:\left.{\mathcal{V}}_2\right|\mathcal{U}\right) $$ I V 1 : V 2 U , which is O(1/GN) when a private causal curve exists. The regions $$ {\mathcal{V}}_1 $$ V 1 and $$ {\mathcal{V}}_2 $$ V 2 are specified by the endpoints of the causal curve and the placement of the region $$ \mathcal{U} $$ U . This gives a causal perspective on the conditional mutual information in AdS/CFT, analogous to the causal perspective on the mutual information given by earlier work on the connected wedge theorem. We give an information theoretic argument for our theorem, along with a bulk geometric proof. In the geometric perspective, the theorem follows from the maximin formula and entanglement wedge nesting. In the information theoretic approach, the theorem follows from resource requirements for sending private messages over a public quantum channel.

Download Full-text

SVIM-asm: Structural variant detection from haploid and diploid genome assemblies

10.1101/2020.10.27.356907 ◽

2020 ◽

Author(s):

David Heller ◽

Martin Vingron

Keyword(s):

Genetic Information ◽

Source Code ◽

Supplementary Information ◽

Supplementary Data ◽

Diploid Genome ◽

Insertions And Deletions ◽

Structural Variant ◽

Sequencing Technologies ◽

Variant Detection ◽

Genome Assemblies

AbstractMotivationWith the availability of new sequencing technologies, the generation of haplotype-resolved genome assemblies up to chromosome scale has become feasible. These assemblies capture the complete genetic information of both parental haplotypes, increase structural variant (SV) calling sensitivity and enable direct genotyping and phasing of SVs. Yet, existing SV callers are designed for haploid genome assemblies only, do not support genotyping or detect only a limited set of SV classes.ResultsWe introduce our method SVIM-asm for the detection and genotyping of six common classes of SVs from haploid and diploid genome assemblies. Compared against the only other existing SV caller for diploid assemblies, DipCall, SVIM-asm detects more SV classes and reached higher F1 scores for the detection of insertions and deletions on two recently published assemblies of the HG002 individual.Availability and ImplementationSVIM-asm has been implemented in Python and can be easily installed via bioconda. Its source code is available at github.com/eldariont/[email protected] informationSupplementary data are available online.

Download Full-text

Efficient variable selection method using conditional mutual information

Journal of the Korean Data and Information Science Society ◽

10.7465/jkdi.2014.25.5.1079 ◽

2014 ◽

Vol 25 (5) ◽

pp. 1079-1094

Author(s):

Chi Kyung Ahn ◽

Donguk Kim

Keyword(s):

Mutual Information ◽

Variable Selection ◽

Selection Method ◽

Conditional Mutual Information ◽

Variable Selection Method

Download Full-text

GalaxyCloudRunner: enhancing scalable computing for Galaxy

10.1101/2020.05.28.121772 ◽

2020 ◽

Author(s):

N Goonasekera ◽

A Mahmoud ◽

J Chilton ◽

E Afgan

Keyword(s):

Source Code ◽

Supplementary Information ◽

Scalable Computing ◽

Link Type ◽

Cloud Providers ◽

Galaxy Server ◽

Cloud Resources

AbstractSummaryThe existence of more than 100 public Galaxy servers with service quotas is indicative of the need for an increased availability of compute resources for Galaxy to use. The GalaxyCloudRunner enables a Galaxy server to easily expand its available compute capacity by sending user jobs to cloud resources. User jobs are routed to the acquired resources based on a set of configurable rules and the resources can be dynamically acquired from any of 4 popular cloud providers (AWS, Azure, GCP, or OpenStack) in an automated fashion.Availability and implementationGalaxyCloudRunner is implemented in Python and leverages Docker containers. The source code is MIT licensed and available at https://github.com/cloudve/galaxycloudrunner. The documentation is available at http://gcr.cloudve.org/.ContactEnis Afgan ([email protected])Supplementary informationNone

Download Full-text