Multi-layered network-based pathway activity inference using directed random walks: application to predicting clinical outcomes in urologic cancer

Bioinformatics ◽

10.1093/bioinformatics/btab086 ◽

2021 ◽

Author(s):

So Yeon Kim ◽

Eun Kyung Choe ◽

Manu Shivakumar ◽

Dokyoon Kim ◽

Kyung-Ah Sohn

Keyword(s):

Supplementary Information ◽

Integrative Approach ◽

R Software ◽

Pathway Activity ◽

Source Codes ◽

Prognostic Features ◽

Molecular Features ◽

Urologic Cancer ◽

Inference Methods ◽

Urologic Cancers

Abstract Motivation To better understand the molecular features of cancers, a comprehensive analysis using multi-omics data has been conducted. In addition, a pathway activity inference method has been developed to facilitate the integrative effects of multiple genes. In this respect, we have recently proposed a novel integrative pathway activity inference approach, iDRW and demonstrated the effectiveness of the method with respect to dichotomizing two survival groups. However, there were several limitations, such as a lack of generality. In this study, we designed a directed gene–gene graph using pathway information by assigning interactions between genes in multiple layers of networks. Results As a proof-of-concept study, it was evaluated using three genomic profiles of urologic cancer patients. The proposed integrative approach achieved improved outcome prediction performances compared with a single genomic profile alone and other existing pathway activity inference methods. The integrative approach also identified common/cancer-specific candidate driver pathways as predictive prognostic features in urologic cancers. Furthermore, it provides better biological insights into the prioritized pathways and genes in an integrated view using a multi-layered gene–gene network. Our framework is not specifically designed for urologic cancers and can be generally applicable for various datasets. Availability and implementation iDRW is implemented as the R software package. The source codes are available at https://github.com/sykim122/iDRW. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Multi-layered network-based pathway activity inference using directed random walks: application to predicting clinical outcomes in urologic cancer

10.1101/2020.07.22.163949 ◽

2020 ◽

Author(s):

So Yeon Kim ◽

Eun Kyung Choe ◽

Manu Shivakumar ◽

Dokyoon Kim ◽

Kyung-Ah Sohn

Keyword(s):

Integrative Approach ◽

Proof Of Concept ◽

R Software ◽

Pathway Activity ◽

Source Codes ◽

Prognostic Features ◽

Molecular Features ◽

Urologic Cancer ◽

Inference Methods ◽

Urologic Cancers

AbstractMotivationTo better understand the molecular features of cancers, a comprehensive analysis using multi-omics data has been conducted. Additionally, a pathway activity inference method has been developed to facilitate the integrative effects of multiple genes. In this respect, we have recently proposed a novel integrative pathway activity inference approach, iDRW, and demonstrated the effectiveness of the method with respect to dichotomizing two survival groups. However, there were several limitations, such as a lack of generality. In this study, we designed a directed gene-gene graph using pathway information by assigning interactions between genes in multiple layers of networks.ResultsAs a proof-of-concept study, it was evaluated using three genomic profiles of urologic cancer patients. The proposed integrative approach achieved improved outcome prediction performances compared with a single genomic profile alone and other existing pathway activity inference methods. The integrative approach also identified common/cancer-specific candidate driver pathways as predictive prognostic features in urologic cancers. Furthermore, it provides better biological insights into the prioritized pathways and genes in an integrated view using a multi-layered gene-gene network. Our framework is not specifically designed for urologic cancers and can be generally applicable for various datasets.AvailabilityiDRW is implemented as the R software package. The source codes are available at https://github.com/sykim122/iDRW.

Download Full-text

DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays

Bioinformatics ◽

10.1093/bioinformatics/bty1054 ◽

2019 ◽

Vol 35 (17) ◽

pp. 3055-3062 ◽

Cited By ~ 72

Author(s):

Amrit Singh ◽

Casey P Shannon ◽

Benoît Gautier ◽

Florian Rohart ◽

Michaël Vacher ◽

...

Keyword(s):

Data Integration ◽

Biomarker Discovery ◽

Predictive Performance ◽

Supplementary Information ◽

Integrative Approach ◽

Data Types ◽

Common Information ◽

Molecular Features ◽

Study Designs ◽

Integration Analysis

Abstract Motivation In the continuously expanding omics era, novel computational and statistical strategies are needed for data integration and identification of biomarkers and molecular signatures. We present Data Integration Analysis for Biomarker discovery using Latent cOmponents (DIABLO), a multi-omics integrative method that seeks for common information across different data types through the selection of a subset of molecular features, while discriminating between multiple phenotypic groups. Results Using simulations and benchmark multi-omics studies, we show that DIABLO identifies features with superior biological relevance compared with existing unsupervised integrative methods, while achieving predictive performance comparable to state-of-the-art supervised approaches. DIABLO is versatile, allowing for modular-based analyses and cross-over study designs. In two case studies, DIABLO identified both known and novel multi-omics biomarkers consisting of mRNAs, miRNAs, CpGs, proteins and metabolites. Availability and implementation DIABLO is implemented in the mixOmics R Bioconductor package with functions for parameters’ choice and visualization to assist in the interpretation of the integrative analyses, along with tutorials on http://mixomics.org and in our Bioconductor vignette. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Structure-aware protein–protein interaction site prediction using deep graph convolutional network

Bioinformatics ◽

10.1093/bioinformatics/btab643 ◽

2021 ◽

Author(s):

Qianmu Yuan ◽

Jianwen Chen ◽

Huiying Zhao ◽

Yaoqi Zhou ◽

Yuedong Yang

Keyword(s):

Protein Interactions ◽

Spatial Information ◽

Screening Tools ◽

Supplementary Information ◽

Protein Protein Interactions ◽

Convolutional Network ◽

Source Codes ◽

Site Prediction ◽

Protein Protein Interaction ◽

Mapping Techniques

Abstract Motivation Protein–protein interactions (PPI) play crucial roles in many biological processes, and identifying PPI sites is an important step for mechanistic understanding of diseases and design of novel drugs. Since experimental approaches for PPI site identification are expensive and time-consuming, many computational methods have been developed as screening tools. However, these methods are mostly based on neighbored features in sequence, and thus limited to capture spatial information. Results We propose a deep graph-based framework deep Graph convolutional network for Protein–Protein-Interacting Site prediction (GraphPPIS) for PPI site prediction, where the PPI site prediction problem was converted into a graph node classification task and solved by deep learning using the initial residual and identity mapping techniques. We showed that a deeper architecture (up to eight layers) allows significant performance improvement over other sequence-based and structure-based methods by more than 12.5% and 10.5% on AUPRC and MCC, respectively. Further analyses indicated that the predicted interacting sites by GraphPPIS are more spatially clustered and closer to the native ones even when false-positive predictions are made. The results highlight the importance of capturing spatially neighboring residues for interacting site prediction. Availability and implementation The datasets, the pre-computed features, and the source codes along with the pre-trained models of GraphPPIS are available at https://github.com/biomed-AI/GraphPPIS. The GraphPPIS web server is freely available at https://biomed.nscc-gz.cn/apps/GraphPPIS. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SimiC: A Single Cell Gene Regulatory Network Inference method with Similarity Constraints

10.1101/2020.04.03.023002 ◽

2020 ◽

Author(s):

Jianhao Peng ◽

Ullas V. Chembazhi ◽

Sushant Bangru ◽

Ian M. Traniello ◽

Auinash Kalsotra ◽

...

Keyword(s):

Single Cell ◽

Network Inference ◽

Regional Analysis ◽

Supplementary Information ◽

Inference Method ◽

Gene Regulatory Network Inference ◽

Inference Problem ◽

Cell State ◽

Gene Regulatory ◽

Inference Methods

AbstractMotivationWith the use of single-cell RNA sequencing (scRNA-Seq) technologies, it is now possible to acquire gene expression data for each individual cell in samples containing up to millions of cells. These cells can be further grouped into different states along an inferred cell differentiation path, which are potentially characterized by similar, but distinct enough, gene regulatory networks (GRNs). Hence, it would be desirable for scRNA-Seq GRN inference methods to capture the GRN dynamics across cell states. However, current GRN inference methods produce a unique GRN per input dataset (or independent GRNs per cell state), failing to capture these regulatory dynamics.ResultsWe propose a novel single-cell GRN inference method, named SimiC, that jointly infers the GRNs corresponding to each state. SimiC models the GRN inference problem as a LASSO optimization problem with an added similarity constraint, on the GRNs associated to contiguous cell states, that captures the inter-cell-state homogeneity. We show on a mouse hepatocyte single-cell data generated after partial hepatectomy that, contrary to previous GRN methods for scRNA-Seq data, SimiC is able to capture the transcription factor (TF) dynamics across liver regeneration, as well as the cell-level behavior for the regulatory program of each TF across cell states. In addition, on a honey bee scRNA-Seq experiment, SimiC is able to capture the increased heterogeneity of cells on whole-brain tissue with respect to a regional analysis tissue, and the TFs associated specifically to each sequenced tissue.AvailabilitySimiC is written in Python and includes an R API. It can be downloaded from https://github.com/jianhao2016/[email protected], [email protected] informationSupplementary data are available at the code repository.

Download Full-text

VisFeature: a stand-alone program for visualizing and analyzing statistical features of biological sequences

Bioinformatics ◽

10.1093/bioinformatics/btz689 ◽

2019 ◽

Cited By ~ 3

Author(s):

Jun Wang ◽

Pu-Feng Du ◽

Xin-Yu Xue ◽

Guang-Ping Li ◽

Yuan-Ke Zhou ◽

...

Keyword(s):

Sequence Data ◽

Software Tool ◽

Data Retrieval ◽

Supplementary Information ◽

Statistical Features ◽

Biological Sequence ◽

Sequence Alignments ◽

Multiple Sequence ◽

Source Codes ◽

Multiple Sequence Alignments

Abstract Summary Many efforts have been made in developing bioinformatics algorithms to predict functional attributes of genes and proteins from their primary sequences. One challenge in this process is to intuitively analyze and to understand the statistical features that have been selected by heuristic or iterative methods. In this paper, we developed VisFeature, which aims to be a helpful software tool that allows the users to intuitively visualize and analyze statistical features of all types of biological sequence, including DNA, RNA and proteins. VisFeature also integrates sequence data retrieval, multiple sequence alignments and statistical feature generation functions. Availability and implementation VisFeature is a desktop application that is implemented using JavaScript/Electron and R. The source codes of VisFeature are freely accessible from the GitHub repository (https://github.com/wangjun1996/VisFeature). The binary release, which includes an example dataset, can be freely downloaded from the same GitHub repository (https://github.com/wangjun1996/VisFeature/releases). Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

An integrative approach for fine-mapping chromatin interactions

Bioinformatics ◽

10.1093/bioinformatics/btz843 ◽

2019 ◽

Vol 36 (6) ◽

pp. 1704-1711

Author(s):

Artur Jaroszewicz ◽

Jason Ernst

Keyword(s):

Gene Regulation ◽

High Resolution ◽

Biological Significance ◽

Computational Method ◽

Supplementary Information ◽

Integrative Approach ◽

Genome Architecture ◽

Open Chromatin ◽

Chromatin Interactions ◽

Genome Wide

Abstract Motivation Chromatin interactions play an important role in genome architecture and gene regulation. The Hi-C assay generates such interactions maps genome-wide, but at relatively low resolutions (e.g. 5-25 kb), which is substantially coarser than the resolution of transcription factor binding sites or open chromatin sites that are potential sources of such interactions. Results To predict the sources of Hi-C-identified interactions at a high resolution (e.g. 100 bp), we developed a computational method that integrates data from DNase-seq and ChIP-seq of TFs and histone marks. Our method, χ-CNN, uses this data to first train a convolutional neural network (CNN) to discriminate between called Hi-C interactions and non-interactions. χ-CNN then predicts the high-resolution source of each Hi-C interaction using a feature attribution method. We show these predictions recover original Hi-C peaks after extending them to be coarser. We also show χ-CNN predictions enrich for evolutionarily conserved bases, eQTLs and CTCF motifs, supporting their biological significance. χ-CNN provides an approach for analyzing important aspects of genome architecture and gene regulation at a higher resolution than previously possible. Availability and implementation χ-CNN software is available on GitHub (https://github.com/ernstlab/X-CNN). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ScaleQC: a scalable lossy to lossless solution for NGS data compression

Bioinformatics ◽

10.1093/bioinformatics/btaa543 ◽

2020 ◽

Vol 36 (17) ◽

pp. 4551-4559 ◽

Cited By ~ 1

Author(s):

Rongshan Yu ◽

Wenxian Yang

Keyword(s):

Lossless Compression ◽

Supplementary Information ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Source Codes ◽

Compression Performance ◽

Data Rates ◽

Quality Value ◽

Ngs Data ◽

Bit Stream

Abstract Motivation Per-base quality values in Next Generation Sequencing data take a significant portion of storage even after compression. Lossy compression technologies could further reduce the space used by quality values. However, in many applications, lossless compression is still desired. Hence, sequencing data in multiple file formats have to be prepared for different applications. Results We developed a scalable lossy to lossless compression solution for quality values named ScaleQC (Scalable Quality value Compression). ScaleQC is able to provide the so-called bit-stream level scalability that the losslessly compressed bit-stream by ScaleQC can be further truncated to lower data rates without incurring an expensive transcoding operation. Despite its scalability, ScaleQC still achieves comparable compression performance at both lossless and lossy data rates compared to the existing lossless or lossy compressors. Availability and implementation ScaleQC has been integrated with SAMtools as a special quality value encoding mode for CRAM. Its source codes can be obtained from our integrated SAMtools (https://github.com/xmuyulab/samtools) with dependency on integrated HTSlib (https://github.com/xmuyulab/htslib). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Network-based characterization of disease–disease relationships in terms of drugs and therapeutic targets

Bioinformatics ◽

10.1093/bioinformatics/btaa439 ◽

2020 ◽

Vol 36 (Supplement_1) ◽

pp. i516-i524

Author(s):

Midori Iida ◽

Michio Iwata ◽

Yoshihiro Yamanishi

Keyword(s):

Large Scale ◽

Expression Patterns ◽

Therapeutic Targets ◽

Molecular Networks ◽

Supplementary Information ◽

New Associations ◽

Disease States ◽

Molecular Features ◽

Novel Approach

Abstract Motivation Disease states are distinguished from each other in terms of differing clinical phenotypes, but characteristic molecular features are often common to various diseases. Similarities between diseases can be explained by characteristic gene expression patterns. However, most disease–disease relationships remain uncharacterized. Results In this study, we proposed a novel approach for network-based characterization of disease–disease relationships in terms of drugs and therapeutic targets. We performed large-scale analyses of omics data and molecular interaction networks for 79 diseases, including adrenoleukodystrophy, leukaemia, Alzheimer's disease, asthma, atopic dermatitis, breast cancer, cystic fibrosis and inflammatory bowel disease. We quantified disease–disease similarities based on proximities of abnormally expressed genes in various molecular networks, and showed that similarities between diseases could be explained by characteristic molecular network topologies. Furthermore, we developed a kernel matrix regression algorithm to predict the commonalities of drugs and therapeutic targets among diseases. Our comprehensive prediction strategy indicated many new associations among phenotypically diverse diseases. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ADFinder: accurate detection of programmed DNA elimination using NGS high-throughput sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btaa226 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3632-3636 ◽

Cited By ~ 2

Author(s):

Weibo Zheng ◽

Jing Chen ◽

Thomas G Doak ◽

Weibo Song ◽

Ying Yan

Keyword(s):

High Throughput ◽

Large Scale ◽

High Throughput Sequencing ◽

Supplementary Information ◽

Sequencing Data ◽

Source Codes ◽

High Throughput Sequencing Data ◽

Dna Elimination ◽

Multiple Alternative ◽

Dna Splicing

Abstract Motivation Programmed DNA elimination (PDE) plays a crucial role in the transitions between germline and somatic genomes in diverse organisms ranging from unicellular ciliates to multicellular nematodes. However, software specific for the detection of DNA splicing events is scarce. In this paper, we describe Accurate Deletion Finder (ADFinder), an efficient detector of PDEs using high-throughput sequencing data. ADFinder can predict PDEs with relatively low sequencing coverage, detect multiple alternative splicing forms in the same genomic location and calculate the frequency for each splicing event. This software will facilitate research of PDEs and all down-stream analyses. Results By analyzing genome-wide DNA splicing events in two micronuclear genomes of Oxytricha trifallax and Tetrahymena thermophila, we prove that ADFinder is effective in predicting large scale PDEs. Availability and implementation The source codes and manual of ADFinder are available in our GitHub website: https://github.com/weibozheng/ADFinder. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Beyond IDH-Mutation: Emerging Molecular Diagnostic and Prognostic Features in Adult Diffuse Gliomas

Cancers ◽

10.3390/cancers12071817 ◽

2020 ◽

Vol 12 (7) ◽

pp. 1817 ◽

Cited By ~ 4

Author(s):

Kanish Mirchia ◽

Timothy E. Richardson

Keyword(s):

Central Nervous System ◽

Nervous System ◽

Prognostic Factors ◽

The United States ◽

Molecular Diagnostic ◽

Grading System ◽

Diffuse Glioma ◽

Prognostic Features ◽

Molecular Features ◽

Diffuse Gliomas

Diffuse gliomas are among the most common adult central nervous system tumors with an annual incidence of more than 16,000 cases in the United States. Until very recently, the diagnosis of these tumors was based solely on morphologic features, however, with the publication of the WHO Classification of Tumours of the Central Nervous System, revised 4th edition in 2016, certain molecular features are now included in the official diagnostic and grading system. One of the most significant of these changes has been the division of adult astrocytomas into IDH-wildtype and IDH-mutant categories in addition to histologic grade as part of the main-line diagnosis, although a great deal of heterogeneity in the clinical outcome still remains to be explained within these categories. Since then, numerous groups have been working to identify additional biomarkers and prognostic factors in diffuse gliomas to help further stratify these tumors in hopes of producing a more complete grading system, as well as understanding the underlying biology that results in differing outcomes. The field of neuro-oncology is currently in the midst of a “molecular revolution” in which increasing emphasis is being placed on genetic and epigenetic features driving current diagnostic, prognostic, and predictive considerations. In this review, we focus on recent advances in adult diffuse glioma biomarkers and prognostic factors and summarize the state of the field.

Download Full-text