scholarly journals PAFway: pairwise associations between functional annotations in biological networks and pathways

2020 ◽  
Vol 36 (19) ◽  
pp. 4963-4964
Author(s):  
Mahiar Mahjoub ◽  
Daphne Ezer

Abstract Motivation Large gene networks can be dense and difficult to interpret in a biologically meaningful way. Results Here, we introduce PAFway, which estimates pairwise associations between functional annotations in biological networks and pathways. It answers the biological question: do genes that have a specific function tend to regulate genes that have a different specific function? The results can be visualized as a heatmap or a network of biological functions. We apply this package to reveal associations between functional annotations in an Arabidopsis thaliana gene network. Availability and implementation PAFway is submitted to CRAN. Currently available here: https://github.com/ezer/PAFway. Supplementary information Supplementary data are available at Bioinformatics online.

2020 ◽  
Vol 36 (9) ◽  
pp. 2649-2656 ◽  
Author(s):  
Van Dinh Tran ◽  
Alessandro Sperduti ◽  
Rolf Backofen ◽  
Fabrizio Costa

Abstract Motivation The identification of disease–gene associations is a task of fundamental importance in human health research. A typical approach consists in first encoding large gene/protein relational datasets as networks due to the natural and intuitive property of graphs for representing objects’ relationships and then utilizing graph-based techniques to prioritize genes for successive low-throughput validation assays. Since different types of interactions between genes yield distinct gene networks, there is the need to integrate different heterogeneous sources to improve the reliability of prioritization systems. Results We propose an approach based on three phases: first, we merge all sources in a single network, then we partition the integrated network according to edge density introducing a notion of edge type to distinguish the parts and finally, we employ a novel node kernel suitable for graphs with typed edges. We show how the node kernel can generate a large number of discriminative features that can be efficiently processed by linear regularized machine learning classifiers. We report state-of-the-art results on 12 disease–gene associations and on a time-stamped benchmark containing 42 newly discovered associations. Availability and implementation Source code: https://github.com/dinhinfotech/DiGI.git. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Vol 35 (10) ◽  
pp. 1797-1798 ◽  
Author(s):  
Han Cao ◽  
Jiayu Zhou ◽  
Emanuel Schwarz

Abstract Motivation Multi-task learning (MTL) is a machine learning technique for simultaneous learning of multiple related classification or regression tasks. Despite its increasing popularity, MTL algorithms are currently not available in the widely used software environment R, creating a bottleneck for their application in biomedical research. Results We developed an efficient, easy-to-use R library for MTL (www.r-project.org) comprising 10 algorithms applicable for regression, classification, joint predictor selection, task clustering, low-rank learning and incorporation of biological networks. We demonstrate the utility of the algorithms using simulated data. Availability and implementation The RMTL package is an open source R package and is freely available at https://github.com/transbioZI/RMTL. RMTL will also be available on cran.r-project.org. Supplementary information Supplementary data are available at Bioinformatics online.


eLife ◽  
2015 ◽  
Vol 4 ◽  
Author(s):  
Eric Wang ◽  
Shinpei Kawaoka ◽  
Jae-Seok Roe ◽  
Junwei Shi ◽  
Anja F Hohmann ◽  
...  

Most mammalian transcription factors (TFs) and cofactors occupy thousands of genomic sites and modulate the expression of large gene networks to implement their biological functions. In this study, we describe an exception to this paradigm. TRIM33 is identified here as a lineage dependency in B cell neoplasms and is shown to perform this essential function by associating with a single cis element. ChIP-seq analysis of TRIM33 in murine B cell leukemia revealed a preferential association with two lineage-specific enhancers that harbor an exceptional density of motifs recognized by the PU.1 TF. TRIM33 is recruited to these elements by PU.1, yet acts to antagonize PU.1 function. One of the PU.1/TRIM33 co-occupied enhancers is upstream of the pro-apoptotic gene Bim, and deleting this enhancer renders TRIM33 dispensable for leukemia cell survival. These findings reveal an essential role for TRIM33 in preventing apoptosis in B lymphoblastic leukemia by interfering with enhancer-mediated Bim activation.


2018 ◽  
Vol 2018 ◽  
pp. 1-11 ◽  
Author(s):  
Francisco Gómez-Vela ◽  
Domingo S. Rodriguez-Baena ◽  
José Luis Vázquez-Noguera

In the last few years, gene networks have become one of most important tools to model biological processes. Among other utilities, these networks visually show biological relationships between genes. However, due to the large amount of the currently generated genetic data, their size has grown to the point of being unmanageable. To solve this problem, it is possible to use computational approaches, such as heuristics-based methods, to analyze and optimize gene network’s structure by pruning irrelevant relationships. In this paper we present a new method, called GeSOp, to optimize large gene network structures. The method is able to perform a considerably prune of the irrelevant relationships comprising the input network. To do so, the method is based on a greedy heuristic to obtain the most relevant subnetwork. The performance of our method was tested by means of two experiments on gene networks obtained from different organisms. The first experiment shows how GeSOp is able not only to carry out a significant reduction in the size of the network, but also to maintain the biological information ratio. In the second experiment, the ability to improve the biological indicators of the network is checked. Hence, the results presented show that GeSOp is a reliable method to optimize and improve the structure of large gene networks.


2019 ◽  
Vol 35 (19) ◽  
pp. 3617-3627 ◽  
Author(s):  
Zebulun Arendsee ◽  
Jing Li ◽  
Urminder Singh ◽  
Arun Seetharam ◽  
Karin Dorman ◽  
...  

Abstract Motivation The goal of phylostratigraphy is to infer the evolutionary origin of each gene in an organism. This is done by searching for homologs within increasingly broad clades. The deepest clade that contains a homolog of the protein(s) encoded by a gene is that gene’s phylostratum. Results We have created a general R-based framework, phylostratr, to estimate the phylostratum of every gene in a species. The program fully automates analysis: selecting species for balanced representation, retrieving sequences, building databases, inferring phylostrata and returning diagnostics. Key diagnostics include: detection of genes with inferred homologs in old clades, but not intermediate ones; proteome quality assessments; false-positive diagnostics, and checks for missing organellar genomes. phylostratr allows extensive customization and systematic comparisons of the influence of analysis parameters or genomes on phylostrata inference. A user may: modify the automatically generated clade tree or use their own tree; provide custom sequences in place of those automatically retrieved from UniProt; replace BLAST with an alternative algorithm; or tailor the method and sensitivity of the homology inference classifier. We show the utility of phylostratr through case studies in Arabidopsis thaliana and Saccharomyces cerevisiae. Availability and implementation Source code available at https://github.com/arendsee/phylostratr. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (12) ◽  
pp. 3916-3917 ◽  
Author(s):  
Daniele Mercatelli ◽  
Gonzalo Lopez-Garcia ◽  
Federico M Giorgi

Abstract Motivation Gene network inference and master regulator analysis (MRA) have been widely adopted to define specific transcriptional perturbations from gene expression signatures. Several tools exist to perform such analyses but most require a computer cluster or large amounts of RAM to be executed. Results We developed corto, a fast and lightweight R package to infer gene networks and perform MRA from gene expression data, with optional corrections for copy-number variations and able to run on signatures generated from RNA-Seq or ATAC-Seq data. We extensively benchmarked it to infer context-specific gene networks in 39 human tumor and 27 normal tissue datasets. Availability and implementation Cross-platform and multi-threaded R package on CRAN (stable version) https://cran.r-project.org/package=corto and Github (development release) https://github.com/federicogiorgi/corto. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (24) ◽  
pp. 5226-5234 ◽  
Author(s):  
Sam F L Windels ◽  
Noël Malod-Dognin ◽  
Nataša Pržulj

Abstract Motivation Laplacian matrices capture the global structure of networks and are widely used to study biological networks. However, the local structure of the network around a node can also capture biological information. Local wiring patterns are typically quantified by counting how often a node touches different graphlets (small, connected, induced sub-graphs). Currently available graphlet-based methods do not consider whether nodes are in the same network neighbourhood. To combine graphlet-based topological information and membership of nodes to the same network neighbourhood, we generalize the Laplacian to the Graphlet Laplacian, by considering a pair of nodes to be ‘adjacent’ if they simultaneously touch a given graphlet. Results We utilize Graphlet Laplacians to generalize spectral embedding, spectral clustering and network diffusion. Applying Graphlet Laplacian-based spectral embedding, we visually demonstrate that Graphlet Laplacians capture biological functions. This result is quantified by applying Graphlet Laplacian-based spectral clustering, which uncovers clusters enriched in biological functions dependent on the underlying graphlet. We explain the complementarity of biological functions captured by different Graphlet Laplacians by showing that they capture different local topologies. Finally, diffusing pan-cancer gene mutation scores based on different Graphlet Laplacians, we find complementary sets of cancer-related genes. Hence, we demonstrate that Graphlet Laplacians capture topology-function and topology-disease relationships in biological networks. Availability and implementation http://www0.cs.ucl.ac.uk/staff/natasa/graphlet-laplacian/index.html Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Yuelei Zhang ◽  
Xiao Chang ◽  
Xiaoping Liu

Abstract Motivation Inferring gene regulatory networks (GRNs) from high-throughput data is an important and challenging problem in systems biology. Although numerous GRN methods have been developed, most have focused on the verification of the specific dataset. However, it is difficult to establish directed topological networks that are both suitable for time-series and non-time-series datasets due to the complexity and diversity of biological networks. Results Here, we proposed a novel method, GNIPLR (Gene networks inference based on projection and lagged regression) to infer GRNs from time-series or non-time-series gene expression data. GNIPLR projected gene data twice using the LASSO projection (LSP) algorithm and the linear projection (LP) approximation to produce a linear and monotonous pseudo-time series, and then determined the direction of regulation in combination with lagged regression analyses. The proposed algorithm was validated using simulated and real biological data. Moreover, we also applied the GNIPLR algorithm to the liver hepatocellular carcinoma (LIHC) and bladder urothelial carcinoma (BLCA) cancer expression datasets. These analyses revealed significantly higher accuracy and AUC values than other popular methods. Availabilityand implementation The GNIPLR tool is freely available at https://github.com/zyllluck/GNIPLR. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (23) ◽  
pp. 4898-4906
Author(s):  
Minhyeok Lee ◽  
Sung Won Han ◽  
Junhee Seok

Abstract Motivation Network-based analysis of biomedical data has been extensively studied over the last decades. As a successful application, gene networks have been used to illustrate interactions among genes and explain the associated phenotypes. However, the gene network approaches have not been actively applied for survival analysis, which is one of the main interests of biomedical research. In addition, a few previous studies using gene networks for survival analysis construct networks mainly from prior knowledge, such as pathways, regulations and gene sets, while the performance considerably depends on the selection of prior knowledge. Results In this paper, we propose a data-driven construction method for survival risk-gene networks as well as a survival risk prediction method using the network structure. The proposed method constructs risk-gene networks with survival-associated genes using penalized regression. Then, gene expression indices are hierarchically adjusted through the networks to reduce the variance intrinsic in datasets. By illustrating risk-gene structure, the proposed method is expected to provide an intuition for the relationship between genes and survival risks. The risk-gene network is applied to a low grade glioma dataset, and produces a hypothesis of the relationship between genetic biomarkers of low and high grade glioma. Moreover, with multiple datasets, we demonstrate that the proposed method shows superior prediction performance compared to other conventional methods. Availability and implementation The R package of risk-gene networks is freely available in the web at http://cdal.korea.ac.kr/NetDA/. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (13) ◽  
pp. 4070-4079 ◽  
Author(s):  
Evangelos Karatzas ◽  
Margarita Zachariou ◽  
Marilena M Bourdakou ◽  
George Minadakis ◽  
Anastasis Oulas ◽  
...  

Abstract Motivation Understanding the underlying biological mechanisms and respective interactions of a disease remains an elusive, time consuming and costly task. Computational methodologies that propose pathway/mechanism communities and reveal respective relationships can be of great value as they can help expedite the process of identifying how perturbations in a single pathway can affect other pathways. Results We present a random-walks-based methodology called PathWalks, where a walker crosses a pathway-to-pathway network under the guidance of a disease-related map. The latter is a gene network that we construct by integrating multi-source information regarding a specific disease. The most frequent trajectories highlight communities of pathways that are expected to be strongly related to the disease under study. We apply the PathWalks methodology on Alzheimer's disease and idiopathic pulmonary fibrosis and establish that it can highlight pathways that are also identified by other pathway analysis tools as well as are backed through bibliographic references. More importantly, PathWalks produces additional new pathways that are functionally connected with those already established, giving insight for further experimentation. Availability and implementation https://github.com/vagkaratzas/PathWalks. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document