Prediction of survival risks with adjusted gene expression through risk-gene networks

2019 ◽  
Vol 35 (23) ◽  
pp. 4898-4906
Author(s):  
Minhyeok Lee ◽  
Sung Won Han ◽  
Junhee Seok

Abstract Motivation Network-based analysis of biomedical data has been extensively studied over the last decades. As a successful application, gene networks have been used to illustrate interactions among genes and explain the associated phenotypes. However, the gene network approaches have not been actively applied for survival analysis, which is one of the main interests of biomedical research. In addition, a few previous studies using gene networks for survival analysis construct networks mainly from prior knowledge, such as pathways, regulations and gene sets, while the performance considerably depends on the selection of prior knowledge. Results In this paper, we propose a data-driven construction method for survival risk-gene networks as well as a survival risk prediction method using the network structure. The proposed method constructs risk-gene networks with survival-associated genes using penalized regression. Then, gene expression indices are hierarchically adjusted through the networks to reduce the variance intrinsic in datasets. By illustrating risk-gene structure, the proposed method is expected to provide an intuition for the relationship between genes and survival risks. The risk-gene network is applied to a low grade glioma dataset, and produces a hypothesis of the relationship between genetic biomarkers of low and high grade glioma. Moreover, with multiple datasets, we demonstrate that the proposed method shows superior prediction performance compared to other conventional methods. Availability and implementation The R package of risk-gene networks is freely available in the web at http://cdal.korea.ac.kr/NetDA/. Supplementary information Supplementary data are available at Bioinformatics online.

2020 ◽  
Vol 36 (12) ◽  
pp. 3916-3917 ◽  
Author(s):  
Daniele Mercatelli ◽  
Gonzalo Lopez-Garcia ◽  
Federico M Giorgi

Abstract Motivation Gene network inference and master regulator analysis (MRA) have been widely adopted to define specific transcriptional perturbations from gene expression signatures. Several tools exist to perform such analyses but most require a computer cluster or large amounts of RAM to be executed. Results We developed corto, a fast and lightweight R package to infer gene networks and perform MRA from gene expression data, with optional corrections for copy-number variations and able to run on signatures generated from RNA-Seq or ATAC-Seq data. We extensively benchmarked it to infer context-specific gene networks in 39 human tumor and 27 normal tissue datasets. Availability and implementation Cross-platform and multi-threaded R package on CRAN (stable version) https://cran.r-project.org/package=corto and Github (development release) https://github.com/federicogiorgi/corto. Supplementary information Supplementary data are available at Bioinformatics online.


2012 ◽  
Vol 07 (01n02) ◽  
pp. 41-70 ◽  
Author(s):  
JASON SHULMAN ◽  
LARS SEEMANN ◽  
GREGG W. ROMAN ◽  
GEMUNU H. GUNARATNE

Networks are used to abstract large, highly-coupled sets of objects. Their analyses have included network classification into a few broad classes and selection of small substructures that perform simple yet common tasks. One issue that has received little attention is how the state of a network can be moved according to a pre-specified set of guidelines. In this paper, we address this question in the context of gene networks. In general, neither the full membership of the gene network associated with a biological process nor the precise form of interactions between nodes is known. What is available, through microarrays or sequencing, are gene expression profiles of an organism or its viable mutants. Our approach relies only on these expression profiles, and not on the availability of an accurate model for the network. The first step is to select a small set of core- or master- nodes, such as transcription factors or microRNAs, that can be used to alter the levels of many of the remaining genes in the network. We ask how the state — or solution — of the gene network changes as the levels of these master nodes are altered externally. The object of our study is, not the network, but the surface of these solutions. We argue that it can be approximated using gene expression profiles of the organism and single manipulation of master node activity. This is done through an "effective model." The effective model as well as error estimates for its predictions can be derived from experimental data. The method is validated using synthetic gene networks that have stationary solutions and those that are periodically driven, e.g., circadian networks. An effective model for the oxygen-deprivation network of E.coli is constructed using previously published gene expression profiles, and used to predict the expression levels in a double knockout mutant. Less that 30% of the predictions lie outside the 5% confidence level. We propose the use of the effective model methodology to compute how Drosophila melanogaster in the normal state can be genetically altered into a pre-defined sleep deprived-like state.


Author(s):  
Daniele Mercatelli ◽  
Gonzalo Lopez-Garcia ◽  
Federico M. Giorgi

AbstractMotivationGene Network Inference and Master Regulator Analysis (MRA) have been widely adopted to define specific transcriptional perturbations from gene expression signatures. Several tools exist to perform such analyses, but most require a computer cluster or large amounts of RAM to be executed.ResultsWe developed corto, a fast and lightweight R package to infer gene networks and perform MRA from gene expression data, with optional corrections for Copy Number Variations (CNVs) and able to run on signatures generated from RNA-Seq or ATAC-Seq data. We extensively benchmarked it to infer context-specific gene networks in 39 human tumor and 27 normal tissue datasets.AvailabilityCross-platform and multi-threaded R package on CRAN (stable version) https://cran.rproject.org/package=corto and Github (development release) https://github.com/federicogiorgi/[email protected]


2019 ◽  
Author(s):  
Yi Yang ◽  
Xingjie Shi ◽  
Yuling Jiao ◽  
Jian Huang ◽  
Min Chen ◽  
...  

AbstractMotivationAlthough genome-wide association studies (GWAS) have deepened our understanding of the genetic architecture of complex traits, the mechanistic links that underlie how genetic variants cause complex traits remains elusive. To advance our understanding of the underlying mechanistic links, various consortia have collected a vast volume of genomic data that enable us to investigate the role that genetic variants play in gene expression regulation. Recently, a collaborative mixed model (CoMM) [42] was proposed to jointly interrogate genome on complex traits by integrating both the GWAS dataset and the expression quantitative trait loci (eQTL) dataset. Although CoMM is a powerful approach that leverages regulatory information while accounting for the uncertainty in using an eQTL dataset, it requires individual-level GWAS data and cannot fully make use of widely available GWAS summary statistics. Therefore, statistically efficient methods that leverages transcriptome information using only summary statistics information from GWAS data are required.ResultsIn this study, we propose a novel probabilistic model, CoMM-S2, to examine the mechanistic role that genetic variants play, by using only GWAS summary statistics instead of individual-level GWAS data. Similar to CoMM which uses individual-level GWAS data, CoMM-S2 combines two models: the first model examines the relationship between gene expression and genotype, while the second model examines the relationship between the phenotype and the predicted gene expression from the first model. Distinct from CoMM, CoMM-S2 requires only GWAS summary statistics. Using both simulation studies and real data analysis, we demonstrate that even though CoMM-S2 utilizes GWAS summary statistics, it has comparable performance as CoMM, which uses individual-level GWAS [email protected] and implementationThe implement of CoMM-S2 is included in the CoMM package that can be downloaded from https://github.com/gordonliu810822/CoMM.Supplementary informationSupplementary data are available at Bioinformatics online.


F1000Research ◽  
2022 ◽  
Vol 9 ◽  
pp. 1159
Author(s):  
Qian (Vicky) Wu ◽  
Wei Sun ◽  
Li Hsu

Gene expression data have been used to infer gene-gene networks (GGN) where an edge between two genes implies the conditional dependence of these two genes given all the other genes. Such gene-gene networks are of-ten referred to as gene regulatory networks since it may reveal expression regulation. Most of existing methods for identifying GGN employ penalized regression with L1 (lasso), L2 (ridge), or elastic net penalty, which spans the range of L1 to L2 penalty. However, for high dimensional gene expression data, a penalty that spans the range of L0 and L1 penalty, such as the log penalty, is often needed for variable selection consistency. Thus, we develop a novel method that em-ploys log penalty within the framework of an earlier network identification method space (Sparse PArtial Correlation Estimation), and implement it into a R package space-log. We show that the space-log is computationally efficient (source code implemented in C), and has good performance comparing with other methods, particularly for networks with hubs.Space-log is open source and available at GitHub, https://github.com/wuqian77/SpaceLog


2018 ◽  
Vol 35 (13) ◽  
pp. 2226-2234 ◽  
Author(s):  
Ameen Eetemadi ◽  
Ilias Tagkopoulos

Abstract Motivation Gene expression prediction is one of the grand challenges in computational biology. The availability of transcriptomics data combined with recent advances in artificial neural networks provide an unprecedented opportunity to create predictive models of gene expression with far reaching applications. Results We present the Genetic Neural Network (GNN), an artificial neural network for predicting genome-wide gene expression given gene knockouts and master regulator perturbations. In its core, the GNN maps existing gene regulatory information in its architecture and it uses cell nodes that have been specifically designed to capture the dependencies and non-linear dynamics that exist in gene networks. These two key features make the GNN architecture capable to capture complex relationships without the need of large training datasets. As a result, GNNs were 40% more accurate on average than competing architectures (MLP, RNN, BiRNN) when compared on hundreds of curated and inferred transcription modules. Our results argue that GNNs can become the architecture of choice when building predictors of gene expression from exponentially growing corpus of genome-wide transcriptomics data. Availability and implementation https://github.com/IBPA/GNN Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (24) ◽  
pp. 5137-5145 ◽  
Author(s):  
Onur Dereli ◽  
Ceyda Oğuz ◽  
Mehmet Gönen

Abstract Motivation Survival analysis methods that integrate pathways/gene sets into their learning model could identify molecular mechanisms that determine survival characteristics of patients. Rather than first picking the predictive pathways/gene sets from a given collection and then training a predictive model on the subset of genomic features mapped to these selected pathways/gene sets, we developed a novel machine learning algorithm (Path2Surv) that conjointly performs these two steps using multiple kernel learning. Results We extensively tested our Path2Surv algorithm on 7655 patients from 20 cancer types using cancer-specific pathway/gene set collections and gene expression profiles of these patients. Path2Surv statistically significantly outperformed survival random forest (RF) on 12 out of 20 datasets and obtained comparable predictive performance against survival support vector machine (SVM) using significantly fewer gene expression features (i.e. less than 10% of what survival RF and survival SVM used). Availability and implementation Our implementations of survival SVM and Path2Surv algorithms in R are available at https://github.com/mehmetgonen/path2surv together with the scripts that replicate the reported experiments. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Jinhui Zhang ◽  
Ting Wang ◽  
Xinghao Yu ◽  
Shuiping Huang ◽  
Huashuo Zhao ◽  
...  

Abstract Background:Multiple genes were previously identified to be associated with cervical cancer; however, the genetic architecture of cervical cancer remains unknown and many causal genes have yet been discovered.Methods: To explore causal genes related to cervical cancer, a two-stage causal inference approach was proposed within the framework of Mendelian randomization, where the gene expression was treated as exposure, with methylations located within that gene serving as instrumental variables. Five prediction models were first utilized to characterize the relationship between the expression and methylations for each gene; then the methylation-regulated gene expression (MReX) was obtained and the association was evaluated via Cox mixed-effects model based on MReX. We further implemented the harmonic mean p-value (HMP) combination to take advantage of respective strengths of these prediction models while accounting for dependency among the p-values.Results: A total of 14 causal genes were discovered to be associated with the survival risk of cervical cancer in TCGA when the five prediction models were separately employed. The total number of causal genes was brought to 23 when conducting HMP. Some of the newly discovered genes may be novel (e.g. YJEFN3, SPATA5L1, IMMP1L, C5orf55, PPIP5K2, ZNF330, CRYZL1, PPM1A, ESCO2, ZNF605, ZNF225, ZNF266, FICD and OSTC). Functional analyses showed these genes were enriched in tumor-associated pathways. Additionally, four genes (i.e. COL6A1, SYDE1, ESCO2 and GIPC1) were differentially expressed.Conclusion: Overall, our study discovered promising candidate genes that are causally associated with the survival risk of cervical cancer and thus provided new insights into the genetic etiology of cervical cancer.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 1159
Author(s):  
Qian (Vicky) Wu ◽  
Wei Sun ◽  
Li Hsu

Gene expression data have been used to infer gene-gene networks (GGN) where an edge between two genes implies the conditional dependence of these two genes given all the other genes. Such gene-gene networks are of-ten referred to as gene regulatory networks since it may reveal expression regulation. Most of existing methods for identifying GGN employ penalized regression with L1 (lasso), L2 (ridge), or elastic net penalty, which spans the range of L1 to L2 penalty. However, for high dimensional gene expression data, a penalty that spans the range of L0 and L1 penalty, such as the log penalty, is often needed for variable selection consistency. Thus, we develop a novel method that em-ploys log penalty within the framework of an earlier network identification method space (Sparse PArtial Correlation Estimation), and implement it into a R package space-log. We show that the space-log is computationally efficient (source code implemented in C), and has good performance comparing with other methods, particularly for networks with hubs.Space-log is open source and available at GitHub, https://github.com/wuqian77/SpaceLog


Blood ◽  
2005 ◽  
Vol 106 (11) ◽  
pp. 2381-2381
Author(s):  
Tomoiku Takaku ◽  
Junko H. Ohyashiki ◽  
Yu Zhang ◽  
Kazuma Ohyashiki

Abstract The immune response to viral infection involves complex network of dynamic gene and protein interactions. Comprehensive gene expression analysis of the host immune response against viruses has been extensively studied, however, the mechanism of virus-induced immune response is not completely understood. This might be due in part to the difficulty of finding pathologically relevant genes, despite the fact that DNA microarray technology can simultaneously monitor the expression of thousands of genes. Likewise, it is hard to estimate how each gene interferes during the viral infection. Thus, construction of gene networks from microarray gene expression data is becoming an important challenge in the post-genome era. Human herpesvirus 6 (HHV-6) is a β-herpesvirus that is closely related to human cytomegalovirus. HHV-6 shows a predominant tropism for CD4 T lymphocytes, on which it exerts marked cytopathic effects. Understanding of the clinical spectrum of HHV-6 is still evolving, however, in vitro interactions between HHV-6 and other viruses, such as the human immunodeficiency virus (HIV), and their relevance to the in vivo situation has become increasingly apparent. We present here the dynamic gene network of the host immune response during human herpesvirus type 6 (HHV-6) infection in an adult T cell leukemia (ATL) cell line. Using a pathway-focused oligonucleotide DNA microarray, we found a possible association between chemokine genes regulating Th1/Th2 balance and genes regulating T-cell proliferation during HHV-6B infection. Gene network analysis using an integrated comprehensive workbench, VoyaGene® revealed that a gene encoding a TEC-family kinase, ITK, might be a putative modulator in the host immune response against HHV-6B infection. We conclude that Th2-dominated inflammatory reaction in host cells may play an important role in HHV-6B infected T cells, thereby suggesting the possibility that ITK might be a therapeutic target in diseases related to dysregulation of Th1/Th2 balance. This study describes a novel approach to find genes related with the complex host-virus interaction using microarray data employing the Bayesian statistical framework. Figure Figure


Sign in / Sign up

Export Citation Format

Share Document