Gene prioritization based on random walks with restarts and absorbing states, to define gene sets regulating drug pharmacodynamics from single-cell analyses

AbstractMotivationPrioritizing genes for their role in drug sensitivity, is an important step in understanding drugs mechanisms of action and discovering new molecular targets for co-treatment. To formalize this problem, we consider two sets of genes X and P respectively composing the predictive gene signature of sensitivity to a drug and the genes involved in its mechanism of action, as well as a protein interaction network (PPIN) containing the products of X and P as nodes. We introduce GENetRank, a method to prioritize the genes in X for their likelihood to regulate the genes in P.ResultsGENetRank uses asymmetric random walks with restarts and absorbing states to focus on certain nodes of the PPIN, as well as novel saturation indices providing insights on the visited regions of the PPIN. Using MINT as underlying network, we apply GENetRank to a predicitive gene signature of cancer cells sensitivity to tumor-necrosis-factor-related apoptosis-inducing ligand (TRAIL), performed in single-cells. Our ranking provides biological insights on drug sensitivity and a gene set considerably enriched in genes regulating TRAIL pharmacodynamics when compared to the most significant differentially expressed genes obtained from a statistical analysis framework alone. We also introduce gene expression radars, a visualization tool to assess all pairwise interactions at a glance.Availability and ImplementationGENetRank is made available in the Structural Bioinformatics Library (https://sbl.inria.fr/doc/Genetrank-user-manual.html). It should prove useful for mining gene sets in conjunction with a signaling pathway, whenever other approaches yield relatively large sets of genes.

Download Full-text

Identification and validation of a hub gene prognostic index for hepatocellular carcinoma

Future Oncology ◽

10.2217/fon-2020-1112 ◽

2021 ◽

Author(s):

Q Shi ◽

Z Meng ◽

XX Tian ◽

YF Wang ◽

WH Wang

Keyword(s):

Hepatocellular Carcinoma ◽

Cox Regression ◽

Prognostic Index ◽

Interaction Network ◽

Gene Signature ◽

Gene Expression Omnibus ◽

Cox Regression Analysis ◽

Protein Protein Interaction ◽

Normal Tissues ◽

Significant Independent Prognostic Factor

Aims: We aim to provide new insights into the mechanisms of hepatocellular carcinoma (HCC) and identify key genes as biomarkers for the prognosis of HCC. Materials & methods: Differentially expressed genes between HCC tissues and normal tissues were identified via the Gene Expression Omnibus tool. The top ten hub genes screened by the degree of the protein nodes in the protein–protein interaction network also showed significant associations with overall survival in HCC patients. Results: A prognostic model containing a five-gene signature was constructed to predict the prognosis of HCC via multivariate Cox regression analysis. Conclusion: This study identified a novel five-gene signature ( CDK1, CCNB1, CCNB2, BUB1 and KIF11) as a significant independent prognostic factor.

Download Full-text

Who is the boss? Identifying key roles in telecom fraud network via centrality-guided deep random walk

Data Technologies and Applications ◽

10.1108/dta-05-2020-0103 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Yi-Chun Chang ◽

Kuan-Ting Lai ◽

Seng-Cho T. Chou ◽

Wei-Chuan Chiang ◽

Yuan-Chen Lin

Keyword(s):

Social Network ◽

Random Walk ◽

Social Network Analysis ◽

Network Analysis ◽

Random Walks ◽

Interpersonal Relationships ◽

Interaction Network ◽

Data Set ◽

Content Type ◽

Structural Equivalence

PurposeTelecommunication (telecom) fraud is one of the most common crimes and causes the greatest financial losses. To effectively eradicate fraud groups, the key fraudsters must be identified and captured. One strategy is to analyze the fraud interaction network using social network analysis. However, the underlying structures of fraud networks are different from those of common social networks, which makes traditional indicators such as centrality not directly applicable. Recently, a new line of research called deep random walk has emerged. These methods utilize random walks to explore local information and then apply deep learning algorithms to learn the representative feature vectors. Although effective for many types of networks, random walk is used for discovering local structural equivalence and does not consider the global properties of nodes.Design/methodology/approachThe authors proposed a new method to combine the merits of deep random walk and social network analysis, which is called centrality-guided deep random walk. By using the centrality of nodes as edge weights, the authors’ biased random walks implicitly consider the global importance of nodes and can thus find key fraudster roles more accurately. To evaluate the authors’ algorithm, a real telecom fraud data set with around 562 fraudsters was built, which is the largest telecom fraud network to date.FindingsThe authors’ proposed method achieved better results than traditional centrality indices and various deep random walk algorithms and successfully identified key roles in a fraud network.Research limitations/implicationsThe study used co-offending and flight record to construct a criminal network, more interpersonal relationships of fraudsters, such as friendships and relatives, can be included in the future.Originality/valueThis paper proposed a novel algorithm, centrality-guided deep random walk, and applied it to a new telecom fraud data set. Experimental results show that the authors’ method can successfully identify the key roles in a fraud group and outperform other baseline methods. To the best of the authors’ knowledge, it is the largest analysis of telecom fraud network to date.

Download Full-text

The REST Gene Signature Predicts Drug Sensitivity in Neuroblastoma Cell Lines and Is Significantly Associated with Neuroblastoma Tumor Stage

International Journal of Molecular Sciences ◽

10.3390/ijms150711220 ◽

2014 ◽

Vol 15 (7) ◽

pp. 11220-11233 ◽

Cited By ~ 16

Author(s):

Jianfeng Liang ◽

Pan Tong ◽

Wanni Zhao ◽

Yaqiao Li ◽

Li Zhang ◽

...

Keyword(s):

Cell Lines ◽

Drug Sensitivity ◽

Neuroblastoma Cell ◽

Tumor Stage ◽

Gene Signature ◽

Neuroblastoma Tumor ◽

Neuroblastoma Cell Lines

Download Full-text

Gene set inference from single-cell sequencing data using a hybrid of matrix factorization and variational autoencoders

10.1101/740415 ◽

2019 ◽

Author(s):

Soeren Lukassen ◽

Foo Wei Ten ◽

Roland Eils ◽

Christian Conrad

Keyword(s):

Neural Network ◽

Single Cell ◽

Network Model ◽

Neural Network Model ◽

Matrix Factorization ◽

Latent Variable ◽

Single Cells ◽

Sequencing Data ◽

Gene Set ◽

Gene Sets

AbstractRecent advances in single-cell RNA sequencing (scRNA-Seq) have driven the simultaneous measurement of the expression of 1,000s of genes in 1,000s of single cells. These growing data sets allow us to model gene sets in biological networks at an unprecedented level of detail, in spite of heterogenous cell populations. Here, we propose an unsupervised deep neural network model that is a hybrid of matrix factorization and conditional variational autoencoders (CVA), which utilizes weights as matrix factorizations to obtain gene sets, while class-specific inputs to the latent variable space facilitate a plausible identification of cell types. This artificial neural network model seamlessly integrates functional gene set inference, experimental batch effect correction, and static gene identification, which we conceptually prove here for three single-cell RNA-Seq datasets and suggest for future single-cell-gene analytics.

Download Full-text

Encircling the regions of the pharmacogenomic landscape that determine drug response

10.1101/383588 ◽

2018 ◽

Cited By ~ 2

Author(s):

Adrià Fernández-Torras ◽

Miquel Duran-Frigola ◽

Patrick Aloy

Keyword(s):

Large Scale ◽

Drug Response ◽

Drug Sensitivity ◽

Drug Repositioning ◽

Gene Sets ◽

Diffusion Analysis ◽

Genome Wide ◽

Molecular Determinants ◽

Gene Modules

AbstractBackgroundThe integration of large-scale drug sensitivity screens and genome-wide experiments is changing the field of pharmacogenomics, revealing molecular determinants of drug response without the need for previous knowledge about drug action. In particular, transcriptional signatures of drug sensitivity may guide drug repositioning, prioritize drug combinations and point to new therapeutic biomarkers. However, the inherent complexity of transcriptional signatures, with thousands of differentially expressed genes, makes them hard to interpret, thus giving poor mechanistic insights and hampering translation to clinics.MethodsTo simplify drug signatures, we have developed a network-based methodology to identify functionally coherent gene modules. Our strategy starts with the calculation of drug-gene correlations and is followed by a pathway-oriented filtering and a network-diffusion analysis across the interactome.ResultsWe apply our approach to 189 drugs tested in 671 cancer cell lines and observe a connection between gene expression levels of the modules and mechanisms of action of the drugs. Further, we characterize multiple aspects of the modules, including their functional categories, tissue-specificity and prevalence in clinics. Finally, we prove the predictive capability of the modules and demonstrate how they can be used as gene sets in conventional enrichment analyses.ConclusionsNetwork biology strategies like module detection are able to digest the outcome of large-scale pharmacogenomic initiatives, thereby contributing to their interpretability and improving the characterization of the drugs screened.

Download Full-text

Scattome: A Single-Cell Analysis of Targeted Transcriptome Program to Predict Drug Sensitivity of Single Cells within Human Myeloma Tumors

Blood ◽

10.1182/blood.v126.23.4249.4249 ◽

2015 ◽

Vol 126 (23) ◽

pp. 4249-4249

Author(s):

Amit Kumar Mitra ◽

Ujjal Mukherjee ◽

Taylor Harding ◽

Holly Stessman ◽

Ying Li ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Lines ◽

Research Funding ◽

Drug Response ◽

Drug Sensitivity ◽

Single Cell Analysis ◽

Single Cells ◽

Pcr Analysis ◽

Cell Analysis

Abstract Multiple myeloma (MM) is characterized by significant genetic diversity at subclonal levels that likely plays a defining role in the heterogeneity of tumor progression, clinical aggressiveness and drug sensitivity. Such heterogeneity is a driving factor in the evolution of MM, from founder clones through outgrowth of subclonal fractions. DNA Sequencing studies on MM samples have indeed demonstrated such heterogeneity in subclonal architecture at diagnosis based on recurrent mutations in pathologically relevant genes that may ultimately to lead to relapse. However, no study so far has reported a predictive gene expression signature that can identify, distinguish and quantify drug sensitive and drug-resistant subpopulations within a bulk population of myeloma cells. In recent years, our laboratory has successfully developed a gene expression profile (GEP)-based signature that could not only distinguish drug response of MM cell lines, but also was effective in stratifying patient outcomes when applied to GEP profiles from MM clinical trials using proteasome inhibitors (PI) as chemotherapeutic agents. Further, we noted myeloma cell lines that responded to the drug often contained residual sub-population of cells that did not respond, and likely were selectively propagated during drug treatment in vitro, and in patients. In this study, we performed targeted qRT-PCR analysis of single cells using a gene panel that included PI sensitivity genes and gene signatures that could discriminate between low and high-risk myeloma followed by intensive bioinformatics and statistical analysis for the classification and prediction of PI response in individual cells within bulk multiple myeloma tumors. Fluidigm's C1 Single-Cell Auto Prep System was used to perform automated single-cell capture, processing and cDNA synthesis on 576 pre-treatment cells from 12 cell lines representing a wide range of PI-sensitivity and 370 cells from 7 patient samples undergoing PI treatment followed by targeted gene expression profiling of single cells using automated, high-throughput on-chip qRT-PCR analysis using 96.96 Dynamic Array IFCs on the BioMark HD System. Probability of resistance for each individual cell was predicted using a pipeline that employed the machine learning methods Random Forest, Support Vector Machine (radial and sigmoidal), LASSO and kNN (k Nearest Neighbor) for making single-cell GEP data-driven predictions/ decisions. The weighted probabilities from each of the algorithms were used to quantify resistance of each individual cell and plotted using Ensemble forecasting algorithm. Using our drug response GEP signature at the single cell level, we could successfully identify distinct subpopulations of tumor cells that were predicted to be sensitive or resistant to PIs. Subsequently, we developed a R Statistical analysis package (http://cran.r-project.org), SCATTome (Single Cell Analysis of Targeted Transcriptome), that can restructure data obtained from Fluidigm qPCR analysis run, filter missing data, perform scaling of filtered data, build classification models and successfully predict drug response of individual cells and classify each cell's probability of response based on the targeted transcriptome. We will present the program output as graphical displays of single cell response probabilities. This package provides a novel classification method that has the potential to predict subclonal response to a variety of therapeutic agents. Disclosures Kumar: Skyline: Consultancy, Honoraria; BMS: Consultancy; Onyx: Consultancy, Research Funding; Sanofi: Consultancy, Research Funding; Janssen: Consultancy, Research Funding; Novartis: Research Funding; Takeda: Consultancy, Research Funding; Celgene: Consultancy, Research Funding.

Download Full-text

Regulatory T Cell-Related Gene Biomarkers in the Deterioration of Atherosclerosis

Frontiers in Cardiovascular Medicine ◽

10.3389/fcvm.2021.661709 ◽

2021 ◽

Vol 8 ◽

Author(s):

Meng Xia ◽

Qingmeng Wu ◽

Pengfei Chen ◽

Cheng Qian

Keyword(s):

Gene Expression ◽

Cardiovascular Events ◽

Molecular Mechanisms ◽

Inflammatory Responses ◽

Expression Patterns ◽

Interaction Network ◽

Gene Signature ◽

Smooth Muscle Contraction ◽

Gene Expression Omnibus ◽

Scoring Method

Background: Regulatory T cells (Tregs) have shown to be protective against the development of atherosclerosis, a major pathological cause for cardiovascular events. Here, we aim to explore the roles of Tregs-related genes in atherosclerosis deterioration.Methods and Results: We downloaded the gene expression profile of 29 atherosclerotic samples from the Gene Expression Omnibus database with an accession number of GSE28829. The abundance of Tregs estimated by the CIBERSORT algorithm was negatively correlated with the atherosclerotic stage. Using the limma test and correlation analysis, a total of 159 differentially expressed Tregs-related genes (DETregRGs) between early and advanced atherosclerotic plaques were documented. Functional annotation analysis using the DAVID tool indicated that the DETregRGs were mainly enriched in inflammatory responses, immune-related mechanisms, and pathways such as complement and coagulation cascades, platelet activation, leukocyte trans-endothelial migration, vascular smooth muscle contraction, and so on. A protein-protein interaction network of the DETregRGs was then constructed, and five hub genes (PTPRC, C3AR1, CD53, TLR2, and CCR1) were derived from the network with node degrees ≥20. The expression patterns of these hub DETregRGs were further validated in several independent datasets. Finally, a single sample scoring method was used to build a gene signature for the five DETregRGs, which could distinguish patients with myocardial infarction from those with stable coronary disease.Conclusion: The results of this study will improve our understanding about the Tregs-associated molecular mechanisms in the progression of atherosclerosis and facilitate the discovery of novel biomarkers for acute cardiovascular events.

Download Full-text

Detecting cancer vulnerabilities through gene networks under purifying selection in 4,700 cancer genomes

10.1101/222687 ◽

2017 ◽

Author(s):

Anika Gupta ◽

Heiko Horn ◽

Parisa Razaz ◽

April Kim ◽

Michael Lawrence ◽

...

Keyword(s):

Gene Networks ◽

Large Scale ◽

Significant Proportion ◽

Low Frequency ◽

Interaction Network ◽

Purifying Selection ◽

Sequencing Data ◽

Gene Sets ◽

Sequencing Studies ◽

Significant Enrichment

ABSTRACTLarge-scale cancer sequencing studies have uncovered dozens of mutations critical to cancer initiation and progression. However, a significant proportion of genes linked to tumor propagation remain hidden, often due to noise in sequencing data confounding low frequency alterations. Further, genes in networks under purifying selection (NPS), or those that are mutated in cancers less frequently than would be expected by chance, may play crucial roles in sustaining cancers but have largely been overlooked. We describe here a statistical framework that identifies genes that have a first order protein interaction network significantly depleted for mutations, to elucidate key genetic contributors to cancers. Not reliant on and thus, unbiased by, the gene of interest’s mutation rate, our approach has identified 685 putative genes linked to cancer development. Comparative analysis indicates statistically significant enrichment of NPS genes in previously validated cancer vulnerability gene sets, while further identifying novel cancer-specific candidate gene targets. As more tumor genomes are sequenced, integrating systems level mutation data through this network approach should become increasingly useful in pinpointing gene targets for cancer diagnosis and treatment.

Download Full-text

Network-based Biased Tree Ensembles (NetBiTE) for Drug Sensitivity Prediction and Drug Sensitivity Biomarker Identification in Cancer

Scientific Reports ◽

10.1038/s41598-019-52093-w ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 6

Author(s):

Ali Oskooei ◽

Matteo Manica ◽

Roland Mathis ◽

María Rodríguez Martínez

Keyword(s):

Drug Administration ◽

Weight Distribution ◽

Drug Targets ◽

Drug Sensitivity ◽

Target Genes ◽

Computational Cost ◽

Interaction Network ◽

Biomarker Identification ◽

Drug Sensitivity Prediction ◽

Bias Weight

Abstract We present the Network-based Biased Tree Ensembles (NetBiTE) method for drug sensitivity prediction and drug sensitivity biomarker identification in cancer using a combination of prior knowledge and gene expression data. Our devised method consists of a biased tree ensemble that is built according to a probabilistic bias weight distribution. The bias weight distribution is obtained from the assignment of high weights to the drug targets and propagating the assigned weights over a protein-protein interaction network such as STRING. The propagation of weights, defines neighborhoods of influence around the drug targets and as such simulates the spread of perturbations within the cell, following drug administration. Using a synthetic dataset, we showcase how application of biased tree ensembles (BiTE) results in significant accuracy gains at a much lower computational cost compared to the unbiased random forests (RF) algorithm. We then apply NetBiTE to the Genomics of Drug Sensitivity in Cancer (GDSC) dataset and demonstrate that NetBiTE outperforms RF in predicting IC50 drug sensitivity, only for drugs that target membrane receptor pathways (MRPs): RTK, EGFR and IGFR signaling pathways. We propose based on the NetBiTE results, that for drugs that inhibit MRPs, the expression of target genes prior to drug administration is a biomarker for IC50 drug sensitivity following drug administration. We further verify and reinforce this proposition through control studies on, PI3K/MTOR signaling pathway inhibitors, a drug category that does not target MRPs, and through assignment of dummy targets to MRP inhibiting drugs and investigating the variation in NetBiTE accuracy.

Download Full-text

Characterizing gene sets using discriminative random walks with restart on heterogeneous biological networks

Bioinformatics ◽

10.1093/bioinformatics/btw151 ◽

2016 ◽

Vol 32 (14) ◽

pp. 2167-2175 ◽

Cited By ~ 27

Author(s):

Charles Blatti ◽

Saurabh Sinha

Keyword(s):

Random Walks ◽

Biological Networks ◽

Gene Sets

Download Full-text