Integration of gene interaction information into a reweighted Lasso-Cox model for accurate survival prediction

Author(s):  
Wei Wang ◽  
Wei Liu

Abstract Motivation Accurately predicting the risk of cancer patients is a central challenge for clinical cancer research. For high-dimensional gene expression data, Cox proportional hazard model with the least absolute shrinkage and selection operator for variable selection (Lasso-Cox) is one of the most popular feature selection and risk prediction algorithms. However, the Lasso-Cox model treats all genes equally, ignoring the biological characteristics of the genes themselves. This often encounters the problem of poor prognostic performance on independent datasets. Results Here, we propose a Reweighted Lasso-Cox (RLasso-Cox) model to ameliorate this problem by integrating gene interaction information. It is based on the hypothesis that topologically important genes in the gene interaction network tend to have stable expression changes. We used random walk to evaluate the topological weight of genes, and then highlighted topologically important genes to improve the generalization ability of the RLasso-Cox model. Experiments on datasets of three cancer types showed that the RLasso-Cox model improves the prognostic accuracy and robustness compared with the Lasso-Cox model and several existing network-based methods. More importantly, the RLasso-Cox model has the advantage of identifying small gene sets with high prognostic performance on independent datasets, which may play an important role in identifying robust survival biomarkers for various cancer types. Availability and implementation http://bioconductor.org/packages/devel/bioc/html/RLassoCox.html Supplementary information Supplementary data are available at Bioinformatics online.

Author(s):  
Moritz Herrmann ◽  
Philipp Probst ◽  
Roman Hornung ◽  
Vindi Jurinovic ◽  
Anne-Laure Boulesteix

Abstract Multi-omics data, that is, datasets containing different types of high-dimensional molecular variables, are increasingly often generated for the investigation of various diseases. Nevertheless, questions remain regarding the usefulness of multi-omics data for the prediction of disease outcomes such as survival time. It is also unclear which methods are most appropriate to derive such prediction models. We aim to give some answers to these questions through a large-scale benchmark study using real data. Different prediction methods from machine learning and statistics were applied on 18 multi-omics cancer datasets (35 to 1000 observations, up to 100 000 variables) from the database ‘The Cancer Genome Atlas’ (TCGA). The considered outcome was the (censored) survival time. Eleven methods based on boosting, penalized regression and random forest were compared, comprising both methods that do and that do not take the group structure of the omics variables into account. The Kaplan–Meier estimate and a Cox model using only clinical variables were used as reference methods. The methods were compared using several repetitions of 5-fold cross-validation. Uno’s C-index and the integrated Brier score served as performance metrics. The results indicate that methods taking into account the multi-omics structure have a slightly better prediction performance. Taking this structure into account can protect the predictive information in low-dimensional groups—especially clinical variables—from not being exploited during prediction. Moreover, only the block forest method outperformed the Cox model on average, and only slightly. This indicates, as a by-product of our study, that in the considered TCGA studies the utility of multi-omics data for prediction purposes was limited. Contact:[email protected], +49 89 2180 3198 Supplementary information: Supplementary data are available at Briefings in Bioinformatics online. All analyses are reproducible using R code freely available on Github.


2020 ◽  
Vol 36 (16) ◽  
pp. 4466-4472 ◽  
Author(s):  
Tianyi Zhao ◽  
Yang Hu ◽  
Jiajie Peng ◽  
Liang Cheng

Abstract Motivation Although long non-coding RNAs (lncRNAs) have limited capacity for encoding proteins, they have been verified as biomarkers in the occurrence and development of complex diseases. Recent wet-lab experiments have shown that lncRNAs function by regulating the expression of protein-coding genes (PCGs), which could also be the mechanism responsible for causing diseases. Currently, lncRNA-related biological data are increasing rapidly. Whereas, no computational methods have been designed for predicting the novel target genes of lncRNA. Results In this study, we present a graph convolutional network (GCN) based method, named DeepLGP, for prioritizing target PCGs of lncRNA. First, gene and lncRNA features were selected, these included their location in the genome, expression in 13 tissues and miRNA-mediated lncRNA–gene pairs. Next, GCN was applied to convolve a gene interaction network for encoding the features of genes and lncRNAs. Then, these features were used by the convolutional neural network for prioritizing target genes of lncRNAs. In 10-cross validations on two independent datasets, DeepLGP obtained high area under curves (0.90–0.98) and area under precision-recall curves (0.91–0.98). We found that lncRNA pairs with high similarity had more overlapped target genes. Further experiments showed that genes targeted by the same lncRNA sets had a strong likelihood of causing the same diseases, which could help in identifying disease-causing PCGs. Availability and implementation https://github.com/zty2009/LncRNA-target-gene. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
pp. postgradmedj-2021-139981
Author(s):  
Shimin Tang ◽  
Hao Jiang ◽  
Zhijun Cao ◽  
Qiang Zhou

IntroductionProstate cancer is a common malignancy in men that is difficult to treat and carries a high risk of death. miR-219-5p is expressed in reduced amounts in many malignancies. However, the prognostic value of miR-219-5p for patients with prostate cancer remains unclear.MethodsWe retrospectively analysed data from 213 prostate cancer patients from 10 June 2012 to 9 May 2015. Overall survival was assessed by Kaplan-Meier analysis and Cox regression models. Besides, a prediction model was constructed, and calibration curves evaluated the model’s accuracy.ResultsOf the 213 patients, a total of 72 (33.8%) died and the median survival time was 60.0 months. We found by multifactorial analysis that miR-219-5p deficiency increased the risk of death by nearly fourfold (HR: 3.86, 95% CI): 2.01 to 7.44, p<0.001) and the risk of progression by twofold (HR: 2.79, 95% CI: 1.68 to 4.64, p<0.001). To quantify each covariate’s weight on prognosis, we screened variables by cox model to construct a predictive model. The Nomogram showed excellent accuracy in estimating death’s risk, with a corrected C-index of 0.778.ConclusionsmiR-219-5p can be used as a biomarker to predict death risk in prostate cancer patients. The mortality risk prediction model constructed based on miR-219-5p has good consistency and validity in assessing patient prognosis.


2021 ◽  
Vol 12 ◽  
Author(s):  
Genís Calderer ◽  
Marieke L. Kuijjer

Networks are useful tools to represent and analyze interactions on a large, or genome-wide scale and have therefore been widely used in biology. Many biological networks—such as those that represent regulatory interactions, drug-gene, or gene-disease associations—are of a bipartite nature, meaning they consist of two different types of nodes, with connections only forming between the different node sets. Analysis of such networks requires methodologies that are specifically designed to handle their bipartite nature. Community structure detection is a method used to identify clusters of nodes in a network. This approach is especially helpful in large-scale biological network analysis, as it can find structure in networks that often resemble a “hairball” of interactions in visualizations. Often, the communities identified in biological networks are enriched for specific biological processes and thus allow one to assign drugs, regulatory molecules, or diseases to such processes. In addition, comparison of community structures between different biological conditions can help to identify how network rewiring may lead to tissue development or disease, for example. In this mini review, we give a theoretical basis of different methods that can be applied to detect communities in bipartite biological networks. We introduce and discuss different scores that can be used to assess the quality of these community structures. We then apply a wide range of methods to a drug-gene interaction network to highlight the strengths and weaknesses of these methods in their application to large-scale, bipartite biological networks.


2019 ◽  
Vol 8 (11) ◽  
pp. 1799 ◽  
Author(s):  
Maria Bencivenga ◽  
Giuseppe Verlato ◽  
Valentina Mengardo ◽  
Lorenzo Scorsone ◽  
Michele Sacco ◽  
...  

Background: Although the Japan Clinical Oncology Group (JCOG) 9501 trial did not find that prophylactic D3 lymphadenectomy led to any survival advantage over D2 lymphadenectomy, it did find that the prognosis of subserosal and N0 gastric cancer patients improved. The aim of this retrospective observational study was to compare survival after D2 or D3 lymphadenectomy in different patient subgroups. Methods: The study considered all of the patients who underwent D2 or D3 lymphadenectomy at a high-volume center in Verona (Italy) between 1992 and 2011. After excluding patients with Bormann IV or neuroendocrine tumors, early gastric cancers, or non-curative resections, the analysis involved 301 R0 patients: 100 who underwent D2, and 201 who underwent D3 lymphadenectomy. Post-operative deaths and deaths due to recurrences were considered as terminal events in the survival analysis. Results: The D2 patients were significantly older than the D3 patients at baseline (69.8 ± 2.3 vs. 62.2 ± 10.7 years). The median number of retrieved nodes was 29 (interquartile range: 24.5–39) after D2, and 43 (34–52) after D3. The five-year disease-related survival rate was similar after D2 (44%, 95% confidence interval (CI) 34–54%) and D3 (41%, 34–48%) (p = 0.766). A Cox model controlling for sex, age, tumor site, Laurén histology, and T and N stages showed that the risk of cancer-related death after D3 was similar to that recorded after D2 (hazard ratio 0.97, 95% CI 0.67–1.42). There was a significant interaction between the T status and the extension of the lymphadenectomy (p = 0.012), with the prognosis being better after D2 in T2 and T4b patients, and after D3 in T3 patients. Conclusions: The findings of this study suggest that D3 lymphadenectomy is not routinely indicated for patients with advanced gastric cancer, although differences in survival after D3 across T tiers deserve further consideration.


2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Hao Yu ◽  
Yang Liu ◽  
Chao Li ◽  
Jianhao Wang ◽  
Bo Yu ◽  
...  

Background. Neuropathic pain (NP) is a devastating complication following nerve injury, and it can be alleviated by regulating neuroimmune direction. We aimed to explore the neuroimmune mechanism and identify some new diagnostic or therapeutic targets for NP treatment via bioinformatic analysis. Methods. The microarray GSE18803 was downloaded and analyzed using R. The Venn diagram was drawn to find neuroimmune-related differentially expressed genes (DEGs) in neuropathic pain. Gene Ontology (GO), pathway enrichment, and protein-protein interaction (PPI) network were used to analyze DEGs, respectively. Besides, the identified hub genes were submitted to the DGIdb database to find relevant therapeutic drugs. Results. A total of 91 neuroimmune-related DEGs were identified. The results of GO and pathway enrichment analyses were closely related to immune and inflammatory responses. PPI analysis showed two important modules and 8 hub genes: PTPRC, CD68, CTSS, RAC2, LAPTM5, FCGR3A, CD53, and HCK. The drug-hub gene interaction network was constructed by Cytoscape, and it included 24 candidate drugs and 3 hub genes. Conclusion. The present study helps us better understand the neuroimmune mechanism of neuropathic pain and provides some novel insights on NP treatment, such as modulation of microglia polarization and targeting bone resorption. Besides, CD68, CTSS, LAPTM5, FCGR3A, and CD53 may be used as early diagnostic biomarkers and the gene HCK can be a therapeutic target.


10.1186/gm404 ◽  
2012 ◽  
Vol 4 (12) ◽  
Author(s):  
Raymond J Louie ◽  
Jingyu Guo ◽  
John W Rodgers ◽  
Rick White ◽  
Najaf A Shah ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document