scholarly journals Improved Personalized Survival Prediction of Patients With Diffuse Large B-cell Lymphoma Using Gene Expression Profiling

2020 ◽  
Author(s):  
Adrián Mosquera Orgueira ◽  
José Ángel Díaz Arias ◽  
Miguel Cid López ◽  
Andres Peleteiro Raindo ◽  
Beatriz Antelo Rodriguez ◽  
...  

Abstract Background30-40% of patients with Diffuse Large B-cell Lymphoma (DLBCL) have an adverse clinical evolution. The increased understanding of DLBCL biology has shed light on the clinical evolution of this pathology, leading to the discovery of prognostic factors based on gene expression data, genomic rearrangements and mutational subgroups. Nevertheless, additional efforts are needed in order to enable survival predictions at the patient level. This study investigated new machine learning models of survival based on transcriptomic and clinical data.MethodsGene expression profiling (GEP) in 2 different publicly available retrospective cohorts were analyzed. Cox regression and unsupervised clustering were performed in order to identify probes associated with overall survival on the largest cohort. Random forests were created to model survival using combinations of GEP data, COO classification and clinical information. Cross-validation was used to compare model results in the training set, and Harrel’s concordance index (c-index) was used to assess model’s predictability. Results were validated in an independent test set. Results233 and 64 patients were included in the training and test set, respectively. Initially we derived and validated a 4-gene expression clusterization that was independently associated with lower survival in 20% of patients. These genes were TNFRSF9, BIRC3, BCL2L1 and G3BP2. Thereafter, we applied machine-learning models to predict survival. A set of 102 genes was highly predictive of disease outcome, outperforming available clinical information and COO classification. The final best model integrated clinical information, COO classification, 4-gene-based clusterization and 50 gene expression data (training set c-index, 0.8404, test set c-index, 0.7942). ConclusionThis study indicates that modelling DLBCL survival with transcriptomic-based machine learning algorithms can largely outperform other important prognostic variables such as disease stage and COO.

BMC Cancer ◽  
2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Adrián Mosquera Orgueira ◽  
José Ángel Díaz Arias ◽  
Miguel Cid López ◽  
Andrés Peleteiro Raíndo ◽  
Beatriz Antelo Rodríguez ◽  
...  

Abstract Background Thirty to forty percent of patients with Diffuse Large B-cell Lymphoma (DLBCL) have an adverse clinical evolution. The increased understanding of DLBCL biology has shed light on the clinical evolution of this pathology, leading to the discovery of prognostic factors based on gene expression data, genomic rearrangements and mutational subgroups. Nevertheless, additional efforts are needed in order to enable survival predictions at the patient level. In this study we investigated new machine learning-based models of survival using transcriptomic and clinical data. Methods Gene expression profiling (GEP) of in 2 different publicly available retrospective DLBCL cohorts were analyzed. Cox regression and unsupervised clustering were performed in order to identify probes associated with overall survival on the largest cohort. Random forests were created to model survival using combinations of GEP data, COO classification and clinical information. Cross-validation was used to compare model results in the training set, and Harrel’s concordance index (c-index) was used to assess model’s predictability. Results were validated in an independent test set. Results Two hundred thirty-three and sixty-four patients were included in the training and test set, respectively. Initially we derived and validated a 4-gene expression clusterization that was independently associated with lower survival in 20% of patients. This pattern included the following genes: TNFRSF9, BIRC3, BCL2L1 and G3BP2. Thereafter, we applied machine-learning models to predict survival. A set of 102 genes was highly predictive of disease outcome, outperforming available clinical information and COO classification. The final best model integrated clinical information, COO classification, 4-gene-based clusterization and the expression levels of 50 individual genes (training set c-index, 0.8404, test set c-index, 0.7942). Conclusion Our results indicate that DLBCL survival models based on the application of machine learning algorithms to gene expression and clinical data can largely outperform other important prognostic variables such as disease stage and COO. Head-to-head comparisons with other risk stratification models are needed to compare its usefulness.


2021 ◽  
Vol 3 (3) ◽  
pp. 720-739
Author(s):  
Joaquim Carreras ◽  
Rifat Hamoudi

Predictive analytics using artificial intelligence is a useful tool in cancer research. A multilayer perceptron neural network used gene expression data to predict the lymphoma subtypes of 290 cases of non-Hodgkin lymphoma (GSE132929). The input layer included both the whole array of 20,863 genes and a cancer transcriptome panel of 1769 genes. The output layer was lymphoma subtypes, including follicular lymphoma, mantle cell lymphoma, diffuse large B-cell lymphoma, Burkitt lymphoma, and marginal zone lymphoma. The neural networks successfully classified the cases consistent with the lymphoma subtypes, with an area under the curve (AUC) that ranged from 0.87 to 0.99. The most relevant predictive genes were LCE2B, KNG1, IGHV7_81, TG, C6, FGB, ZNF750, CTSV, INGX, and COL4A6 for the whole set; and ARG1, MAGEA3, AKT2, IL1B, S100A7A, CLEC5A, WIF1, TREM1, DEFB1, and GAGE1 for the cancer panel. The characteristic predictive genes for each lymphoma subtypes were also identified with high accuracy (AUC = 0.95, incorrect predictions = 6.2%). Finally, the topmost relevant 30 genes of the whole set, which belonged to apoptosis, cell proliferation, metabolism, and antigen presentation pathways, not only predicted the lymphoma subtypes but also the overall survival of diffuse large B-cell lymphoma (series GSE10846, n = 414 cases), and most relevant cancer subtypes of The Cancer Genome Atlas (TCGA) consortium including carcinomas of breast, colorectal, lung, prostate, and gastric, melanoma, etc. (7441 cases). In conclusion, neural networks predicted the non-Hodgkin lymphoma subtypes with high accuracy, and the highlighted genes also predicted the survival of a pan-cancer series.


2021 ◽  
Author(s):  
Mohamad Zamani-Ahmadmahmudi ◽  
Seyed Mahdi Nassiri ◽  
Amir Asadabadi

Abstract Gene expression profiling has been vastly used to extract the genes that can predict the clinical outcome in patients with diverse cancers, including diffuse large B-cell lymphoma (DLBCL). With the aid of bioinformatics and computational analysis on gene expression data, various prognostic gene signatures for DLBCL have been recently developed. The major drawback of the previous signatures is their inability to correctly predict survival in external data sets. In other words, they are not reproducible in other datasets. Hence, in this study, we sought to determine the gene(s) that can reproducibly and robustly predict survival in patients with DLBCL. Gene expression data were extracted from 7 datasets containing 1636 patients (GSE10846 [n=420], GSE31312 [n=470], GSE11318 [n=203], GSE32918 [n=172], GSE4475 [n=123], GSE69051 [n=157], and GSE34171 [n=91]). Genes significantly associated with overall survival were detected using the univariate Cox proportional hazards analysis with a P value <0.001 and a false discovery rate (FDR) <5%. Thereafter, significant genes common between all the datasets were extracted. Additionally, chromosomal aberrations in the corresponding region of the final common gene(s) were evaluated as copy number alterations using the single nucleotide polymorphism (SNP) data of 570 patients with DLBCL (GSE58718 [n=242], GSE57277 [n=148], and GSE34171 [n=180]). Our results indicated that reticulon family gene 1 (RTN1) was the only gene that met our rigorous pipeline criteria and associated with a favorable clinical outcome in all the datasets (P<0.001, FDR<5%). In the multivariate Cox proportional hazards analysis, this gene remained independent of the routine international prognostic index components (i.e., age, stage, lactate dehydrogenase level, Eastern Cooperative Oncology Group [ECOG] performance status, and number of extranodal sites) (P<0.0001). Furthermore, no significant chromosomal aberration was found in the RTN1 genomic region (14q23.1: Start 59,595,976/ End 59,870,966).


2019 ◽  
Author(s):  
Mohamad Zamani-Ahmadmahmudi ◽  
Fatemeh Soltani-Nezhad ◽  
Amir Asadabadi

Abstract Background Gene expression profiling has been vastly used to extract genes that can predict the clinical outcome in patients with diverse cancers, including diffuse large B-cell lymphoma (DLBCL). With the aid of bioinformatics and computational analysis on gene expression data, various prognostic gene signatures for DLBCL have been recently developed. The major drawback of the previous signatures is their inability to correctly predict survival in external data sets. In other words, they are not reproducible in other datasets. Hence, in this study, we sought to determine the gene(s) that can reproducibly and robustly predict survival in patients with DLBCL. Methods Gene expression data were extracted from 7 datasets containing 1636 patients (GSE10846 [n=420], GSE31312 [n=470], GSE11318 [n=203], GSE32918 [n=172], GSE4475 [n=123], GSE69051 [n=157], and GSE34171 [n=91]). Genes significantly associated with overall survival were detected using the univariate Cox proportional hazards analysis with a P value <0.001 and a false discovery rate (FDR) <5%. Thereafter, significant genes common between all the datasets were extracted. Additionally, chromosomal aberrations in the corresponding region of final common gene(s) were evaluated as copy number alterations using the single nucleotide polymorphism (SNP) data of 570 patients with DLBCL (GSE58718 [n=242], GSE57277 [n=148], and GSE34171 [n=180]). The results were experimentally confirmed using the quantitative real-time PCR (qRT-PCR) analysis. Results Our results indicated that reticulon family gene 1 (RTN1) was the only gene that met our rigorous pipeline criteria and associated with a favorable clinical outcome in all the datasets (P<0.001, FDR<5%). In the multivariate Cox proportional hazards analysis, this gene remained independent of the routine international prognostic index components (i.e., age, stage, lactate dehydrogenase level, Eastern Cooperative Oncology Group [ECOG] performance status, and number of extranodal sites) (P<0.0001). Our experimental step confirmed the results and revealed that the expression of RTN1 in the long-survival group was significantly higher than that in the short-survival group. Furthermore, no significant chromosomal aberration was found in the RTN1 genomic region (14q23.1: Start 59,595,976/ End 59,870,966). Conclusion In light of the results of present study, RTN1 can be considered a potential prognostic gene that can robustly predict survival in patients with DLBCL.


2020 ◽  
Vol 38 (15_suppl) ◽  
pp. 8047-8047
Author(s):  
Selin Merdan ◽  
Kritika Subramanian ◽  
Turgay Ayer ◽  
Jean Louise Koff ◽  
Andres Chang ◽  
...  

8047 Background: The current clinical risk stratification of Diffuse Large B-cell Lymphoma (DLBCL) relies on the International Prognostic Index (IPI) comprising a limited number of clinical variables but is imperfect in the identification of high-risk disease. Our study aimed to: (1) develop a risk prediction model based on the genetic and clinical features; and (2) evaluate the model’s biological implications in association with the estimated profiles of immune infiltration. Methods: Gene-expression profiling was performed on 718 patients with DLBCL for which RNA sequencing data and clinical covariates were available by Reddy et al (2017). Unsupervised and supervised machine learning methods were used to discover and identify the best set of survival-associated gene signatures for prediction. A multivariate model of survival from these signatures was constructed in the training set and validated in an independent test set. The compositions of the tumor-infiltrating immune cells were enumerated using CIBERSORT for deconvolution analysis. Results: A four gene-signature-based score was developed that separated patients into high- and low-risk groups with a significant difference in survival in the training, validation and complete cohorts (p < 0.001), independently of the IPI. The combination of the gene-expression-based score with the IPI improved the discrimination on the validation and complete sets. The area-under-the-curve at 2 and 5 years increased from 0.71 and 0.69 to 0.75 and 0.74 in the validation set, respectively. Conclusions: By analyzing the gene-expression data with a systematic approach, we developed and validated a risk prediction model that outperforms existing risk assessment methods. Our study, which integrated the profiles of immune infiltration with prognostic prediction, unraveled important associations that have the potential to identify patients who could benefit from the various therapeutic interventions, as well as highlighting possible targets for new drugs.


Sign in / Sign up

Export Citation Format

Share Document