Improved Personalized Survival Prediction of Patients With Diffuse Large B-cell Lymphoma Using Gene Expression Profiling

Abstract Background30-40% of patients with Diffuse Large B-cell Lymphoma (DLBCL) have an adverse clinical evolution. The increased understanding of DLBCL biology has shed light on the clinical evolution of this pathology, leading to the discovery of prognostic factors based on gene expression data, genomic rearrangements and mutational subgroups. Nevertheless, additional efforts are needed in order to enable survival predictions at the patient level. This study investigated new machine learning models of survival based on transcriptomic and clinical data.MethodsGene expression profiling (GEP) in 2 different publicly available retrospective cohorts were analyzed. Cox regression and unsupervised clustering were performed in order to identify probes associated with overall survival on the largest cohort. Random forests were created to model survival using combinations of GEP data, COO classification and clinical information. Cross-validation was used to compare model results in the training set, and Harrel’s concordance index (c-index) was used to assess model’s predictability. Results were validated in an independent test set. Results233 and 64 patients were included in the training and test set, respectively. Initially we derived and validated a 4-gene expression clusterization that was independently associated with lower survival in 20% of patients. These genes were TNFRSF9, BIRC3, BCL2L1 and G3BP2. Thereafter, we applied machine-learning models to predict survival. A set of 102 genes was highly predictive of disease outcome, outperforming available clinical information and COO classification. The final best model integrated clinical information, COO classification, 4-gene-based clusterization and 50 gene expression data (training set c-index, 0.8404, test set c-index, 0.7942). ConclusionThis study indicates that modelling DLBCL survival with transcriptomic-based machine learning algorithms can largely outperform other important prognostic variables such as disease stage and COO.

Download Full-text

Improved personalized survival prediction of patients with diffuse large B-cell Lymphoma using gene expression profiling

BMC Cancer ◽

10.1186/s12885-020-07492-y ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Adrián Mosquera Orgueira ◽

José Ángel Díaz Arias ◽

Miguel Cid López ◽

Andrés Peleteiro Raíndo ◽

Beatriz Antelo Rodríguez ◽

...

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Expression Profiling ◽

Cell Lymphoma ◽

Clinical Information ◽

B Cell Lymphoma ◽

Training Set ◽

Test Set ◽

Large B Cell Lymphoma ◽

Large B Cell

Abstract Background Thirty to forty percent of patients with Diffuse Large B-cell Lymphoma (DLBCL) have an adverse clinical evolution. The increased understanding of DLBCL biology has shed light on the clinical evolution of this pathology, leading to the discovery of prognostic factors based on gene expression data, genomic rearrangements and mutational subgroups. Nevertheless, additional efforts are needed in order to enable survival predictions at the patient level. In this study we investigated new machine learning-based models of survival using transcriptomic and clinical data. Methods Gene expression profiling (GEP) of in 2 different publicly available retrospective DLBCL cohorts were analyzed. Cox regression and unsupervised clustering were performed in order to identify probes associated with overall survival on the largest cohort. Random forests were created to model survival using combinations of GEP data, COO classification and clinical information. Cross-validation was used to compare model results in the training set, and Harrel’s concordance index (c-index) was used to assess model’s predictability. Results were validated in an independent test set. Results Two hundred thirty-three and sixty-four patients were included in the training and test set, respectively. Initially we derived and validated a 4-gene expression clusterization that was independently associated with lower survival in 20% of patients. This pattern included the following genes: TNFRSF9, BIRC3, BCL2L1 and G3BP2. Thereafter, we applied machine-learning models to predict survival. A set of 102 genes was highly predictive of disease outcome, outperforming available clinical information and COO classification. The final best model integrated clinical information, COO classification, 4-gene-based clusterization and the expression levels of 50 individual genes (training set c-index, 0.8404, test set c-index, 0.7942). Conclusion Our results indicate that DLBCL survival models based on the application of machine learning algorithms to gene expression and clinical data can largely outperform other important prognostic variables such as disease stage and COO. Head-to-head comparisons with other risk stratification models are needed to compare its usefulness.

Download Full-text

Novel prognostic genes of diffuse large B-cell lymphoma revealed by survival analysis of gene expression data

OncoTargets and Therapy ◽

10.2147/ott.s90057 ◽

2015 ◽

pp. 3407 ◽

Cited By ~ 5

Author(s):

Chenglong Li ◽

Biao Zhu ◽

Jiao Chen ◽

Xiaobing Xiaobing Huang

Keyword(s):

Gene Expression ◽

Survival Analysis ◽

B Cell ◽

Gene Expression Data ◽

Cell Lymphoma ◽

B Cell Lymphoma ◽

Expression Data ◽

Large B Cell Lymphoma ◽

Large B Cell

Download Full-text

Artificial Neural Network Analysis of Gene Expression Data Predicted Non-Hodgkin Lymphoma Subtypes with High Accuracy

Machine Learning and Knowledge Extraction ◽

10.3390/make3030036 ◽

2021 ◽

Vol 3 (3) ◽

pp. 720-739

Author(s):

Joaquim Carreras ◽

Rifat Hamoudi

Keyword(s):

Neural Network ◽

Gene Expression ◽

Hodgkin Lymphoma ◽

Cell Lymphoma ◽

B Cell Lymphoma ◽

High Accuracy ◽

Expression Data ◽

Non Hodgkin Lymphoma ◽

Large B Cell Lymphoma ◽

Large B Cell

Predictive analytics using artificial intelligence is a useful tool in cancer research. A multilayer perceptron neural network used gene expression data to predict the lymphoma subtypes of 290 cases of non-Hodgkin lymphoma (GSE132929). The input layer included both the whole array of 20,863 genes and a cancer transcriptome panel of 1769 genes. The output layer was lymphoma subtypes, including follicular lymphoma, mantle cell lymphoma, diffuse large B-cell lymphoma, Burkitt lymphoma, and marginal zone lymphoma. The neural networks successfully classified the cases consistent with the lymphoma subtypes, with an area under the curve (AUC) that ranged from 0.87 to 0.99. The most relevant predictive genes were LCE2B, KNG1, IGHV7_81, TG, C6, FGB, ZNF750, CTSV, INGX, and COL4A6 for the whole set; and ARG1, MAGEA3, AKT2, IL1B, S100A7A, CLEC5A, WIF1, TREM1, DEFB1, and GAGE1 for the cancer panel. The characteristic predictive genes for each lymphoma subtypes were also identified with high accuracy (AUC = 0.95, incorrect predictions = 6.2%). Finally, the topmost relevant 30 genes of the whole set, which belonged to apoptosis, cell proliferation, metabolism, and antigen presentation pathways, not only predicted the lymphoma subtypes but also the overall survival of diffuse large B-cell lymphoma (series GSE10846, n = 414 cases), and most relevant cancer subtypes of The Cancer Genome Atlas (TCGA) consortium including carcinomas of breast, colorectal, lung, prostate, and gastric, melanoma, etc. (7441 cases). In conclusion, neural networks predicted the non-Hodgkin lymphoma subtypes with high accuracy, and the highlighted genes also predicted the survival of a pan-cancer series.

Download Full-text

PCN10 MACHINE LEARNING PREDICTION OF SURVIVAL IN DIFFUSE LARGE B-CELL LYMPHOMA BASED ON GENE-EXPRESSION PROFILING

Value in Health ◽

10.1016/j.jval.2020.04.1517 ◽

2020 ◽

Vol 23 ◽

pp. S23-S24

Author(s):

S. Merdan ◽

K. Subramanian ◽

T. Ayer ◽

J. Weyenbergh ◽

J. Koff ◽

...

Keyword(s):

Gene Expression ◽

Machine Learning ◽

B Cell ◽

Gene Expression Profiling ◽

Expression Profiling ◽

Cell Lymphoma ◽

B Cell Lymphoma ◽

Large B Cell Lymphoma ◽

Prediction Of Survival ◽

Large B Cell

Download Full-text

Prognostic Efficacy of the RTN1 Gene in Patients with Diffuse Large B-Cell Lymphoma

10.21203/rs.2.11441/v2 ◽

2021 ◽

Author(s):

Mohamad Zamani-Ahmadmahmudi ◽

Seyed Mahdi Nassiri ◽

Amir Asadabadi

Keyword(s):

Gene Expression ◽

Clinical Outcome ◽

Gene Expression Data ◽

Proportional Hazards ◽

Cell Lymphoma ◽

B Cell Lymphoma ◽

Cox Proportional Hazards ◽

Expression Data ◽

Large B Cell Lymphoma ◽

Large B Cell

Abstract Gene expression profiling has been vastly used to extract the genes that can predict the clinical outcome in patients with diverse cancers, including diffuse large B-cell lymphoma (DLBCL). With the aid of bioinformatics and computational analysis on gene expression data, various prognostic gene signatures for DLBCL have been recently developed. The major drawback of the previous signatures is their inability to correctly predict survival in external data sets. In other words, they are not reproducible in other datasets. Hence, in this study, we sought to determine the gene(s) that can reproducibly and robustly predict survival in patients with DLBCL. Gene expression data were extracted from 7 datasets containing 1636 patients (GSE10846 [n=420], GSE31312 [n=470], GSE11318 [n=203], GSE32918 [n=172], GSE4475 [n=123], GSE69051 [n=157], and GSE34171 [n=91]). Genes significantly associated with overall survival were detected using the univariate Cox proportional hazards analysis with a P value <0.001 and a false discovery rate (FDR) <5%. Thereafter, significant genes common between all the datasets were extracted. Additionally, chromosomal aberrations in the corresponding region of the final common gene(s) were evaluated as copy number alterations using the single nucleotide polymorphism (SNP) data of 570 patients with DLBCL (GSE58718 [n=242], GSE57277 [n=148], and GSE34171 [n=180]). Our results indicated that reticulon family gene 1 (RTN1) was the only gene that met our rigorous pipeline criteria and associated with a favorable clinical outcome in all the datasets (P<0.001, FDR<5%). In the multivariate Cox proportional hazards analysis, this gene remained independent of the routine international prognostic index components (i.e., age, stage, lactate dehydrogenase level, Eastern Cooperative Oncology Group [ECOG] performance status, and number of extranodal sites) (P<0.0001). Furthermore, no significant chromosomal aberration was found in the RTN1 genomic region (14q23.1: Start 59,595,976/ End 59,870,966).

Download Full-text

Prognostic Efficacy of the RTN1 Gene in Patients with Diffuse Large B-Cell Lymphoma

10.21203/rs.2.11441/v1 ◽

2019 ◽

Author(s):

Mohamad Zamani-Ahmadmahmudi ◽

Fatemeh Soltani-Nezhad ◽

Amir Asadabadi

Keyword(s):

Gene Expression ◽

Clinical Outcome ◽

Proportional Hazards ◽

Cell Lymphoma ◽

B Cell Lymphoma ◽

Cox Proportional Hazards ◽

Expression Data ◽

Survival Group ◽

Large B Cell Lymphoma ◽

Large B Cell

Abstract Background Gene expression profiling has been vastly used to extract genes that can predict the clinical outcome in patients with diverse cancers, including diffuse large B-cell lymphoma (DLBCL). With the aid of bioinformatics and computational analysis on gene expression data, various prognostic gene signatures for DLBCL have been recently developed. The major drawback of the previous signatures is their inability to correctly predict survival in external data sets. In other words, they are not reproducible in other datasets. Hence, in this study, we sought to determine the gene(s) that can reproducibly and robustly predict survival in patients with DLBCL. Methods Gene expression data were extracted from 7 datasets containing 1636 patients (GSE10846 [n=420], GSE31312 [n=470], GSE11318 [n=203], GSE32918 [n=172], GSE4475 [n=123], GSE69051 [n=157], and GSE34171 [n=91]). Genes significantly associated with overall survival were detected using the univariate Cox proportional hazards analysis with a P value <0.001 and a false discovery rate (FDR) <5%. Thereafter, significant genes common between all the datasets were extracted. Additionally, chromosomal aberrations in the corresponding region of final common gene(s) were evaluated as copy number alterations using the single nucleotide polymorphism (SNP) data of 570 patients with DLBCL (GSE58718 [n=242], GSE57277 [n=148], and GSE34171 [n=180]). The results were experimentally confirmed using the quantitative real-time PCR (qRT-PCR) analysis. Results Our results indicated that reticulon family gene 1 (RTN1) was the only gene that met our rigorous pipeline criteria and associated with a favorable clinical outcome in all the datasets (P<0.001, FDR<5%). In the multivariate Cox proportional hazards analysis, this gene remained independent of the routine international prognostic index components (i.e., age, stage, lactate dehydrogenase level, Eastern Cooperative Oncology Group [ECOG] performance status, and number of extranodal sites) (P<0.0001). Our experimental step confirmed the results and revealed that the expression of RTN1 in the long-survival group was significantly higher than that in the short-survival group. Furthermore, no significant chromosomal aberration was found in the RTN1 genomic region (14q23.1: Start 59,595,976/ End 59,870,966). Conclusion In light of the results of present study, RTN1 can be considered a potential prognostic gene that can robustly predict survival in patients with DLBCL.

Download Full-text

Classification of diffuse large B cell lymphoma gene expression data based on two-layer particle swarm optimization

2013 10th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD) ◽

10.1109/fskd.2013.6816234 ◽

2013 ◽

Author(s):

Yajie Liu ◽

Xinling Shi ◽

Guoliang Huang ◽

Baolei Li ◽

Lei Zhao

Keyword(s):

Gene Expression ◽

Particle Swarm Optimization ◽

Gene Expression Data ◽

Cell Lymphoma ◽

B Cell Lymphoma ◽

Expression Data ◽

Swarm Optimization ◽

Large B Cell Lymphoma ◽

Large B Cell

Download Full-text

Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning

Nature ◽

10.1038/news011227-7 ◽

2001 ◽

Cited By ~ 1

Keyword(s):

Gene Expression ◽

Machine Learning ◽

B Cell ◽

Expression Profiling ◽

Outcome Prediction ◽

Cell Lymphoma ◽

B Cell Lymphoma ◽

Supervised Machine Learning ◽

Large B Cell Lymphoma ◽

Large B Cell

Download Full-text

Microarray gene expression data with linked survival phenotypes: diffuse large-B-cell lymphoma revisited

Biostatistics ◽

10.1093/biostatistics/kxj006 ◽

2005 ◽

Vol 7 (2) ◽

pp. 268-285 ◽

Cited By ~ 55

Author(s):

Mark R. Segal

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Cell Lymphoma ◽

B Cell Lymphoma ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Microarray Gene Expression ◽

Large B Cell Lymphoma ◽

Microarray Gene ◽

Large B Cell

Download Full-text

Machine learning prediction of survival in diffuse large B-cell lymphoma based on gene-expression profiling.

Journal of Clinical Oncology ◽

10.1200/jco.2020.38.15_suppl.8047 ◽

2020 ◽

Vol 38 (15_suppl) ◽

pp. 8047-8047

Author(s):

Selin Merdan ◽

Kritika Subramanian ◽

Turgay Ayer ◽

Jean Louise Koff ◽

Andres Chang ◽

...

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Prediction Model ◽

B Cell ◽

Gene Expression Profiling ◽

Expression Profiling ◽

Cell Lymphoma ◽

B Cell Lymphoma ◽

Large B Cell Lymphoma ◽

Large B Cell

8047 Background: The current clinical risk stratification of Diffuse Large B-cell Lymphoma (DLBCL) relies on the International Prognostic Index (IPI) comprising a limited number of clinical variables but is imperfect in the identification of high-risk disease. Our study aimed to: (1) develop a risk prediction model based on the genetic and clinical features; and (2) evaluate the model’s biological implications in association with the estimated profiles of immune infiltration. Methods: Gene-expression profiling was performed on 718 patients with DLBCL for which RNA sequencing data and clinical covariates were available by Reddy et al (2017). Unsupervised and supervised machine learning methods were used to discover and identify the best set of survival-associated gene signatures for prediction. A multivariate model of survival from these signatures was constructed in the training set and validated in an independent test set. The compositions of the tumor-infiltrating immune cells were enumerated using CIBERSORT for deconvolution analysis. Results: A four gene-signature-based score was developed that separated patients into high- and low-risk groups with a significant difference in survival in the training, validation and complete cohorts (p < 0.001), independently of the IPI. The combination of the gene-expression-based score with the IPI improved the discrimination on the validation and complete sets. The area-under-the-curve at 2 and 5 years increased from 0.71 and 0.69 to 0.75 and 0.74 in the validation set, respectively. Conclusions: By analyzing the gene-expression data with a systematic approach, we developed and validated a risk prediction model that outperforms existing risk assessment methods. Our study, which integrated the profiles of immune infiltration with prognostic prediction, unraveled important associations that have the potential to identify patients who could benefit from the various therapeutic interventions, as well as highlighting possible targets for new drugs.

Download Full-text