Medical Survival Analysis Through Transduction of Semi-Supervised Regression Targets

Author(s):  
Faisal M. Khan ◽  
Qiuhua Liu

A crucial challenge in predictive modeling for survival analysis applications such as medical prognosis is the accounting of censored observations in the data. While these time-to-event predictions inherently represent a regression problem, traditional regression approaches are challenged by the censored characteristics of the data. In such problems the true target times of a majority of instances are unknown; what is known is a censored target representing some indeterminate time before the true target time. While censored samples can be considered as semi-supervised targets, the current limited efforts in semi-supervised regression do not take into account the partial nature of unsupervised information; samples are treated as either fully labeled or unlabelled. This paper presents a novel semi-supervised learning approach where the true target times are approximated from the censored times through transduction. The method can be employed to transform traditional regression methods for survival analysis, or can be employed to enhance existing state-of-the-art survival analysis methods for improved predictive performance. The proposed approach represents one of the first applications of semi-supervised regression to survival analysis and yields a significant improvement in performance over the state-of-the-art in prostate and breast cancer prognosis applications.

2004 ◽  
Vol 1 (1) ◽  
pp. 131-142
Author(s):  
Ljupčo Todorovski ◽  
Sašo Džeroski ◽  
Peter Ljubič

Both equation discovery and regression methods aim at inducing models of numerical data. While the equation discovery methods are usually evaluated in terms of comprehensibility of the induced model, the emphasis of the regression methods evaluation is on their predictive accuracy. In this paper, we present Ciper, an efficient method for discovery of polynomial equations and empirically evaluate its predictive performance on standard regression tasks. The evaluation shows that polynomials compare favorably to linear and piecewise regression models, induced by the existing state-of-the-art regression methods, in terms of degree of fit and complexity.


2020 ◽  
Vol 41 (Supplement_2) ◽  
Author(s):  
S Rao ◽  
Y Li ◽  
R Ramakrishnan ◽  
A Hassaine ◽  
D Canoy ◽  
...  

Abstract Background/Introduction Predicting incident heart failure has been challenging. Deep learning models when applied to rich electronic health records (EHR) offer some theoretical advantages. However, empirical evidence for their superior performance is limited and they remain commonly uninterpretable, hampering their wider use in medical practice. Purpose We developed a deep learning framework for more accurate and yet interpretable prediction of incident heart failure. Methods We used longitudinally linked EHR from practices across England, involving 100,071 patients, 13% of whom had been diagnosed with incident heart failure during follow-up. We investigated the predictive performance of a novel transformer deep learning model, “Transformer for Heart Failure” (BEHRT-HF), and validated it using both an external held-out dataset and an internal five-fold cross-validation mechanism using area under receiver operating characteristic (AUROC) and area under the precision recall curve (AUPRC). Predictor groups included all outpatient and inpatient diagnoses within their temporal context, medications, age, and calendar year for each encounter. By treating diagnoses as anchors, we alternatively removed different modalities (ablation study) to understand the importance of individual modalities to the performance of incident heart failure prediction. Using perturbation-based techniques, we investigated the importance of associations between selected predictors and heart failure to improve model interpretability. Results BEHRT-HF achieved high accuracy with AUROC 0.932 and AUPRC 0.695 for external validation, and AUROC 0.933 (95% CI: 0.928, 0.938) and AUPRC 0.700 (95% CI: 0.682, 0.718) for internal validation. Compared to the state-of-the-art recurrent deep learning model, RETAIN-EX, BEHRT-HF outperformed it by 0.079 and 0.030 in terms of AUPRC and AUROC. Ablation study showed that medications were strong predictors, and calendar year was more important than age. Utilising perturbation, we identified and ranked the intensity of associations between diagnoses and heart failure. For instance, the method showed that established risk factors including myocardial infarction, atrial fibrillation and flutter, and hypertension all strongly associated with the heart failure prediction. Additionally, when population was stratified into different age groups, incident occurrence of a given disease had generally a higher contribution to heart failure prediction in younger ages than when diagnosed later in life. Conclusions Our state-of-the-art deep learning framework outperforms the predictive performance of existing models whilst enabling a data-driven way of exploring the relative contribution of a range of risk factors in the context of other temporal information. Funding Acknowledgement Type of funding source: Private grant(s) and/or Sponsorship. Main funding source(s): National Institute for Health Research, Oxford Martin School, Oxford Biomedical Research Centre


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Minh Thanh Vo ◽  
Anh H. Vo ◽  
Tuong Le

PurposeMedical images are increasingly popular; therefore, the analysis of these images based on deep learning helps diagnose diseases become more and more essential and necessary. Recently, the shoulder implant X-ray image classification (SIXIC) dataset that includes X-ray images of implanted shoulder prostheses produced by four manufacturers was released. The implant's model detection helps to select the correct equipment and procedures in the upcoming surgery.Design/methodology/approachThis study proposes a robust model named X-Net to improve the predictability for shoulder implants X-ray image classification in the SIXIC dataset. The X-Net model utilizes the Squeeze and Excitation (SE) block integrated into Residual Network (ResNet) module. The SE module aims to weigh each feature map extracted from ResNet, which aids in improving the performance. The feature extraction process of X-Net model is performed by both modules: ResNet and SE modules. The final feature is obtained by incorporating the extracted features from the above steps, which brings more important characteristics of X-ray images in the input dataset. Next, X-Net uses this fine-grained feature to classify the input images into four classes (Cofield, Depuy, Zimmer and Tornier) in the SIXIC dataset.FindingsExperiments are conducted to show the proposed approach's effectiveness compared with other state-of-the-art methods for SIXIC. The experimental results indicate that the approach outperforms the various experimental methods in terms of several performance metrics. In addition, the proposed approach provides the new state of the art results in all performance metrics, such as accuracy, precision, recall, F1-score and area under the curve (AUC), for the experimental dataset.Originality/valueThe proposed method with high predictive performance can be used to assist in the treatment of injured shoulder joints.


2020 ◽  
pp. 181-218
Author(s):  
Bendix Carstensen

This chapter describes survival analysis. Survival analysis concerns data where the outcome is a length of time, namely the time from inclusion in the study (such as diagnosis of some disease) till death or some other event — hence the term 'time to event analysis', which is also used. There are two primary targets normally addressed in survival analysis: survival probabilities and event rates. The chapter then looks at the life table estimator of survival function and the Kaplan–Meier estimator of survival. It also considers the Cox model and its relationship with Poisson models, as well as the Fine–Gray approach to competing risks.


Plants ◽  
2020 ◽  
Vol 9 (5) ◽  
pp. 617
Author(s):  
Alessandro Romano ◽  
Piergiorgio Stevanato

Germination data are analyzed by several methods, which can be mainly classified as germination indexes and traditional regression techniques to fit non-linear parametric functions to the temporal sequence of cumulative germination. However, due to the nature of germination data, often different from other biological data, the abovementioned methods may present some limits, especially when ungerminated seeds are present at the end of an experiment. A class of methods that could allow addressing these issues is represented by the so-called “time-to-event analysis”, better known in other scientific fields as “survival analysis” or “reliability analysis”. There is relatively little literature about the application of these methods to germination data, and some reviews dealt only with parts of the possible approaches such as either non-parametric and semi-parametric or parametric ones. The present study aims to give a contribution to the knowledge about the reliability of these methods by assessing all the main approaches to the same germination data provided by sugar beet (Beta vulgaris L.) seeds cohorts. The results obtained confirmed that although the different approaches present advantages and disadvantages, they could generally represent a valuable tool to analyze germination data providing parameters whose usefulness depends on the purpose of the research.


2019 ◽  
Vol 40 (Supplement_1) ◽  
Author(s):  
L E Juarez-Orozco ◽  
J W Benjamins ◽  
T Maaniitty ◽  
A Saraste ◽  
P Van Der Harst ◽  
...  

Abstract Background Deep Learning (DL) is revolutionizing cardiovascular medicine through complex data-pattern recognition. In spite of its success in the diagnosis of coronary artery disease (CAD), DL implementation for prognostic evaluation of cardiovascular events is still limited. Traditional survival models (e.g.Cox) notably incorporate the effect of time-to-event but are unable to exploit complex non-liner dependencies between large numbers of predictors. On the other hand, DL hasn't systematically incorporated time-to-event for prognostic evaluations. Long-term registries of hybrid PET/CT imaging represent a suitable substrate for DL-based survival analysis due the large amount of time-dependent structured variables that they convey. Therefore, we sought to evaluate the feasibility and performance of DL Survival Analysis in predicting the occurrence of myocardial infarction (MI) and death in a long-term registry of cardiac hybrid PET/CT. Methods Data from our PET/CT registry of symptomatic patients with intermediate CAD risk who underwent sequential CT angiography and 15O-water PET for suspected ischemia, was analyzed. The sample has been followed for a 6-year average for MI or death. Ten clinical variables were extracted from electronic records including cardiovascular risk factors, dyspnea and early revascularization. CT angiography images were evaluated segmentally for: presence of plaque, % of luminal stenosis and calcification (58 variables). Absolute stress PET myocardial perfusion data was evaluated globally and regionally across vascular territories (4 variables). Cox-Nnet (a deep survival neural network) was implemented in a 5-fold cross-validated 80:20 split for training and testing. Resulting DL-hazard ratios were operationalized and compared to the observed events developed during follow-up. The performance of Cox-Nnet evaluating structured CT, PET/CT, and PET/CT+clinical variables was compared to expert interpretation (operationalized as: normal coronaries, non-obstructive CAD, obstructive CAD) and to Calcium Score (CaSc), through the concordance (c)-index. Results There were 426 men and 525 women with a mean age of 61±9 years-old. Twenty-four MI and 49 deaths occurred during follow-up (1 month–9.6 years), while 11.5% patients underwent early revascularization. Cox-Nnet evaluation of PET/CT data (c-index=0.75) outperformed categorical expert interpretation (c-index=0.54) and CaSc (c-index=0.65), while hybrid PET/CT and PET/CT+clinical (c-index=0.75) variables demonstrated incremental performance overall independent from early revascularization. Conclusion Deep Learning Survival Analysis is feasible in the evaluation of cardiovascular prognostic data. It might enhance the value of cardiac hybrid PET/CT imaging data for predicting the long-term development of myocardial infarction and death. Further research into the implementation of Deep Learning for prognostic analyses in CAD is warranted.


2018 ◽  
Vol 18 (3-4) ◽  
pp. 322-345 ◽  
Author(s):  
Moritz Berger ◽  
Matthias Schmid

Abstract: Time-to-event models are a popular tool to analyse data where the outcome variable is the time to the occurrence of a specific event of interest. Here, we focus on the analysis of time-to-event outcomes that are either intrinsically discrete or grouped versions of continuous event times. In the literature, there exists a variety of regression methods for such data. This tutorial provides an introduction to how these models can be applied using open source statistical software. In particular, we consider semiparametric extensions comprising the use of smooth nonlinear functions and tree-based methods. All methods are illustrated by data on the duration of unemployment of US citizens.


Entropy ◽  
2020 ◽  
Vol 22 (10) ◽  
pp. 1143
Author(s):  
Zhenwu Wang ◽  
Tielin Wang ◽  
Benting Wan ◽  
Mengjie Han

Multi-label classification (MLC) is a supervised learning problem where an object is naturally associated with multiple concepts because it can be described from various dimensions. How to exploit the resulting label correlations is the key issue in MLC problems. The classifier chain (CC) is a well-known MLC approach that can learn complex coupling relationships between labels. CC suffers from two obvious drawbacks: (1) label ordering is decided at random although it usually has a strong effect on predictive performance; (2) all the labels are inserted into the chain, although some of them may carry irrelevant information that discriminates against the others. In this work, we propose a partial classifier chain method with feature selection (PCC-FS) that exploits the label correlation between label and feature spaces and thus solves the two previously mentioned problems simultaneously. In the PCC-FS algorithm, feature selection is performed by learning the covariance between feature set and label set, thus eliminating the irrelevant features that can diminish classification performance. Couplings in the label set are extracted, and the coupled labels of each label are inserted simultaneously into the chain structure to execute the training and prediction activities. The experimental results from five metrics demonstrate that, in comparison to eight state-of-the-art MLC algorithms, the proposed method is a significant improvement on existing multi-label classification.


2017 ◽  
Vol 2017 ◽  
pp. 1-9 ◽  
Author(s):  
Yanhua Wu ◽  
Zhifang Jia ◽  
Donghui Cao ◽  
Chuan Wang ◽  
Xing Wu ◽  
...  

Gastric cancer (GC) is one of the most prominent global cancer-related health threats. Genes play a key role in the precise mechanisms of gastric cancer. SNPs in mi-RNAs could affect mRNA expression and then affect the risk and prognosis of GC. Firstly, we have decided to perform a case-control study which included 897 GC patients and 992 controls to evaluate the association of miR-219-1 rs213210, miR-938 rs2505901, miR-34b/c rs4938723, and miR-218 rs11134527 polymorphisms with gastric cancer susceptibility. Secondly, among the 897 GC patients above, 755 cases underwent a radical operation, without distant metastasis and with negative surgical margins included in the survival analysis to evaluate the association of the four SNPs above with gastric cancer prognosis. The C/T or C/C genotypes of rs213210 were related to a lower GC risk (OR = 0.76, 95% CI: 0.62–0.93,P=0.009) compared to the T/T genotype. Rs11134527 in miR-218 was associated with GC survival, and the G/A and G/G genotypes of rs11134527 resulted in a decreased risk of death when compared with the A/A genotype (HR = 0.75, 95% CI: 0.61–0.95,P=0.016). This study found that miR-219-1 rs213210 polymorphism was associated with GC susceptibility and rs11134527 in miR-218 was positively correlated with GC prognosis.


Sign in / Sign up

Export Citation Format

Share Document