Cox-sMBPLS: An Algorithm for Disease Survival Prediction and Multi-Omics Module Discovery Incorporating Cis-Regulatory Quantitative Effects

BackgroundThe development of high-throughput techniques has enabled profiling a large number of biomolecules across a number of molecular compartments. The challenge then becomes to integrate such multimodal Omics data to gain insights into biological processes and disease onset and progression mechanisms. Further, given the high dimensionality of such data, incorporating prior biological information on interactions between molecular compartments when developing statistical models for data integration is beneficial, especially in settings involving a small number of samples.ResultsWe develop a supervised model for time to event data (e.g., death, biochemical recurrence) that simultaneously accounts for redundant information within Omics profiles and leverages prior biological associations between them through a multi-block PLS framework. The interactions between data from different molecular compartments (e.g., epigenome, transcriptome, methylome, etc.) were captured by using cis-regulatory quantitative effects in the proposed model. The model, coined Cox-sMBPLS, exhibits superior prediction performance and improved feature selection based on both simulation studies and analysis of data from heart failure patients.ConclusionThe proposed supervised Cox-sMBPLS model can effectively incorporate prior biological information in the survival prediction system, leading to improved prediction performance and feature selection. It also enables the identification of multi-Omics modules of biomolecules that impact the patients’ survival probability and also provides insights into potential relevant risk factors that merit further investigation.

Download Full-text

Impact of feature selection methods and subgroup factors on prognostic analysis with CT-based radiomics in non-small cell lung cancer patients

Radiation Oncology ◽

10.1186/s13014-021-01810-9 ◽

2021 ◽

Vol 16 (1) ◽

Author(s):

Yuto Sugai ◽

Noriyuki Kadoya ◽

Shohei Tanaka ◽

Shunpei Tanabe ◽

Mariko Umeda ◽

...

Keyword(s):

Feature Selection ◽

Subgroup Analysis ◽

Prediction Models ◽

Prediction Performance ◽

Survival Prediction ◽

Histological Subtype ◽

Selection Methods ◽

Small Cell Lung ◽

Test Dataset ◽

Nsclc Patients

Abstract Background Radiomics is a new technology to noninvasively predict survival prognosis with quantitative features extracted from medical images. Most radiomics-based prognostic studies of non-small-cell lung cancer (NSCLC) patients have used mixed datasets of different subgroups. Therefore, we investigated the radiomics-based survival prediction of NSCLC patients by focusing on subgroups with identical characteristics. Methods A total of 304 NSCLC (Stages I–IV) patients treated with radiotherapy in our hospital were used. We extracted 107 radiomic features (i.e., 14 shape features, 18 first-order statistical features, and 75 texture features) from the gross tumor volume drawn on the free breathing planning computed tomography image. Three feature selection methods [i.e., test–retest and multiple segmentation (FS1), Pearson's correlation analysis (FS2), and a method that combined FS1 and FS2 (FS3)] were used to clarify how they affect survival prediction performance. Subgroup analysis for each histological subtype and each T stage applied the best selection method for the analysis of All data. We used a least absolute shrinkage and selection operator Cox regression model for all analyses and evaluated prognostic performance using the concordance-index (C-index) and the Kaplan–Meier method. For subgroup analysis, fivefold cross-validation was applied to ensure model reliability. Results In the analysis of All data, the C-index for the test dataset is 0.62 (FS1), 0.63 (FS2), and 0.62 (FS3). The subgroup analysis indicated that the prediction model based on specific histological subtypes and T stages had a higher C-index for the test dataset than that based on All data (All data, 0.64 vs. SCCall, 060; ADCall, 0.69; T1, 0.68; T2, 0.65; T3, 0.66; T4, 0.70). In addition, the prediction models unified for each T stage in histological subtype showed a different trend in the C-index for the test dataset between ADC-related and SCC-related models (ADCT1–ADCT4, 0.72–0.83; SCCT1–SCCT4, 0.58–0.71). Conclusions Our results showed that feature selection methods moderately affected the survival prediction performance. In addition, prediction models based on specific subgroups may improve the prediction performance. These results may prove useful for determining the optimal radiomics-based predication model.

Download Full-text

Feature selection with the R package MXM

F1000Research ◽

10.12688/f1000research.16216.2 ◽

2019 ◽

Vol 7 ◽

pp. 1505 ◽

Cited By ~ 3

Author(s):

Michail Tsagris ◽

Ioannis Tsamardinos

Keyword(s):

Feature Selection ◽

Predictive Performance ◽

High Volume ◽

R Package ◽

Volume Data ◽

Time To Event ◽

Minimal Set ◽

Advantages And Disadvantages ◽

Time To Event Data ◽

Selection Algorithms

Feature (or variable) selection is the process of identifying the minimal set of features with the highest predictive performance on the target variable of interest. Numerous feature selection algorithms have been developed over the years, but only few have been implemented in R and made publicly available R as packages while offering few options. The R package MXM offers a variety of feature selection algorithms, and has unique features that make it advantageous over its competitors: a) it contains feature selection algorithms that can treat numerous types of target variables, including continuous, percentages, time to event (survival), binary, nominal, ordinal, clustered, counts, left censored, etc; b) it contains a variety of regression models that can be plugged into the feature selection algorithms (for example with time to event data the user can choose among Cox, Weibull, log logistic or exponential regression); c) it includes an algorithm for detecting multiple solutions (many sets of statistically equivalent features, plain speaking, two features can carry statistically equivalent information when substituting one with the other does not effect the inference or the conclusions); and d) it includes memory efficient algorithms for high volume data, data that cannot be loaded into R (In a 16GB RAM terminal for example, R cannot directly load data of 16GB size. By utilizing the proper package, we load the data and then perform feature selection.). In this paper, we qualitatively compare MXM with other relevant feature selection packages and discuss its advantages and disadvantages. Further, we provide a demonstration of MXM’s algorithms using real high-dimensional data from various applications.

Download Full-text

Bias Induced by Self-reported Smoking on Periodontitis-Systemic Disease Associations

Journal of Dental Research ◽

10.1177/154405910308200504 ◽

2003 ◽

Vol 82 (5) ◽

pp. 345-349 ◽

Cited By ~ 56

Author(s):

C.F. Spiekerman ◽

P.P. Hujoel ◽

T.A. DeRouen

Keyword(s):

Systemic Disease ◽

Strong Relationship ◽

Nhanes Iii ◽

Simulation Studies ◽

Time To Event ◽

Serum Cotinine ◽

Attachment Loss ◽

Time To Event Data ◽

Disease Associations ◽

Never Smokers

Non-causal associations between periodontitis and systemic diseases may be spuriously induced by smoking because of its strong relationship to both. The goal of this study was to evaluate whether adjustment for self-reported smoking removes tobacco-related confounding and eliminated such spurious confounding. Using NHANES III data, we evaluated associations between attachment loss and serum cotinine after adjustment by self-reported number of cigarettes smoked. Cotinine, a metabolite of nicotine, should not be related to attachment loss, if self-reported smoking captures the effect of tobacco on attachment levels. Adjustment for self-reported cigarette smoking did not completely remove the correlation between attachment loss and serum-cotinine level (r = 0.075, n= 1507, p = 0.003). Simulation studies indicated similar results for time-to-event data. These findings demonstrate the difficulty in distinguishing the effects of periodontitis from those of smoking with respect to a smoking-related outcome. Future studies should report results of analyses on separate subcohorts of never-smokers and smokers.

Download Full-text

Prognostic review and time-to-event data meta-analysis of low skeletal muscle mass in patients with peripheral arterial disease of the lower limbs undergoing revascularization

International Angiology ◽

10.23736/s0392-9590.19.04248-2 ◽

2020 ◽

Vol 39 (1) ◽

Author(s):

Madeleine Sharpe ◽

Emeka Okoye ◽

George A. Antoniou

Keyword(s):

Skeletal Muscle ◽

Peripheral Arterial Disease ◽

Skeletal Muscle Mass ◽

Meta Analysis ◽

Arterial Disease ◽

Lower Limbs ◽

Event Data ◽

Time To Event ◽

Time To Event Data ◽

Peripheral Arterial

Download Full-text

Approximate Bayesian inference for joint linear and partially linear modeling of longitudinal zero-inflated count and time to event data

Statistical Methods in Medical Research ◽

10.1177/09622802211002868 ◽

2021 ◽

pp. 096228022110028

Author(s):

T Baghfalaki ◽

M Ganjali

Keyword(s):

Linear Model ◽

Joint Modeling ◽

Partially Linear Model ◽

Event Data ◽

Time To Event ◽

Data Set ◽

Partially Linear ◽

Time To Event Data ◽

Count Response ◽

Approximate Bayesian

Joint modeling of zero-inflated count and time-to-event data is usually performed by applying the shared random effect model. This kind of joint modeling can be considered as a latent Gaussian model. In this paper, the approach of integrated nested Laplace approximation (INLA) is used to perform approximate Bayesian approach for the joint modeling. We propose a zero-inflated hurdle model under Poisson or negative binomial distributional assumption as sub-model for count data. Also, a Weibull model is used as survival time sub-model. In addition to the usual joint linear model, a joint partially linear model is also considered to take into account the non-linear effect of time on the longitudinal count response. The performance of the method is investigated using some simulation studies and its achievement is compared with the usual approach via the Bayesian paradigm of Monte Carlo Markov Chain (MCMC). Also, we apply the proposed method to analyze two real data sets. The first one is the data about a longitudinal study of pregnancy and the second one is a data set obtained of a HIV study.

Download Full-text

Mitigation of biases in estimating hazard ratios under non-sensitive and non-specific observation of outcomes–applications to influenza vaccine effectiveness

Emerging Themes in Epidemiology ◽

10.1186/s12982-020-00091-z ◽

2021 ◽

Vol 18 (1) ◽

Author(s):

Ulrike Baum ◽

Sangita Kulathinal ◽

Kari Auranen

Keyword(s):

False Positive ◽

False Positive Rate ◽

Vaccine Effectiveness ◽

Partial Likelihood ◽

Event Data ◽

Time To Event ◽

Positive Events ◽

Time To Event Data ◽

Hazard Ratios ◽

Positive Rate

Abstract Background Non-sensitive and non-specific observation of outcomes in time-to-event data affects event counts as well as the risk sets, thus, biasing the estimation of hazard ratios. We investigate how imperfect observation of incident events affects the estimation of vaccine effectiveness based on hazard ratios. Methods Imperfect time-to-event data contain two classes of events: a portion of the true events of interest; and false-positive events mistakenly recorded as events of interest. We develop an estimation method utilising a weighted partial likelihood and probabilistic deletion of false-positive events and assuming the sensitivity and the false-positive rate are known. The performance of the method is evaluated using simulated and Finnish register data. Results The novel method enables unbiased semiparametric estimation of hazard ratios from imperfect time-to-event data. False-positive rates that are small can be approximated to be zero without inducing bias. The method is robust to misspecification of the sensitivity as long as the ratio of the sensitivity in the vaccinated and the unvaccinated is specified correctly and the cumulative risk of the true event is small. Conclusions The weighted partial likelihood can be used to adjust for outcome measurement errors in the estimation of hazard ratios and effectiveness but requires specifying the sensitivity and the false-positive rate. In absence of exact information about these parameters, the method works as a tool for assessing the potential magnitude of bias given a range of likely parameter values.

Download Full-text

Recurrent neural networks with long term temporal dependencies in machine tool wear diagnosis and prognosis

SN Applied Sciences ◽

10.1007/s42452-021-04427-5 ◽

2021 ◽

Vol 3 (4) ◽

Author(s):

Jianlei Zhang ◽

Yukun Zeng ◽

Binil Starly

Keyword(s):

Neural Network ◽

Tool Wear ◽

Machine Tool ◽

Recurrent Neural Network ◽

Machine Tools ◽

Prediction Performance ◽

Sequential Data ◽

Diagnosis And Prognosis ◽

Proposed Model

AbstractData-driven approaches for machine tool wear diagnosis and prognosis are gaining attention in the past few years. The goal of our study is to advance the adaptability, flexibility, prediction performance, and prediction horizon for online monitoring and prediction. This paper proposes the use of a recent deep learning method, based on Gated Recurrent Neural Network architecture, including Long Short Term Memory (LSTM), which try to captures long-term dependencies than regular Recurrent Neural Network method for modeling sequential data, and also the mechanism to realize the online diagnosis and prognosis and remaining useful life (RUL) prediction with indirect measurement collected during the manufacturing process. Existing models are usually tool-specific and can hardly be generalized to other scenarios such as for different tools or operating environments. Different from current methods, the proposed model requires no prior knowledge about the system and thus can be generalized to different scenarios and machine tools. With inherent memory units, the proposed model can also capture long-term dependencies while learning from sequential data such as those collected by condition monitoring sensors, which means it can be accommodated to machine tools with varying life and increase the prediction performance. To prove the validity of the proposed approach, we conducted multiple experiments on a milling machine cutting tool and applied the model for online diagnosis and RUL prediction. Without loss of generality, we incorporate a system transition function and system observation function into the neural net and trained it with signal data from a minimally intrusive vibration sensor. The experiment results showed that our LSTM-based model achieved the best overall accuracy among other methods, with a minimal Mean Square Error (MSE) for tool wear prediction and RUL prediction respectively.

Download Full-text

U-survival for prognostic prediction of disease progression and mortality of patients with COVID-19

Scientific Reports ◽

10.1038/s41598-021-88591-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Janne J. Näppi ◽

Tomoki Uemura ◽

Chinatsu Watari ◽

Toru Hironaka ◽

Tohru Kamiya ◽

...

Keyword(s):

Risk Groups ◽

Healthcare Services ◽

Chest Ct ◽

Prediction Performance ◽

Survival Prediction ◽

Concordance Index ◽

Survival Curves ◽

Analysis Methodology ◽

Kaplan Meier ◽

Prognostic Prediction

AbstractThe rapid increase of patients with coronavirus disease 2019 (COVID-19) has introduced major challenges to healthcare services worldwide. Therefore, fast and accurate clinical assessment of COVID-19 progression and mortality is vital for the management of COVID-19 patients. We developed an automated image-based survival prediction model, called U-survival, which combines deep learning of chest CT images with the established survival analysis methodology of an elastic-net Cox survival model. In an evaluation of 383 COVID-19 positive patients from two hospitals, the prognostic bootstrap prediction performance of U-survival was significantly higher (P < 0.0001) than those of existing laboratory and image-based reference predictors both for COVID-19 progression (maximum concordance index: 91.6% [95% confidence interval 91.5, 91.7]) and for mortality (88.7% [88.6, 88.9]), and the separation between the Kaplan–Meier survival curves of patients stratified into low- and high-risk groups was largest for U-survival (P < 3 × 10–14). The results indicate that U-survival can be used to provide automated and objective prognostic predictions for the management of COVID-19 patients.

Download Full-text

An Alternative Promotion Time Cure Model with Overdispersed Number of Competing Causes: An Application to Melanoma Data

Mathematics ◽

10.3390/math9151815 ◽

2021 ◽

Vol 9 (15) ◽

pp. 1815

Author(s):

Diego I. Gallardo ◽

Mário de Castro ◽

Héctor W. Gómez

Keyword(s):

Estimation Method ◽

Selection Criterion ◽

Likelihood Method ◽

Simulation Studies ◽

Cure Rate ◽

Nested Models ◽

Proposed Model ◽

The Em Algorithm ◽

Competing Causes ◽

The One

A cure rate model under the competing risks setup is proposed. For the number of competing causes related to the occurrence of the event of interest, we posit the one-parameter Bell distribution, which accommodates overdispersed counts. The model is parameterized in the cure rate, which is linked to covariates. Parameter estimation is based on the maximum likelihood method. Estimates are computed via the EM algorithm. In order to compare different models, a selection criterion for non-nested models is implemented. Results from simulation studies indicate that the estimation method and the model selection criterion have a good performance. A dataset on melanoma is analyzed using the proposed model as well as some models from the literature.

Download Full-text

Identification of 6 gene markers for survival prediction in osteosarcoma cases based on multi-omics analysis

Experimental Biology and Medicine ◽

10.1177/1535370221992015 ◽

2021 ◽

pp. 153537022199201

Author(s):

Runmin Li ◽

Guosheng Wang ◽

ZhouJie Wu ◽

HuaGuang Lu ◽

Gen Li ◽

...

Keyword(s):

Feature Selection ◽

Cox Regression ◽

Cancer Genomics ◽

Predictive Performance ◽

Training Group ◽

Gene Expression Omnibus ◽

Survival Prediction ◽

Screening Process ◽

Gene Markers ◽

Validation Set

Multiple-omics sequencing information with high-throughput has laid a solid foundation to identify genes associated with cancer prognostic process. Multiomics information study is capable of revealing the cancer occurring and developing system according to several aspects. Currently, the prognosis of osteosarcoma is still poor, so a genetic marker is needed for predicting the clinically related overall survival result. First, Office of Cancer Genomics (OCG Target) provided RNASeq, copy amount variations information, and clinically related follow-up data. Genes associated with prognostic process and genes exhibiting copy amount difference were screened in the training group, and the mentioned genes were integrated for feature selection with least absolute shrinkage and selection operator (Lasso). Eventually, effective biomarkers received the screening process. Lastly, this study built and demonstrated one gene-associated prognosis mode according to the set of the test and gene expression omnibus validation set; 512 prognosis-related genes ( P < 0.01), 336 copies of amplified genes ( P < 0.05), and 36 copies of deleted genes ( P < 0.05) were obtained, and those genes of the mentioned genomic variants display close associations with tumor occurring and developing mechanisms. This study generated 10 genes for candidates through the integration of genomic variant genes as well as prognosis-related genes. Six typical genes (i.e. MYC, CHIC2, CCDC152, LYL1, GPR142, and MMP27) were obtained by Lasso feature selection and stepwise multivariate regression study, many of which are reported to show a relationship to tumor progressing process. The authors conducted Cox regression study for building 6-gene sign, i.e. one single prognosis-related element, in terms of cases carrying osteosarcoma. In addition, the samples were able to be risk stratified in the training group, test set, and externally validating set. The AUC of five-year survival according to the training group and validation set reached over 0.85, with superior predictive performance as opposed to the existing researches. Here, 6-gene sign was built to be new prognosis-related marking elements for assessing osteosarcoma cases’ surviving state.

Download Full-text