scholarly journals Cox-sMBPLS: An Algorithm for Disease Survival Prediction and Multi-Omics Module Discovery Incorporating Cis-Regulatory Quantitative Effects

2021 ◽  
Vol 12 ◽  
Author(s):  
Nasim Vahabi ◽  
Caitrin W. McDonough ◽  
Ankit A. Desai ◽  
Larisa H. Cavallari ◽  
Julio D. Duarte ◽  
...  

BackgroundThe development of high-throughput techniques has enabled profiling a large number of biomolecules across a number of molecular compartments. The challenge then becomes to integrate such multimodal Omics data to gain insights into biological processes and disease onset and progression mechanisms. Further, given the high dimensionality of such data, incorporating prior biological information on interactions between molecular compartments when developing statistical models for data integration is beneficial, especially in settings involving a small number of samples.ResultsWe develop a supervised model for time to event data (e.g., death, biochemical recurrence) that simultaneously accounts for redundant information within Omics profiles and leverages prior biological associations between them through a multi-block PLS framework. The interactions between data from different molecular compartments (e.g., epigenome, transcriptome, methylome, etc.) were captured by using cis-regulatory quantitative effects in the proposed model. The model, coined Cox-sMBPLS, exhibits superior prediction performance and improved feature selection based on both simulation studies and analysis of data from heart failure patients.ConclusionThe proposed supervised Cox-sMBPLS model can effectively incorporate prior biological information in the survival prediction system, leading to improved prediction performance and feature selection. It also enables the identification of multi-Omics modules of biomolecules that impact the patients’ survival probability and also provides insights into potential relevant risk factors that merit further investigation.

2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Yuto Sugai ◽  
Noriyuki Kadoya ◽  
Shohei Tanaka ◽  
Shunpei Tanabe ◽  
Mariko Umeda ◽  
...  

Abstract Background Radiomics is a new technology to noninvasively predict survival prognosis with quantitative features extracted from medical images. Most radiomics-based prognostic studies of non-small-cell lung cancer (NSCLC) patients have used mixed datasets of different subgroups. Therefore, we investigated the radiomics-based survival prediction of NSCLC patients by focusing on subgroups with identical characteristics. Methods A total of 304 NSCLC (Stages I–IV) patients treated with radiotherapy in our hospital were used. We extracted 107 radiomic features (i.e., 14 shape features, 18 first-order statistical features, and 75 texture features) from the gross tumor volume drawn on the free breathing planning computed tomography image. Three feature selection methods [i.e., test–retest and multiple segmentation (FS1), Pearson's correlation analysis (FS2), and a method that combined FS1 and FS2 (FS3)] were used to clarify how they affect survival prediction performance. Subgroup analysis for each histological subtype and each T stage applied the best selection method for the analysis of All data. We used a least absolute shrinkage and selection operator Cox regression model for all analyses and evaluated prognostic performance using the concordance-index (C-index) and the Kaplan–Meier method. For subgroup analysis, fivefold cross-validation was applied to ensure model reliability. Results In the analysis of All data, the C-index for the test dataset is 0.62 (FS1), 0.63 (FS2), and 0.62 (FS3). The subgroup analysis indicated that the prediction model based on specific histological subtypes and T stages had a higher C-index for the test dataset than that based on All data (All data, 0.64 vs. SCCall, 060; ADCall, 0.69; T1, 0.68; T2, 0.65; T3, 0.66; T4, 0.70). In addition, the prediction models unified for each T stage in histological subtype showed a different trend in the C-index for the test dataset between ADC-related and SCC-related models (ADCT1–ADCT4, 0.72–0.83; SCCT1–SCCT4, 0.58–0.71). Conclusions Our results showed that feature selection methods moderately affected the survival prediction performance. In addition, prediction models based on specific subgroups may improve the prediction performance. These results may prove useful for determining the optimal radiomics-based predication model.


F1000Research ◽  
2019 ◽  
Vol 7 ◽  
pp. 1505 ◽  
Author(s):  
Michail Tsagris ◽  
Ioannis Tsamardinos

Feature (or variable) selection is the process of identifying the minimal set of features with the highest predictive performance on the target variable of interest. Numerous feature selection algorithms have been developed over the years, but only few have been implemented in R and made publicly available R as packages while offering few options. The R package MXM offers a variety of feature selection algorithms, and has unique features that make it advantageous over its competitors: a) it contains feature selection algorithms that can treat numerous types of target variables, including continuous, percentages, time to event (survival), binary, nominal, ordinal, clustered, counts, left censored, etc; b) it contains a variety of regression models that can be plugged into the feature selection algorithms (for example with time to event data the user can choose among Cox, Weibull, log logistic or exponential regression); c) it includes an algorithm for detecting multiple solutions (many sets of statistically equivalent features, plain speaking, two features can carry statistically equivalent information when substituting one with the other does not effect the inference or the conclusions); and d) it includes memory efficient algorithms for high volume data, data that cannot be loaded into R (In a 16GB RAM terminal for example, R cannot directly load data of 16GB size. By utilizing the proper package, we load the data and then perform feature selection.). In this paper, we qualitatively compare MXM with other relevant feature selection packages and discuss its advantages and disadvantages. Further, we provide a demonstration of MXM’s algorithms using real high-dimensional data from various applications.


2003 ◽  
Vol 82 (5) ◽  
pp. 345-349 ◽  
Author(s):  
C.F. Spiekerman ◽  
P.P. Hujoel ◽  
T.A. DeRouen

Non-causal associations between periodontitis and systemic diseases may be spuriously induced by smoking because of its strong relationship to both. The goal of this study was to evaluate whether adjustment for self-reported smoking removes tobacco-related confounding and eliminated such spurious confounding. Using NHANES III data, we evaluated associations between attachment loss and serum cotinine after adjustment by self-reported number of cigarettes smoked. Cotinine, a metabolite of nicotine, should not be related to attachment loss, if self-reported smoking captures the effect of tobacco on attachment levels. Adjustment for self-reported cigarette smoking did not completely remove the correlation between attachment loss and serum-cotinine level (r = 0.075, n= 1507, p = 0.003). Simulation studies indicated similar results for time-to-event data. These findings demonstrate the difficulty in distinguishing the effects of periodontitis from those of smoking with respect to a smoking-related outcome. Future studies should report results of analyses on separate subcohorts of never-smokers and smokers.


2021 ◽  
pp. 096228022110028
Author(s):  
T Baghfalaki ◽  
M Ganjali

Joint modeling of zero-inflated count and time-to-event data is usually performed by applying the shared random effect model. This kind of joint modeling can be considered as a latent Gaussian model. In this paper, the approach of integrated nested Laplace approximation (INLA) is used to perform approximate Bayesian approach for the joint modeling. We propose a zero-inflated hurdle model under Poisson or negative binomial distributional assumption as sub-model for count data. Also, a Weibull model is used as survival time sub-model. In addition to the usual joint linear model, a joint partially linear model is also considered to take into account the non-linear effect of time on the longitudinal count response. The performance of the method is investigated using some simulation studies and its achievement is compared with the usual approach via the Bayesian paradigm of Monte Carlo Markov Chain (MCMC). Also, we apply the proposed method to analyze two real data sets. The first one is the data about a longitudinal study of pregnancy and the second one is a data set obtained of a HIV study.


2021 ◽  
Vol 18 (1) ◽  
Author(s):  
Ulrike Baum ◽  
Sangita Kulathinal ◽  
Kari Auranen

Abstract Background Non-sensitive and non-specific observation of outcomes in time-to-event data affects event counts as well as the risk sets, thus, biasing the estimation of hazard ratios. We investigate how imperfect observation of incident events affects the estimation of vaccine effectiveness based on hazard ratios. Methods Imperfect time-to-event data contain two classes of events: a portion of the true events of interest; and false-positive events mistakenly recorded as events of interest. We develop an estimation method utilising a weighted partial likelihood and probabilistic deletion of false-positive events and assuming the sensitivity and the false-positive rate are known. The performance of the method is evaluated using simulated and Finnish register data. Results The novel method enables unbiased semiparametric estimation of hazard ratios from imperfect time-to-event data. False-positive rates that are small can be approximated to be zero without inducing bias. The method is robust to misspecification of the sensitivity as long as the ratio of the sensitivity in the vaccinated and the unvaccinated is specified correctly and the cumulative risk of the true event is small. Conclusions The weighted partial likelihood can be used to adjust for outcome measurement errors in the estimation of hazard ratios and effectiveness but requires specifying the sensitivity and the false-positive rate. In absence of exact information about these parameters, the method works as a tool for assessing the potential magnitude of bias given a range of likely parameter values.


2021 ◽  
Vol 3 (4) ◽  
Author(s):  
Jianlei Zhang ◽  
Yukun Zeng ◽  
Binil Starly

AbstractData-driven approaches for machine tool wear diagnosis and prognosis are gaining attention in the past few years. The goal of our study is to advance the adaptability, flexibility, prediction performance, and prediction horizon for online monitoring and prediction. This paper proposes the use of a recent deep learning method, based on Gated Recurrent Neural Network architecture, including Long Short Term Memory (LSTM), which try to captures long-term dependencies than regular Recurrent Neural Network method for modeling sequential data, and also the mechanism to realize the online diagnosis and prognosis and remaining useful life (RUL) prediction with indirect measurement collected during the manufacturing process. Existing models are usually tool-specific and can hardly be generalized to other scenarios such as for different tools or operating environments. Different from current methods, the proposed model requires no prior knowledge about the system and thus can be generalized to different scenarios and machine tools. With inherent memory units, the proposed model can also capture long-term dependencies while learning from sequential data such as those collected by condition monitoring sensors, which means it can be accommodated to machine tools with varying life and increase the prediction performance. To prove the validity of the proposed approach, we conducted multiple experiments on a milling machine cutting tool and applied the model for online diagnosis and RUL prediction. Without loss of generality, we incorporate a system transition function and system observation function into the neural net and trained it with signal data from a minimally intrusive vibration sensor. The experiment results showed that our LSTM-based model achieved the best overall accuracy among other methods, with a minimal Mean Square Error (MSE) for tool wear prediction and RUL prediction respectively.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Janne J. Näppi ◽  
Tomoki Uemura ◽  
Chinatsu Watari ◽  
Toru Hironaka ◽  
Tohru Kamiya ◽  
...  

AbstractThe rapid increase of patients with coronavirus disease 2019 (COVID-19) has introduced major challenges to healthcare services worldwide. Therefore, fast and accurate clinical assessment of COVID-19 progression and mortality is vital for the management of COVID-19 patients. We developed an automated image-based survival prediction model, called U-survival, which combines deep learning of chest CT images with the established survival analysis methodology of an elastic-net Cox survival model. In an evaluation of 383 COVID-19 positive patients from two hospitals, the prognostic bootstrap prediction performance of U-survival was significantly higher (P < 0.0001) than those of existing laboratory and image-based reference predictors both for COVID-19 progression (maximum concordance index: 91.6% [95% confidence interval 91.5, 91.7]) and for mortality (88.7% [88.6, 88.9]), and the separation between the Kaplan–Meier survival curves of patients stratified into low- and high-risk groups was largest for U-survival (P < 3 × 10–14). The results indicate that U-survival can be used to provide automated and objective prognostic predictions for the management of COVID-19 patients.


Mathematics ◽  
2021 ◽  
Vol 9 (15) ◽  
pp. 1815
Author(s):  
Diego I. Gallardo ◽  
Mário de Castro ◽  
Héctor W. Gómez

A cure rate model under the competing risks setup is proposed. For the number of competing causes related to the occurrence of the event of interest, we posit the one-parameter Bell distribution, which accommodates overdispersed counts. The model is parameterized in the cure rate, which is linked to covariates. Parameter estimation is based on the maximum likelihood method. Estimates are computed via the EM algorithm. In order to compare different models, a selection criterion for non-nested models is implemented. Results from simulation studies indicate that the estimation method and the model selection criterion have a good performance. A dataset on melanoma is analyzed using the proposed model as well as some models from the literature.


2021 ◽  
pp. 153537022199201
Author(s):  
Runmin Li ◽  
Guosheng Wang ◽  
ZhouJie Wu ◽  
HuaGuang Lu ◽  
Gen Li ◽  
...  

Multiple-omics sequencing information with high-throughput has laid a solid foundation to identify genes associated with cancer prognostic process. Multiomics information study is capable of revealing the cancer occurring and developing system according to several aspects. Currently, the prognosis of osteosarcoma is still poor, so a genetic marker is needed for predicting the clinically related overall survival result. First, Office of Cancer Genomics (OCG Target) provided RNASeq, copy amount variations information, and clinically related follow-up data. Genes associated with prognostic process and genes exhibiting copy amount difference were screened in the training group, and the mentioned genes were integrated for feature selection with least absolute shrinkage and selection operator (Lasso). Eventually, effective biomarkers received the screening process. Lastly, this study built and demonstrated one gene-associated prognosis mode according to the set of the test and gene expression omnibus validation set; 512 prognosis-related genes ( P < 0.01), 336 copies of amplified genes ( P < 0.05), and 36 copies of deleted genes ( P < 0.05) were obtained, and those genes of the mentioned genomic variants display close associations with tumor occurring and developing mechanisms. This study generated 10 genes for candidates through the integration of genomic variant genes as well as prognosis-related genes. Six typical genes (i.e. MYC, CHIC2, CCDC152, LYL1, GPR142, and MMP27) were obtained by Lasso feature selection and stepwise multivariate regression study, many of which are reported to show a relationship to tumor progressing process. The authors conducted Cox regression study for building 6-gene sign, i.e. one single prognosis-related element, in terms of cases carrying osteosarcoma. In addition, the samples were able to be risk stratified in the training group, test set, and externally validating set. The AUC of five-year survival according to the training group and validation set reached over 0.85, with superior predictive performance as opposed to the existing researches. Here, 6-gene sign was built to be new prognosis-related marking elements for assessing osteosarcoma cases’ surviving state.


Sign in / Sign up

Export Citation Format

Share Document