scholarly journals Predict multicategory causes of death in lung cancer patients using clinicopathologic factors

Author(s):  
Fei Deng ◽  
Haijun Zhou ◽  
Yong Lin ◽  
John Heim ◽  
Lanlan Shen ◽  
...  

Background: Random forest model is a recently developed machine-learning algorithm, and superior to other machine learning and regression models for its classification function and better accuracy. But it is rarely used for predicting causes of death in lung cancer patients. On the other hand, specific causes of death in lung cancer patients are poorly classified or predicted, largely due to its categorical nature (versus binary death/survival). Methods: We therefore tuned and employed a random forest algorithm (Stata, version 15) to classify and predict specific causes of death in lung cancer patients, using the surveillance, epidemiology and end results-18 and several clinicopathological factors. The lung cancer diagnosed during 2004 were included for the completeness in their follow-up and death causes. The patients were randomly divided into training and validation sets (1:1 match). We also compared the accuracies of the final random forest and multinomial regression models. Results: We identified and randomly selected 40,000 lung cancers for the analyses, including 20,000 cases for either set. The causes of death were, in descending ranking order, were lung cancer (72.45 %), other causes or alive (14.38%), non-lung cancer (6.87%), cardiovascular disease (5.35%), and infection (0.95%). We found more 250 iterations and the 10 variables produced the best prediction, whose best accuracy was 69.8% (error-rate 30.2%). The final random forest model with 300 iterations and 10 variables reached an accuracy higher than that of multinomial regression model (69.8% vs 64.6%). The top-10 most important factors in the random-forest model were sex, chemotherapy status, age (65+ vs <65 years), radiotherapy status, nodal status, T category, histology type and laterality, which were also independently associated with 5-category causes of death. Conclusion: We optimized a random forest model of machine learning to predict the specific cause of death in lung cancer patients using a set of clinicopathologic factors. The model also appears more accurate than multinomial regression model.

Cancers ◽  
2021 ◽  
Vol 13 (23) ◽  
pp. 6013
Author(s):  
Hyun-Soo Park ◽  
Kwang-sig Lee ◽  
Bo-Kyoung Seo ◽  
Eun-Sil Kim ◽  
Kyu-Ran Cho ◽  
...  

This prospective study enrolled 147 women with invasive breast cancer who underwent low-dose breast CT (80 kVp, 25 mAs, 1.01–1.38 mSv) before treatment. From each tumor, we extracted eight perfusion parameters using the maximum slope algorithm and 36 texture parameters using the filtered histogram technique. Relationships between CT parameters and histological factors were analyzed using five machine learning algorithms. Performance was compared using the area under the receiver-operating characteristic curve (AUC) with the DeLong test. The AUCs of the machine learning models increased when using both features instead of the perfusion or texture features alone. The random forest model that integrated texture and perfusion features was the best model for prediction (AUC = 0.76). In the integrated random forest model, the AUCs for predicting human epidermal growth factor receptor 2 positivity, estrogen receptor positivity, progesterone receptor positivity, ki67 positivity, high tumor grade, and molecular subtype were 0.86, 0.76, 0.69, 0.65, 0.75, and 0.79, respectively. Entropy of pre- and postcontrast images and perfusion, time to peak, and peak enhancement intensity of hot spots are the five most important CT parameters for prediction. In conclusion, machine learning using texture and perfusion characteristics of breast cancer with low-dose CT has potential value for predicting prognostic factors and risk stratification in breast cancer patients.


2021 ◽  
Vol 129 ◽  
pp. 104161
Author(s):  
Fei Deng ◽  
Haijun Zhou ◽  
Yong Lin ◽  
John A. Heim ◽  
Lanlan Shen ◽  
...  

2021 ◽  
Author(s):  
Zhenhao Li

UNSTRUCTURED Tuberculosis (TB) is a precipitating cause of lung cancer. Lung cancer patients coexisting with TB is difficult to differentiate from isolated TB patients. The aim of this study is to develop a prediction model in identifying those two diseases between the comorbidities and TB. In this work, based on the laboratory data from 389 patients, 81 features, including main laboratory examination of blood test, biochemical test, coagulation assay, tumor markers and baseline information, were initially used as integrated markers and then reduced to form a discrimination system consisting of 31 top-ranked indices. Patients diagnosed with TB PCR >1mtb/ml as negative samples, lung cancer patients with TB were confirmed by pathological examination and TB PCR >1mtb/ml as positive samples. We used Spatially Uniform ReliefF (SURF) algorithm to determine feature importance, and the predictive model was built using machine learning algorithm Random Forest. For cross-validation, the samples were randomly split into four training set and one test set. The selected features are composed of four tumor markers (Scc, Cyfra21-1, CEA, ProGRP and NSE), fifteen blood biochemical indices (GLU, IBIL, K, CL, Ur, NA, TBA, CHOL, SA, TG, A/G, AST, CA, CREA and CRP), six routine blood indices (EO#, EO%, MCV, RDW-S, LY# and MPV) and four coagulation indices (APTT ratio, APTT, PTA, TT ratio). This model presented a robust and stable classification performance, which can easily differentiate the comorbidity group from the isolated TB group with AUC, ACC, sensitivity and specificity of 0.8817, 0.8654, 0.8594 and 0.8656 for the training set, respectively. Overall, this work may provide a novel strategy for identifying the TB patients with lung cancer from routine admission lab examination with advantages of being timely and economical. It also indicated that our model with enough indices may further increase the effectiveness and efficiency of diagnosis.


Author(s):  
Ting Jin ◽  
Nam D Nguyen ◽  
Flaminia Talos ◽  
Daifeng Wang

Abstract Motivation Gene expression and regulation, a key molecular mechanism driving human disease development, remains elusive, especially at early stages. Integrating the increasing amount of population-level genomic data and understanding gene regulatory mechanisms in disease development are still challenging. Machine learning has emerged to solve this, but many machine learning methods were typically limited to building an accurate prediction model as a ‘black box’, barely providing biological and clinical interpretability from the box. Results To address these challenges, we developed an interpretable and scalable machine learning model, ECMarker, to predict gene expression biomarkers for disease phenotypes and simultaneously reveal underlying regulatory mechanisms. Particularly, ECMarker is built on the integration of semi- and discriminative-restricted Boltzmann machines, a neural network model for classification allowing lateral connections at the input gene layer. This interpretable model is scalable without needing any prior feature selection and enables directly modeling and prioritizing genes and revealing potential gene networks (from lateral connections) for the phenotypes. With application to the gene expression data of non-small-cell lung cancer patients, we found that ECMarker not only achieved a relatively high accuracy for predicting cancer stages but also identified the biomarker genes and gene networks implying the regulatory mechanisms in the lung cancer development. In addition, ECMarker demonstrates clinical interpretability as its prioritized biomarker genes can predict survival rates of early lung cancer patients (P-value &lt; 0.005). Finally, we identified a number of drugs currently in clinical use for late stages or other cancers with effects on these early lung cancer biomarkers, suggesting potential novel candidates on early cancer medicine. Availabilityand implementation ECMarker is open source as a general-purpose tool at https://github.com/daifengwanglab/ECMarker. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Manesh Chawla ◽  
Amreek Singh

Abstract. Fast downslope release of snow (avalanche) is a serious hazard to people living in snow bound mountains. Released snow mass can gain sufficient momentum on its down slope path to kill humans, uproot trees and rocks, destroy buildings. Direct reduction of avalanche threat is done by building control structures to add mechanical support to snowpack and reduce or deflect downward avalanche flow. On large terrains it is economically infeasible to use these methods on each high risk site.Therefore predicting and avoiding avalanches is the only feasible method to reduce threat but sufficient snow stability data for accurate forecasting is generally unavailable and difficult to collect. Forecasters infer snow stability from their knowledge of local weather, terrain and sparsely available snowpack observations. This inference process is vulnerable to human bias therefore machine learning models are used to find patterns from past data and generate helpful outputs to minimise and quantify uncertainty in forecasting process. These machine learning techniques require long past records of avalanches which are difficult to obtain. In this paper we propose a data efficient Random Forest model to address this problem. The model can generate a descriptive forecast showing reasoning and patterns which are difficult to observe manually. Our model advances the field by being inexpensive and convenient for operational forecasting due to its data efficiency, ease of automation and ability to describe its decisions.


Author(s):  
Qian Zhao ◽  
Ning Xu ◽  
Hui Guo ◽  
Jianguo Li

Background: Sepsis is a life-threatening disease caused by the dysregulated host response to the infection, and being the major cause of death to patients in intensive care unit (ICU). Objective: Early diagnosis of sepsis could significantly reduce in-hospital mortality. Though generated from infection, the development of sepsis follows its own psychological process and disciplines, alters with gender, health status and other factors. Hence, the analysis of mass data by bioinformatic tools and machine learning is a promising method for exploring early diagnosis manners. Method: We collected miRNA and mRNA expression data of sepsis blood samples from Gene Expression Omnibus (GEO) and ArrayExpress databases, screened out differentially expressed genes (DEGs) by R software, predicted miRNA targets on TargetScanHuman and miRTarBase websites, conducted Gene Ontology (GO) term and KEGG pathway enrichment based on overlapping DEGs. The STRING database and Cytoscape were used to build protein-protein interaction (PPI) network and predict hub genes. Then we constructed a Random Forest model by using the hub genes to assess sample type. Results: Bioinformatic analysis of GEO dataset revealed 46 overlapping DEGs in sepsis. The PPI network analysis identified five hub genes, SOCS3, KBTBD6, FBXL5, FEM1C and WSB1. Random Forest model based on these five hub genes was used to assess GSE95233 and GSE95233 datasets, and the area under curve (AUC) of ROC are 0.900 and 0.7988, respectively, which confirmed the efficacy of this model. Conclusion: The integrated analysis of gene expression in sepsis and the effective Random Forest model built in this study may provide promising diagnostic methods for sepsis.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Eun Kyung Park ◽  
Kwang-sig Lee ◽  
Bo Kyoung Seo ◽  
Kyu Ran Cho ◽  
Ok Hee Woo ◽  
...  

AbstractRadiogenomics investigates the relationship between imaging phenotypes and genetic expression. Breast cancer is a heterogeneous disease that manifests complex genetic changes and various prognosis and treatment response. We investigate the value of machine learning approaches to radiogenomics using low-dose perfusion computed tomography (CT) to predict prognostic biomarkers and molecular subtypes of invasive breast cancer. This prospective study enrolled a total of 723 cases involving 241 patients with invasive breast cancer. The 18 CT parameters of cancers were analyzed using 5 machine learning models to predict lymph node status, tumor grade, tumor size, hormone receptors, HER2, Ki67, and the molecular subtypes. The random forest model was the best model in terms of accuracy and the area under the receiver-operating characteristic curve (AUC). On average, the random forest model had 13% higher accuracy and 0.17 higher AUC than the logistic regression. The most important CT parameters in the random forest model for prediction were peak enhancement intensity (Hounsfield units), time to peak (seconds), blood volume permeability (mL/100 g), and perfusion of tumor (mL/min per 100 mL). Machine learning approaches to radiogenomics using low-dose perfusion breast CT is a useful noninvasive tool for predicting prognostic biomarkers and molecular subtypes of invasive breast cancer.


Sign in / Sign up

Export Citation Format

Share Document