scholarly journals Predict multicategory causes of death in lung cancer patients using clinicopathologic factors

2021 ◽  
Vol 129 ◽  
pp. 104161
Author(s):  
Fei Deng ◽  
Haijun Zhou ◽  
Yong Lin ◽  
John A. Heim ◽  
Lanlan Shen ◽  
...  
2020 ◽  
Author(s):  
Fei Deng ◽  
Haijun Zhou ◽  
Yong Lin ◽  
John Heim ◽  
Lanlan Shen ◽  
...  

Background: Random forest model is a recently developed machine-learning algorithm, and superior to other machine learning and regression models for its classification function and better accuracy. But it is rarely used for predicting causes of death in lung cancer patients. On the other hand, specific causes of death in lung cancer patients are poorly classified or predicted, largely due to its categorical nature (versus binary death/survival). Methods: We therefore tuned and employed a random forest algorithm (Stata, version 15) to classify and predict specific causes of death in lung cancer patients, using the surveillance, epidemiology and end results-18 and several clinicopathological factors. The lung cancer diagnosed during 2004 were included for the completeness in their follow-up and death causes. The patients were randomly divided into training and validation sets (1:1 match). We also compared the accuracies of the final random forest and multinomial regression models. Results: We identified and randomly selected 40,000 lung cancers for the analyses, including 20,000 cases for either set. The causes of death were, in descending ranking order, were lung cancer (72.45 %), other causes or alive (14.38%), non-lung cancer (6.87%), cardiovascular disease (5.35%), and infection (0.95%). We found more 250 iterations and the 10 variables produced the best prediction, whose best accuracy was 69.8% (error-rate 30.2%). The final random forest model with 300 iterations and 10 variables reached an accuracy higher than that of multinomial regression model (69.8% vs 64.6%). The top-10 most important factors in the random-forest model were sex, chemotherapy status, age (65+ vs <65 years), radiotherapy status, nodal status, T category, histology type and laterality, which were also independently associated with 5-category causes of death. Conclusion: We optimized a random forest model of machine learning to predict the specific cause of death in lung cancer patients using a set of clinicopathologic factors. The model also appears more accurate than multinomial regression model.


2004 ◽  
Vol 66 (6) ◽  
pp. 602-607 ◽  
Author(s):  
Miho UCHIHIRA ◽  
Takahiro EJIMA ◽  
Takao UCHIHIRA ◽  
Jun ARAKI ◽  
Toshiaki KAMEI

Sign in / Sign up

Export Citation Format

Share Document