Comparison of Machine Learning Classifiers to Predict Patient Survival and Genetics of High-Grade Glioma: Towards a Standardized Model for Clinical Implementation (Preprint)

2021 ◽  
Author(s):  
Luca Pasquini ◽  
Antonio Napolitano ◽  
Martina Lucignani ◽  
Emanuela Tagliente ◽  
Francesco Dellepiane ◽  
...  

BACKGROUND Radiomic models outperform clinical data for outcome prediction in high-grade gliomas (HGG). Many machine learning (ML) radiomic models have been developed, mostly employing single classifiers with variable results. However, comparative analyses of different ML models for clinically-relevant tasks are lacking in the literature. OBJECTIVE We aimed to compare well-established ML learning classifiers, including single and ensemble learners, to predict clinically-relevant tasks for HGG: overall survival (OS), isocitrate dehydrogenase (IDH) mutation, O-6-methylguanine-DNA-methyltransferase (MGMT) promoter methylation, epidermal growth factor receptor (EGFR) amplification and Ki-67 expression in HGG patients, based on radiomic features from conventional and advanced MRI. Our objective was to identify the best algorithm for each task in terms of accuracy of the prediction performance. METHODS 156 adult patients with pathologic diagnosis of HGG were included. Three tumoral regions were manually segmented: contrast-enhancing tumor, necrosis and non-enhancing tumor. Radiomic features were extracted with a custom version of Pyradiomics, and selected through Boruta algorithm. A Grid Search algorithm was applied when computing 4 times K-fold cross validation (K=10) to get the highest mean and lowest spread of accuracy. Model performance was assessed as Area Under The Curve-Receiver Operating Characteristics (AUC-ROC). RESULTS Ensemble classifiers showed the best performance across tasks. xGB obtained highest accuracy for OS (74.5%), AB for IDH mutation (88%), MGMT methylation (71,7%), Ki-67 expression (86,6%), and EGFRvIII amplification (81,6%). CONCLUSIONS Best performing features shed light on possible correlations between MRI and tumor histology.

2021 ◽  
Vol 11 ◽  
Author(s):  
Luca Pasquini ◽  
Antonio Napolitano ◽  
Martina Lucignani ◽  
Emanuela Tagliente ◽  
Francesco Dellepiane ◽  
...  

Radiomic models outperform clinical data for outcome prediction in high-grade gliomas (HGG). However, lack of parameter standardization limits clinical applications. Many machine learning (ML) radiomic models employ single classifiers rather than ensemble learning, which is known to boost performance, and comparative analyses are lacking in the literature. We aimed to compare ML classifiers to predict clinically relevant tasks for HGG: overall survival (OS), isocitrate dehydrogenase (IDH) mutation, O-6-methylguanine-DNA-methyltransferase (MGMT) promoter methylation, epidermal growth factor receptor vIII (EGFR) amplification, and Ki-67 expression, based on radiomic features from conventional and advanced magnetic resonance imaging (MRI). Our objective was to identify the best algorithm for each task. One hundred fifty-six adult patients with pathologic diagnosis of HGG were included. Three tumoral regions were manually segmented: contrast-enhancing tumor, necrosis, and non-enhancing tumor. Radiomic features were extracted with a custom version of Pyradiomics and selected through Boruta algorithm. A Grid Search algorithm was applied when computing ten times K-fold cross-validation (K=10) to get the highest mean and lowest spread of accuracy. Model performance was assessed as AUC-ROC curve mean values with 95% confidence intervals (CI). Extreme Gradient Boosting (xGB) obtained highest accuracy for OS (74,5%), Adaboost (AB) for IDH mutation (87.5%), MGMT methylation (70,8%), Ki-67 expression (86%), and EGFR amplification (81%). Ensemble classifiers showed the best performance across tasks. High-scoring radiomic features shed light on possible correlations between MRI and tumor histology.


Entropy ◽  
2019 ◽  
Vol 21 (11) ◽  
pp. 1078 ◽  
Author(s):  
Furqan Rustam ◽  
Imran Ashraf ◽  
Arif Mehmood ◽  
Saleem Ullah ◽  
Gyu Choi

The use of data from social networks such as Twitter has been increased during the last few years to improve political campaigns, quality of products and services, sentiment analysis, etc. Tweets classification based on user sentiments is a collaborative and important task for many organizations. This paper proposes a voting classifier (VC) to help sentiment analysis for such organizations. The VC is based on logistic regression (LR) and stochastic gradient descent classifier (SGDC) and uses a soft voting mechanism to make the final prediction. Tweets were classified into positive, negative and neutral classes based on the sentiments they contain. In addition, a variety of machine learning classifiers were evaluated using accuracy, precision, recall and F1 score as the performance metrics. The impact of feature extraction techniques, including term frequency (TF), term frequency-inverse document frequency (TF-IDF), and word2vec, on classification accuracy was investigated as well. Moreover, the performance of a deep long short-term memory (LSTM) network was analyzed on the selected dataset. The results show that the proposed VC performs better than that of other classifiers. The VC is able to achieve an accuracy of 0.789, and 0.791 with TF and TF-IDF feature extraction, respectively. The results demonstrate that ensemble classifiers achieve higher accuracy than non-ensemble classifiers. Experiments further proved that the performance of machine learning classifiers is better when TF-IDF is used as the feature extraction method. Word2vec feature extraction performs worse than TF and TF-IDF feature extraction. The LSTM achieves a lower accuracy than machine learning classifiers.


Author(s):  
S. L. Belyakov ◽  
S. М. Karpov

Current work is devoted to the problem of automatic detection of fraudulent financial transactions. The article describes the causes of fraudulent transactions their typical attributes, as well as the basic principle of detection. The concepts of fraudulent and honest transactions are defined. Examples of algorithms for determining suspicious financial transactions in antifraud systems are given. Modern approaches to monitoring and detecting cases of fraud in remote banking systems are considered. The positive and negative aspects of each approach are described. Particular attention is paid to the problem of optimal recognition of transaction classes in highly unbalanced data. Methods for solving the problem of unbalanced data are considered. The choice of means for evaluating the operation of the machine learning model is justified considering the specifics of data distribution. As a solution, we propose an approach based on the use of ensemble classifiers in conjunction with balanced sampling algorithms, the key feature of which is to create a balanced sample not for the entire classifier, but for each student in the ensemble separately. Based on data on fraud in the field of bank credit cards, a comparison is made and the best classifier is selected among such ensemble algorithms as random forest, adaptive boosting and bagging of decision trees. To create balanced subsets of evaluators of ensemble algorithms, the algorithm of random insufficient sampling is used. To search for the optimal parameters of the classifiers, the random search algorithm on the grid is used. The results of experimental comparison of the selected methods are presented. The advantages of the proposed approach are analyzed, and the boundaries of its applicability are discussed.


Sign in / Sign up

Export Citation Format

Share Document