A Deep Learning Ensemble Approach for Automated COVID-19 Detection from Chest CT Images

Gaetano Zazzaro; Francesco Martone; Gianpaolo Romano; Luigi Pavone

doi:10.3390/jcm10245982

A Deep Learning Ensemble Approach for Automated COVID-19 Detection from Chest CT Images

Journal of Clinical Medicine ◽

10.3390/jcm10245982 ◽

2021 ◽

Vol 10 (24) ◽

pp. 5982

Author(s):

Gaetano Zazzaro ◽

Francesco Martone ◽

Gianpaolo Romano ◽

Luigi Pavone

Keyword(s):

High Performance ◽

Information Gain ◽

Ct Images ◽

Classification Performance ◽

Majority Voting ◽

K Nearest Neighbors ◽

Complete Dataset ◽

Learning Technique ◽

Feasible Option ◽

Fold Cross Validation

Background: The aim of this study was to evaluate the performance of an automated COVID-19 detection method based on a transfer learning technique that makes use of chest computed tomography (CT) images. Method: In this study, we used a publicly available multiclass CT scan dataset containing 4171 CT scans of 210 different patients. In particular, we extracted features from the CT images using a set of convolutional neural networks (CNNs) that had been pretrained on the ImageNet dataset as feature extractors, and we then selected a subset of these features using the Information Gain filter. The resulting feature vectors were then used to train a set of k Nearest Neighbors classifiers with 10-fold cross validation to assess the classification performance of the features that had been extracted by each CNN. Finally, a majority voting approach was used to classify each image into two different classes: COVID-19 and NO COVID-19. Results: A total of 414 images of the test set (10% of the complete dataset) were correctly classified, and only 4 were misclassified, yielding a final classification accuracy of 99.04%. Conclusions: The high performance that was achieved by the method could make it feasible option that could be used to assist radiologists in COVID-19 diagnosis through the use of CT images.

Use of Machine Learning to Investigate the Quantitative Checklist for Autism in Toddlers (Q-CHAT) towards Early Autism Screening

Diagnostics ◽

10.3390/diagnostics11030574 ◽

2021 ◽

Vol 11 (3) ◽

pp. 574

Author(s):

Gennaro Tartarisco ◽

Giovanni Cicceri ◽

Davide Di Pietro ◽

Elisa Leonardi ◽

Stefania Aiello ◽

...

Keyword(s):

Machine Learning ◽

High Performance ◽

Behavioral Science ◽

Autistic Traits ◽

Classification Performance ◽

Recursive Feature Elimination ◽

Diagnostic Tools ◽

Support Vector ◽

K Nearest Neighbors ◽

Autism Screening

In the past two decades, several screening instruments were developed to detect toddlers who may be autistic both in clinical and unselected samples. Among others, the Quantitative CHecklist for Autism in Toddlers (Q-CHAT) is a quantitative and normally distributed measure of autistic traits that demonstrates good psychometric properties in different settings and cultures. Recently, machine learning (ML) has been applied to behavioral science to improve the classification performance of autism screening and diagnostic tools, but mainly in children, adolescents, and adults. In this study, we used ML to investigate the accuracy and reliability of the Q-CHAT in discriminating young autistic children from those without. Five different ML algorithms (random forest (RF), naïve Bayes (NB), support vector machine (SVM), logistic regression (LR), and K-nearest neighbors (KNN)) were applied to investigate the complete set of Q-CHAT items. Our results showed that ML achieved an overall accuracy of 90%, and the SVM was the most effective, being able to classify autism with 95% accuracy. Furthermore, using the SVM–recursive feature elimination (RFE) approach, we selected a subset of 14 items ensuring 91% accuracy, while 83% accuracy was obtained from the 3 best discriminating items in common to ours and the previously reported Q-CHAT-10. This evidence confirms the high performance and cross-cultural validity of the Q-CHAT, and supports the application of ML to create shorter and faster versions of the instrument, maintaining high classification accuracy, to be used as a quick, easy, and high-performance tool in primary-care settings.

Collaborative Classification Approach for Airline Tweets Using Sentiment Analysis

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i3.1639 ◽

2021 ◽

Vol 12 (3) ◽

pp. 3597-3603

Author(s):

M.Veera Kumari Et.al

Keyword(s):

Sentiment Analysis ◽

Cross Validation ◽

Majority Voting ◽

Quality Of Services ◽

Classification Methods ◽

K Nearest Neighbors ◽

Classification Techniques ◽

Improve Accuracy ◽

Fold Cross Validation

In the world there are so many airline services which facilitate different airline facilities for their customers. Those airline services may satisfy or may not satisfy their customers. Customers cannot express their comments immediately, so airline services provide the twitter blog to give the feedback on their services. Twitter has been increased to develop the quality of services[4]. This paper develop the different classification techniques to improve accuracy for sentiment analysis. The tweets of services are classified into three polarities such as positive, negative and neutral. Classification methods are Random forest(RF), Logistic Regression(LR), K-Nearest Neighbors(KNN), Naïve Baye’s(NB), Decision Tree(DTC), Extreme Gradient Boost(XGB), merging of (two, three and four) classification techniques with majority Voting Classifier, AdaBoost measuring the accuracy achieved by the function using 20-fold and 30-fold cross validation was compassed in the validation phase. In this paper proposes a new ensemble Bagging approach for different classifiers[10]. The metrics of sentiment analysis precision, recall, f1-score, micro average, macro average and accuracy are discovered for all above mentioned classification techniques. In addition average predictions of classifiers and also accuracy of average predictions of classifiers was calculated for getting good quality of services. The result describes that bagging classifiers achieve better accuracy than non-bagging classifiers.

Data Attribute Selection with Information Gain to Improve Credit Approval Classification Performance using K-Nearest Neighbor Algorithm

International Journal of Islamic Business and Economics (IJIBEC) ◽

10.28918/ijibec.v1i1.882 ◽

2017 ◽

pp. 13

Author(s):

Ivandari Ivandari ◽

Tria Titiani Chasanah ◽

Sattriedi Wahyu Binabar ◽

M. Adib Adib Al Karomi

Keyword(s):

Credit Card ◽

Nearest Neighbor ◽

Information Gain ◽

Classification Performance ◽

Attribute Selection ◽

K Nearest Neighbor ◽

K Nearest Neighbors ◽

Community Needs ◽

Public Data ◽

Calculation Results

Credit is one of the modern economic behaviors. In practice, credit can be either borrowing a certain amount of money or purchasing goods with a gradual payment process and within an agreed timeframe. Economic conditions that are less supportive and high community needs make people choose to buy goods with this credit process. Unfortunately the high needs sometimes are not in line with the ability to make payments in accordance with the initial agreement. Such condition causes the payment process to be disrupted or also called the term “bad credit”. This research uses public data of credit card dataset from UCI repository and private data that is dataset of credit approval from local banking. The information gain algorithm is used to calculate the weights of each of the attributes. From the calculation results note that all attributes have different weights. This study resulted in the conclusion that not all data attributes influence the classification result. Suppose attribute A1 to UCI dataset as well as loan type attribute on local dataset that has information gain weight 0 (zero). The result of classification using K-Nearest Neighbors algorithm shows that there is an increase of 7.53% for UCI dataset and 3.26% for local dataset after feature selection on both datasets.

Vulnerability in Deep Transfer Learning Models to Adversarial Fast Gradient Sign Attack for COVID-19 Prediction from Chest Radiography Images

Applied Sciences ◽

10.3390/app11094233 ◽

2021 ◽

Vol 11 (9) ◽

pp. 4233

Author(s):

Biprodip Pal ◽

Debashis Gupta ◽

Md. Rashed-Al-Mahfuz ◽

Salem A. Alyami ◽

Mohammad Ali Moni

Keyword(s):

Deep Learning ◽

Transfer Learning ◽

Chest Radiography ◽

High Sensitivity ◽

Ct Images ◽

Classification Performance ◽

Learning Models ◽

X Ray ◽

Image Capturing ◽

Fast Gradient

The COVID-19 pandemic requires the rapid isolation of infected patients. Thus, high-sensitivity radiology images could be a key technique to diagnose patients besides the polymerase chain reaction approach. Deep learning algorithms are proposed in several studies to detect COVID-19 symptoms due to the success in chest radiography image classification, cost efficiency, lack of expert radiologists, and the need for faster processing in the pandemic area. Most of the promising algorithms proposed in different studies are based on pre-trained deep learning models. Such open-source models and lack of variation in the radiology image-capturing environment make the diagnosis system vulnerable to adversarial attacks such as fast gradient sign method (FGSM) attack. This study therefore explored the potential vulnerability of pre-trained convolutional neural network algorithms to the FGSM attack in terms of two frequently used models, VGG16 and Inception-v3. Firstly, we developed two transfer learning models for X-ray and CT image-based COVID-19 classification and analyzed the performance extensively in terms of accuracy, precision, recall, and AUC. Secondly, our study illustrates that misclassification can occur with a very minor perturbation magnitude, such as 0.009 and 0.003 for the FGSM attack in these models for X-ray and CT images, respectively, without any effect on the visual perceptibility of the perturbation. In addition, we demonstrated that successful FGSM attack can decrease the classification performance to 16.67% and 55.56% for X-ray images, as well as 36% and 40% in the case of CT images for VGG16 and Inception-v3, respectively, without any human-recognizable perturbation effects in the adversarial images. Finally, we analyzed that correct class probability of any test image which is supposed to be 1, can drop for both considered models and with increased perturbation; it can drop to 0.24 and 0.17 for the VGG16 model in cases of X-ray and CT images, respectively. Thus, despite the need for data sharing and automated diagnosis, practical deployment of such program requires more robustness.

A new approach: information gain algorithm-based k-nearest neighbors hybrid diagnostic system for Parkinson’s disease

Physical and Engineering Sciences in Medicine ◽

10.1007/s13246-021-01001-6 ◽

2021 ◽

Author(s):

Cüneyt Yücelbaş

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Information Gain ◽

Diagnostic System ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

New Approach

Metalearning approach for leukemia informative genes prioritization

Journal of Integrative Bioinformatics ◽

10.1515/jib-2019-0069 ◽

2020 ◽

Vol 17 (1) ◽

Author(s):

Vânia Rodrigues ◽

Sérgio Deusdado

Keyword(s):

High Performance ◽

Information Gain ◽

Pearson Correlation ◽

Microarray Gene Expression Data ◽

Microarray Gene Expression ◽

The Public ◽

Leukemia Diagnosis ◽

Chi Squared ◽

Logistic Regression Method ◽

Single Attribute

AbstractThe discovery of diagnostic or prognostic biomarkers is fundamental to optimize therapeutics for patients. By enhancing the interpretability of the prediction model, this work is aimed to optimize Leukemia diagnosis while retaining a high-performance evaluation in the identification of informative genes. For this purpose, we used an optimal parameterization of Kernel Logistic Regression method on Leukemia microarray gene expression data classification, applying metalearners to select attributes, reducing the data dimensionality before passing it to the classifier. Pearson correlation and chi-squared statistic were the attribute evaluators applied on metalearners, having information gain as single-attribute evaluator. The implemented models relied on 10-fold cross-validation. The metalearners approach identified 12 common genes, with highest average merit of 0.999. The practical work was developed using the public datamining software WEKA.

Combining a convolutional neural network with autoencoders to predict the survival chance of COVID-19 patients

Scientific Reports ◽

10.1038/s41598-021-93543-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Fahime Khozeimeh ◽

Danial Sharifrazi ◽

Navid Hoseini Izadi ◽

Javad Hassannataj Joloudari ◽

Afshin Shoeibi ◽

...

Keyword(s):

Clinical Data ◽

Data Augmentation ◽

Clinical Information ◽

Ct Images ◽

Classification Performance ◽

Survival Chance ◽

Average Accuracy ◽

Novel Method ◽

Aided Diagnosis ◽

Augmentation Procedure

AbstractCOVID-19 has caused many deaths worldwide. The automation of the diagnosis of this virus is highly desired. Convolutional neural networks (CNNs) have shown outstanding classification performance on image datasets. To date, it appears that COVID computer-aided diagnosis systems based on CNNs and clinical information have not yet been analysed or explored. We propose a novel method, named the CNN-AE, to predict the survival chance of COVID-19 patients using a CNN trained with clinical information. Notably, the required resources to prepare CT images are expensive and limited compared to those required to collect clinical data, such as blood pressure, liver disease, etc. We evaluated our method using a publicly available clinical dataset that we collected. The dataset properties were carefully analysed to extract important features and compute the correlations of features. A data augmentation procedure based on autoencoders (AEs) was proposed to balance the dataset. The experimental results revealed that the average accuracy of the CNN-AE (96.05%) was higher than that of the CNN (92.49%). To demonstrate the generality of our augmentation method, we trained some existing mortality risk prediction methods on our dataset (with and without data augmentation) and compared their performances. We also evaluated our method using another dataset for further generality verification. To show that clinical data can be used for COVID-19 survival chance prediction, the CNN-AE was compared with multiple pre-trained deep models that were tuned based on CT images.

Feature Selection using Genetic Programming

Zambia ICT Journal ◽

10.33260/zictjournal.v3i2.62 ◽

2019 ◽

Vol 3 (2) ◽

pp. 11-18

Author(s):

George Mweshi

Keyword(s):

Feature Selection ◽

Genetic Programming ◽

Information Gain ◽

Principal Component ◽

Search Space ◽

Classification Performance ◽

The Other ◽

Searching Strategy ◽

Mining Algorithms ◽

Feature Selection Techniques

Extracting useful and novel information from the large amount of collected data has become a necessity for corporations wishing to maintain a competitive advantage. One of the biggest issues in handling these significantly large datasets is the curse of dimensionality. As the dimension of the data increases, the performance of the data mining algorithms employed to mine the data deteriorates. This deterioration is mainly caused by the large search space created as a result of having irrelevant, noisy and redundant features in the data. Feature selection is one of the various techniques that can be used to remove these unnecessary features. Feature selection consequently reduces the dimension of the data as well as the search space which in turn increases the efficiency and the accuracy of the mining algorithms. In this paper, we investigate the ability of Genetic Programming (GP), an evolutionary algorithm searching strategy capable of automatically finding solutions in complex and large search spaces, to perform feature selection. We implement a basic GP algorithm and perform feature selection on 5 benchmark classification datasets from UCI repository. To test the competitiveness and feasibility of the GP approach, we examine the classification performance of four classifiers namely J48, Naives Bayes, PART, and Random Forests using the GP selected features, all the original features and the features selected by the other commonly used feature selection techniques i.e. principal component analysis, information gain, relief-f and cfs. The experimental results show that not only does GP select a smaller set of features from the original features, classifiers using GP selected features achieve a better classification performance than using all the original features. Furthermore, compared to the other well-known feature selection techniques, GP achieves very competitive results.

A new data classification improvement approach based on kernel clustering

Journal of Physics Conference Series ◽

10.1088/1742-6596/2082/1/012021 ◽

2021 ◽

Vol 2082 (1) ◽

pp. 012021

Author(s):

Bingsen Guo

Keyword(s):

High Performance ◽

Real Life ◽

Data Classification ◽

Classification Performance ◽

Training Set ◽

Kernel Clustering ◽

Training Samples ◽

Critical Issues ◽

Extensive Performance ◽

Query Sample

Abstract Data classification is one of the most critical issues in data mining with a large number of real-life applications. In many practical classification issues, there are various forms of anomalies in the real dataset. For example, the training set contains outliers, often enough to confuse the classifier and reduce its ability to learn from the data. In this paper, we propose a new data classification improvement approach based on kernel clustering. The proposed method can improve the classification performance by optimizing the training set. We first use the existing kernel clustering method to cluster the training set and optimize it based on the similarity between the training samples in each class and the corresponding class center. Then, the optimized reliable training set is trained to the standard classifier in the kernel space to classify each query sample. Extensive performance analysis shows that the proposed method achieves high performance, thus improving the classifier’s effectiveness.

A Quantitative Assessment of Pre-Operative MRI Reports in Glioma Patients: Report Metrics and IDH Prediction Ability

Frontiers in Oncology ◽

10.3389/fonc.2020.600327 ◽

2021 ◽

Vol 10 ◽

Author(s):

Hang Cao ◽

E. Zeynep Erson-Omay ◽

Murat Günel ◽

Jennifer Moliterno ◽

Robert K. Fulbright

Keyword(s):

High Performance ◽

Prediction Models ◽

Rank Correlation ◽

Support Vector ◽

Wild Type ◽

K Nearest Neighbors ◽

Prediction Ability ◽

Spearman’S Rank Correlation ◽

Negative Findings ◽

T1 Contrast

ObjectivesTo measure the metrics of glioma pre-operative MRI reports and build IDH prediction models.MethodsPre-operative MRI reports of 144 glioma patients in a single institution were collected retrospectively. Words were transformed to lowercase letters. White spaces, punctuations, and stop words were removed. Stemming was performed. A word cloud method applied to processed text matrix visualized language behavior. Spearman’s rank correlation assessed the correlation between the subjective descriptions of the enhancement pattern. The T1-contrast images associated with enhancement descriptions were selected. The keywords associated with IDH status were evaluated by χ2 value ranking. Random forest, k-nearest neighbors and Support Vector Machine algorithms were used to train models based on report features and age. All statistical analysis used two-tailed test with significance at p <.05.ResultsLonger word counts occurred in reports of older patients, higher grade gliomas, and wild type IDH gliomas. We identified 30 glioma enhancement descriptions, eight of which were commonly used: peripheral, heterogeneous, irregular, nodular, thick, rim, large, and ring. Five of eight patterns were correlated. IDH mutant tumors were characterized by words related to normal, symmetric or negative findings. IDH wild type tumors were characterized words by related to pathological MR findings like enhancement, necrosis and FLAIR foci. An integrated KNN model based on report features and age demonstrated high-performance (AUC: 0.89, 95% CI: 0.88–0.90).ConclusionReport length depended on age, glioma grade, and IDH status. Description of glioma enhancement was varied. Report descriptions differed for IDH wild and mutant gliomas. Report features can be used to predict glioma IDH status.