Explainable Machine Learning Framework for Image Classification Problems: Case Study on Glioma Cancer Prediction

Emmanuel Pintelas; Meletis Liaskos; Ioannis E. Livieris; Sotiris Kotsiantis; Panagiotis Pintelas

doi:10.3390/jimaging6060037

Explainable Machine Learning Framework for Image Classification Problems: Case Study on Glioma Cancer Prediction

Journal of Imaging ◽

10.3390/jimaging6060037 ◽

2020 ◽

Vol 6 (6) ◽

pp. 37

Author(s):

Emmanuel Pintelas ◽

Meletis Liaskos ◽

Ioannis E. Livieris ◽

Sotiris Kotsiantis ◽

Panagiotis Pintelas

Keyword(s):

Machine Learning ◽

Image Classification ◽

Prediction Accuracy ◽

Classification Problems ◽

Cancer Prediction ◽

Learning Framework ◽

Box Models ◽

Black Box Models

Image classification is a very popular machine learning domain in which deep convolutional neural networks have mainly emerged on such applications. These networks manage to achieve remarkable performance in terms of prediction accuracy but they are considered as black box models since they lack the ability to interpret their inner working mechanism and explain the main reasoning of their predictions. There is a variety of real world tasks, such as medical applications, in which interpretability and explainability play a significant role. Making decisions on critical issues such as cancer prediction utilizing black box models in order to achieve high prediction accuracy but without provision for any sort of explanation for its prediction, accuracy cannot be considered as sufficient and ethnically acceptable. Reasoning and explanation is essential in order to trust these models and support such critical predictions. Nevertheless, the definition and the validation of the quality of a prediction model’s explanation can be considered in general extremely subjective and unclear. In this work, an accurate and interpretable machine learning framework is proposed, for image classification problems able to make high quality explanations. For this task, it is developed a feature extraction and explanation extraction framework, proposing also three basic general conditions which validate the quality of any model’s prediction explanation for any application domain. The feature extraction framework will extract and create transparent and meaningful high level features for images, while the explanation extraction framework will be responsible for creating good explanations relying on these extracted features and the prediction model’s inner function with respect to the proposed conditions. As a case study application, brain tumor magnetic resonance images were utilized for predicting glioma cancer. Our results demonstrate the efficiency of the proposed model since it managed to achieve sufficient prediction accuracy being also interpretable and explainable in simple human terms.

Download Full-text

A Personalized Machine-Learning-Enabled Method for Efficient Research in Ethnopharmacology. The Case of the Southern Balkans and the Coastal Zone of Asia Minor

Applied Sciences ◽

10.3390/app11135826 ◽

2021 ◽

Vol 11 (13) ◽

pp. 5826

Author(s):

Evangelos Axiotis ◽

Andreas Kontogiannis ◽

Eleftherios Kalpoutzakis ◽

George Giannakopoulos

Keyword(s):

Machine Learning ◽

Coastal Zone ◽

Extraction Process ◽

Asia Minor ◽

Efficiency And Effectiveness ◽

Effectiveness And Efficiency ◽

Intelligent Tools ◽

Southern Balkans

Ethnopharmacology experts face several challenges when identifying and retrieving documents and resources related to their scientific focus. The volume of sources that need to be monitored, the variety of formats utilized, and the different quality of language use across sources present some of what we call “big data” challenges in the analysis of this data. This study aims to understand if and how experts can be supported effectively through intelligent tools in the task of ethnopharmacological literature research. To this end, we utilize a real case study of ethnopharmacology research aimed at the southern Balkans and the coastal zone of Asia Minor. Thus, we propose a methodology for more efficient research in ethnopharmacology. Our work follows an “expert–apprentice” paradigm in an automatic URL extraction process, through crawling, where the apprentice is a machine learning (ML) algorithm, utilizing a combination of active learning (AL) and reinforcement learning (RL), and the expert is the human researcher. ML-powered research improved the effectiveness and efficiency of the domain expert by 3.1 and 5.14 times, respectively, fetching a total number of 420 relevant ethnopharmacological documents in only 7 h versus an estimated 36 h of human-expert effort. Therefore, utilizing artificial intelligence (AI) tools to support the researcher can boost the efficiency and effectiveness of the identification and retrieval of appropriate documents.

Download Full-text

The Role of Textualisation and Argumentation in Understanding the Machine Learning Process

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/765 ◽

2017 ◽

Author(s):

Kacper Sokol ◽

Peter Flach

Keyword(s):

Machine Learning ◽

Predictive Accuracy ◽

Spatial Perception ◽

Black Box ◽

High Dimensional ◽

Box Models ◽

Machine Learning Applications ◽

Black Box Models ◽

Machine Learning Models

Understanding data, models and predictions is important for machine learning applications. Due to the limitations of our spatial perception and intuition, analysing high-dimensional data is inherently difficult. Furthermore, black-box models achieving high predictive accuracy are widely used, yet the logic behind their predictions is often opaque. Use of textualisation -- a natural language narrative of selected phenomena -- can tackle these shortcomings. When extended with argumentation theory we could envisage machine learning models and predictions arguing persuasively for their choices.

Download Full-text

Data science in economics: comprehensive review of advanced machine learning and deep learning methods

10.21203/rs.3.rs-91905/v1 ◽

2020 ◽

Author(s):

Saeed Nosratabadi ◽

Amir Mosavi ◽

Puhong Duan ◽

Pedram Ghamisi ◽

Filip Ferdinand ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Prediction Accuracy ◽

Data Science ◽

State Of The Art ◽

Hybrid Models ◽

The Other ◽

Learning Models ◽

Comprehensive Review

Abstract This paper provides the state of the art of data science in economics. Through a novel taxonomy of applications and methods advances in data science are investigated. The data science advances are investigated in three individual classes of deep learning models, ensemble models, and hybrid models. Application domains include stock market, marketing, E-commerce, corporate banking, and cryptocurrency. Prisma method, a systematic literature review methodology is used to ensure the quality of the survey. The findings revealed that the trends are on advancement of hybrid models as more than 51% of the reviewed articles applied hybrid model. On the other hand, it is found that based on the RMSE accuracy metric, hybrid models had higher prediction accuracy than other algorithms. While it is expected the trends go toward the advancements of deep learning models.

Download Full-text

Combine Clustering and Machine Learning for Enhancing the Efficiency of Energy Baseline of Chiller System

Energies ◽

10.3390/en13174368 ◽

2020 ◽

Vol 13 (17) ◽

pp. 4368 ◽

Cited By ~ 1

Author(s):

Chun-Wei Chen ◽

Chun-Chang Li ◽

Chen-Yu Lin

Keyword(s):

Machine Learning ◽

Prediction Accuracy ◽

Prediction Models ◽

Machine Learning Algorithms ◽

Learning Models ◽

Important Method ◽

Gap Statistic ◽

Machine Learning Model ◽

Key Variables

Energy baseline is an important method for measuring the energy-saving benefits of chiller system, and the benefits can be calculated by comparing prediction models and actual results. Currently, machine learning is often adopted as a prediction model for energy baselines. Common models include regression, ensemble learning, and deep learning models. In this study, we first reviewed several machine learning algorithms, which were used to establish prediction models. Then, the concept of clustering to preprocess chiller data was adopted. Data mining, K-means clustering, and gap statistic were used to successfully identify the critical variables to cluster chiller modes. Applying these key variables effectively enhanced the quality of the chiller data, and combining the clustering results and the machine learning model effectively improved the prediction accuracy of the model and the reliability of the energy baselines.

Download Full-text

Interpreting Undesirable Pixels for Image Classification on Black-Box Models

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW) ◽

10.1109/iccvw.2019.00523 ◽

2019 ◽

Author(s):

Sin-Han Kang ◽

Hong-Gyu Jung ◽

Seong-Whan Lee

Keyword(s):

Image Classification ◽

Black Box ◽

Box Models ◽

Black Box Models

Download Full-text

PISIoT: A Machine Learning and IoT-Based Smart Health Platform for Overweight and Obesity Control

Applied Sciences ◽

10.3390/app9153037 ◽

2019 ◽

Vol 9 (15) ◽

pp. 3037 ◽

Cited By ~ 1

Author(s):

Isaac Machorro-Cano ◽

Giner Alor-Hernández ◽

Mario Andrés Paredes-Valverde ◽

Uriel Ramos-Deonati ◽

José Luis Sánchez-Cervantes ◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Overweight And Obesity ◽

Process Data ◽

Associated Conditions ◽

Smart Health ◽

Medical Recommendations ◽

And Control

Overweight and obesity are affecting productivity and quality of life worldwide. The Internet of Things (IoT) makes it possible to interconnect, detect, identify, and process data between objects or services to fulfill a common objective. The main advantages of IoT in healthcare are the monitoring, analysis, diagnosis, and control of conditions such as overweight and obesity and the generation of recommendations to prevent them. However, the objects used in the IoT have limited resources, so it has become necessary to consider other alternatives to analyze the data generated from monitoring, analysis, diagnosis, control, and the generation of recommendations, such as machine learning. This work presents PISIoT: a machine learning and IoT-based smart health platform for the prevention, detection, treatment, and control of overweight and obesity, and other associated conditions or health problems. Weka API and the J48 machine learning algorithm were used to identify critical variables and classify patients, while Apache Mahout and RuleML were used to generate medical recommendations. Finally, to validate the PISIoT platform, we present a case study on the prevention of myocardial infarction in elderly patients with obesity by monitoring biomedical variables.

Download Full-text

Wrapping practical problems into a machine learning framework: using water pipe failure prediction as a case study

International Journal of Intelligent Systems Technologies and Applications ◽

10.1504/ijista.2017.10005998 ◽

2017 ◽

Vol 16 (3) ◽

pp. 191

Author(s):

Fang Chen ◽

Jianlong Zhou ◽

Jinjun Sun ◽

Yang Wang

Keyword(s):

Machine Learning ◽

Failure Prediction ◽

Water Pipe ◽

Pipe Failure ◽

Learning Framework

Download Full-text

Iterative Guided Machine Learning-Assisted Systematic Literature Reviews: A Diabetes Case Study

10.21203/rs.3.rs-48078/v2 ◽

2021 ◽

Author(s):

John Zimmerman ◽

Robin Soler ◽

Lavinder James ◽

Murphy Sarah ◽

Atkins Charisma ◽

...

Keyword(s):

Machine Learning ◽

Scientific Literature ◽

Relevant Information ◽

Text Analytics ◽

Inclusion Criteria ◽

Literature Reviews ◽

Study Test ◽

Model Training

Abstract Background: Systematic Reviews (SR), studies of studies, use a formal process to evaluate the quality of scientific literature and determine ensuing effectiveness from qualifying articles to establish consensus findings around a hypothesis. Their value is increasing as the conduct and publication of research and evaluation has expanded and the process of identifying key insights becomes more time consuming. Text analytics and Machine Learning (ML) techniques may help overcome this problem of scale while still maintaining the level of rigor expected of SRs.Methods: In this article, we discuss an approach that uses existing examples of SRs to build and test a method for assisting the SR title and abstract pre-screening by reducing the initial pool of potential articles down to articles that meet inclusion criteria. Our approach differs from previous approaches to using ML as a SR tool in that it incorporates ML configurations guided by previously conducted SRs, and human confirmation on ML predictions of relevant articles during multiple iterative reviews on smaller tranches of citations. We applied the tailored method to a new SR review effort to validate performance. Results: the case study test of the approach proved a sensitivity (recall) in finding relevant articles during down selection that may rival many traditional processes. Conclusions: We believe this iterative method can help overcome bias in initial ML model training by having humans reinforce ML models with new and relevant information, and is an applied step towards transfer learning for ML in SR.

Download Full-text

Data Science in Economics: Comprehensive Review of Advanced Machine Learning and Deep Learning Methods

10.20944/preprints202010.0263.v1 ◽

2020 ◽

Author(s):

Saeed Nosratabadi ◽

Amir Mosavi ◽

Puhong Duan ◽

Pedram Ghamisi ◽

Filip Ferdinand ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Prediction Accuracy ◽

Data Science ◽

State Of The Art ◽

Hybrid Models ◽

The Other ◽

Learning Models ◽

Comprehensive Review

This paper provides the state of the art of data science in economics. Through a novel taxonomy of applications and methods advances in data science are investigated. The data science advances are investigated in three individual classes of deep learning models, ensemble models, and hybrid models. Application domains include stock market, marketing, E-commerce, corporate banking, and cryptocurrency. Prisma method, a systematic literature review methodology is used to ensure the quality of the survey. The findings revealed that the trends are on advancement of hybrid models as more than 51% of the reviewed articles applied hybrid model. On the other hand, it is found that based on the RMSE accuracy metric, hybrid models had higher prediction accuracy than other algorithms. While it is expected the trends go toward the advancements of deep learning models.

Download Full-text

Improving the Accuracy of Convolutional Neural Networks by Identifying and Removing Outlier Images in Datasets Using t-SNE

Mathematics ◽

10.3390/math8050662 ◽

2020 ◽

Vol 8 (5) ◽

pp. 662 ◽

Cited By ~ 3

Author(s):

Husein Perez ◽

Joseph H. M. Tah

Keyword(s):

Machine Learning ◽

Density Distribution ◽

Image Classification ◽

High Dimensional Data ◽

Supervised Machine Learning ◽

Learning Problems ◽

High Dimensional ◽

Feature Engineering ◽

Outlier Data

In the field of supervised machine learning, the quality of a classifier model is directly correlated with the quality of the data that is used to train the model. The presence of unwanted outliers in the data could significantly reduce the accuracy of a model or, even worse, result in a biased model leading to an inaccurate classification. Identifying the presence of outliers and eliminating them is, therefore, crucial for building good quality training datasets. Pre-processing procedures for dealing with missing and outlier data, commonly known as feature engineering, are standard practice in machine learning problems. They help to make better assumptions about the data and also prepare datasets in a way that best expose the underlying problem to the machine learning algorithms. In this work, we propose a multistage method for detecting and removing outliers in high-dimensional data. Our proposed method is based on utilising a technique called t-distributed stochastic neighbour embedding (t-SNE) to reduce high-dimensional map of features into a lower, two-dimensional, probability density distribution and then use a simple descriptive statistical method called interquartile range (IQR) to identifying any outlier values from the density distribution of the features. t-SNE is a machine learning algorithm and a nonlinear dimensionality reduction technique well-suited for embedding high-dimensional data for visualisation in a low-dimensional space of two or three dimensions. We applied this method on a dataset containing images for training a convolutional neural network model (ConvNet) for an image classification problem. The dataset contains four different classes of images: three classes contain defects in construction (mould, stain, and paint deterioration) and a no-defect class (normal). We used the transfer learning technique to modify a pre-trained VGG-16 model. We used this model as a feature extractor and as a benchmark to evaluate our method. We have shown that, when using this method, we can identify and remove the outlier images in the dataset. After removing the outlier images from the dataset and re-training the VGG-16 model, the results have also shown that the accuracy of the classification has significantly improved and the number of misclassified cases has also dropped. While many feature engineering techniques for handling missing and outlier data are common in predictive machine learning problems involving numerical or categorical data, there is little work on developing techniques for handling outliers in high-dimensional data which can be used to improve the quality of machine learning problems involving images such as ConvNet models for image classification and object detection problems.

Download Full-text