scholarly journals Cancer Cell Profiling Using Image Moments and Neural Networks with Model Agnostic Explainability: A Case Study of Breast Cancer Histopathological (BreakHis) Database

Mathematics ◽  
2021 ◽  
Vol 9 (20) ◽  
pp. 2616
Author(s):  
Dmitry Kaplun ◽  
Alexander Krasichkov ◽  
Petr Chetyrbok ◽  
Nikolay Oleinikov ◽  
Anupam Garg ◽  
...  

With the evolution of modern digital pathology, examining cancer cell tissues has paved the way to quantify subtle symptoms, for example, by means of image staining procedures using Eosin and Hematoxylin. Cancer tissues in the case of breast and lung cancer are quite challenging to examine by manual expert analysis of patients suffering from cancer. Merely relying on the observable characteristics by histopathologists for cell profiling may under-constrain the scale and diagnostic quality due to tedious repetition with constant concentration. Thus, automatic analysis of cancer cells has been proposed with algorithmic and soft-computing techniques to leverage speed and reliability. The paper’s novelty lies in the utility of Zernike image moments to extract complex features from cancer cell images and using simple neural networks for classification, followed by explainability on the test results using the Local Interpretable Model-Agnostic Explanations (LIME) technique and Explainable Artificial Intelligence (XAI). The general workflow of the proposed high throughput strategy involves acquiring the BreakHis public dataset, which consists of microscopic images, followed by the application of image processing and machine learning techniques. The recommended technique has been mathematically substantiated and compared with the state-of-the-art to justify the empirical basis in the pursuit of our algorithmic discovery. The proposed system is able to classify malignant and benign cancer cell images of 40× resolution with 100% recognition rate. XAI interprets and reasons the test results obtained from the machine learning model, making it reliable and transparent for analysis and parameter tuning.

2021 ◽  
Author(s):  
Rogini Runghen ◽  
Daniel B Stouffer ◽  
Giulio Valentino Dalla Riva

Collecting network interaction data is difficult. Non-exhaustive sampling and complex hidden processes often result in an incomplete data set. Thus, identifying potentially present but unobserved interactions is crucial both in understanding the structure of large scale data, and in predicting how previously unseen elements will interact. Recent studies in network analysis have shown that accounting for metadata (such as node attributes) can improve both our understanding of how nodes interact with one another, and the accuracy of link prediction. However, the dimension of the object we need to learn to predict interactions in a network grows quickly with the number of nodes. Therefore, it becomes computationally and conceptually challenging for large networks. Here, we present a new predictive procedure combining a graph embedding method with machine learning techniques to predict interactions on the base of nodes' metadata. Graph embedding methods project the nodes of a network onto a---low dimensional---latent feature space. The position of the nodes in the latent feature space can then be used to predict interactions between nodes. Learning a mapping of the nodes' metadata to their position in a latent feature space corresponds to a classic---and low dimensional---machine learning problem. In our current study we used the Random Dot Product Graph model to estimate the embedding of an observed network, and we tested different neural networks architectures to predict the position of nodes in the latent feature space. Flexible machine learning techniques to map the nodes onto their latent positions allow to account for multivariate and possibly complex nodes' metadata. To illustrate the utility of the proposed procedure, we apply it to a large dataset of tourist visits to destinations across New Zealand. We found that our procedure accurately predicts interactions for both existing nodes and nodes newly added to the network, while being computationally feasible even for very large networks. Overall, our study highlights that by exploiting the properties of a well understood statistical model for complex networks and combining it with standard machine learning techniques, we can simplify the link prediction problem when incorporating multivariate node metadata. Our procedure can be immediately applied to different types of networks, and to a wide variety of data from different systems. As such, both from a network science and data science perspective, our work offers a flexible and generalisable procedure for link prediction.


2020 ◽  
Author(s):  
Georgios Kantidakis ◽  
Hein Putter ◽  
Carlo Lancia ◽  
Jacob de Boer ◽  
Andries E Braat ◽  
...  

Abstract Background: Predicting survival of recipients after liver transplantation is regarded as one of the most important challenges in contemporary medicine. Hence, improving on current prediction models is of great interest.Nowadays, there is a strong discussion in the medical field about machine learning (ML) and whether it has greater potential than traditional regression models when dealing with complex data. Criticism to ML is related to unsuitable performance measures and lack of interpretability which is important for clinicians.Methods: In this paper, ML techniques such as random forests and neural networks are applied to large data of 62294 patients from the United States with 97 predictors selected on clinical/statistical grounds, over more than 600, to predict survival from transplantation. Of particular interest is also the identification of potential risk factors. A comparison is performed between 3 different Cox models (with all variables, backward selection and LASSO) and 3 machine learning techniques: a random survival forest and 2 partial logistic artificial neural networks (PLANNs). For PLANNs, novel extensions to their original specification are tested. Emphasis is given on the advantages and pitfalls of each method and on the interpretability of the ML techniques.Results: Well-established predictive measures are employed from the survival field (C-index, Brier score and Integrated Brier Score) and the strongest prognostic factors are identified for each model. Clinical endpoint is overall graft-survival defined as the time between transplantation and the date of graft-failure or death. The random survival forest shows slightly better predictive performance than Cox models based on the C-index. Neural networks show better performance than both Cox models and random survival forest based on the Integrated Brier Score at 10 years.Conclusion: In this work, it is shown that machine learning techniques can be a useful tool for both prediction and interpretation in the survival context. From the ML techniques examined here, PLANN with 1 hidden layer predicts survival probabilities the most accurately, being as calibrated as the Cox model with all variables.


Author(s):  
Mehmet Fatih Bayramoglu ◽  
Cagatay Basarir

Investing in developed markets offers investors the opportunity to diversify internationally by investing in foreign firms. In other words, it provides the possibility of reducing systematic risk. For this reason, investors are very interested in developed markets. However, developed are more efficient than emerging markets, so the risk and return can be low in these markets. For this reason, developed market investors often use machine learning techniques to increase their gains while reducing their risks. In this chapter, artificial neural networks which is one of the machine learning techniques have been tested to improve internationally diversified portfolio performance. Also, the results of ANNs were compared with the performances of traditional portfolios and the benchmark portfolio. The portfolios are derived from the data of 16 foreign companies quoted on NYSE by ANNs, and they are invested for 30 trading days. According to the results, portfolio derived by ANNs gained 10.30% return, while traditional portfolios gained 5.98% return.


Author(s):  
Juan Gómez-Sanchis ◽  
Emilio Soria-Olivas ◽  
Marcelino Martinez-Sober ◽  
Jose Blasco ◽  
Juan Guerrero ◽  
...  

This work presents a new approach for one of the main problems in the analysis of atmospheric phenomena, the prediction of atmospheric concentrations of different elements. The proposed methodology is more efficient than other classical approaches and is used in this work to predict tropospheric ozone concentration. The relevance of this problem stems from the fact that excessive ozone concentrations may cause several problems related to public health. Previous research by the authors of this work has shown that the classical approach to this problem (linear models) does not achieve satisfactory results in tropospheric ozone concentration prediction. The authors’ approach is based on Machine Learning (ML) techniques, which include algorithms related to neural networks, fuzzy systems and advanced statistical techniques for data processing. In this work, the authors focus on one of the main ML techniques, namely, neural networks. These models demonstrate their suitability for this problem both in terms of prediction accuracy and information extraction.


Author(s):  
Hesham M. Al-Ammal

Detection of anomalies in a given data set is a vital step in several applications in cybersecurity; including intrusion detection, fraud, and social network analysis. Many of these techniques detect anomalies by examining graph-based data. Analyzing graphs makes it possible to capture relationships, communities, as well as anomalies. The advantage of using graphs is that many real-life situations can be easily modeled by a graph that captures their structure and inter-dependencies. Although anomaly detection in graphs dates back to the 1990s, recent advances in research utilized machine learning methods for anomaly detection over graphs. This chapter will concentrate on static graphs (both labeled and unlabeled), and the chapter summarizes some of these recent studies in machine learning for anomaly detection in graphs. This includes methods such as support vector machines, neural networks, generative neural networks, and deep learning methods. The chapter will reflect the success and challenges of using these methods in the context of graph-based anomaly detection.


Author(s):  
Pablo Díaz-Moreno ◽  
Juan José Carrasco ◽  
Emilio Soria-Olivas ◽  
José M. Martínez-Martínez ◽  
Pablo Escandell-Montero ◽  
...  

Neural Networks (NN) are one of the most used machine learning techniques in different areas of knowledge. This has led to the emergence of a large number of courses of Neural Networks around the world and in areas where the users of this technique do not have a lot of programming skills. Current software that implements these elements, such as Matlab®, has a number of important limitations in teaching field. In some cases, the implementation of a MLP requires a thorough knowledge of the software and of the instructions that train and validate these systems. In other cases, the architecture of the model is fixed and they do not allow an automatic sweep of the parameters that determine the architecture of the network. This chapter presents a teaching tool for the its use in courses about neural models that solves some of the above-mentioned limitations. This tool is based on Matlab® software.


2019 ◽  
Vol 36 (1) ◽  
pp. 272-279 ◽  
Author(s):  
Hannah F Löchel ◽  
Dominic Eger ◽  
Theodor Sperlea ◽  
Dominik Heider

AbstractMotivationClassification of protein sequences is one big task in bioinformatics and has many applications. Different machine learning methods exist and are applied on these problems, such as support vector machines (SVM), random forests (RF) and neural networks (NN). All of these methods have in common that protein sequences have to be made machine-readable and comparable in the first step, for which different encodings exist. These encodings are typically based on physical or chemical properties of the sequence. However, due to the outstanding performance of deep neural networks (DNN) on image recognition, we used frequency matrix chaos game representation (FCGR) for encoding of protein sequences into images. In this study, we compare the performance of SVMs, RFs and DNNs, trained on FCGR encoded protein sequences. While the original chaos game representation (CGR) has been used mainly for genome sequence encoding and classification, we modified it to work also for protein sequences, resulting in n-flakes representation, an image with several icosagons.ResultsWe could show that all applied machine learning techniques (RF, SVM and DNN) show promising results compared to the state-of-the-art methods on our benchmark datasets, with DNNs outperforming the other methods and that FCGR is a promising new encoding method for protein sequences.Availability and implementationhttps://cran.r-project.org/.Supplementary informationSupplementary data are available at Bioinformatics online.


2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Georgios Kantidakis ◽  
Hein Putter ◽  
Carlo Lancia ◽  
Jacob de Boer ◽  
Andries E. Braat ◽  
...  

Abstract Background Predicting survival of recipients after liver transplantation is regarded as one of the most important challenges in contemporary medicine. Hence, improving on current prediction models is of great interest.Nowadays, there is a strong discussion in the medical field about machine learning (ML) and whether it has greater potential than traditional regression models when dealing with complex data. Criticism to ML is related to unsuitable performance measures and lack of interpretability which is important for clinicians. Methods In this paper, ML techniques such as random forests and neural networks are applied to large data of 62294 patients from the United States with 97 predictors selected on clinical/statistical grounds, over more than 600, to predict survival from transplantation. Of particular interest is also the identification of potential risk factors. A comparison is performed between 3 different Cox models (with all variables, backward selection and LASSO) and 3 machine learning techniques: a random survival forest and 2 partial logistic artificial neural networks (PLANNs). For PLANNs, novel extensions to their original specification are tested. Emphasis is given on the advantages and pitfalls of each method and on the interpretability of the ML techniques. Results Well-established predictive measures are employed from the survival field (C-index, Brier score and Integrated Brier Score) and the strongest prognostic factors are identified for each model. Clinical endpoint is overall graft-survival defined as the time between transplantation and the date of graft-failure or death. The random survival forest shows slightly better predictive performance than Cox models based on the C-index. Neural networks show better performance than both Cox models and random survival forest based on the Integrated Brier Score at 10 years. Conclusion In this work, it is shown that machine learning techniques can be a useful tool for both prediction and interpretation in the survival context. From the ML techniques examined here, PLANN with 1 hidden layer predicts survival probabilities the most accurately, being as calibrated as the Cox model with all variables. Trial registration Retrospective data were provided by the Scientific Registry of Transplant Recipients under Data Use Agreement number 9477 for analysis of risk factors after liver transplantation.


Author(s):  
Muthu Ram Prabhu Elenchezhian ◽  
Md Rassel Raihan ◽  
Kenneth Reifsnider

Recurrent neural networks (RNN) have been used to interpret data in situations wherein our knowledge of the active physics is incomplete. The currency of these methods is the data that are generated by a physical system. Unfortunately, if we are uncertain about the physics of the system, we also do not know the level of uncertainty in the data that we use to represent it. Typically, data provided to an RNN is provided by measurements of system state information, e.g., data that define speed, position, accelerations, configurations of system elements (like the flaps and elevators on an airplane) etc. But recently, data are being collected that indicate the state of the materials themselves that are used to construct the system. Material state may include the defect state of the materials such as the crack density and patterns in composite material in structural elements (obtained from health monitoring data). In this paper, we address the question of teaching a control system (e.g., for testing equipment, aircraft control systems, health monitoring systems, etc.) to recognize composite material degradation during service and to adjust applied loads and fields as part of a control scheme to avoid failure of the material during service. Topics will include defining a proper cost function for the above objectives, formulation of a ‘failure hypothesis’ as a regression function, and the quantification of uncertainty when the physics of the situation is not completely defined. Examples of machine learning techniques for a uniaxial fatigue loading of composite coupons with a circular hole are presented. Example models are random forest regression algorithms and artificial neural networks for linear regression.


Sign in / Sign up

Export Citation Format

Share Document