Semantic Explanation for Deep Neural Networks Using Feature Interactions

Given the promising results obtained by deep-learning techniques in multimedia analysis, the explainability of predictions made by networks has become important in practical applications. We present a method to generate semantic and quantitative explanations that are easily interpretable by humans. The previous work to obtain such explanations has focused on the contributions of each feature, taking their sum to be the prediction result for a target variable; the lack of discriminative power due to this simple additive formulation led to low explanatory performance. Our method considers not only individual features but also their interactions, for a more detailed interpretation of the decisions made by networks. The algorithm is based on the factorization machine, a prediction method that calculates factor vectors for each feature. We conducted experiments on multiple datasets with different models to validate our method, achieving higher performance than the previous work. We show that including interactions not only generates explanations but also makes them richer and is able to convey more information. We show examples of produced explanations in a simple visual format and verify that they are easily interpretable and plausible.

Download Full-text

Analysis of Non-Linear Activation Functions for Classification Tasks Using Convolutional Neural Networks

Recent Patents on Computer Science ◽

10.2174/2213275911666181025143029 ◽

2019 ◽

Vol 12 (3) ◽

pp. 156-161 ◽

Cited By ~ 3

Author(s):

Aman Dureja ◽

Payal Pahwa

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Activation Function ◽

Primary Objective ◽

Experimental Comparison ◽

Activation Functions ◽

Practical Applications ◽

Network Activation ◽

Non Linear ◽

Hidden Layer

Background: In making the deep neural network, activation functions play an important role. But the choice of activation functions also affects the network in term of optimization and to retrieve the better results. Several activation functions have been introduced in machine learning for many practical applications. But which activation function should use at hidden layer of deep neural networks was not identified. Objective: The primary objective of this analysis was to describe which activation function must be used at hidden layers for deep neural networks to solve complex non-linear problems. Methods: The configuration for this comparative model was used by using the datasets of 2 classes (Cat/Dog). The number of Convolutional layer used in this network was 3 and the pooling layer was also introduced after each layer of CNN layer. The total of the dataset was divided into the two parts. The first 8000 images were mainly used for training the network and the next 2000 images were used for testing the network. Results: The experimental comparison was done by analyzing the network by taking different activation functions on each layer of CNN network. The validation error and accuracy on Cat/Dog dataset were analyzed using activation functions (ReLU, Tanh, Selu, PRelu, Elu) at number of hidden layers. Overall the Relu gave best performance with the validation loss at 25th Epoch 0.3912 and validation accuracy at 25th Epoch 0.8320. Conclusion: It is found that a CNN model with ReLU hidden layers (3 hidden layers here) gives best results and improve overall performance better in term of accuracy and speed. These advantages of ReLU in CNN at number of hidden layers are helpful to effectively and fast retrieval of images from the databases.

Download Full-text

Identifying the underlying factors associated with antidepressant drug discontinuation: Content analysis of patients’ drug reviews (Preprint)

10.2196/preprints.23572 ◽

2020 ◽

Author(s):

Mohammad Alarifi ◽

Somaieh Goudarzvand3 ◽

Abdulrahman Jabour ◽

Doreen Foy ◽

Maryam Zolnoori

Keyword(s):

Machine Learning ◽

Antidepressant Drug ◽

Prediction Method ◽

Analytical Framework ◽

Structured Data ◽

Withdrawal Symptoms ◽

Machine Learning Techniques ◽

Drug Discontinuation ◽

Factors Associated ◽

Learning Techniques

BACKGROUND The rate of antidepressant prescriptions is globally increasing. A large portion of patients stop their medications which could lead to many side effects including relapse, and anxiety. OBJECTIVE The aim of this was to develop a drug-continuity prediction model and identify the factors associated with drug-continuity using online patient forums. METHODS We retrieved 982 antidepressant drug reviews from the online patient’s forum AskaPatient.com. We followed the Analytical Framework Method to extract structured data from unstructured data. Using the structured data, we examined the factors associated with antidepressant discontinuity and developed a predictive model using multiple machine learning techniques. RESULTS We tested multiple machine learning techniques which resulted in different performances ranging from accuracy of 65% to 82%. We found that Radom Forest algorithm provides the highest prediction method with 82% Accuracy, 78% Precision, 88.03% Recall, and 84.2% F1-Score. The factors associated with drug discontinuity the most were; withdrawal symptoms, effectiveness-ineffectiveness, perceived-distress-adverse drug reaction, rating, and perceived-distress related to withdrawal symptoms. CONCLUSIONS Although the nature of data available at online forums differ from data collected through surveys, we found that online patients forum can be a valuable source of data for drug-continuity prediction and understanding patients experience. The factors identified through our techniques were consistent with the findings of prior studies that used surveys.

Download Full-text

Multi-Mineral Segmentation of SEM Images Using Deep Learning Techniques

10.2118/206526-ms ◽

2021 ◽

Author(s):

Vladislav Vasilevich Alekseev ◽

Denis Mihaylovich Orlov ◽

Dmitry Anatolevich Koroteev

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Deep Neural Networks ◽

Pore Space ◽

Rock Physics ◽

Sem Images ◽

Digital Rock Physics ◽

Learning Techniques ◽

Digital Core ◽

Non Destructive

Abstract The approaches of building and methods of using the digital core are currently developing rapidly. The use of these methods makes it possible to obtain petrophysical information by non-destructive methods quickly. Digital rock physics includes two main stages: constructing models and modeling various physical processes on the obtained models. Our work proposes using deep learning methods for mineral and pore space segmentation instead of classical methods such as threshold image processing. Deep neural networks have long been able to show their advantages in many areas of computer vision. This paper proposes and tests methods that help identify different minerals in images from a scanning electron microscope. We used images of rocks of the Achimov formation, which are arkoses, as samples. We tested various deep neural networks such as LinkNet, U-Net, ResUNet, and pix2pix and identified those that performed best in segmentation.

Download Full-text

Learning Feature Interactions with Lorentzian Factorization Machine

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6119 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6470-6477

Author(s):

Canran Xu ◽

Ming Wu

Keyword(s):

Deep Learning ◽

Hyperbolic Space ◽

Recommendation System ◽

Triangle Inequality ◽

State Of The Art ◽

Learning Methods ◽

New Model ◽

User Behaviors ◽

Feature Interactions ◽

Factorization Machine

Learning representations for feature interactions to model user behaviors is critical for recommendation system and click-trough rate (CTR) predictions. Recent advances in this area are empowered by deep learning methods which could learn sophisticated feature interactions and achieve the state-of-the-art result in an end-to-end manner. These approaches require large number of training parameters integrated with the low-level representations, and thus are memory and computational inefficient. In this paper, we propose a new model named “LorentzFM” that can learn feature interactions embedded in a hyperbolic space in which the violation of triangle inequality for Lorentz distances is available. To this end, the learned representation is benefited by the peculiar geometric properties of hyperbolic triangles, and result in a significant reduction in the number of parameters (20% to 80%) because all the top deep learning layers are not required. With such a lightweight architecture, LorentzFM achieves comparable and even materially better results than the deep learning methods such as DeepFM, xDeepFM and Deep & Cross in both recommendation and CTR prediction tasks.

Download Full-text

Optimising pin-in-paste technology using gradient boosted decision trees

Soldering & Surface Mount Technology ◽

10.1108/ssmt-09-2017-0024 ◽

2018 ◽

Vol 30 (3) ◽

pp. 164-170 ◽

Cited By ~ 8

Author(s):

Péter Martinek ◽

Oliver Krammer

Keyword(s):

Decision Tree ◽

Prediction Method ◽

Percentage Error ◽

Solder Paste ◽

Hole Filling ◽

Decision Tree Learning ◽

Stencil Printing ◽

Content Type ◽

Learning Techniques ◽

Through Hole

Purpose This paper aims to present a robust prediction method for estimating the quality of electronic products assembled with pin-in-paste soldering technology. A specific board quality factor was also defined which describes the expected yield of the board assembly. Design/methodology/approach Experiments were performed to obtain the required input data for developing a prediction method based on decision tree learning techniques. A Type 4 lead-free solder paste (particle size 20–38 µm) was deposited by stencil printing with different printing speeds (from 20 mm/s to 70 mm/s) into the through-holes (0.8 mm, 1 mm, 1.1 mm, 1.4 mm) of an FR4 board. Hole-filling was investigated with X-ray analyses. Three test cases were evaluated. Findings The optimal parameters of the algorithm were determined as: subsample is 0.5, learning rate is 0.001, maximum tree depth is 6 and boosting iteration is 10,000. The mean absolute error, root mean square error and mean absolute percentage error resulted in 0.024, 0.03 and 3.5, respectively, on average for the prediction of the hole-filling value, based on the printing speed and hole-diameter after optimisation. Our method is able to predict the hole-filling in pin-in-paste technology for different through-hole diameters. Originality/value No research works are available in current literature regarding machine learning techniques for pin-in-paste technology. Therefore, we decided to develop a method using decision tree learning techniques for supporting the design of the stencil printing process for through-hole components and pin-in-paste technology. The first pass yield of the assembly can be enhanced, and the reflow soldering failures of pin-in-paste technology can be significantly reduced.

Download Full-text

Transfer Incremental Learning Using Data Augmentation

Applied Sciences ◽

10.3390/app8122512 ◽

2018 ◽

Vol 8 (12) ◽

pp. 2512 ◽

Cited By ~ 2

Author(s):

Ghouthi Boukli Hacene ◽

Vincent Gripon ◽

Nicolas Farrugia ◽

Matthieu Arzel ◽

Michel Jezequel

Keyword(s):

Incremental Learning ◽

Deep Neural Networks ◽

Data Augmentation ◽

State Of The Art ◽

Low Complexity ◽

Computational Power ◽

Learning Problem ◽

Learning Techniques ◽

Using Data ◽

Selection Of

Deep learning-based methods have reached state of the art performances, relying on a large quantity of available data and computational power. Such methods still remain highly inappropriate when facing a major open machine learning problem, which consists of learning incrementally new classes and examples over time. Combining the outstanding performances of Deep Neural Networks (DNNs) with the flexibility of incremental learning techniques is a promising venue of research. In this contribution, we introduce Transfer Incremental Learning using Data Augmentation (TILDA). TILDA is based on pre-trained DNNs as feature extractors, robust selection of feature vectors in subspaces using a nearest-class-mean based technique, majority votes and data augmentation at both the training and the prediction stages. Experiments on challenging vision datasets demonstrate the ability of the proposed method for low complexity incremental learning, while achieving significantly better accuracy than existing incremental counterparts.

Download Full-text

A Prediction Method of Solar Power Generator using Machine Learning Techniques

Parallel and Distributed Computing, Applications and Technologies - Communications in Computer and Information Science ◽

10.1007/978-981-13-5907-1_37 ◽

2019 ◽

pp. 345-352

Author(s):

Jungseok Cho ◽

Jeongdoo Lee ◽

Doosan Cho

Keyword(s):

Machine Learning ◽

Solar Power ◽

Prediction Method ◽

Machine Learning Techniques ◽

Power Generator ◽

Learning Techniques

Download Full-text

Inter-Class Angular Loss for Convolutional Neural Networks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013894 ◽

2019 ◽

Vol 33 ◽

pp. 3894-3901 ◽

Cited By ~ 1

Author(s):

Le Hui ◽

Xiang Li ◽

Chen Gong ◽

Meng Fang ◽

Joey Tianyi Zhou ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Deep Neural Networks ◽

Learning Difficulties ◽

Feature Space ◽

Superior Performance ◽

Strongly Correlated ◽

Discriminative Ability ◽

Practical Applications ◽

Classification Tasks

Convolutional Neural Networks (CNNs) have shown great power in various classification tasks and have achieved remarkable results in practical applications. However, the distinct learning difficulties in discriminating different pairs of classes are largely ignored by the existing networks. For instance, in CIFAR-10 dataset, distinguishing cats from dogs is usually harder than distinguishing horses from ships. By carefully studying the behavior of CNN models in the training process, we observe that the confusion level of two classes is strongly correlated with their angular separability in the feature space. That is, the larger the inter-class angle is, the lower the confusion will be. Based on this observation, we propose a novel loss function dubbed “Inter-Class Angular Loss” (ICAL), which explicitly models the class correlation and can be directly applied to many existing deep networks. By minimizing the proposed ICAL, the networks can effectively discriminate the examples in similar classes by enlarging the angle between their corresponding class vectors. Thorough experimental results on a series of vision and nonvision datasets confirm that ICAL critically improves the discriminative ability of various representative deep neural networks and generates superior performance to the original networks with conventional softmax loss.

Download Full-text

Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/435 ◽

2017 ◽

Cited By ~ 141

Author(s):

Jun Xiao ◽

Hao Ye ◽

Xiangnan He ◽

Hanwang Zhang ◽

Fei Wu ◽

...

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Feature Interaction ◽

Model Parameters ◽

Learning Approach ◽

Attention Networks ◽

Feature Interactions ◽

Factorization Machine ◽

Real World Datasets ◽

Novel Model

Factorization Machines (FMs) are a supervised learning approach that enhances the linear regression model by incorporating the second-order feature interactions. Despite effectiveness, FM can be hindered by its modelling of all feature interactions with the same weight, as not all feature interactions are equally useful and predictive. For example, the interactions with useless features may even introduce noises and adversely degrade the performance. In this work, we improve FM by discriminating the importance of different feature interactions. We propose a novel model named Attentional Factorization Machine (AFM), which learns the importance of each feature interaction from data via a neural attention network. Extensive experiments on two real-world datasets demonstrate the effectiveness of AFM. Empirically, it is shown on regression task AFM betters FM with a 8.6% relative improvement, and consistently outperforms the state-of-the-art deep learning methods Wide&Deep [Cheng et al., 2016] and DeepCross [Shan et al., 2016] with a much simpler structure and fewer model parameters. Our implementation of AFM is publicly available at: https://github.com/hexiangnan/attentional_factorization_machine

Download Full-text

A Novel Semi-Supervised Fuzzy C-Means Clustering Algorithm Using Multiple Fuzzification Coefficients

Algorithms ◽

10.3390/a14090258 ◽

2021 ◽

Vol 14 (9) ◽

pp. 258

Author(s):

Tran Dinh Khang ◽

Manh-Kien Tran ◽

Michael Fowler

Keyword(s):

Machine Learning ◽

Clustering Algorithm ◽

Machine Learning Techniques ◽

Unsupervised Machine Learning ◽

Practical Applications ◽

Fuzzy C Means ◽

Learning Techniques ◽

Fuzzy C Means Clustering ◽

Data Points ◽

Data Elements

Clustering is an unsupervised machine learning method with many practical applications that has gathered extensive research interest. It is a technique of dividing data elements into clusters such that elements in the same cluster are similar. Clustering belongs to the group of unsupervised machine learning techniques, meaning that there is no information about the labels of the elements. However, when knowledge of data points is known in advance, it will be beneficial to use a semi-supervised algorithm. Within many clustering techniques available, fuzzy C-means clustering (FCM) is a common one. To make the FCM algorithm a semi-supervised method, it was proposed in the literature to use an auxiliary matrix to adjust the membership grade of the elements to force them into certain clusters during the computation. In this study, instead of using the auxiliary matrix, we proposed to use multiple fuzzification coefficients to implement the semi-supervision component. After deriving the proposed semi-supervised fuzzy C-means clustering algorithm with multiple fuzzification coefficients (sSMC-FCM), we demonstrated the convergence of the algorithm and validated the efficiency of the method through a numerical example.

Download Full-text