Explainable Artificial Intelligence for Sarcasm Detection in Dialogues

Sarcasm detection in dialogues has been gaining popularity among natural language processing (NLP) researchers with the increased use of conversational threads on social media. Capturing the knowledge of the domain of discourse, context propagation during the course of dialogue, and situational context and tone of the speaker are some important features to train the machine learning models for detecting sarcasm in real time. As situational comedies vibrantly represent human mannerism and behaviour in everyday real-life situations, this research demonstrates the use of an ensemble supervised learning algorithm to detect sarcasm in the benchmark dialogue dataset, MUStARD. The punch-line utterance and its associated context are taken as features to train the eXtreme Gradient Boosting (XGBoost) method. The primary goal is to predict sarcasm in each utterance of the speaker using the chronological nature of a scene. Further, it is vital to prevent model bias and help decision makers understand how to use the models in the right way. Therefore, as a twin goal of this research, we make the learning model used for conversational sarcasm detection interpretable. This is done using two post hoc interpretability approaches, Local Interpretable Model-agnostic Explanations (LIME) and Shapley Additive exPlanations (SHAP), to generate explanations for the output of a trained classifier. The classification results clearly depict the importance of capturing the intersentence context to detect sarcasm in conversational threads. The interpretability methods show the words (features) that influence the decision of the model the most and help the user understand how the model is making the decision for detecting sarcasm in dialogues.

Download Full-text

User Classification on Online Social Networks by Post Frequency

10.5753/sbsi.2017.6076 ◽

2017 ◽

Cited By ~ 2

Author(s):

Gabriel Tavares ◽

Saulo Mastelini ◽

Sylvio Jr.

Keyword(s):

Social Networks ◽

Language Processing ◽

Online Social Networks ◽

Computational Cost ◽

Real Life ◽

Gradient Boosting ◽

Support Vector ◽

K Nearest Neighbors ◽

User Classification ◽

Extreme Gradient Boosting

This paper proposes a technique for classifying user accounts on social networks to detect fraud in Online Social Networks (OSN). The main purpose of our classification is to recognize the patterns of users from Human, Bots or Cyborgs. Classic and consolidated approaches of Text Mining employ textual features from Natural Language Processing (NLP) for classification, but some drawbacks as computational cost, the huge amount of data could rise in real-life scenarios. This work uses an approach based on statistical frequency parameters of the user posting to distinguish the types of users without textual content. We perform the experiment over a Twitter dataset and as learn-based algorithms in classification task we compared Random Forest (RF), Support Vector Machine (SVM), k-nearest Neighbors (k-NN), Gradient Boosting Machine (GBM) and Extreme Gradient Boosting (XGBoost). Using the standard parameters of each algorithm, we achieved accuracy results of 88% and 84% by RF and XGBoost, respectively

Download Full-text

XGBoost and Network Analysis for Prediction of Proteins Affecting Insulin based on Protein Protein Interactions

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v5i4.1076 ◽

2020 ◽

pp. 253-262

Author(s):

Mohammad Hamim Zajuli Al Faroby ◽

Mohammad Isa Irawan ◽

Ni Nyoman Tri Puspaningsih

Keyword(s):

Protein Interactions ◽

Interaction Analysis ◽

Synthesis Process ◽

Gradient Boosting ◽

Protein Protein Interactions ◽

Central Function ◽

Extreme Gradient Boosting ◽

Main Protein ◽

The Right ◽

Roc Score

Protein Interaction Analysis (PPI) can be used to identify proteins that have a supporting function on the main protein, especially in the synthesis process. Insulin is synthesized by proteins that have the same molecular function covering different but mutually supportive roles. To identify this function, the translation of Gene Ontology (GO) gives certain characteristics to each protein. This study purpose to predict proteins that interact with insulin using the centrality method as a feature extractor and extreme gradient boosting as a classification algorithm. Characteristics using the centralized method produces features as a central function of protein. Classification results are measured using measurements, precision, recall and ROC scores. Optimizing the model by finding the right parameters produces an accuracy of and a ROC score of . The prediction model produced by XGBoost has capabilities above the average of other machine learning methods.

Download Full-text

New explainability method for BERT-based model in fake news detection

Scientific Reports ◽

10.1038/s41598-021-03100-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Mateusz Szczepański ◽

Marek Pawlicki ◽

Rafał Kozik ◽

Michał Choraś

Keyword(s):

Artificial Intelligence ◽

Social Media ◽

Language Processing ◽

High Performance ◽

Real Life ◽

Contemporary Society ◽

Fake News ◽

Interpretable Model ◽

Deep Integration ◽

The Impact

AbstractThe ubiquity of social media and their deep integration in the contemporary society has granted new ways to interact, exchange information, form groups, or earn money—all on a scale never seen before. Those possibilities paired with the widespread popularity contribute to the level of impact that social media display. Unfortunately, the benefits brought by them come at a cost. Social Media can be employed by various entities to spread disinformation—so called ‘Fake News’, either to make a profit or influence the behaviour of the society. To reduce the impact and spread of Fake News, a diverse array of countermeasures were devised. These include linguistic-based approaches, which often utilise Natural Language Processing (NLP) and Deep Learning (DL). However, as the latest advancements in the Artificial Intelligence (AI) domain show, the model’s high performance is no longer enough. The explainability of the system’s decision is equally crucial in real-life scenarios. Therefore, the objective of this paper is to present a novel explainability approach in BERT-based fake news detectors. This approach does not require extensive changes to the system and can be attached as an extension for operating detectors. For this purposes, two Explainable Artificial Intelligence (xAI) techniques, Local Interpretable Model-Agnostic Explanations (LIME) and Anchors, will be used and evaluated on fake news data, i.e., short pieces of text forming tweets or headlines. This focus of this paper is on the explainability approach for fake news detectors, as the detectors themselves were part of previous works of the authors.

Download Full-text

Detection and Identification of Organic Pollutants in Drinking Water from Fluorescence Spectra Based on Deep Learning Using Convolutional Autoencoder

Water ◽

10.3390/w13192633 ◽

2021 ◽

Vol 13 (19) ◽

pp. 2633

Author(s):

Jie Yu ◽

Yitong Cao ◽

Fei Shi ◽

Jiegen Shi ◽

Dibo Hou ◽

...

Keyword(s):

Drinking Water ◽

Deep Learning ◽

Fluorescence Spectroscopy ◽

Organic Pollutants ◽

Learning Algorithm ◽

Three Dimensional ◽

Gradient Boosting ◽

Spectral Processing ◽

Extreme Gradient Boosting ◽

Convolutional Autoencoder

Three dimensional fluorescence spectroscopy has become increasingly useful in the detection of organic pollutants. However, this approach is limited by decreased accuracy in identifying low concentration pollutants. In this research, a new identification method for organic pollutants in drinking water is accordingly proposed using three-dimensional fluorescence spectroscopy data and a deep learning algorithm. A novel application of a convolutional autoencoder was designed to process high-dimensional fluorescence data and extract multi-scale features from the spectrum of drinking water samples containing organic pollutants. Extreme Gradient Boosting (XGBoost), an implementation of gradient-boosted decision trees, was used to identify the organic pollutants based on the obtained features. Method identification performance was validated on three typical organic pollutants in different concentrations for the scenario of accidental pollution. Results showed that the proposed method achieved increasing accuracy, in the case of both high-(>10 μg/L) and low-(≤10 μg/L) concentration pollutant samples. Compared to traditional spectrum processing techniques, the convolutional autoencoder-based approach enabled obtaining features of enhanced detail from fluorescence spectral data. Moreover, evidence indicated that the proposed method maintained the detection ability in conditions whereby the background water changes. It can effectively reduce the rate of misjudgments associated with the fluctuation of drinking water quality. This study demonstrates the possibility of using deep learning algorithms for spectral processing and contamination detection in drinking water.

Download Full-text

Prediction of West Nile Virus using Ensemble Classifiers

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a9810.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 3744-3749

Keyword(s):

West Nile Virus ◽

Random Forest ◽

Learning Algorithm ◽

Traditional Approach ◽

The United States ◽

Gradient Boosting ◽

Ensemble Classifiers ◽

Human Beings ◽

West Nile ◽

Extreme Gradient Boosting

West Nile Virus (WNV) is a disease caused by mosquitoes where human beings get infected by the mosquito’s bite. The disease is considered to be a serious threat to the society especially in the United States where it is frequently found in localities having water bodies. The traditional approach is to collect the traps of mosquitoes from a locality and check whether they are infected with virus. If there is a virus found then that locality is sprayed with pesticides. But this process is very time consuming and requires a lot of financial support. Machine learning methods can provide an efficient approach to predict the presence of virus in a locality using data related to the location and weather. This paper uses the dataset present in Kaggle which includes information related to the traps found in the locality and also about the information related to the locality’s weather. The dataset is found to be imbalanced hence Synthetic Minority Over sampling Technique (SMOTE), an upsampling method, is used to sample the dataset to balance it. Ensemble learning classifiers like random forest, gradient boosting and Extreme Gradient Boosting (XGB). The performance of ensemble classifiers is compared with the performance of the best supervised learning algorithm, SVM. Among the models, XGB gave the highest F-1 score of 92.93 by performing marginally better than random forest (92.78) and also SVM (91.16).

Download Full-text

The extraction of early warning features for the predicting financial distress based on XGboost model and shap framework

International Journal of Financial Engineering ◽

10.1142/s2424786321410048 ◽

2021 ◽

pp. 2141004

Author(s):

He Yang ◽

Emma Li ◽

Yi Fang Cai ◽

Jiapei Li ◽

George X. Yuan

Keyword(s):

Machine Learning ◽

Early Warning ◽

Financial Distress ◽

Prediction Accuracy ◽

Financial Risk ◽

Learning Algorithm ◽

Listed Companies ◽

Gradient Boosting ◽

Distress Risk ◽

Extreme Gradient Boosting

The purpose of this paper is to establish a framework for the extraction of early warning risk features for the predicting financial distress based on XGBoost model and SHAP. It is well known that the way to construct early warning risk features to predict financial distress of companies is very important, and by comparing with the traditional statistical methods, though the data-driven machine learning for the financial early warning, modelling has a better performance in terms of prediction accuracy, but it also brings the difficulty such as the one the corresponding model may be not explained well. Recently, eXtreme Gradient Boosting (XGBoost), an ensemble learning algorithm based on extreme gradient boosting, has become a hot topic in the area of machine learning research field due to its strong nonlinear information recognition ability and high prediction accuracy in the practice. In this study, the XGBoost algorithm is used to extract early warning features for the predicting financial distress for listed companies, with 76 financial risk features from seven categories of aspects, and 14 non-financial risk features from four categories of aspects, which are collected to establish an early warning system for the predication of financial distress. With applications, we conduct the empirical testing respect to AUC, KS and Kappa, the numerical results show that by comparing with the Logistic model, our method based on XGBoost model established in this paper has much better ability to predict the financial distress risk of listed companies. Moreover, under the framework of SHAP (SHAPley Additive exPlanations), we are able to give a reasonable explanation for important risk features and influencing ways affecting the financial distress visibly. The results given by this paper show that the XGBoost approach to model early warning features for financial distress does not only preform a better prediction accuracy, but also is explainable, which is significant for the identification of early warning to the financial distress risk for listed companies in the practice.

Download Full-text

Using Machine Learning to Predict Invasive Bacterial Infections in Young Febrile Infants Visiting the Emergency Department

Journal of Clinical Medicine ◽

10.3390/jcm10091875 ◽

2021 ◽

Vol 10 (9) ◽

pp. 1875

Author(s):

I-Min Chiu ◽

Chi-Yung Cheng ◽

Wun-Huei Zeng ◽

Ying-Hsien Huang ◽

Chun-Hung Richard Lin

Keyword(s):

Machine Learning ◽

Emergency Department ◽

Bacterial Infections ◽

Clinical Symptoms ◽

Learning Algorithm ◽

Gradient Boosting ◽

P Value ◽

Young Infants ◽

Extreme Gradient Boosting ◽

Sensitivity Level

Background: The aim of this study was to develop and evaluate a machine learning (ML) model to predict invasive bacterial infections (IBIs) in young febrile infants visiting the emergency department (ED). Methods: This retrospective study was conducted in the EDs of three medical centers across Taiwan from 2011 to 2018. We included patients age in 0–60 days who were visiting the ED with clinical symptoms of fever. We developed three different ML algorithms, including logistic regression (LR), supportive vector machine (SVM), and extreme gradient boosting (XGboost), comparing their performance at predicting IBIs to a previous validated score system (IBI score). Results: During the study period, 4211 patients were included, where 126 (3.1%) had IBI. A total of eight, five, and seven features were used in the LR, SVM, and XGboost through the feature selection process, respectively. The ML models can achieve a better AUROC value when predicting IBIs in young infants compared with the IBI score (LR: 0.85 vs. SVM: 0.84 vs. XGBoost: 0.85 vs. IBI score: 0.70, p-value < 0.001). Using a cost sensitive learning algorithm, all ML models showed better specificity in predicting IBIs at a 90% sensitivity level compared to an IBI score > 2 (LR: 0.59 vs. SVM: 0.60 vs. XGBoost: 0.57 vs. IBI score >2: 0.43, p-value < 0.001). Conclusions: All ML models developed in this study outperformed the traditional scoring system in stratifying low-risk febrile infants after the standardized sensitivity level.

Download Full-text

Generalized Q-Neutrosophic Soft Expert Set for Decision under Uncertainty

Symmetry ◽

10.3390/sym10110621 ◽

2018 ◽

Vol 10 (11) ◽

pp. 621 ◽

Cited By ~ 12

Author(s):

Majdoleen Qamar ◽

Nasruddin Hassan

Keyword(s):

Real Life ◽

Decision Makers ◽

Neutrosophic Set ◽

Two Dimensional ◽

Decision Under Uncertainty ◽

Soft Sets ◽

Neutrosophic Sets ◽

Triplet Structure ◽

The Right ◽

Symmetric Property

Neutrosophic triplet structure yields a symmetric property of truth membership on the left, indeterminacy membership in the centre and false membership on the right, as do points of object, centre and image of reflection. As an extension of a neutrosophic set, the Q-neutrosophic set was introduced to handle two-dimensional uncertain and inconsistent situations. We extend the soft expert set to generalized Q-neutrosophic soft expert set by incorporating the idea of soft expert set to the concept of Q-neutrosophic set and attaching the parameter of fuzzy set while defining a Q-neutrosophic soft expert set. This pattern carries the benefits of Q-neutrosophic sets and soft sets, enabling decision makers to recognize the views of specialists with no requirement for extra lumbering tasks, thus making it exceedingly reasonable for use in decision-making issues that include imprecise, indeterminate and inconsistent two-dimensional data. Some essential operations namely subset, equal, complement, union, intersection, AND and OR operations and additionally several properties relating to the notion of generalized Q-neutrosophic soft expert set are characterized. Finally, an algorithm on generalized Q-neutrosophic soft expert set is proposed and applied to a real-life example to show the efficiency of this notion in handling such problems.

Download Full-text

Extreme Gradient Boosting Machine Learning Algorithm For Safe Auto Insurance Operations

2019 IEEE International Conference on Vehicular Electronics and Safety (ICVES) ◽

10.1109/icves.2019.8906396 ◽

2019 ◽

Cited By ~ 4

Author(s):

Najmeddine Dhieb ◽

Hakim Ghazzai ◽

Hichem Besbes ◽

Yehia Massoud

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Gradient Boosting ◽

Machine Learning Algorithm ◽

Auto Insurance ◽

Gradient Boosting Machine ◽

Extreme Gradient Boosting

Download Full-text

Detection of Potassium Deficiency and Momentary Transpiration Rate Estimation at Early Growth Stages Using Proximal Hyperspectral Imaging and Extreme Gradient Boosting

Sensors ◽

10.3390/s21030958 ◽

2021 ◽

Vol 21 (3) ◽

pp. 958

Author(s):

Shahar Weksler ◽

Offer Rozenstein ◽

Nadav Haish ◽

Menachem Moshelion ◽

Rony Wallach ◽

...

Keyword(s):

Crop Yield ◽

Transpiration Rate ◽

Learning Algorithm ◽

Irrigation Management ◽

Stress Factors ◽

Spectral Information ◽

Gradient Boosting ◽

Growth Stages ◽

Ambient Conditions ◽

Extreme Gradient Boosting

Potassium is a macro element in plants that is typically supplied to crops in excess throughout the season to avoid a deficit leading to reduced crop yield. Transpiration rate is a momentary physiological attribute that is indicative of soil water content, the plant’s water requirements, and abiotic stress factors. In this study, two systems were combined to create a hyperspectral–physiological plant database for classification of potassium treatments (low, medium, and high) and estimation of momentary transpiration rate from hyperspectral images. PlantArray 3.0 was used to control fertigation, log ambient conditions, and calculate transpiration rates. In addition, a semi-automated platform carrying a hyperspectral camera was triggered every hour to capture images of a large array of pepper plants. The combined attributes and spectral information on an hourly basis were used to classify plants into their given potassium treatments (average accuracy = 80%) and to estimate transpiration rate (RMSE = 0.025 g/min, R2 = 0.75) using the advanced ensemble learning algorithm XGBoost (extreme gradient boosting algorithm). Although potassium has no direct spectral absorption features, the classification results demonstrated the ability to label plants according to potassium treatments based on a remotely measured hyperspectral signal. The ability to estimate transpiration rates for different potassium applications using spectral information can aid in irrigation management and crop yield optimization. These combined results are important for decision-making during the growing season, and particularly at the early stages when potassium levels can still be corrected to prevent yield loss.

Download Full-text