scholarly journals Explainable Artificial Intelligence for Sarcasm Detection in Dialogues

2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Akshi Kumar ◽  
Shubham Dikshit ◽  
Victor Hugo C. Albuquerque

Sarcasm detection in dialogues has been gaining popularity among natural language processing (NLP) researchers with the increased use of conversational threads on social media. Capturing the knowledge of the domain of discourse, context propagation during the course of dialogue, and situational context and tone of the speaker are some important features to train the machine learning models for detecting sarcasm in real time. As situational comedies vibrantly represent human mannerism and behaviour in everyday real-life situations, this research demonstrates the use of an ensemble supervised learning algorithm to detect sarcasm in the benchmark dialogue dataset, MUStARD. The punch-line utterance and its associated context are taken as features to train the eXtreme Gradient Boosting (XGBoost) method. The primary goal is to predict sarcasm in each utterance of the speaker using the chronological nature of a scene. Further, it is vital to prevent model bias and help decision makers understand how to use the models in the right way. Therefore, as a twin goal of this research, we make the learning model used for conversational sarcasm detection interpretable. This is done using two post hoc interpretability approaches, Local Interpretable Model-agnostic Explanations (LIME) and Shapley Additive exPlanations (SHAP), to generate explanations for the output of a trained classifier. The classification results clearly depict the importance of capturing the intersentence context to detect sarcasm in conversational threads. The interpretability methods show the words (features) that influence the decision of the model the most and help the user understand how the model is making the decision for detecting sarcasm in dialogues.

Author(s):  
Gabriel Tavares ◽  
Saulo Mastelini ◽  
Sylvio Jr.

This paper proposes a technique for classifying user accounts on social networks to detect fraud in Online Social Networks (OSN). The main purpose of our classification is to recognize the patterns of users from Human, Bots or Cyborgs. Classic and consolidated approaches of Text Mining employ textual features from Natural Language Processing (NLP) for classification, but some drawbacks as computational cost, the huge amount of data could rise in real-life scenarios. This work uses an approach based on statistical frequency parameters of the user posting to distinguish the types of users without textual content. We perform the experiment over a Twitter dataset and as learn-based algorithms in classification task we compared Random Forest (RF), Support Vector Machine (SVM), k-nearest Neighbors (k-NN), Gradient Boosting Machine (GBM) and Extreme Gradient Boosting (XGBoost). Using the standard parameters of each algorithm, we achieved accuracy results of 88% and 84% by RF and XGBoost, respectively


Author(s):  
Mohammad Hamim Zajuli Al Faroby ◽  
Mohammad Isa Irawan ◽  
Ni Nyoman Tri Puspaningsih

Protein Interaction Analysis (PPI) can be used to identify proteins that have a supporting function on the main protein, especially in the synthesis process. Insulin is synthesized by proteins that have the same molecular function covering different but mutually supportive roles. To identify this function, the translation of Gene Ontology (GO) gives certain characteristics to each protein. This study purpose to predict proteins that interact with insulin using the centrality method as a feature extractor and extreme gradient boosting as a classification algorithm. Characteristics using the centralized method produces  features as a central function of protein. Classification results are measured using measurements, precision, recall and ROC scores. Optimizing the model by finding the right parameters produces an accuracy of  and a ROC score of . The prediction model produced by XGBoost has capabilities above the average of other machine learning methods.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Mateusz Szczepański ◽  
Marek Pawlicki ◽  
Rafał Kozik ◽  
Michał Choraś

AbstractThe ubiquity of social media and their deep integration in the contemporary society has granted new ways to interact, exchange information, form groups, or earn money—all on a scale never seen before. Those possibilities paired with the widespread popularity contribute to the level of impact that social media display. Unfortunately, the benefits brought by them come at a cost. Social Media can be employed by various entities to spread disinformation—so called ‘Fake News’, either to make a profit or influence the behaviour of the society. To reduce the impact and spread of Fake News, a diverse array of countermeasures were devised. These include linguistic-based approaches, which often utilise Natural Language Processing (NLP) and Deep Learning (DL). However, as the latest advancements in the Artificial Intelligence (AI) domain show, the model’s high performance is no longer enough. The explainability of the system’s decision is equally crucial in real-life scenarios. Therefore, the objective of this paper is to present a novel explainability approach in BERT-based fake news detectors. This approach does not require extensive changes to the system and can be attached as an extension for operating detectors. For this purposes, two Explainable Artificial Intelligence (xAI) techniques, Local Interpretable Model-Agnostic Explanations (LIME) and Anchors, will be used and evaluated on fake news data, i.e., short pieces of text forming tweets or headlines. This focus of this paper is on the explainability approach for fake news detectors, as the detectors themselves were part of previous works of the authors.


Water ◽  
2021 ◽  
Vol 13 (19) ◽  
pp. 2633
Author(s):  
Jie Yu ◽  
Yitong Cao ◽  
Fei Shi ◽  
Jiegen Shi ◽  
Dibo Hou ◽  
...  

Three dimensional fluorescence spectroscopy has become increasingly useful in the detection of organic pollutants. However, this approach is limited by decreased accuracy in identifying low concentration pollutants. In this research, a new identification method for organic pollutants in drinking water is accordingly proposed using three-dimensional fluorescence spectroscopy data and a deep learning algorithm. A novel application of a convolutional autoencoder was designed to process high-dimensional fluorescence data and extract multi-scale features from the spectrum of drinking water samples containing organic pollutants. Extreme Gradient Boosting (XGBoost), an implementation of gradient-boosted decision trees, was used to identify the organic pollutants based on the obtained features. Method identification performance was validated on three typical organic pollutants in different concentrations for the scenario of accidental pollution. Results showed that the proposed method achieved increasing accuracy, in the case of both high-(>10 μg/L) and low-(≤10 μg/L) concentration pollutant samples. Compared to traditional spectrum processing techniques, the convolutional autoencoder-based approach enabled obtaining features of enhanced detail from fluorescence spectral data. Moreover, evidence indicated that the proposed method maintained the detection ability in conditions whereby the background water changes. It can effectively reduce the rate of misjudgments associated with the fluctuation of drinking water quality. This study demonstrates the possibility of using deep learning algorithms for spectral processing and contamination detection in drinking water.


West Nile Virus (WNV) is a disease caused by mosquitoes where human beings get infected by the mosquito’s bite. The disease is considered to be a serious threat to the society especially in the United States where it is frequently found in localities having water bodies. The traditional approach is to collect the traps of mosquitoes from a locality and check whether they are infected with virus. If there is a virus found then that locality is sprayed with pesticides. But this process is very time consuming and requires a lot of financial support. Machine learning methods can provide an efficient approach to predict the presence of virus in a locality using data related to the location and weather. This paper uses the dataset present in Kaggle which includes information related to the traps found in the locality and also about the information related to the locality’s weather. The dataset is found to be imbalanced hence Synthetic Minority Over sampling Technique (SMOTE), an upsampling method, is used to sample the dataset to balance it. Ensemble learning classifiers like random forest, gradient boosting and Extreme Gradient Boosting (XGB). The performance of ensemble classifiers is compared with the performance of the best supervised learning algorithm, SVM. Among the models, XGB gave the highest F-1 score of 92.93 by performing marginally better than random forest (92.78) and also SVM (91.16).


Author(s):  
He Yang ◽  
Emma Li ◽  
Yi Fang Cai ◽  
Jiapei Li ◽  
George X. Yuan

The purpose of this paper is to establish a framework for the extraction of early warning risk features for the predicting financial distress based on XGBoost model and SHAP. It is well known that the way to construct early warning risk features to predict financial distress of companies is very important, and by comparing with the traditional statistical methods, though the data-driven machine learning for the financial early warning, modelling has a better performance in terms of prediction accuracy, but it also brings the difficulty such as the one the corresponding model may be not explained well. Recently, eXtreme Gradient Boosting (XGBoost), an ensemble learning algorithm based on extreme gradient boosting, has become a hot topic in the area of machine learning research field due to its strong nonlinear information recognition ability and high prediction accuracy in the practice. In this study, the XGBoost algorithm is used to extract early warning features for the predicting financial distress for listed companies, with 76 financial risk features from seven categories of aspects, and 14 non-financial risk features from four categories of aspects, which are collected to establish an early warning system for the predication of financial distress. With applications, we conduct the empirical testing respect to AUC, KS and Kappa, the numerical results show that by comparing with the Logistic model, our method based on XGBoost model established in this paper has much better ability to predict the financial distress risk of listed companies. Moreover, under the framework of SHAP (SHAPley Additive exPlanations), we are able to give a reasonable explanation for important risk features and influencing ways affecting the financial distress visibly. The results given by this paper show that the XGBoost approach to model early warning features for financial distress does not only preform a better prediction accuracy, but also is explainable, which is significant for the identification of early warning to the financial distress risk for listed companies in the practice.


2021 ◽  
Vol 10 (9) ◽  
pp. 1875
Author(s):  
I-Min Chiu ◽  
Chi-Yung Cheng ◽  
Wun-Huei Zeng ◽  
Ying-Hsien Huang ◽  
Chun-Hung Richard Lin

Background: The aim of this study was to develop and evaluate a machine learning (ML) model to predict invasive bacterial infections (IBIs) in young febrile infants visiting the emergency department (ED). Methods: This retrospective study was conducted in the EDs of three medical centers across Taiwan from 2011 to 2018. We included patients age in 0–60 days who were visiting the ED with clinical symptoms of fever. We developed three different ML algorithms, including logistic regression (LR), supportive vector machine (SVM), and extreme gradient boosting (XGboost), comparing their performance at predicting IBIs to a previous validated score system (IBI score). Results: During the study period, 4211 patients were included, where 126 (3.1%) had IBI. A total of eight, five, and seven features were used in the LR, SVM, and XGboost through the feature selection process, respectively. The ML models can achieve a better AUROC value when predicting IBIs in young infants compared with the IBI score (LR: 0.85 vs. SVM: 0.84 vs. XGBoost: 0.85 vs. IBI score: 0.70, p-value < 0.001). Using a cost sensitive learning algorithm, all ML models showed better specificity in predicting IBIs at a 90% sensitivity level compared to an IBI score > 2 (LR: 0.59 vs. SVM: 0.60 vs. XGBoost: 0.57 vs. IBI score >2: 0.43, p-value < 0.001). Conclusions: All ML models developed in this study outperformed the traditional scoring system in stratifying low-risk febrile infants after the standardized sensitivity level.


Symmetry ◽  
2018 ◽  
Vol 10 (11) ◽  
pp. 621 ◽  
Author(s):  
Majdoleen Qamar ◽  
Nasruddin Hassan

Neutrosophic triplet structure yields a symmetric property of truth membership on the left, indeterminacy membership in the centre and false membership on the right, as do points of object, centre and image of reflection. As an extension of a neutrosophic set, the Q-neutrosophic set was introduced to handle two-dimensional uncertain and inconsistent situations. We extend the soft expert set to generalized Q-neutrosophic soft expert set by incorporating the idea of soft expert set to the concept of Q-neutrosophic set and attaching the parameter of fuzzy set while defining a Q-neutrosophic soft expert set. This pattern carries the benefits of Q-neutrosophic sets and soft sets, enabling decision makers to recognize the views of specialists with no requirement for extra lumbering tasks, thus making it exceedingly reasonable for use in decision-making issues that include imprecise, indeterminate and inconsistent two-dimensional data. Some essential operations namely subset, equal, complement, union, intersection, AND and OR operations and additionally several properties relating to the notion of generalized Q-neutrosophic soft expert set are characterized. Finally, an algorithm on generalized Q-neutrosophic soft expert set is proposed and applied to a real-life example to show the efficiency of this notion in handling such problems.


Sensors ◽  
2021 ◽  
Vol 21 (3) ◽  
pp. 958
Author(s):  
Shahar Weksler ◽  
Offer Rozenstein ◽  
Nadav Haish ◽  
Menachem Moshelion ◽  
Rony Wallach ◽  
...  

Potassium is a macro element in plants that is typically supplied to crops in excess throughout the season to avoid a deficit leading to reduced crop yield. Transpiration rate is a momentary physiological attribute that is indicative of soil water content, the plant’s water requirements, and abiotic stress factors. In this study, two systems were combined to create a hyperspectral–physiological plant database for classification of potassium treatments (low, medium, and high) and estimation of momentary transpiration rate from hyperspectral images. PlantArray 3.0 was used to control fertigation, log ambient conditions, and calculate transpiration rates. In addition, a semi-automated platform carrying a hyperspectral camera was triggered every hour to capture images of a large array of pepper plants. The combined attributes and spectral information on an hourly basis were used to classify plants into their given potassium treatments (average accuracy = 80%) and to estimate transpiration rate (RMSE = 0.025 g/min, R2 = 0.75) using the advanced ensemble learning algorithm XGBoost (extreme gradient boosting algorithm). Although potassium has no direct spectral absorption features, the classification results demonstrated the ability to label plants according to potassium treatments based on a remotely measured hyperspectral signal. The ability to estimate transpiration rates for different potassium applications using spectral information can aid in irrigation management and crop yield optimization. These combined results are important for decision-making during the growing season, and particularly at the early stages when potassium levels can still be corrected to prevent yield loss.


Sign in / Sign up

Export Citation Format

Share Document