Context-Aware Misinformation Detection: A benchmark of Deep Learning Architectures using Word Embeddings

Automatic satire identification can help to identify texts in which the intended meaning differs from the literal meaning, improving tasks such as sentiment analysis, fake news detection or natural-language user interfaces. Typically, satire identification is performed by training a supervised classifier for finding linguistic clues that can determine whether a text is satirical or not. For this, the state-of-the-art relies on neural networks fed with word embeddings that are capable of learning interesting characteristics regarding the way humans communicate. However, as far as our knowledge goes, there are no comprehensive studies that evaluate these techniques in Spanish in the satire identification domain. Consequently, in this work we evaluate several deep-learning architectures with Spanish pre-trained word-embeddings and compare the results with strong baselines based on term-counting features. This evaluation is performed with two datasets that contain satirical and non-satirical tweets written in two Spanish variants: European Spanish and Mexican Spanish. Our experimentation revealed that term-counting features achieved similar results to deep-learning approaches based on word-embeddings, both outperforming previous results based on linguistic features. Our results suggest that term-counting features and traditional machine learning models provide competitive results regarding automatic satire identification, slightly outperforming state-of-the-art models.

Download Full-text

Classification of Legislations using Deep Learning

The International Arab Journal of Information Technology ◽

10.34028/iajit/18/5/4 ◽

2021 ◽

Vol 18 (5) ◽

Author(s):

Sameerchand Pudaruth ◽

Sunjiv Soyjaudah ◽

Rajendra Gunputh

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Short Term Memory ◽

Support Vector ◽

Word Embeddings ◽

Legal Professionals ◽

Domain Specific ◽

The Republic ◽

Learning Architectures

Laws are often developed in a piecemeal approach and many provisions of similar nature are often found in different legislations. Therefore, there is a need to classify legislations into various legal topics to help legal professionals in their daily activities. In this study, we have experimented with various deep learning architectures for the automatic classification of 490 legislations from the Republic of Mauritius into 30 categories. Our results demonstrate that a Deep Neural Network (DNN) with three hidden layers delivered the best performance compared with other architectures such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). A mean classification accuracy of 60.9% was achieved using DNN, 56.5% for CNN and 33.7% for Long Short-Term Memory (LSTM). Comparisons were also made with traditional machine learning classifiers such as support vector machines and decision trees and it was found that the performance of DNN was superior, by at least 10%, in all runs. Both general pre-trained word embeddings such as Word2vec and domain-specific word embeddings such as Law2vec were used in combination with the above deep learning architectures but Word2vec had the best performance. To our knowledge, this is the first application of deep learning in the categorisation of legislations.

Download Full-text

Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media

Journal Of Big Data ◽

10.1186/s40537-021-00488-w ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Yahya Albalawi ◽

Jim Buckley ◽

Nikola S. Nikolov

Keyword(s):

Social Media ◽

Deep Learning ◽

Comprehensive Evaluation ◽

Classification Problem ◽

Data Sets ◽

Word Embeddings ◽

Data Set ◽

Lower Accuracy ◽

Health Related ◽

The Impact

AbstractThis paper presents a comprehensive evaluation of data pre-processing and word embedding techniques in the context of Arabic document classification in the domain of health-related communication on social media. We evaluate 26 text pre-processings applied to Arabic tweets within the process of training a classifier to identify health-related tweets. For this task we use the (traditional) machine learning classifiers KNN, SVM, Multinomial NB and Logistic Regression. Furthermore, we report experimental results with the deep learning architectures BLSTM and CNN for the same text classification problem. Since word embeddings are more typically used as the input layer in deep networks, in the deep learning experiments we evaluate several state-of-the-art pre-trained word embeddings with the same text pre-processing applied. To achieve these goals, we use two data sets: one for both training and testing, and another for testing the generality of our models only. Our results point to the conclusion that only four out of the 26 pre-processings improve the classification accuracy significantly. For the first data set of Arabic tweets, we found that Mazajak CBOW pre-trained word embeddings as the input to a BLSTM deep network led to the most accurate classifier with F1 score of 89.7%. For the second data set, Mazajak Skip-Gram pre-trained word embeddings as the input to BLSTM led to the most accurate model with F1 score of 75.2% and accuracy of 90.7% compared to F1 score of 90.8% achieved by Mazajak CBOW for the same architecture but with lower accuracy of 70.89%. Our results also show that the performance of the best of the traditional classifier we trained is comparable to the deep learning methods on the first dataset, but significantly worse on the second dataset.

Download Full-text

A Generalization Performance Study Using Deep Learning Networks in Embedded Systems

Sensors ◽

10.3390/s21041031 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1031

Author(s):

Joseba Gorospe ◽

Rubén Mulero ◽

Olatz Arbelaitz ◽

Javier Muguerza ◽

Miguel Ángel Antón

Keyword(s):

Deep Learning ◽

Embedded Systems ◽

Embedded System ◽

General Purpose ◽

Learning Networks ◽

Performance Study ◽

Learning Techniques ◽

Wide Range ◽

Learning Architectures

Deep learning techniques are being increasingly used in the scientific community as a consequence of the high computational capacity of current systems and the increase in the amount of data available as a result of the digitalisation of society in general and the industrial world in particular. In addition, the immersion of the field of edge computing, which focuses on integrating artificial intelligence as close as possible to the client, makes it possible to implement systems that act in real time without the need to transfer all of the data to centralised servers. The combination of these two concepts can lead to systems with the capacity to make correct decisions and act based on them immediately and in situ. Despite this, the low capacity of embedded systems greatly hinders this integration, so the possibility of being able to integrate them into a wide range of micro-controllers can be a great advantage. This paper contributes with the generation of an environment based on Mbed OS and TensorFlow Lite to be embedded in any general purpose embedded system, allowing the introduction of deep learning architectures. The experiments herein prove that the proposed system is competitive if compared to other commercial systems.

Download Full-text

Chest x-ray automated triage: a semiologic approach designed for clinical implementation, exploiting different types of labels through a combination of four Deep Learning architectures.

Computer Methods and Programs in Biomedicine ◽

10.1016/j.cmpb.2021.106130 ◽

2021 ◽

pp. 106130

Author(s):

Candelaria Mosquera ◽

Facundo Nahuel Diaz ◽

Fernando Binder ◽

José Martín Rabellino ◽

Sonia Elizabeth Benitez ◽

...

Keyword(s):

Deep Learning ◽

Clinical Implementation ◽

X Ray ◽

Different Types ◽

Chest X Ray ◽

Learning Architectures

Download Full-text

Multimodal Deep Learning and Visible-Light and Hyperspectral Imaging for Fruit Maturity Estimation

Sensors ◽

10.3390/s21041288 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1288

Author(s):

Cinmayii A. Garillos-Manliguez ◽

John Y. Chiang

Keyword(s):

Deep Learning ◽

Visible Light ◽

Hyperspectral Imaging ◽

Morphological Changes ◽

Consumer Preference ◽

Hyperspectral Data ◽

Sensitivity Analyses ◽

Deep Convolutional Neural Networks ◽

Fruit Maturity ◽

Learning Architectures

Fruit maturity is a critical factor in the supply chain, consumer preference, and agriculture industry. Most classification methods on fruit maturity identify only two classes: ripe and unripe, but this paper estimates six maturity stages of papaya fruit. Deep learning architectures have gained respect and brought breakthroughs in unimodal processing. This paper suggests a novel non-destructive and multimodal classification using deep convolutional neural networks that estimate fruit maturity by feature concatenation of data acquired from two imaging modes: visible-light and hyperspectral imaging systems. Morphological changes in the sample fruits can be easily measured with RGB images, while spectral signatures that provide high sensitivity and high correlation with the internal properties of fruits can be extracted from hyperspectral images with wavelength range in between 400 nm and 900 nm—factors that must be considered when building a model. This study further modified the architectures: AlexNet, VGG16, VGG19, ResNet50, ResNeXt50, MobileNet, and MobileNetV2 to utilize multimodal data cubes composed of RGB and hyperspectral data for sensitivity analyses. These multimodal variants can achieve up to 0.90 F1 scores and 1.45% top-2 error rate for the classification of six stages. Overall, taking advantage of multimodal input coupled with powerful deep convolutional neural network models can classify fruit maturity even at refined levels of six stages. This indicates that multimodal deep learning architectures and multimodal imaging have great potential for real-time in-field fruit maturity estimation that can help estimate optimal harvest time and other in-field industrial applications.

Download Full-text