Deep Visual Semantic Embedding with Text Data Augmentation and Word Embedding Initialization

Language and vision are the two most essential parts of human intelligence for interpreting the real world around us. How to make connections between language and vision is the key point in current research. Multimodality methods like visual semantic embedding have been widely studied recently, which unify images and corresponding texts into the same feature space. Inspired by the recent development of text data augmentation and a simple but powerful technique proposed called EDA (easy data augmentation), we can expand the information with given data using EDA to improve the performance of models. In this paper, we take advantage of the text data augmentation technique and word embedding initialization for multimodality retrieval. We utilize EDA for text data augmentation, word embedding initialization for text encoder based on recurrent neural networks, and minimizing the gap between the two spaces by triplet ranking loss with hard negative mining. On two Flickr-based datasets, we achieve the same recall with only 60% of the training dataset as the normal training with full available data. Experiment results show the improvement of our proposed model; and, on all datasets in this paper (Flickr8k, Flickr30k, and MS-COCO), our model performs better on image annotation and image retrieval tasks; the experiments also demonstrate that text data augmentation is more suitable for smaller datasets, while word embedding initialization is suitable for larger ones.

Download Full-text

The Image Annotation Refinement in Embedding Feature Space based on Mutual Information

International Journal of Circuits, Systems and Signal Processing ◽

10.46300/9106.2022.16.23 ◽

2022 ◽

Vol 16 ◽

pp. 191-201

Author(s):

Wei Li ◽

Haiyu Song ◽

Hongda Zhang ◽

Houjie Li ◽

Pengjie Wang

Keyword(s):

Mutual Information ◽

Image Annotation ◽

State Of The Art ◽

Feature Space ◽

Semantic Space ◽

Visual Features ◽

Novel Approach ◽

Proposed Model ◽

Information Method ◽

Available Information

The ever-increasing size of images has made automatic image annotation one of the most important tasks in the fields of machine learning and computer vision. Despite continuous efforts in inventing new annotation algorithms and new models, results of the state-of-the-art image annotation methods are often unsatisfactory. In this paper, to further improve annotation refinement performance, a novel approach based on weighted mutual information to automatically refine the original annotations of images is proposed. Unlike the traditional refinement model using only visual feature, the proposed model use semantic embedding to properly map labels and visual features to a meaningful semantic space. To accurately measure the relevance between the particular image and its original annotations, the proposed model utilize all available information including image-to-image, label-to-label and image-to-label. Experimental results conducted on three typical datasets show not only the validity of the refinement, but also the superiority of the proposed algorithm over existing ones. The improvement largely benefits from our proposed mutual information method and utilizing all available information.

Download Full-text

A Correlated Topic Model Using Word Embeddings

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/588 ◽

2017 ◽

Cited By ~ 20

Author(s):

Guangxu Xun ◽

Yaliang Li ◽

Wayne Xin Zhao ◽

Jing Gao ◽

Aidong Zhang

Keyword(s):

Data Augmentation ◽

Topic Model ◽

Semantic Relatedness ◽

Word Embedding ◽

Word Embeddings ◽

Word Level ◽

Logistic Normal Distribution ◽

Proposed Model ◽

Correlation Information ◽

Correlated Topic Model

Conventional correlated topic models are able to capture correlation structure among latent topics by replacing the Dirichlet prior with the logistic normal distribution. Word embeddings have been proven to be able to capture semantic regularities in language. Therefore, the semantic relatedness and correlations between words can be directly calculated in the word embedding space, for example, via cosine values. In this paper, we propose a novel correlated topic model using word embeddings. The proposed model enables us to exploit the additional word-level correlation information in word embeddings and directly model topic correlation in the continuous word embedding space. In the model, words in documents are replaced with meaningful word embeddings, topics are modeled as multivariate Gaussian distributions over the word embeddings and topic correlations are learned among the continuous Gaussian topics. A Gibbs sampling solution with data augmentation is given to perform inference. We evaluate our model on the 20 Newsgroups dataset and the Reuters-21578 dataset qualitatively and quantitatively. The experimental results show the effectiveness of our proposed model.

Download Full-text

An Efficient CNN Model for COVID-19 Disease Detection Based on X-Ray Image Classification

Complexity ◽

10.1155/2021/6621607 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Aijaz Ahmad Reshi ◽

Furqan Rustam ◽

Arif Mehmood ◽

Abdulaziz Alhossan ◽

Ziyad Alrabiah ◽

...

Keyword(s):

Image Analysis ◽

Image Classification ◽

Data Augmentation ◽

Medical Image Analysis ◽

Machine Learning Algorithms ◽

Training Dataset ◽

Test Scenario ◽

X Ray ◽

Proposed Model ◽

Chest X Ray

Artificial intelligence (AI) techniques in general and convolutional neural networks (CNNs) in particular have attained successful results in medical image analysis and classification. A deep CNN architecture has been proposed in this paper for the diagnosis of COVID-19 based on the chest X-ray image classification. Due to the nonavailability of sufficient-size and good-quality chest X-ray image dataset, an effective and accurate CNN classification was a challenge. To deal with these complexities such as the availability of a very-small-sized and imbalanced dataset with image-quality issues, the dataset has been preprocessed in different phases using different techniques to achieve an effective training dataset for the proposed CNN model to attain its best performance. The preprocessing stages of the datasets performed in this study include dataset balancing, medical experts’ image analysis, and data augmentation. The experimental results have shown the overall accuracy as high as 99.5% which demonstrates the good capability of the proposed CNN model in the current application domain. The CNN model has been tested in two scenarios. In the first scenario, the model has been tested using the 100 X-ray images of the original processed dataset which achieved an accuracy of 100%. In the second scenario, the model has been tested using an independent dataset of COVID-19 X-ray images. The performance in this test scenario was as high as 99.5%. To further prove that the proposed model outperforms other models, a comparative analysis has been done with some of the machine learning algorithms. The proposed model has outperformed all the models generally and specifically when the model testing was done using an independent testing set.

Download Full-text

Convolution neural network and histogram equalization for COVID-19 diagnosis system

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v24.i1.pp420-427 ◽

2021 ◽

Vol 24 (1) ◽

pp. 420

Author(s):

Bashra Kadhim Oleiwi Chabor Alwawi ◽

Layla H. Abood

Keyword(s):

Neural Network ◽

Data Augmentation ◽

Training Model ◽

Histogram Equalization ◽

Convolution Neural Network ◽

Training Dataset ◽

Learning Technology ◽

Detection Model ◽

Proposed Model ◽

Model Training

The coronavirus disease-2019 (COVID-19) is spreading quickly and globally as a pandemic and is the biggest problem facing humanity nowadays. The medical resources have become insufficient in many areas. The importance of the fast diagnosis of the positive cases is increasing to prevent further spread of this pandemic. In this study, the deep learning technology for COVID-19 dataset expansion and detection model is proposed. In the first stage of proposed model, COVID-19 dataset as chest X-ray images were collected and pre-processed, followed by expanding the data using data augmentation, enhancement by image processing and histogram equalization techniuque. While in the second stage of this model, a new convolution neural network (CNN) architecture was built and trained to diagnose the COVID-19 dataset as a COVID-19 (infected) or normal (uninfected) case. Whereas, a graphical user interface (GUI) using with Tkinter was designed for the proposed COVID-19 detection model. Training simulations are carried out online on using Google colaboratory based graphics prossesing unit (GPU). The proposed model has successfully classified COVID-19 with accuracy of the training model is 93.8% for training dataset and 92.1% for validating dataset and reached to the targeted point with minimum epoch’s number to train this model with satisfying results.

Download Full-text

AN EFFICIENT MACHINE LEARNING MODEL FOR PREDICTION OF ACUTE MYOCARDIAL INFARCTION

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666200325104317 ◽

2020 ◽

Vol 13 ◽

Author(s):

Dhilsath Fathima.M ◽

S. Justin Samuel ◽

R. Hari Haran

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Acute Myocardial Infarction ◽

Logistic Regression ◽

Decision Tree ◽

Learning Model ◽

Training Dataset ◽

Data Set ◽

Machine Learning Model ◽

Proposed Model

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.

Download Full-text

A Survey of Text Data Augmentation

2020 International Conference on Computer Communication and Network Security (CCNS) ◽

10.1109/ccns50731.2020.00049 ◽

2020 ◽

Author(s):

Pei Liu ◽

Xuemin Wang ◽

Chao Xiang ◽

Weiye Meng

Keyword(s):

Data Augmentation ◽

Text Data

Download Full-text

GPR B-Scan Image Denoising via Multi-Scale Convolutional Autoencoder with Data Augmentation

Electronics ◽

10.3390/electronics10111269 ◽

2021 ◽

Vol 10 (11) ◽

pp. 1269

Author(s):

Jiabin Luo ◽

Wentai Lei ◽

Feifei Hou ◽

Chenghao Wang ◽

Qiang Ren ◽

...

Keyword(s):

Image Denoising ◽

Data Augmentation ◽

Noise Suppression ◽

Random Noise ◽

Similarity Index ◽

Structural Similarity ◽

Training Dataset ◽

Generative Adversarial Network ◽

Multi Scale ◽

Convolutional Autoencoder

Ground-penetrating radar (GPR), as a non-invasive instrument, has been widely used in civil engineering. In GPR B-scan images, there may exist random noise due to the influence of the environment and equipment hardware, which complicates the interpretability of the useful information. Many methods have been proposed to eliminate or suppress the random noise. However, the existing methods have an unsatisfactory denoising effect when the image is severely contaminated by random noise. This paper proposes a multi-scale convolutional autoencoder (MCAE) to denoise GPR data. At the same time, to solve the problem of training dataset insufficiency, we designed the data augmentation strategy, Wasserstein generative adversarial network (WGAN), to increase the training dataset of MCAE. Experimental results conducted on both simulated, generated, and field datasets demonstrated that the proposed scheme has promising performance for image denoising. In terms of three indexes: the peak signal-to-noise ratio (PSNR), the time cost, and the structural similarity index (SSIM), the proposed scheme can achieve better performance of random noise suppression compared with the state-of-the-art competing methods (e.g., CAE, BM3D, WNNM).

Download Full-text

UAV Image Multi-Labeling with Data-Efficient Transformers

Applied Sciences ◽

10.3390/app11093974 ◽

2021 ◽

Vol 11 (9) ◽

pp. 3974

Author(s):

Laila Bashmal ◽

Yakoub Bazi ◽

Mohamad Mahmoud Al Rahhal ◽

Haikel Alhichri ◽

Naif Al Ajlan

Keyword(s):

Data Augmentation ◽

Feature Representation ◽

Aerial Image ◽

Remote Sensing Images ◽

Training Set ◽

Proposed Model ◽

Class Labels ◽

Using Data ◽

Uav Image

In this paper, we present an approach for the multi-label classification of remote sensing images based on data-efficient transformers. During the training phase, we generated a second view for each image from the training set using data augmentation. Then, both the image and its augmented version were reshaped into a sequence of flattened patches and then fed to the transformer encoder. The latter extracts a compact feature representation from each image with the help of a self-attention mechanism, which can handle the global dependencies between different regions of the high-resolution aerial image. On the top of the encoder, we mounted two classifiers, a token and a distiller classifier. During training, we minimized a global loss consisting of two terms, each corresponding to one of the two classifiers. In the test phase, we considered the average of the two classifiers as the final class labels. Experiments on two datasets acquired over the cities of Trento and Civezzano with a ground resolution of two-centimeter demonstrated the effectiveness of the proposed model.

Download Full-text

A Densely Connected GRU Neural Network Based on Coattention Mechanism for Chinese Rice-Related Question Similarity Matching

Agronomy ◽

10.3390/agronomy11071307 ◽

2021 ◽

Vol 11 (7) ◽

pp. 1307

Author(s):

Haoriqin Wang ◽

Huaji Zhu ◽

Huarui Wu ◽

Xiaomin Wang ◽

Xiao Han ◽

...

Keyword(s):

Semi-Supervised Aspect-Based Sentiment Analysis for Case-Related Microblog Reviews Using Case Knowledge Graph Embedding

International Journal of Asian Language Processing ◽

10.1142/s2717554520500125 ◽

2021 ◽

pp. 2050012

Author(s):

Peilian Zhao ◽

Cunli Mao ◽

Zhengtao Yu

Keyword(s):

Sentiment Analysis ◽

Domain Knowledge ◽

Opinion Mining ◽

Data Augmentation ◽

Training Data ◽

Knowledge Graph ◽

Fine Grained ◽

Learning Framework ◽

Proposed Model ◽

Real World Applications

Aspect-Based Sentiment Analysis (ABSA), a fine-grained task of opinion mining, which aims to extract sentiment of specific target from text, is an important task in many real-world applications, especially in the legal field. Therefore, in this paper, we study the problem of limitation of labeled training data required and ignorance of in-domain knowledge representation for End-to-End Aspect-Based Sentiment Analysis (E2E-ABSA) in legal field. We proposed a new method under deep learning framework, named Semi-ETEKGs, which applied E2E framework using knowledge graph (KG) embedding in legal field after data augmentation (DA). Specifically, we pre-trained the BERT embedding and in-domain KG embedding for unlabeled data and labeled data with case elements after DA, and then we put two embeddings into the E2E framework to classify the polarity of target-entity. Finally, we built a case-related dataset based on a popular benchmark for ABSA to prove the efficiency of Semi-ETEKGs, and experiments on case-related dataset from microblog comments show that our proposed model outperforms the other compared methods significantly.

Download Full-text