A Cross-Modal Image and Text Retrieval Method Based on Efficient Feature Extraction and Interactive Learning CAE

In view of the complexity of the multimodal environment and the existing shallow network structure that cannot achieve high-precision image and text retrieval, a cross-modal image and text retrieval method combining efficient feature extraction and interactive learning convolutional autoencoder (CAE) is proposed. First, the residual network convolution kernel is improved by incorporating two-dimensional principal component analysis (2DPCA) to extract image features and extracting text features through long short-term memory (LSTM) and word vectors to efficiently extract graphic features. Then, based on interactive learning CAE, cross-modal retrieval of images and text is realized. Among them, the image and text features are respectively input to the two input terminals of the dual-modal CAE, and the image-text relationship model is obtained through the interactive learning of the middle layer to realize the image-text retrieval. Finally, based on Flickr30K, MSCOCO, and Pascal VOC 2007 datasets, the proposed method is experimentally demonstrated. The results show that the proposed method can complete accurate image retrieval and text retrieval. Moreover, the mean average precision (MAP) has reached more than 0.3, the area of precision-recall rate (PR) curves are better than other comparison methods, and they are applicable.

Download Full-text

Voice Keyword Retrieval Method Using Attention Mechanism and Multimodal Information Fusion

Scientific Programming ◽

10.1155/2021/6662841 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Hongli Zhang

Keyword(s):

Interactive Learning ◽

Recognition Rate ◽

Audio Signal ◽

Recall Rate ◽

Text Retrieval ◽

Attention Mechanism ◽

Retrieval Method ◽

Multimodal Information Fusion ◽

Text Features ◽

Hidden Layer

A cross-modal speech-text retrieval method using interactive learning convolution automatic encoder (CAE) is proposed. First, an interactive learning autoencoder structure is proposed, including two inputs of speech and text, as well as processing links such as encoding, hidden layer interaction, and decoding, to complete the modeling of cross-modal speech-text retrieval. Then, the original audio signal is preprocessed and the Mel frequency cepstrum coefficient (MFCC) feature is extracted. In addition, the word bag model is used to extract the text features, and then the attention mechanism is used to combine the text and speech features. Through interactive learning CAE, the shared features of speech and text modes are obtained and then sent to modal classifier to identify modal information, so as to realize cross-modal voice text retrieval. Finally, experiments show that the performance of the proposed algorithm is better than that of the contrast algorithm in terms of recall rate, accuracy rate, and false recognition rate.

Download Full-text

Truncated attention mechanism and cascade loss for cross-modal person re-identification

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210382 ◽

2021 ◽

pp. 1-13

Author(s):

Shuo Shi ◽

Changwei Huo ◽

Yingchun Guo ◽

Stephen Lean ◽

Gang Yan ◽

...

Keyword(s):

Natural Language ◽

Short Term Memory ◽

Principal Component ◽

Image Features ◽

Attention Mechanism ◽

Text And Image ◽

Level Information ◽

Text Features ◽

Lstm Network ◽

Language Description

Person re-identification with natural language description is a process of retrieving the corresponding person’s image from an image dataset according to a text description of the person. The key challenge in this cross-modal task is to extract visual and text features and construct loss functions to achieve cross-modal matching between text and image. Firstly, we designed a two-branch network framework for person re-identification with natural language description. In this framework we include the following: a Bi-directional Long Short-Term Memory (Bi-LSTM) network is used to extract text features and a truncated attention mechanism is proposed to select the principal component of the text features; a MobileNet is used to extract image features. Secondly, we proposed a Cascade Loss Function (CLF), which includes cross-modal matching loss and single modal classification loss, both with relative entropy function, to fully exploit the identity-level information. The experimental results on the CUHK-PEDES dataset demonstrate that our method achieves better results in Top-5 and Top-10 than other current 10 state-of-the-art algorithms.

Download Full-text

Text-Image Retrieval With Salient Features

Journal of Database Management ◽

10.4018/jdm.2021100101 ◽

2021 ◽

Vol 32 (4) ◽

pp. 1-13

Author(s):

Xia Feng ◽

Zhiyi Hu ◽

Caihua Liu ◽

W. H. Ip ◽

Huiying Chen

Keyword(s):

Image Retrieval ◽

Recall Rate ◽

Image Features ◽

Feature Representation ◽

Image Feature ◽

Text And Image ◽

Retrieval Task ◽

Retrieval Method ◽

Salient Features ◽

Object Level

In recent years, deep learning has achieved remarkable results in the text-image retrieval task. However, only global image features are considered, and the vital local information is ignored. This results in a failure to match the text well. Considering that object-level image features can help the matching between text and image, this article proposes a text-image retrieval method that fuses salient image feature representation. Fusion of salient features at the object level can improve the understanding of image semantics and thus improve the performance of text-image retrieval. The experimental results show that the method proposed in the paper is comparable to the latest methods, and the recall rate of some retrieval results is better than the current work.

Download Full-text

Adaptive Principal Component Analysis Combined with Feature Extraction-Based Method for Feature Identification in Manufacturing

Journal of Sensors ◽

10.1155/2019/5736104 ◽

2019 ◽

Vol 2019 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Tsun-Kuo Lin

Keyword(s):

Principal Component Analysis ◽

Feature Extraction ◽

High Performance ◽

Principal Component ◽

Component Analysis ◽

Image Features ◽

Image Feature ◽

Support Vector ◽

Feature Identification ◽

Adaptive Pca

This paper developed a principal component analysis (PCA)-integrated algorithm for feature identification in manufacturing; this algorithm is based on an adaptive PCA-based scheme for identifying image features in vision-based inspection. PCA is a commonly used statistical method for pattern recognition tasks, but an effective PCA-based approach for identifying suitable image features in manufacturing has yet to be developed. Unsuitable image features tend to yield poor results when used in conventional visual inspections. Furthermore, research has revealed that the use of unsuitable or redundant features might influence the performance of object detection. To address these problems, the adaptive PCA-based algorithm developed in this study entails the identification of suitable image features using a support vector machine (SVM) model for inspecting of various object images; this approach can be used for solving the inherent problem of detection that occurs when the extraction contains challenging image features in manufacturing processes. The results of experiments indicated that the proposed algorithm can successfully be used to adaptively select appropriate image features. The algorithm combines image feature extraction and PCA/SVM classification to detect patterns in manufacturing. The algorithm was determined to achieve high-performance detection and to outperform the existing methods.

Download Full-text

An Ecology of Text: Using Text Retrieval to Study Alife on the Net

Artificial Life ◽

10.1162/artl.1997.3.4.261 ◽

1997 ◽

Vol 3 (4) ◽

pp. 261-287 ◽

Cited By ~ 5

Author(s):

Michael L. Best

Keyword(s):

Principal Component Analysis ◽

Natural Selection ◽

Population Ecology ◽

Sufficient Conditions ◽

Principal Component ◽

Component Analysis ◽

Latent Semantic Indexing ◽

Text Retrieval ◽

Semantic Indexing ◽

Retrieval Method

I introduce a new alife model, an ecology based on a corpus of text, and apply it to the analysis of posts to USENET News. In this corporal ecology posts are organisms, the newsgroups of NetNews define an environment, and human posters situated in their wider context make up a scarce resource. I apply latent semantic indexing (LSI), a text retrieval method based on principal component analysis, to distill from the corpus those replicating units of text. LSI arrives at suitable replicators because it discovers word co-occurrences that segregate and recombine with appreciable frequency. I argue that natural selection is necessarily in operation because sufficient conditions for its occurrence are met: replication, mutagenicity, and trait/fitness covariance. I describe a set of experiments performed on a static corpus of over 10,000 posts. In these experiments I study average population fitness, a fundamental element of population ecology. My study of fitness arrives at the tinhappy discovery that a flame-war, centered around an overly prolific poster, is the king of the jungle.

Download Full-text

Biometric authenticator algorithm based on multiresolution analysis

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v20.i3.pp1332-1341 ◽

2020 ◽

Vol 20 (3) ◽

pp. 1332

Author(s):

Soumia Kerrache ◽

Beladgham Mohammed ◽

Hamza Aymen ◽

Kadri Ibrahim

Keyword(s):

Feature Extraction ◽

Multiresolution Analysis ◽

Nearest Neighbor ◽

Curvelet Transform ◽

Principal Component ◽

Image Features ◽

Support Vector ◽

K Nearest Neighbor ◽

Feature Extraction Method ◽

Fusion Approach

Features extraction is an essential process in identifying person biometrics because the effectiveness of the system depends on it. Multiresolution Analysis success can be used in the system of a person’s identification and pattern recognition. In this paper, we present a feature extraction method for two-dimensional face and iris authentication. Our approach is a combination of principal component analysis (PCA) and curvelet transform as an improved fusion approach for feature extraction. The proposed fusion approach involves image denoising using 2D-Curvelet transform to achieve compact representations of curves singularities. This is followed by the application of PCA as a fusion rule to improve upon the spatial resolution. The limitations of the only PCA algorithm are a poor recognition speed and complex mathematical calculating load, to reduce these limitations, we are applying the curvelet transform. <br /> To assess the performance of the presented method, we have employed three classification techniques: Neural networks (NN), K-Nearest Neighbor (KNN) and Support Vector machines (SVM).<br />The results reveal that the extraction of image features is more efficient using Curvelet/PCA.

Download Full-text

Retrieval of Multiple Spatiotemporally Correlated Images on Tourist Attractions Based on Image Processing

Traitement du signal ◽

10.18280/ts.370518 ◽

2020 ◽

Vol 37 (5) ◽

pp. 847-854

Author(s):

Shuang Lu ◽

Qian Zhang ◽

Yi Liu ◽

Lei Liu ◽

Qing Zhu ◽

...

Keyword(s):

Feature Extraction ◽

Short Term Memory ◽

Tourist Attraction ◽

Convolutional Network ◽

Retrieval Method ◽

Tourist Attractions ◽

Feature Extraction Method ◽

Spatiotemporal Feature ◽

Text Information ◽

Shed Light

The thriving of information technology (IT) has elevated the demand for intelligent query and retrieval of information about the tourist attractions of interest, which are the bases for preparing convenient and personalized itineraries. To realize accurate and rapid query of tourist attraction information (not limited to text information), this paper proposes a spatiotemporal feature extraction method and a ranking and retrieval method for multiple spatiotemporally correlated images (MSCIs) on tourist attractions based on deeply recursive convolutional network (DRCN). Firstly, the authors introduced the acquisition process of candidate spatiotemporally correlated images on tourist attractions, including both coarse screening and fine screening. Next, the workflow of spatiotemporal feature extraction from tourist attraction images was explained, as well as he proposed convolutional long short-term memory (ConvLSTM) algorithm. After that, the ranking model of MSCIs was constructed and derived. Experimental results demonstrate that our strategy is effective in the retrieval of tourist attraction images. The research results shed light on the fast and accurate retrieval of other types of images.

Download Full-text

ROBUST DETECTION AND RECOGNITION SYSTEM BASED ON FACIAL EXTRACTION AND DECISION TREE

Journal of Engineering and Sustainable Development ◽

10.31272/jeasd.25.4.4 ◽

2021 ◽

Vol 25 (4) ◽

pp. 40-50

Author(s):

Ansam H. Rashed ◽

◽

Muthana H. Hamd ◽

Keyword(s):

Feature Extraction ◽

Face Recognition ◽

Face Detection ◽

Principal Component ◽

Recognition System ◽

Image Features ◽

Machine Learning Algorithms ◽

Linear Discriminant ◽

Robust Detection ◽

Time Systems

Automatic face recognition system is suggested in this work on the basis of appearance based features focusing on the whole image as well as local based features focusing on critical face points like eyes, mouth, and nose for generating further details. Face detection is the major phase in face recognition systems, certain method for face detection (Viola-Jones) has the ability to process images efficiently and achieve high rates of detection in real time systems. Dimension reduction and feature extraction approaches are going to be utilized on the cropped image caused by detection. One of the simple, yet effective ways for extracting image features is the Local Binary Pattern Histogram (LBPH), while the technique of Principal Component Analysis (PCA) was majorly utilized in pattern recognition. Also, the technique of Linear Discriminant Analysis (LDA) utilized for overcoming PCA limitations was efficiently used in face recognition. Furthermore, classification is going to be utilized following the feature extraction. The utilized machine learning algorithms are PART and J48. The suggested system is showing high accuracy for detection with Viola-Jones 98.75, whereas the features which are extracted by means of LDA with J48 provided the best results of (F-measure, Recall, and Precision).

Download Full-text

Hybrid Tile based Feature Extraction and Support Vector Machine Base Content-Based Image Retrieval System for Medical Application

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c5433.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 3305-3310

Keyword(s):

Support Vector Machine ◽

Feature Extraction ◽

Image Retrieval ◽

Color Image ◽

Medical Application ◽

Recall Rate ◽

Image Features ◽

Support Vector ◽

Svm Classifier ◽

Feature Extraction Method

Through the landing of therapeutic endoscopes, earth perception satellites and individual telephones, content-based picture recovery (CBIR) has concerned critical consideration, activated by its broad applications, e.g., medicinal picture investigation, removed detecting, and individual re-distinguishing proof. Be that as it may, developing successful component extraction is as yet reported as an invigorating issue.In this paper, to overcome the feature extraction problems a hybrid Tile Based Feature Extraction (TBFE) is introduced. The TBFE algorithm is hybrid with the local binary pattern (LBP) and Local derivative pattern (LDP). These hybrid TBFE feature extraction method helps to extract the color image features in automatic manner. Support vector machine (SVM) is used as a classifier in this image retrieval approach to retrieve the images from the database. The hybrid TBFE along with the SVM classifier image retrieval is named as IR-TBFE-SVM. Experiments show that IR-TBFE-SVMdelivers a higher correctness and recall rate than single feature employed retrieval systems, and ownsdecentweight balancing and query efficiency performance.

Download Full-text

Automatic Lip-Reading System Based on Deep Convolutional Neural Network and Attention-Based Long Short-Term Memory

Applied Sciences ◽

10.3390/app9081599 ◽

2019 ◽

Vol 9 (8) ◽

pp. 1599 ◽

Cited By ~ 5

Author(s):

Yuanyao Lu ◽

Hongbo Li

Keyword(s):

Neural Network ◽

Feature Extraction ◽

Convolutional Neural Network ◽

Short Term Memory ◽

Image Features ◽

Attention Mechanism ◽

Image Feature ◽

Fusion Model ◽

Image Feature Extraction ◽

Lip Reading

With the improvement of computer performance, virtual reality (VR) as a new way of visual operation and interaction method gives the automatic lip-reading technology based on visual features broad development prospects. In an immersive VR environment, the user’s state can be successfully captured through lip movements, thereby analyzing the user’s real-time thinking. Due to complex image processing, hard-to-train classifiers and long-term recognition processes, the traditional lip-reading recognition system is difficult to meet the requirements of practical applications. In this paper, the convolutional neural network (CNN) used to image feature extraction is combined with a recurrent neural network (RNN) based on attention mechanism for automatic lip-reading recognition. Our proposed method for automatic lip-reading recognition can be divided into three steps. Firstly, we extract keyframes from our own established independent database (English pronunciation of numbers from zero to nine by three males and three females). Then, we use the Visual Geometry Group (VGG) network to extract the lip image features. It is found that the image feature extraction results are fault-tolerant and effective. Finally, we compare two lip-reading models: (1) a fusion model with an attention mechanism and (2) a fusion model of two networks. The results show that the accuracy of the proposed model is 88.2% in the test dataset and 84.9% for the contrastive model. Therefore, our proposed method is superior to the traditional lip-reading recognition methods and the general neural networks.

Download Full-text