A Supervised Video Hashing Method Based on a Deep 3D Convolutional Neural Network for Large-Scale Video Retrieval

Hanqing Chen; Chunyan Hu; Feifei Lee; Chaowei Lin; Wei Yao; Lu Chen; Qiu Chen

doi:10.3390/s21093094

A Supervised Video Hashing Method Based on a Deep 3D Convolutional Neural Network for Large-Scale Video Retrieval

Sensors ◽

10.3390/s21093094 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3094

Author(s):

Hanqing Chen ◽

Chunyan Hu ◽

Feifei Lee ◽

Chaowei Lin ◽

Wei Yao ◽

...

Keyword(s):

Neural Network ◽

Feature Extraction ◽

Convolutional Neural Network ◽

Large Scale ◽

Video Retrieval ◽

Video Frames ◽

Video Feature ◽

Video Hashing ◽

Short Video ◽

The Stability

Recently, with the popularization of camera tools such as mobile phones and the rise of various short video platforms, a lot of videos are being uploaded to the Internet at all times, for which a video retrieval system with fast retrieval speed and high precision is very necessary. Therefore, content-based video retrieval (CBVR) has aroused the interest of many researchers. A typical CBVR system mainly contains the following two essential parts: video feature extraction and similarity comparison. Feature extraction of video is very challenging, previous video retrieval methods are mostly based on extracting features from single video frames, while resulting the loss of temporal information in the videos. Hashing methods are extensively used in multimedia information retrieval due to its retrieval efficiency, but most of them are currently only applied to image retrieval. In order to solve these problems in video retrieval, we build an end-to-end framework called deep supervised video hashing (DSVH), which employs a 3D convolutional neural network (CNN) to obtain spatial-temporal features of videos, then train a set of hash functions by supervised hashing to transfer the video features into binary space and get the compact binary codes of videos. Finally, we use triplet loss for network training. We conduct a lot of experiments on three public video datasets UCF-101, JHMDB and HMDB-51, and the results show that the proposed method has advantages over many state-of-the-art video retrieval methods. Compared with the DVH method, the mAP value of UCF-101 dataset is improved by 9.3%, and the minimum improvement on JHMDB dataset is also increased by 0.3%. At the same time, we also demonstrate the stability of the algorithm in the HMDB-51 dataset.

Download Full-text

An Advanced Relevance Feedback Method to Improve Performance of CBIR using Convolutional Neural Network and Comprehensive Values

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b2741.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 5427-5438

Keyword(s):

Neural Network ◽

Feature Extraction ◽

Image Retrieval ◽

Convolutional Neural Network ◽

Large Scale ◽

Activation Function ◽

Image Feature ◽

Similarity Measurement ◽

Query Image ◽

Image Production

Content-Based Image Retrieval (CBIR) is extensively used technique for image retrieval from large image databases. However, users are not satisfied with the conventional image retrieval techniques. In addition, the advent of web development and transmission networks, the number of images available to users continues to increase. Therefore, a permanent and considerable digital image production in many areas takes place. Quick access to the similar images of a given query image from this extensive collection of images pose great challenges and require proficient techniques. From query by image to retrieval of relevant images, CBIR has key phases such as feature extraction, similarity measurement, and retrieval of relevant images. However, extracting the features of the images is one of the important steps. Recently Convolutional Neural Network (CNN) shows good results in the field of computer vision due to the ability of feature extraction from the images. Alex Net is a classical Deep CNN for image feature extraction. We have modified the Alex Net Architecture with a few changes and proposed a novel framework to improve its ability for feature extraction and for similarity measurement. The proposal approach optimizes Alex Net in the aspect of pooling layer. In particular, average pooling is replaced by max-avg pooling and the non-linear activation function Maxout is used after every Convolution layer for better feature extraction. This paper introduces CNN for features extraction from images in CBIR system and also presents Euclidean distance along with the Comprehensive Values for better results. The proposed framework goes beyond image retrieval, including the large-scale database. The performance of the proposed work is evaluated using precision. The proposed work show better results than existing works.

Download Full-text

CricShotClassify: An Approach to Classifying Batting Shots from Cricket Videos Using a Convolutional Neural Network and Gated Recurrent Unit

Sensors ◽

10.3390/s21082846 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2846

Author(s):

Anik Sen ◽

Kaushik Deb ◽

Pranab Kumar Dhar ◽

Takeshi Koshiba

Keyword(s):

Neural Network ◽

Feature Extraction ◽

Convolutional Neural Network ◽

Network Architecture ◽

Deep Neural Network ◽

Learning Models ◽

Neural Network Architecture ◽

Automatic Feature Extraction ◽

Video Frames ◽

Gated Recurrent Unit

Recognizing the sport of cricket on the basis of different batting shots can be a significant part of context-based advertisement to users watching cricket, generating sensor-based commentary systems and coaching assistants. Due to the similarity between different batting shots, manual feature extraction from video frames is tedious. This paper proposes a hybrid deep-neural-network architecture for classifying 10 different cricket batting shots from offline videos. We composed a novel dataset, CricShot10, comprising uneven lengths of batting shots and unpredictable illumination conditions. Impelled by the enormous success of deep-learning models, we utilized a convolutional neural network (CNN) for automatic feature extraction, and a gated recurrent unit (GRU) to deal with long temporal dependency. Initially, conventional CNN and dilated CNN-based architectures were developed. Following that, different transfer-learning models were investigated—namely, VGG16, InceptionV3, Xception, and DenseNet169—which freeze all the layers. Experiment results demonstrated that the VGG16–GRU model outperformed the other models by attaining 86% accuracy. We further explored VGG16 and two models were developed, one by freezing all but the final 4 VGG16 layers, and another by freezing all but the final 8 VGG16 layers. On our CricShot10 dataset, these two models were 93% accurate. These results verify the effectiveness of our proposed architecture compared with other methods in terms of accuracy.

Download Full-text

Non-Blind Image Deconvolution Based on “Ringing” Removal Using Convolutional Neural Network

Electronic Imaging ◽

10.2352/issn.2470-1173.2020.10.ipas-180 ◽

2020 ◽

Vol 2020 (10) ◽

pp. 181-1-181-7

Author(s):

Takahiro Kudo ◽

Takanori Fujisawa ◽

Takuro Yamaguchi ◽

Masaaki Ikehara

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Network Architecture ◽

Large Scale ◽

Blind Deconvolution ◽

Training Dataset ◽

Image Deconvolution ◽

Classic Problem ◽

Key Points ◽

Blind Image

Image deconvolution has been an important issue recently. It has two kinds of approaches: non-blind and blind. Non-blind deconvolution is a classic problem of image deblurring, which assumes that the PSF is known and does not change universally in space. Recently, Convolutional Neural Network (CNN) has been used for non-blind deconvolution. Though CNNs can deal with complex changes for unknown images, some CNN-based conventional methods can only handle small PSFs and does not consider the use of large PSFs in the real world. In this paper we propose a non-blind deconvolution framework based on a CNN that can remove large scale ringing in a deblurred image. Our method has three key points. The first is that our network architecture is able to preserve both large and small features in the image. The second is that the training dataset is created to preserve the details. The third is that we extend the images to minimize the effects of large ringing on the image borders. In our experiments, we used three kinds of large PSFs and were able to observe high-precision results from our method both quantitatively and qualitatively.

Download Full-text

Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9413453 ◽

2021 ◽

Author(s):

Chao-Han Huck Yang ◽

Jun Qi ◽

Samuel Yen-Chi Chen ◽

Pin-Yu Chen ◽

Sabato Marco Siniscalchi ◽

...

Keyword(s):

Neural Network ◽

Feature Extraction ◽

Speech Recognition ◽

Convolutional Neural Network ◽

Automatic Speech Recognition

Download Full-text

Deep convolutional neural networks for human movement detection using wireless signals

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189629 ◽

2021 ◽

pp. 1-10

Author(s):

Chien-Cheng Leea ◽

Zhongjian Gao ◽

Xiu-Chi Huanga

Keyword(s):

Neural Network ◽

Feature Extraction ◽

Convolutional Neural Network ◽

Detection System ◽

Deep Convolutional Neural Network ◽

Human Detection ◽

Two Dimensional ◽

Dimensional Matrix ◽

State Classification ◽

Propagation Paths

This paper proposes a Wi-Fi-based indoor human detection system using a deep convolutional neural network. The system detects different human states in various situations, including different environments and propagation paths. The main improvements proposed by the system is that there is no cameras overhead and no sensors are mounted. This system captures useful amplitude information from the channel state information and converts this information into an image-like two-dimensional matrix. Next, the two-dimensional matrix is used as an input to a deep convolutional neural network (CNN) to distinguish human states. In this work, a deep residual network (ResNet) architecture is used to perform human state classification with hierarchical topological feature extraction. Several combinations of datasets for different environments and propagation paths are used in this study. ResNet’s powerful inference simplifies feature extraction and improves the accuracy of human state classification. The experimental results show that the fine-tuned ResNet-18 model has good performance in indoor human detection, including people not present, people still, and people moving. Compared with traditional machine learning using handcrafted features, this method is simple and effective.

Download Full-text

Classification of Skin Disease Using Deep Learning Neural Networks with MobileNet V2 and LSTM

Sensors ◽

10.3390/s21082852 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2852

Author(s):

Parvathaneni Naga Srinivasu ◽

Jalluri Gnana SivaSai ◽

Muhammad Fazal Ijaz ◽

Akash Kumar Bhoi ◽

Wonjoon Kim ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Learning ◽

Convolutional Neural Network ◽

Skin Disease ◽

Network Architecture ◽

Large Scale ◽

Short Term Memory ◽

Convolutional Networks ◽

Occurrence Matrix

Deep learning models are efficient in learning the features that assist in understanding complex patterns precisely. This study proposed a computerized process of classifying skin disease through deep learning based MobileNet V2 and Long Short Term Memory (LSTM). The MobileNet V2 model proved to be efficient with a better accuracy that can work on lightweight computational devices. The proposed model is efficient in maintaining stateful information for precise predictions. A grey-level co-occurrence matrix is used for assessing the progress of diseased growth. The performance has been compared against other state-of-the-art models such as Fine-Tuned Neural Networks (FTNN), Convolutional Neural Network (CNN), Very Deep Convolutional Networks for Large-Scale Image Recognition developed by Visual Geometry Group (VGG), and convolutional neural network architecture that expanded with few changes. The HAM10000 dataset is used and the proposed method has outperformed other methods with more than 85% accuracy. Its robustness in recognizing the affected region much faster with almost 2× lesser computations than the conventional MobileNet model results in minimal computational efforts. Furthermore, a mobile application is designed for instant and proper action. It helps the patient and dermatologists identify the type of disease from the affected region’s image at the initial stage of the skin disease. These findings suggest that the proposed system can help general practitioners efficiently and effectively diagnose skin conditions, thereby reducing further complications and morbidity.

Download Full-text

Predicting the pandemic: sentiment evaluation and predictive analysis from large-scale tweets on Covid-19 by deep convolutional neural network

Evolutionary Intelligence ◽

10.1007/s12065-021-00598-7 ◽

2021 ◽

Author(s):

Sourav Das ◽

Anup Kumar Kolya

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Large Scale ◽

Deep Convolutional Neural Network ◽

Predictive Analysis

Download Full-text

Programmable phase-change metasurfaces on waveguides for multimode photonic convolutional neural network

Nature Communications ◽

10.1038/s41467-020-20365-z ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Changming Wu ◽

Heshan Yu ◽

Seokhyeong Lee ◽

Ruoming Peng ◽

Ichiro Takeuchi ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Phase Change ◽

Convolutional Neural Network ◽

Large Scale ◽

Phase Change Materials ◽

Refractive Index Change ◽

Optical Computing ◽

Machine Learning Algorithms ◽

Matrix Vector Multiplication

AbstractNeuromorphic photonics has recently emerged as a promising hardware accelerator, with significant potential speed and energy advantages over digital electronics for machine learning algorithms, such as neural networks of various types. Integrated photonic networks are particularly powerful in performing analog computing of matrix-vector multiplication (MVM) as they afford unparalleled speed and bandwidth density for data transmission. Incorporating nonvolatile phase-change materials in integrated photonic devices enables indispensable programming and in-memory computing capabilities for on-chip optical computing. Here, we demonstrate a multimode photonic computing core consisting of an array of programable mode converters based on on-waveguide metasurfaces made of phase-change materials. The programmable converters utilize the refractive index change of the phase-change material Ge2Sb2Te5 during phase transition to control the waveguide spatial modes with a very high precision of up to 64 levels in modal contrast. This contrast is used to represent the matrix elements, with 6-bit resolution and both positive and negative values, to perform MVM computation in neural network algorithms. We demonstrate a prototypical optical convolutional neural network that can perform image processing and recognition tasks with high accuracy. With a broad operation bandwidth and a compact device footprint, the demonstrated multimode photonic core is promising toward large-scale photonic neural networks with ultrahigh computation throughputs.

Download Full-text

A Study of Spatial-Spectral Feature Extraction frameworks with 3D Convolutional Neural Network for Robust Hyperspectral Imagery Classification

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing ◽

10.1109/jstars.2020.3046414 ◽

2020 ◽

pp. 1-1

Author(s):

Bishwas Praveen ◽

Vineetha Menon

Keyword(s):

Neural Network ◽

Feature Extraction ◽

Convolutional Neural Network ◽

Hyperspectral Imagery ◽

Spectral Feature

Download Full-text

AUTOMATED SCREENING OF DIABETIC RETINOPATHY WITH OPTIMIZED DEEP CONVOLUTIONAL NEURAL NETWORK: ENHANCED MOTH FLAME MODEL

Journal of Mechanics in Medicine and Biology ◽

10.1142/s0219519421500056 ◽

2021 ◽

Vol 21 (01) ◽

pp. 2150005

Author(s):

ARUN T NAIR ◽

K. MUTHUVEL

Keyword(s):

Neural Network ◽

Feature Extraction ◽

Blood Vessel ◽

Convolutional Neural Network ◽

Optimization Algorithm ◽

Vessel Segmentation ◽

Deep Convolutional Neural Network ◽

Gray Level ◽

Pass Filter ◽

Blood Vessel Segmentation

Nowadays, analysis on retinal image exists as one of the challenging area for study. Numerous retinal diseases could be recognized by analyzing the variations taking place in retina. However, the main disadvantage among those studies is that, they do not have higher recognition accuracy. The proposed framework includes four phases namely, (i) Blood Vessel Segmentation (ii) Feature Extraction (iii) Optimal Feature Selection and (iv) Classification. Initially, the input fundus image is subjected to blood vessel segmentation from which two binary thresholded images (one from High Pass Filter (HPF) and other from top-hat reconstruction) are acquired. These two images are differentiated and the areas that are common to both are said to be the major vessels and the left over regions are fused to form vessel sub-image. These vessel sub-images are classified with Gaussian Mixture Model (GMM) classifier and the resultant is summed up with the major vessels to form the segmented blood vessels. The segmented images are subjected to feature extraction process, where the features like proposed Local Binary Pattern (LBP), Gray-Level Co-Occurrence Matrix (GLCM) and Gray Level Run Length Matrix (GLRM) are extracted. As the curse of dimensionality seems to be the greatest issue, it is important to select the appropriate features from the extracted one for classification. In this paper, a new improved optimization algorithm Moth Flame with New Distance Formulation (MF-NDF) is introduced for selecting the optimal features. Finally, the selected optimal features are subjected to Deep Convolutional Neural Network (DCNN) model for classification. Further, in order to make the precise diagnosis, the weights of DCNN are optimally tuned by the same optimization algorithm. The performance of the proposed algorithm will be compared against the conventional algorithms in terms of positive and negative measures.

Download Full-text