scholarly journals A Supervised Video Hashing Method Based on a Deep 3D Convolutional Neural Network for Large-Scale Video Retrieval

Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3094
Author(s):  
Hanqing Chen ◽  
Chunyan Hu ◽  
Feifei Lee ◽  
Chaowei Lin ◽  
Wei Yao ◽  
...  

Recently, with the popularization of camera tools such as mobile phones and the rise of various short video platforms, a lot of videos are being uploaded to the Internet at all times, for which a video retrieval system with fast retrieval speed and high precision is very necessary. Therefore, content-based video retrieval (CBVR) has aroused the interest of many researchers. A typical CBVR system mainly contains the following two essential parts: video feature extraction and similarity comparison. Feature extraction of video is very challenging, previous video retrieval methods are mostly based on extracting features from single video frames, while resulting the loss of temporal information in the videos. Hashing methods are extensively used in multimedia information retrieval due to its retrieval efficiency, but most of them are currently only applied to image retrieval. In order to solve these problems in video retrieval, we build an end-to-end framework called deep supervised video hashing (DSVH), which employs a 3D convolutional neural network (CNN) to obtain spatial-temporal features of videos, then train a set of hash functions by supervised hashing to transfer the video features into binary space and get the compact binary codes of videos. Finally, we use triplet loss for network training. We conduct a lot of experiments on three public video datasets UCF-101, JHMDB and HMDB-51, and the results show that the proposed method has advantages over many state-of-the-art video retrieval methods. Compared with the DVH method, the mAP value of UCF-101 dataset is improved by 9.3%, and the minimum improvement on JHMDB dataset is also increased by 0.3%. At the same time, we also demonstrate the stability of the algorithm in the HMDB-51 dataset.

Content-Based Image Retrieval (CBIR) is extensively used technique for image retrieval from large image databases. However, users are not satisfied with the conventional image retrieval techniques. In addition, the advent of web development and transmission networks, the number of images available to users continues to increase. Therefore, a permanent and considerable digital image production in many areas takes place. Quick access to the similar images of a given query image from this extensive collection of images pose great challenges and require proficient techniques. From query by image to retrieval of relevant images, CBIR has key phases such as feature extraction, similarity measurement, and retrieval of relevant images. However, extracting the features of the images is one of the important steps. Recently Convolutional Neural Network (CNN) shows good results in the field of computer vision due to the ability of feature extraction from the images. Alex Net is a classical Deep CNN for image feature extraction. We have modified the Alex Net Architecture with a few changes and proposed a novel framework to improve its ability for feature extraction and for similarity measurement. The proposal approach optimizes Alex Net in the aspect of pooling layer. In particular, average pooling is replaced by max-avg pooling and the non-linear activation function Maxout is used after every Convolution layer for better feature extraction. This paper introduces CNN for features extraction from images in CBIR system and also presents Euclidean distance along with the Comprehensive Values for better results. The proposed framework goes beyond image retrieval, including the large-scale database. The performance of the proposed work is evaluated using precision. The proposed work show better results than existing works.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2846
Author(s):  
Anik Sen ◽  
Kaushik Deb ◽  
Pranab Kumar Dhar ◽  
Takeshi Koshiba

Recognizing the sport of cricket on the basis of different batting shots can be a significant part of context-based advertisement to users watching cricket, generating sensor-based commentary systems and coaching assistants. Due to the similarity between different batting shots, manual feature extraction from video frames is tedious. This paper proposes a hybrid deep-neural-network architecture for classifying 10 different cricket batting shots from offline videos. We composed a novel dataset, CricShot10, comprising uneven lengths of batting shots and unpredictable illumination conditions. Impelled by the enormous success of deep-learning models, we utilized a convolutional neural network (CNN) for automatic feature extraction, and a gated recurrent unit (GRU) to deal with long temporal dependency. Initially, conventional CNN and dilated CNN-based architectures were developed. Following that, different transfer-learning models were investigated—namely, VGG16, InceptionV3, Xception, and DenseNet169—which freeze all the layers. Experiment results demonstrated that the VGG16–GRU model outperformed the other models by attaining 86% accuracy. We further explored VGG16 and two models were developed, one by freezing all but the final 4 VGG16 layers, and another by freezing all but the final 8 VGG16 layers. On our CricShot10 dataset, these two models were 93% accurate. These results verify the effectiveness of our proposed architecture compared with other methods in terms of accuracy.


2020 ◽  
Vol 2020 (10) ◽  
pp. 181-1-181-7
Author(s):  
Takahiro Kudo ◽  
Takanori Fujisawa ◽  
Takuro Yamaguchi ◽  
Masaaki Ikehara

Image deconvolution has been an important issue recently. It has two kinds of approaches: non-blind and blind. Non-blind deconvolution is a classic problem of image deblurring, which assumes that the PSF is known and does not change universally in space. Recently, Convolutional Neural Network (CNN) has been used for non-blind deconvolution. Though CNNs can deal with complex changes for unknown images, some CNN-based conventional methods can only handle small PSFs and does not consider the use of large PSFs in the real world. In this paper we propose a non-blind deconvolution framework based on a CNN that can remove large scale ringing in a deblurred image. Our method has three key points. The first is that our network architecture is able to preserve both large and small features in the image. The second is that the training dataset is created to preserve the details. The third is that we extend the images to minimize the effects of large ringing on the image borders. In our experiments, we used three kinds of large PSFs and were able to observe high-precision results from our method both quantitatively and qualitatively.


2021 ◽  
pp. 1-10
Author(s):  
Chien-Cheng Leea ◽  
Zhongjian Gao ◽  
Xiu-Chi Huanga

This paper proposes a Wi-Fi-based indoor human detection system using a deep convolutional neural network. The system detects different human states in various situations, including different environments and propagation paths. The main improvements proposed by the system is that there is no cameras overhead and no sensors are mounted. This system captures useful amplitude information from the channel state information and converts this information into an image-like two-dimensional matrix. Next, the two-dimensional matrix is used as an input to a deep convolutional neural network (CNN) to distinguish human states. In this work, a deep residual network (ResNet) architecture is used to perform human state classification with hierarchical topological feature extraction. Several combinations of datasets for different environments and propagation paths are used in this study. ResNet’s powerful inference simplifies feature extraction and improves the accuracy of human state classification. The experimental results show that the fine-tuned ResNet-18 model has good performance in indoor human detection, including people not present, people still, and people moving. Compared with traditional machine learning using handcrafted features, this method is simple and effective.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2852
Author(s):  
Parvathaneni Naga Srinivasu ◽  
Jalluri Gnana SivaSai ◽  
Muhammad Fazal Ijaz ◽  
Akash Kumar Bhoi ◽  
Wonjoon Kim ◽  
...  

Deep learning models are efficient in learning the features that assist in understanding complex patterns precisely. This study proposed a computerized process of classifying skin disease through deep learning based MobileNet V2 and Long Short Term Memory (LSTM). The MobileNet V2 model proved to be efficient with a better accuracy that can work on lightweight computational devices. The proposed model is efficient in maintaining stateful information for precise predictions. A grey-level co-occurrence matrix is used for assessing the progress of diseased growth. The performance has been compared against other state-of-the-art models such as Fine-Tuned Neural Networks (FTNN), Convolutional Neural Network (CNN), Very Deep Convolutional Networks for Large-Scale Image Recognition developed by Visual Geometry Group (VGG), and convolutional neural network architecture that expanded with few changes. The HAM10000 dataset is used and the proposed method has outperformed other methods with more than 85% accuracy. Its robustness in recognizing the affected region much faster with almost 2× lesser computations than the conventional MobileNet model results in minimal computational efforts. Furthermore, a mobile application is designed for instant and proper action. It helps the patient and dermatologists identify the type of disease from the affected region’s image at the initial stage of the skin disease. These findings suggest that the proposed system can help general practitioners efficiently and effectively diagnose skin conditions, thereby reducing further complications and morbidity.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Changming Wu ◽  
Heshan Yu ◽  
Seokhyeong Lee ◽  
Ruoming Peng ◽  
Ichiro Takeuchi ◽  
...  

AbstractNeuromorphic photonics has recently emerged as a promising hardware accelerator, with significant potential speed and energy advantages over digital electronics for machine learning algorithms, such as neural networks of various types. Integrated photonic networks are particularly powerful in performing analog computing of matrix-vector multiplication (MVM) as they afford unparalleled speed and bandwidth density for data transmission. Incorporating nonvolatile phase-change materials in integrated photonic devices enables indispensable programming and in-memory computing capabilities for on-chip optical computing. Here, we demonstrate a multimode photonic computing core consisting of an array of programable mode converters based on on-waveguide metasurfaces made of phase-change materials. The programmable converters utilize the refractive index change of the phase-change material Ge2Sb2Te5 during phase transition to control the waveguide spatial modes with a very high precision of up to 64 levels in modal contrast. This contrast is used to represent the matrix elements, with 6-bit resolution and both positive and negative values, to perform MVM computation in neural network algorithms. We demonstrate a prototypical optical convolutional neural network that can perform image processing and recognition tasks with high accuracy. With a broad operation bandwidth and a compact device footprint, the demonstrated multimode photonic core is promising toward large-scale photonic neural networks with ultrahigh computation throughputs.


2021 ◽  
Vol 21 (01) ◽  
pp. 2150005
Author(s):  
ARUN T NAIR ◽  
K. MUTHUVEL

Nowadays, analysis on retinal image exists as one of the challenging area for study. Numerous retinal diseases could be recognized by analyzing the variations taking place in retina. However, the main disadvantage among those studies is that, they do not have higher recognition accuracy. The proposed framework includes four phases namely, (i) Blood Vessel Segmentation (ii) Feature Extraction (iii) Optimal Feature Selection and (iv) Classification. Initially, the input fundus image is subjected to blood vessel segmentation from which two binary thresholded images (one from High Pass Filter (HPF) and other from top-hat reconstruction) are acquired. These two images are differentiated and the areas that are common to both are said to be the major vessels and the left over regions are fused to form vessel sub-image. These vessel sub-images are classified with Gaussian Mixture Model (GMM) classifier and the resultant is summed up with the major vessels to form the segmented blood vessels. The segmented images are subjected to feature extraction process, where the features like proposed Local Binary Pattern (LBP), Gray-Level Co-Occurrence Matrix (GLCM) and Gray Level Run Length Matrix (GLRM) are extracted. As the curse of dimensionality seems to be the greatest issue, it is important to select the appropriate features from the extracted one for classification. In this paper, a new improved optimization algorithm Moth Flame with New Distance Formulation (MF-NDF) is introduced for selecting the optimal features. Finally, the selected optimal features are subjected to Deep Convolutional Neural Network (DCNN) model for classification. Further, in order to make the precise diagnosis, the weights of DCNN are optimally tuned by the same optimization algorithm. The performance of the proposed algorithm will be compared against the conventional algorithms in terms of positive and negative measures.


Sign in / Sign up

Export Citation Format

Share Document