scholarly journals Sound Can Help Us See More Clearly

Sensors ◽  
2022 ◽  
Vol 22 (2) ◽  
pp. 599
Author(s):  
Yongsheng Li ◽  
Tengfei Tu ◽  
Hua Zhang ◽  
Jishuai Li ◽  
Zhengping Jin ◽  
...  

In the field of video action classification, existing network frameworks often only use video frames as input. When the object involved in the action does not appear in a prominent position in the video frame, the network cannot accurately classify it. We introduce a new neural network structure that uses sound to assist in processing such tasks. The original sound wave is converted into sound texture as the input of the network. Furthermore, in order to use the rich modal information (images and sound) in the video, we designed and used a two-stream frame. In this work, we assume that sound data can be used to solve motion recognition tasks. To demonstrate this, we designed a neural network based on sound texture to perform video action classification tasks. Then, we fuse this network with a deep neural network that uses continuous video frames to construct a two-stream network, which is called A-IN. Finally, in the kinetics dataset, we use our proposed A-IN to compare with the image-only network. The experimental results show that the recognition accuracy of the two-stream neural network model with uesed sound data features is increased by 7.6% compared with the network using video frames. This proves that the rational use of the rich information in the video can improve the classification effect.

2018 ◽  
Vol 8 (12) ◽  
pp. 2529 ◽  
Author(s):  
Xiaoqing Wang ◽  
Xiangjun Wang

When large-scale annotated data are not available for certain image classification tasks, training a deep convolutional neural network model becomes challenging. Some recent domain adaptation methods try to solve this problem using generative adversarial networks and have achieved promising results. However, these methods are based on a shared latent space assumption and they do not consider the situation when shared high level representations in different domains do not exist or are not ideal as they assumed. To overcome this limitation, we propose a neural network structure called coupled generative adversarial autoencoders (CGAA) that allows a pair of generators to learn the high-level differences between two domains by sharing only part of the high-level layers. Additionally, by introducing a class consistent loss calculated by a stand-alone classifier into the generator optimization, our model is able to generate class invariant style-transferred images suitable for classification tasks in domain adaptation. We apply CGAA to several domain transferred image classification scenarios including several benchmark datasets. Experiment results have shown that our method can achieve state-of-the-art classification results.


2018 ◽  
Vol 8 (12) ◽  
pp. 2472 ◽  
Author(s):  
Fuji Ren ◽  
Jiawen Deng

As a foundation and typical task in natural language processing, text classification has been widely applied in many fields. However, as the basis of text classification, most existing corpus are imbalanced and often result in the classifier tending its performance to those categories with more texts. In this paper, we propose a background knowledge based multi-stream neural network to make up for the imbalance or insufficient information caused by the limitations of training corpus. The multi-stream network mainly consists of the basal stream, which retained original sequence information, and background knowledge based streams. Background knowledge is composed of keywords and co-occurred words which are extracted from external corpus. Background knowledge based streams are devoted to realizing supplemental information and reinforce basal stream. To better fuse the features extracted from different streams, early-fusion and two after-fusion strategies are employed. According to the results obtained from both Chinese corpus and English corpus, it is demonstrated that the proposed background knowledge based multi-stream neural network performs well in classification tasks.


Author(s):  
Wei Jia ◽  
Li Li ◽  
Zhu Li ◽  
Xiang Zhang ◽  
Shan Liu

The block-based coding structure in the hybrid video coding framework inevitably introduces compression artifacts such as blocking, ringing, and so on. To compensate for those artifacts, extensive filtering techniques were proposed in the loop of video codecs, which are capable of boosting the subjective and objective qualities of reconstructed videos. Recently, neural network-based filters were presented with the power of deep learning from a large magnitude of data. Though the coding efficiency has been improved from traditional methods in High-Efficiency Video Coding (HEVC), the rich features and information generated by the compression pipeline have not been fully utilized in the design of neural networks. Therefore, in this article, we propose the Residual-Reconstruction-based Convolutional Neural Network (RRNet) to further improve the coding efficiency to its full extent, where the compression features induced from bitstream in form of prediction residual are fed into the network as an additional input to the reconstructed frame. In essence, the residual signal can provide valuable information about block partitions and can aid reconstruction of edge and texture regions in a picture. Thus, more adaptive parameters can be trained to handle different texture characteristics. The experimental results show that our proposed RRNet approach presents significant BD-rate savings compared to HEVC and the state-of-the-art CNN-based schemes, indicating that residual signal plays a significant role in enhancing video frame reconstruction.


2020 ◽  
Vol 8 (6) ◽  
pp. 5069-5073

Deeplearning has been used to solve complex problems in various domains. As it advances, it also creates applications which become a major threat to our privacy, security and even to our Democracy. Such an application which is being developed recently is the "Deepfake". Deepfake models can create fake images and videos that humans cannot differentiate them from the genuine ones. Therefore, the counter application to automatically detect and analyze the digital visual media is necessary in today world. This paper details retraining the image classification models to apprehend the features from each deepfake video frames. After feeding different sets of deepfake clips of video fringes through a pretrained layer of bottleneck in the neural network is made for every video frame, already stated layer contains condense data for all images and exposes artificial manipulations in Deepfake videos. When checking Deepfake videos, this technique received more than 87 per cent accuracy. This technique has been tested on the Face Forensics dataset and obtained good accuracy in detection.


Author(s):  
Deepak S. Dharrao ◽  
Nilesh J. Uke

Face recognition from low quality videos is one of the major challenges prevailing in video surveillance system. Several works have been contributed towards the face recognition, but have suffered due to the fact that the low quality videos have face part with low resolution. Also, using traditional feature extraction schemes make the recognition process to be tough. This paper introduces a novel feature descriptor and classification scheme for recognizing the face from low quality videos. Here, the video frames are sequentially provided to the Viola–Jones algorithm for detecting the face part, and the quality of the face part is improved by applying bi-cubic interpolation based super resolution scheme. Now, the features of enhanced video frame are extracted using proposed local direction feature descriptor, namely scattering wavelet-based local directional pattern (SW-LDP). Then, the extracted features are fed as the input to the actor critic neural network, where training is done using the newly developed fractional calculus based krill–lion (fractional KL) algorithm. The proposed fractional KL-ACNN algorithm is experimented using the standard FAMED database. From the analysis, it is evident that the proposed classifier achieved low FAR of 3.89%, and low FRR of 4.04%, and high accuracy value of 95%, respectively.


2020 ◽  
Vol 2020 (17) ◽  
pp. 2-1-2-6
Author(s):  
Shih-Wei Sun ◽  
Ting-Chen Mou ◽  
Pao-Chi Chang

To improve the workout efficiency and to provide the body movement suggestions to users in a “smart gym” environment, we propose to use a depth camera for capturing a user’s body parts and mount multiple inertial sensors on the body parts of a user to generate deadlift behavior models generated by a recurrent neural network structure. The contribution of this paper is trifold: 1) The multimodal sensing signals obtained from multiple devices are fused for generating the deadlift behavior classifiers, 2) the recurrent neural network structure can analyze the information from the synchronized skeletal and inertial sensing data, and 3) a Vaplab dataset is generated for evaluating the deadlift behaviors recognizing capability in the proposed method.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1065
Author(s):  
Moshe Bensimon ◽  
Shlomo Greenberg ◽  
Moshe Haiut

This work presents a new approach based on a spiking neural network for sound preprocessing and classification. The proposed approach is biologically inspired by the biological neuron’s characteristic using spiking neurons, and Spike-Timing-Dependent Plasticity (STDP)-based learning rule. We propose a biologically plausible sound classification framework that uses a Spiking Neural Network (SNN) for detecting the embedded frequencies contained within an acoustic signal. This work also demonstrates an efficient hardware implementation of the SNN network based on the low-power Spike Continuous Time Neuron (SCTN). The proposed sound classification framework suggests direct Pulse Density Modulation (PDM) interfacing of the acoustic sensor with the SCTN-based network avoiding the usage of costly digital-to-analog conversions. This paper presents a new connectivity approach applied to Spiking Neuron (SN)-based neural networks. We suggest considering the SCTN neuron as a basic building block in the design of programmable analog electronics circuits. Usually, a neuron is used as a repeated modular element in any neural network structure, and the connectivity between the neurons located at different layers is well defined. Thus, generating a modular Neural Network structure composed of several layers with full or partial connectivity. The proposed approach suggests controlling the behavior of the spiking neurons, and applying smart connectivity to enable the design of simple analog circuits based on SNN. Unlike existing NN-based solutions for which the preprocessing phase is carried out using analog circuits and analog-to-digital conversion, we suggest integrating the preprocessing phase into the network. This approach allows referring to the basic SCTN as an analog module enabling the design of simple analog circuits based on SNN with unique inter-connections between the neurons. The efficiency of the proposed approach is demonstrated by implementing SCTN-based resonators for sound feature extraction and classification. The proposed SCTN-based sound classification approach demonstrates a classification accuracy of 98.73% using the Real-World Computing Partnership (RWCP) database.


2020 ◽  
Vol 34 (07) ◽  
pp. 10607-10614 ◽  
Author(s):  
Xianhang Cheng ◽  
Zhenzhong Chen

Learning to synthesize non-existing frames from the original consecutive video frames is a challenging task. Recent kernel-based interpolation methods predict pixels with a single convolution process to replace the dependency of optical flow. However, when scene motion is larger than the pre-defined kernel size, these methods yield poor results even though they take thousands of neighboring pixels into account. To solve this problem in this paper, we propose to use deformable separable convolution (DSepConv) to adaptively estimate kernels, offsets and masks to allow the network to obtain information with much fewer but more relevant pixels. In addition, we show that the kernel-based methods and conventional flow-based methods are specific instances of the proposed DSepConv. Experimental results demonstrate that our method significantly outperforms the other kernel-based interpolation methods and shows strong performance on par or even better than the state-of-the-art algorithms both qualitatively and quantitatively.


2020 ◽  
Vol 2020 ◽  
pp. 1-6
Author(s):  
Jian-ye Yuan ◽  
Xin-yuan Nan ◽  
Cheng-rong Li ◽  
Le-le Sun

Considering that the garbage classification is urgent, a 23-layer convolutional neural network (CNN) model is designed in this paper, with the emphasis on the real-time garbage classification, to solve the low accuracy of garbage classification and recycling and difficulty in manual recycling. Firstly, the depthwise separable convolution was used to reduce the Params of the model. Then, the attention mechanism was used to improve the accuracy of the garbage classification model. Finally, the model fine-tuning method was used to further improve the performance of the garbage classification model. Besides, we compared the model with classic image classification models including AlexNet, VGG16, and ResNet18 and lightweight classification models including MobileNetV2 and SuffleNetV2 and found that the model GAF_dense has a higher accuracy rate, fewer Params, and FLOPs. To further check the performance of the model, we tested the CIFAR-10 data set and found the accuracy rates of the model (GAF_dense) are 0.018 and 0.03 higher than ResNet18 and SufflenetV2, respectively. In the ImageNet data set, the accuracy rates of the model (GAF_dense) are 0.225 and 0.146 higher than Resnet18 and SufflenetV2, respectively. Therefore, the garbage classification model proposed in this paper is suitable for garbage classification and other classification tasks to protect the ecological environment, which can be applied to classification tasks such as environmental science, children’s education, and environmental protection.


Sign in / Sign up

Export Citation Format

Share Document