video frame
Recently Published Documents


TOTAL DOCUMENTS

631
(FIVE YEARS 304)

H-INDEX

21
(FIVE YEARS 7)

2022 ◽  
Vol 34 (2) ◽  
pp. 1-18
Author(s):  
Lele Qin ◽  
Guojuan Zhang ◽  
Li You

Video command and dispatch systems have become essential communication safeguard measures in circumstances of emergency rescue, epidemic prevention, and control command as, data security has become especially important. After meeting the requirements of voice and video dispatch, this paper proposes an end-to-end encryption method of multimedia information that introduces a multiple protection mechanism including selective encryption and selective integrity protection. The method has a network access authentication and service encryption workflow, which implants startup authentication and key distribution into the information control signaling procedure. This method constitutes a key pool with the three-dimensional Lorenz System, the four-dimensional Cellular Neural Network (CNN) System and the four-dimensional Chen System where the key source system and initial conditions are decided by the plaintext video frame itself. Then, this method optimizes the chaotic sequences to further enhance system security.


2022 ◽  
Vol 18 (1) ◽  
pp. 1-27
Author(s):  
Ran Xu ◽  
Rakesh Kumar ◽  
Pengcheng Wang ◽  
Peter Bai ◽  
Ganga Meghanath ◽  
...  

Videos take a lot of time to transport over the network, hence running analytics on the live video on embedded or mobile devices has become an important system driver. Considering such devices, e.g., surveillance cameras or AR/VR gadgets, are resource constrained, although there has been significant work in creating lightweight deep neural networks (DNNs) for such clients, none of these can adapt to changing runtime conditions, e.g., changes in resource availability on the device, the content characteristics, or requirements from the user. In this article, we introduce ApproxNet, a video object classification system for embedded or mobile clients. It enables novel dynamic approximation techniques to achieve desired inference latency and accuracy trade-off under changing runtime conditions. It achieves this by enabling two approximation knobs within a single DNN model rather than creating and maintaining an ensemble of models, e.g., MCDNN [MobiSys-16]. We show that ApproxNet can adapt seamlessly at runtime to these changes, provides low and stable latency for the image and video frame classification problems, and shows the improvement in accuracy and latency over ResNet [CVPR-16], MCDNN [MobiSys-16], MobileNets [Google-17], NestDNN [MobiCom-18], and MSDNet [ICLR-18].


Sensors ◽  
2022 ◽  
Vol 22 (2) ◽  
pp. 599
Author(s):  
Yongsheng Li ◽  
Tengfei Tu ◽  
Hua Zhang ◽  
Jishuai Li ◽  
Zhengping Jin ◽  
...  

In the field of video action classification, existing network frameworks often only use video frames as input. When the object involved in the action does not appear in a prominent position in the video frame, the network cannot accurately classify it. We introduce a new neural network structure that uses sound to assist in processing such tasks. The original sound wave is converted into sound texture as the input of the network. Furthermore, in order to use the rich modal information (images and sound) in the video, we designed and used a two-stream frame. In this work, we assume that sound data can be used to solve motion recognition tasks. To demonstrate this, we designed a neural network based on sound texture to perform video action classification tasks. Then, we fuse this network with a deep neural network that uses continuous video frames to construct a two-stream network, which is called A-IN. Finally, in the kinetics dataset, we use our proposed A-IN to compare with the image-only network. The experimental results show that the recognition accuracy of the two-stream neural network model with uesed sound data features is increased by 7.6% compared with the network using video frames. This proves that the rational use of the rich information in the video can improve the classification effect.


2022 ◽  
Vol 2022 ◽  
pp. 1-10
Author(s):  
Biao Ma ◽  
Minghui Ji

Both the human body and its motion are three-dimensional information, while the traditional feature description method of two-person interaction based on RGB video has a low degree of discrimination due to the lack of depth information. According to the respective advantages and complementary characteristics of RGB video and depth video, a retrieval algorithm based on multisource motion feature fusion is proposed. Firstly, the algorithm uses the combination of spatiotemporal interest points and word bag model to represent the features of RGB video. Then, the directional gradient histogram is used to represent the feature of the depth video frame. The statistical features of key frames are introduced to represent the histogram features of depth video. Finally, the multifeature image fusion algorithm is used to fuse the two video features. The experimental results show that multisource feature fusion can greatly improve the retrieval accuracy of motion features.


2022 ◽  
Vol 8 ◽  
pp. e843
Author(s):  
Murat Hacimurtazaoglu ◽  
Kemal Tutuncu

Background In terms of data-hiding areas, video steganography is more advantageous compared to other steganography techniques since it uses video as its cover medium. For any video steganography, the good trade-off among robustness, imperceptibility, and payload must be created and maintained. Even though it has the advantage of capacity, video steganography has the robustness problem especially regarding spatial domain is used to implement it. Transformation operations and statistical attacks can harm secret data. Thus, the ideal video steganography technique must provide high imperceptibility, high payload, and resistance towards visual, statistical and transformation-based steganalysis attacks. Methods One of the most common spatial methods for hiding data within the cover medium is the Least Significant Bit (LSB) method. In this study, an LSB-based video steganography application that uses a poly-pattern key block matrix (KBM) as the key was proposed. The key is a 64 × 64 pixel block matrix that consists of 16 sub-pattern blocks with a pixel size of 16 × 16. To increase the security of the proposed approach, sub-patterns in the KBM are allowed to shift in four directions and rotate up to 270° depending on the user preference and logical operations. For additional security XOR and AND logical operations were used to determine whether to choose the next predetermined 64 × 64 pixel block or jump to another pixel block in the cover video frame to place a KBM to embed the secret data. The fact that the combination of variable KBM structure and logical operator for the secret data embedding distinguishes the proposed algorithm from previous video steganography studies conducted with LSB-based approaches. Results Mean Squared Error (MSE), Structural Similarity Index (SSIM) and Peak Signal-to-Noise Ratio (PSNR) parameters were calculated for the detection of the imperceptibility (or the resistance against visual attacks ) of the proposed algorithm. The proposed algorithm obtained the best MSE, SSIM and PSNR parameter values based on the secret message length as 0.00066, 0.99999, 80.01458 dB for 42.8 Kb of secret message and 0.00173, 0.99999, 75.72723 dB for 109 Kb of secret message, respectively. These results are better than the results of classic LSB and the studies conducted with LSB-based video steganography approaches in the literature. Since the proposed system allows an equal amount of data embedding in each video frame the data loss will be less in transformation operations. The lost data can be easily obtained from the entire text with natural language processing. The variable structure of the KBM, logical operators and extra security preventions makes the proposed system be more secure and complex. This increases the unpredictability and resistance against statistical attacks. Thus, the proposed method provides high imperceptibility and resistance towards visual, statistical and transformation-based attacks while acceptable even high payload.


2022 ◽  
Vol 2161 (1) ◽  
pp. 012024
Author(s):  
Padmashree Desai ◽  
C Sujatha ◽  
Saumyajit Chakraborty ◽  
Saurav Ansuman ◽  
Sanika Bhandari ◽  
...  

Abstract Intelligent decision-making systems require the potential for forecasting, foreseeing, and reasoning about future events. The issue of video frame prediction has aroused a lot of attention due to its usefulness in many computer vision applications such as autonomous vehicles and robots. Recent deep learning advances have significantly improved video prediction performance. Nevertheless, as top-performing systems attempt to foresee even more future frames, their predictions become increasingly foggy. We developed a method for predicting a future frame based on a series of prior frames that services the Convolutional Long-Short Term Memory (ConvLSTM) model. The input video is segmented into frames, fed to the ConvLSTM model to extract the features and forecast a future frame which can be beneficial in a variety of applications. We have used two metrics to measure the quality of the predicted frame: structural similarity index (SSIM) and perceptual distance, which help in understanding the difference between the actual frame and the predicted frame. The UCF101 data set is used for testing and training in the project. It is a data collection of realistic action videos taken from YouTube with 101 action categories for action detection. The ConvLSTM model is trained and tested for 24 categories from this dataset and a future frame is predicted which yields satisfactory results. We obtained SSIM as 0.95 and perceptual similarity as 24.28 for our system. The suggested work’s results are also compared to those of state-of-the-art approaches, which are shown to be superior.


Author(s):  
Asma Zahra ◽  
Mubeen Ghafoor ◽  
Kamran Munir ◽  
Ata Ullah ◽  
Zain Ul Abideen

AbstractSmart video surveillance helps to build more robust smart city environment. The varied angle cameras act as smart sensors and collect visual data from smart city environment and transmit it for further visual analysis. The transmitted visual data is required to be in high quality for efficient analysis which is a challenging task while transmitting videos on low capacity bandwidth communication channels. In latest smart surveillance cameras, high quality of video transmission is maintained through various video encoding techniques such as high efficiency video coding. However, these video coding techniques still provide limited capabilities and the demand of high-quality based encoding for salient regions such as pedestrians, vehicles, cyclist/motorcyclist and road in video surveillance systems is still not met. This work is a contribution towards building an efficient salient region-based surveillance framework for smart cities. The proposed framework integrates a deep learning-based video surveillance technique that extracts salient regions from a video frame without information loss, and then encodes it in reduced size. We have applied this approach in diverse case studies environments of smart city to test the applicability of the framework. The successful result in terms of bitrate 56.92%, peak signal to noise ratio 5.35 bd and SR based segmentation accuracy of 92% and 96% for two different benchmark datasets is the outcome of proposed work. Consequently, the generation of less computational region-based video data makes it adaptable to improve surveillance solution in Smart Cities.


Electronics ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 31
Author(s):  
Jianqiang Xu ◽  
Haoyu Zhao ◽  
Weidong Min ◽  
Yi Zou ◽  
Qiyan Fu

Crowd gathering detection plays an important role in security supervision of public areas. Existing image-processing-based methods are not robust for complex scenes, and deep-learning-based methods for gathering detection mainly focus on the design of the network, which ignores the inner feature of the crowd gathering action. To alleviate such problems, this work proposes a novel framework Detection of Group Gathering (DGG) based on the crowd counting method using deep learning approaches and statistics to detect crowd gathering. The DGG mainly contains three parts, i.e., Detecting Candidate Frame of Gathering (DCFG), Gathering Area Detection (GAD), and Gathering Judgement (GJ). The DCFG is proposed to find the frame index in a video that has the maximum people number based on the crowd counting method. This frame means that the crowd has gathered and the specific gathering area will be detected next. The GAD detects the local area that has the maximum crowd density in a frame with a slide search box. The local area contains the inner feature of the gathering action and represents that the crowd gathering in this local area, which is denoted by grid coordinates in a video frame. Based on the detected results of the DCFG and the GAD, the GJ is proposed to analyze the statistical relationship between the local area and the global area to find the stable pattern for the crowd gathering action. Experiments based on benchmarks show that the proposed DGG has a robust representation of the gathering feature and a high detection accuracy. There is the potential that the DGG can be used in social security and smart city domains.


Author(s):  
Qingbo Yang ◽  
Fangzhou Xu ◽  
Jiancai Leng

Robotic arms are powerful assistants in many industrial production environments, and they run periodically in accordance with preset actions to complete specified operations. However, they may act abnormally when encountering unexpected situation and then lead to unnecessary loss. Recognizing the abnormal actions of robotic arms through surveillance video can automatically help us to understand their operating status and discover possible abnormalities in time. We designed a deep learning architecture based on 3D convolution for abnormal action recognition. The 3D convolutional layer can extract the spatial and temporal features of the robotic arm movements from the video frame difference sequence. The features are compressed and streamlined by the maximum pooling layer to obtain concise and effective robotic arm action features. Finally, the fully connected layer is used to classify the features to recognize the abnormal robotic arm tasks. Support vector data description (SVDD) model is employed to detect abnormal actions of the robotic arm, and the well-trained SVDD model can distinguish the normal actions from the three kinds of abnormal actions with the Area Under Curve (AUC) 99.17% .


Sign in / Sign up

Export Citation Format

Share Document