video frames Latest Research Papers

2022 ◽

Vol 12 (1) ◽

pp. 731

Author(s):

Chunling Tu ◽

Shengzhi Du

Keyword(s):

Recognition Task ◽

Recognition System ◽

Traffic Monitoring ◽

Network Size ◽

Learning Technologies ◽

License Plate ◽

License Plate Recognition ◽

Video Frames ◽

The Individual ◽

Real Traffic

<span>Vehicle and vehicle license detection obtained incredible achievements during recent years that are also popularly used in real traffic scenarios, such as intelligent traffic monitoring systems, auto parking systems, and vehicle services. Computer vision attracted much attention in vehicle and vehicle license detection, benefit from image processing and machine learning technologies. However, the existing methods still have some issues with vehicle and vehicle license plate recognition, especially in a complex environment. In this paper, we propose a multivehicle detection and license plate recognition system based on a hierarchical region convolutional neural network (RCNN). Firstly, a higher level of RCNN is employed to extract vehicles from the original images or video frames. Secondly, the regions of the detected vehicles are input to a lower level (smaller) RCNN to detect the license plate. Thirdly, the detected license plate is split into single numbers. Finally, the individual numbers are recognized by an even smaller RCNN. The experiments on the real traffic database validated the proposed method. Compared with the commonly used all-in-one deep learning structure, the proposed hierarchical method deals with the license plate recognition task in multiple levels for sub-tasks, which enables the modification of network size and structure according to the complexity of sub-tasks. Therefore, the computation load is reduced.</span>

Download Full-text

Sound Can Help Us See More Clearly

Sensors ◽

10.3390/s22020599 ◽

2022 ◽

Vol 22 (2) ◽

pp. 599

Author(s):

Yongsheng Li ◽

Tengfei Tu ◽

Hua Zhang ◽

Jishuai Li ◽

Zhengping Jin ◽

...

Keyword(s):

Neural Network ◽

Video Frame ◽

Stream Network ◽

Action Classification ◽

Video Frames ◽

Neural Network Structure ◽

Rich Information ◽

Classification Tasks ◽

The Rich ◽

Sound Data

In the field of video action classification, existing network frameworks often only use video frames as input. When the object involved in the action does not appear in a prominent position in the video frame, the network cannot accurately classify it. We introduce a new neural network structure that uses sound to assist in processing such tasks. The original sound wave is converted into sound texture as the input of the network. Furthermore, in order to use the rich modal information (images and sound) in the video, we designed and used a two-stream frame. In this work, we assume that sound data can be used to solve motion recognition tasks. To demonstrate this, we designed a neural network based on sound texture to perform video action classification tasks. Then, we fuse this network with a deep neural network that uses continuous video frames to construct a two-stream network, which is called A-IN. Finally, in the kinetics dataset, we use our proposed A-IN to compare with the image-only network. The experimental results show that the recognition accuracy of the two-stream neural network model with uesed sound data features is increased by 7.6% compared with the network using video frames. This proves that the rational use of the rich information in the video can improve the classification effect.

Download Full-text

Automatic Deep Learning-Based Consolidation/Collapse Classification in Lung Ultrasound Images for COVID-19 Induced Pneumonia

10.36227/techrxiv.17912387 ◽

2022 ◽

Author(s):

Nabeel Durrani ◽

Damjan Vukovic ◽

Maria Antico ◽

Jeroen van der Burgt ◽

Ruud JG van van Sloun ◽

...

Keyword(s):

Deep Learning ◽

Class Imbalance ◽

Ultrasound Images ◽

Pleural Effusions ◽

Data Set ◽

Label Noise ◽

Set Size ◽

Video Frames ◽

Two Factors ◽

The Impact

<div>Our automated deep learning-based approach identifies consolidation/collapse in LUS images to aid in the diagnosis of late stages of COVID-19 induced pneumonia, where consolidation/collapse is one of the possible associated pathologies. A common challenge in training such models is that annotating each frame of an ultrasound video requires high labelling effort. This effort in practice becomes prohibitive for large ultrasound datasets. To understand the impact of various degrees of labelling precision, we compare labelling strategies to train fully supervised models (frame-based method, higher labelling effort) and inaccurately supervised models (video-based methods, lower labelling effort), both of which yield binary predictions for LUS videos on a frame-by-frame level. We moreover introduce a novel sampled quaternary method which randomly samples only 10% of the LUS video frames and subsequently assigns (ordinal) categorical labels to all frames in the video based on the fraction of positively annotated samples. This method outperformed the inaccurately supervised video-based method of our previous work on pleural effusions. More surprisingly, this method outperformed the supervised frame-based approach with respect to metrics such as precision-recall area under curve (PR-AUC) and F1 score that are suitable for the class imbalance scenario of our dataset despite being a form of inaccurate learning. This may be due to the combination of a significantly smaller data set size compared to our previous work and the higher complexity of consolidation/collapse compared to pleural effusion, two factors which contribute to label noise and overfitting; specifically, we argue that our video-based method is more robust with respect to label noise and mitigates overfitting in a manner similar to label smoothing. Using clinical expert feedback, separate criteria were developed to exclude data from the training and test sets respectively for our ten-fold cross validation results, which resulted in a PR-AUC score of 73% and an accuracy of 89%. While the efficacy of our classifier using the sampled quaternary method must be verified on a larger consolidation/collapse dataset, when considering the complexity of the pathology, our proposed classifier using the sampled quaternary video-based method is clinically comparable with trained experts and improves over the video-based method of our previous work on pleural effusions.</div>

Download Full-text

Endometriosis detection and localization in laparoscopic gynecology

Multimedia Tools and Applications ◽

10.1007/s11042-021-11730-1 ◽

2022 ◽

Author(s):

Andreas Leibetseder ◽

Klaus Schoeffmann ◽

Jörg Keckstein ◽

Simon Keckstein

Keyword(s):

Deep Neural Networks ◽

Data Augmentation ◽

Specular Reflection ◽

Performance Impact ◽

Data Segmentation ◽

Video Frames ◽

Computer Aided Analysis ◽

Computer Aided ◽

Augmentation Techniques ◽

Detection And Localization

AbstractEndometriosis is a common gynecologic condition typically treated via laparoscopic surgery. Its visual versatility makes it hard to identify for non-specialized physicians and challenging to classify or localize via computer-aided analysis. In this work, we take a first step in the direction of localized endometriosis recognition in laparoscopic gynecology videos using region-based deep neural networks Faster R-CNN and Mask R-CNN. We in particular use and further develop publicly available data for transfer learning deep detection models according to distinctive visual lesion characteristics. Subsequently, we evaluate the performance impact of different data augmentation techniques, including selected geometrical and visual transformations, specular reflection removal as well as region tracking across video frames. Finally, particular attention is given to creating reasonable data segmentation for training, validation and testing. The best performing result surprisingly is achieved by randomly applying simple cropping combined with rotation, resulting in a mean average segmentation precision of 32.4% at 50-95% intersection over union overlap (64.2% for 50% overlap).

Download Full-text

Automatic Deep Learning-Based Consolidation/Collapse Classification in Lung Ultrasound Images for COVID-19 Induced Pneumonia

10.36227/techrxiv.17912387.v1 ◽

2022 ◽

Author(s):

Nabeel Durrani ◽

Damjan Vukovic ◽

Maria Antico ◽

Jeroen van der Burgt ◽

Ruud JG van van Sloun ◽

...

Keyword(s):

Deep Learning ◽

Class Imbalance ◽

Ultrasound Images ◽

Pleural Effusions ◽

Data Set ◽

Label Noise ◽

Set Size ◽

Video Frames ◽

Two Factors ◽

The Impact

<div>Our automated deep learning-based approach identifies consolidation/collapse in LUS images to aid in the diagnosis of late stages of COVID-19 induced pneumonia, where consolidation/collapse is one of the possible associated pathologies. A common challenge in training such models is that annotating each frame of an ultrasound video requires high labelling effort. This effort in practice becomes prohibitive for large ultrasound datasets. To understand the impact of various degrees of labelling precision, we compare labelling strategies to train fully supervised models (frame-based method, higher labelling effort) and inaccurately supervised models (video-based methods, lower labelling effort), both of which yield binary predictions for LUS videos on a frame-by-frame level. We moreover introduce a novel sampled quaternary method which randomly samples only 10% of the LUS video frames and subsequently assigns (ordinal) categorical labels to all frames in the video based on the fraction of positively annotated samples. This method outperformed the inaccurately supervised video-based method of our previous work on pleural effusions. More surprisingly, this method outperformed the supervised frame-based approach with respect to metrics such as precision-recall area under curve (PR-AUC) and F1 score that are suitable for the class imbalance scenario of our dataset despite being a form of inaccurate learning. This may be due to the combination of a significantly smaller data set size compared to our previous work and the higher complexity of consolidation/collapse compared to pleural effusion, two factors which contribute to label noise and overfitting; specifically, we argue that our video-based method is more robust with respect to label noise and mitigates overfitting in a manner similar to label smoothing. Using clinical expert feedback, separate criteria were developed to exclude data from the training and test sets respectively for our ten-fold cross validation results, which resulted in a PR-AUC score of 73% and an accuracy of 89%. While the efficacy of our classifier using the sampled quaternary method must be verified on a larger consolidation/collapse dataset, when considering the complexity of the pathology, our proposed classifier using the sampled quaternary video-based method is clinically comparable with trained experts and improves over the video-based method of our previous work on pleural effusions.</div>

Download Full-text

An improved statistical approach for moving object detection in thermal video frames

Multimedia Tools and Applications ◽

10.1007/s11042-021-11548-x ◽

2022 ◽

Author(s):

Mritunjay Rai ◽

Rohit Sharma ◽

Suresh Chandra Satapathy ◽

Dileep Kumar Yadav ◽

Tanmoy Maity ◽

...

Keyword(s):

Object Detection ◽

Statistical Approach ◽

Moving Object Detection ◽

Moving Object ◽

Video Frames ◽

Thermal Video

Download Full-text

Potential Use of Drone Ultra-High-Definition Videos for Detailed 3D City Modeling

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi11010034 ◽

2022 ◽

Vol 11 (1) ◽

pp. 34

Author(s):

Bashar Alsadik ◽

Yousif Hussein Khalaf

Keyword(s):

Technology Development ◽

Point Clouds ◽

3D Models ◽

Low Noise ◽

High Definition ◽

Sports Events ◽

3D Point Clouds ◽

Video Frames ◽

Wide Range ◽

City Modeling

Ongoing developments in video resolution either using consumer-grade or professional cameras has opened opportunities for different applications such as in sports events broadcasting and digital cinematography. In the field of geoinformation science and photogrammetry, image-based 3D city modeling is expected to benefit from this technology development. Highly detailed 3D point clouds with low noise are expected to be produced when using ultra high definition UHD videos (e.g., 4K, 8K). Furthermore, a greater benefit is expected when the UHD videos are captured from the air by consumer-grade or professional drones. To the best of our knowledge, no studies have been published to quantify the expected outputs when using UHD cameras in terms of 3D modeling and point cloud density. In this paper, a quantification is shown about the expected point clouds and orthophotos qualities when using UHD videos from consumer-grade drones and a review of which applications they can be applied in. The results show that an improvement in 3D models of ≅65% relative accuracy and ≅90% in point density can be attained when using 8K video frames compared with HD video frames which will open a wide range of applications and business cases in the near future.

Download Full-text

ResNet CNN with LSTM Based Tamil Text Detection from Video Frames

Intelligent Automation & Soft Computing ◽

10.32604/iasc.2022.018030 ◽

2022 ◽

Vol 31 (2) ◽

pp. 917-928

Author(s):

I. Muthumani ◽

N. Malmurugan ◽

L. Ganesan

Keyword(s):

Text Detection ◽

Video Frames

Download Full-text

A New Data Hiding Scheme Combining Genetic Algorithm and Artificial Neural Network

10.4018/978-1-6684-2408-7.ch075 ◽

2022 ◽

pp. 1522-1531

Author(s):

Ayan Chatterjee ◽

Nikhilesh Barik

Keyword(s):

Neural Network ◽

Genetic Algorithm ◽

Artificial Neural Network ◽

Frequency Domain ◽

Single Layer ◽

Video Steganography ◽

Video Frames ◽

Important Approach ◽

Artificial Neural ◽

Concept Of Information

Today, in the time of internet based communication, steganography is an important approach. In this approach, secret information is embedded in a cover medium with minimum distortion of it. Here, a video steganography scheme is developed in frequency domain category. Frequency domain is more effective than spatial domain due to variation data insertion domain. To change actual domain of entropy pixels of the video frames, uniform crossover of Genetic Algorithm (GA) is used. Then for data insertion in video frames, single layer perceptron of Artificial Neural Network is used. This particular concept of information security is attractive due to its high security during wireless communication. The effectiveness of the proposed technique is analyzed with the parameters PSNR (Peak Signal to Noise Ratio), IF and Payload (bpb).

Download Full-text

Visual Tracking with Object Center Displacement and CenterNet

International Journal of Computer Vision and Image Processing ◽

10.4018/ijcvip.290397 ◽

2022 ◽

Vol 12 (1) ◽

pp. 0-0

Keyword(s):

Optical Flow ◽

Object Tracking ◽

State Of The Art ◽

Optical Center ◽

Bounding Box ◽

Video Frames ◽

Position Correction ◽

Object Center ◽

Set Of Points

Modern artificial intelligence systems have revolutionized approaches to scientific and technological challenges in a variety of fields, thus remarkable improvements in the quality of state-of-the-art computer vision and other techniques are observed; object tracking in video frames is a vital field of research that provides information about objects and their trajectories. This paper presents an object tracking method basing on optical flow generated between frames and a ConvNet method. Initially, optical center displacement is employed to detect possible the bounding box center of the tracked object. Then, CenterNet is used for object position correction. Given the initial set of points (i.e., bounding box) in first frame, the tracker tries to follow the motion of center of these points by looking at its direction of change in calculated optical flow with next frame, a correction mechanism takes place and waits for motions that surpass a correction threshold to launch position corrections.

Download Full-text

video frames
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A hierarchical RCNN for vehicle and vehicle license plate detection and recognition

Sound Can Help Us See More Clearly

Automatic Deep Learning-Based Consolidation/Collapse Classification in Lung Ultrasound Images for COVID-19 Induced Pneumonia

Endometriosis detection and localization in laparoscopic gynecology

Automatic Deep Learning-Based Consolidation/Collapse Classification in Lung Ultrasound Images for COVID-19 Induced Pneumonia

An improved statistical approach for moving object detection in thermal video frames

Potential Use of Drone Ultra-High-Definition Videos for Detailed 3D City Modeling

ResNet CNN with LSTM Based Tamil Text Detection from Video Frames

A New Data Hiding Scheme Combining Genetic Algorithm and Artificial Neural Network

Visual Tracking with Object Center Displacement and CenterNet

Export Citation Format

video framesRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A hierarchical RCNN for vehicle and vehicle license plate detection and recognition

Sound Can Help Us See More Clearly

Automatic Deep Learning-Based Consolidation/Collapse Classification in Lung Ultrasound Images for COVID-19 Induced Pneumonia

Endometriosis detection and localization in laparoscopic gynecology

Automatic Deep Learning-Based Consolidation/Collapse Classification in Lung Ultrasound Images for COVID-19 Induced Pneumonia

An improved statistical approach for moving object detection in thermal video frames

Potential Use of Drone Ultra-High-Definition Videos for Detailed 3D City Modeling

ResNet CNN with LSTM Based Tamil Text Detection from Video Frames

A New Data Hiding Scheme Combining Genetic Algorithm and Artificial Neural Network

Visual Tracking with Object Center Displacement and CenterNet

video frames
Recently Published Documents