video frames
Recently Published Documents


TOTAL DOCUMENTS

738
(FIVE YEARS 313)

H-INDEX

23
(FIVE YEARS 8)

Author(s):  
Chunling Tu ◽  
Shengzhi Du

<span>Vehicle and vehicle license detection obtained incredible achievements during recent years that are also popularly used in real traffic scenarios, such as intelligent traffic monitoring systems, auto parking systems, and vehicle services. Computer vision attracted much attention in vehicle and vehicle license detection, benefit from image processing and machine learning technologies. However, the existing methods still have some issues with vehicle and vehicle license plate recognition, especially in a complex environment. In this paper, we propose a multivehicle detection and license plate recognition system based on a hierarchical region convolutional neural network (RCNN). Firstly, a higher level of RCNN is employed to extract vehicles from the original images or video frames. Secondly, the regions of the detected vehicles are input to a lower level (smaller) RCNN to detect the license plate. Thirdly, the detected license plate is split into single numbers. Finally, the individual numbers are recognized by an even smaller RCNN. The experiments on the real traffic database validated the proposed method. Compared with the commonly used all-in-one deep learning structure, the proposed hierarchical method deals with the license plate recognition task in multiple levels for sub-tasks, which enables the modification of network size and structure according to the complexity of sub-tasks. Therefore, the computation load is reduced.</span>


Sensors ◽  
2022 ◽  
Vol 22 (2) ◽  
pp. 599
Author(s):  
Yongsheng Li ◽  
Tengfei Tu ◽  
Hua Zhang ◽  
Jishuai Li ◽  
Zhengping Jin ◽  
...  

In the field of video action classification, existing network frameworks often only use video frames as input. When the object involved in the action does not appear in a prominent position in the video frame, the network cannot accurately classify it. We introduce a new neural network structure that uses sound to assist in processing such tasks. The original sound wave is converted into sound texture as the input of the network. Furthermore, in order to use the rich modal information (images and sound) in the video, we designed and used a two-stream frame. In this work, we assume that sound data can be used to solve motion recognition tasks. To demonstrate this, we designed a neural network based on sound texture to perform video action classification tasks. Then, we fuse this network with a deep neural network that uses continuous video frames to construct a two-stream network, which is called A-IN. Finally, in the kinetics dataset, we use our proposed A-IN to compare with the image-only network. The experimental results show that the recognition accuracy of the two-stream neural network model with uesed sound data features is increased by 7.6% compared with the network using video frames. This proves that the rational use of the rich information in the video can improve the classification effect.


2022 ◽  
Author(s):  
Nabeel Durrani ◽  
Damjan Vukovic ◽  
Maria Antico ◽  
Jeroen van der Burgt ◽  
Ruud JG van van Sloun ◽  
...  

<div>Our automated deep learning-based approach identifies consolidation/collapse in LUS images to aid in the diagnosis of late stages of COVID-19 induced pneumonia, where consolidation/collapse is one of the possible associated pathologies. A common challenge in training such models is that annotating each frame of an ultrasound video requires high labelling effort. This effort in practice becomes prohibitive for large ultrasound datasets. To understand the impact of various degrees of labelling precision, we compare labelling strategies to train fully supervised models (frame-based method, higher labelling effort) and inaccurately supervised models (video-based methods, lower labelling effort), both of which yield binary predictions for LUS videos on a frame-by-frame level. We moreover introduce a novel sampled quaternary method which randomly samples only 10% of the LUS video frames and subsequently assigns (ordinal) categorical labels to all frames in the video based on the fraction of positively annotated samples. This method outperformed the inaccurately supervised video-based method of our previous work on pleural effusions. More surprisingly, this method outperformed the supervised frame-based approach with respect to metrics such as precision-recall area under curve (PR-AUC) and F1 score that are suitable for the class imbalance scenario of our dataset despite being a form of inaccurate learning. This may be due to the combination of a significantly smaller data set size compared to our previous work and the higher complexity of consolidation/collapse compared to pleural effusion, two factors which contribute to label noise and overfitting; specifically, we argue that our video-based method is more robust with respect to label noise and mitigates overfitting in a manner similar to label smoothing. Using clinical expert feedback, separate criteria were developed to exclude data from the training and test sets respectively for our ten-fold cross validation results, which resulted in a PR-AUC score of 73% and an accuracy of 89%. While the efficacy of our classifier using the sampled quaternary method must be verified on a larger consolidation/collapse dataset, when considering the complexity of the pathology, our proposed classifier using the sampled quaternary video-based method is clinically comparable with trained experts and improves over the video-based method of our previous work on pleural effusions.</div>


Author(s):  
Andreas Leibetseder ◽  
Klaus Schoeffmann ◽  
Jörg Keckstein ◽  
Simon Keckstein

AbstractEndometriosis is a common gynecologic condition typically treated via laparoscopic surgery. Its visual versatility makes it hard to identify for non-specialized physicians and challenging to classify or localize via computer-aided analysis. In this work, we take a first step in the direction of localized endometriosis recognition in laparoscopic gynecology videos using region-based deep neural networks Faster R-CNN and Mask R-CNN. We in particular use and further develop publicly available data for transfer learning deep detection models according to distinctive visual lesion characteristics. Subsequently, we evaluate the performance impact of different data augmentation techniques, including selected geometrical and visual transformations, specular reflection removal as well as region tracking across video frames. Finally, particular attention is given to creating reasonable data segmentation for training, validation and testing. The best performing result surprisingly is achieved by randomly applying simple cropping combined with rotation, resulting in a mean average segmentation precision of 32.4% at 50-95% intersection over union overlap (64.2% for 50% overlap).


2022 ◽  
Author(s):  
Nabeel Durrani ◽  
Damjan Vukovic ◽  
Maria Antico ◽  
Jeroen van der Burgt ◽  
Ruud JG van van Sloun ◽  
...  

<div>Our automated deep learning-based approach identifies consolidation/collapse in LUS images to aid in the diagnosis of late stages of COVID-19 induced pneumonia, where consolidation/collapse is one of the possible associated pathologies. A common challenge in training such models is that annotating each frame of an ultrasound video requires high labelling effort. This effort in practice becomes prohibitive for large ultrasound datasets. To understand the impact of various degrees of labelling precision, we compare labelling strategies to train fully supervised models (frame-based method, higher labelling effort) and inaccurately supervised models (video-based methods, lower labelling effort), both of which yield binary predictions for LUS videos on a frame-by-frame level. We moreover introduce a novel sampled quaternary method which randomly samples only 10% of the LUS video frames and subsequently assigns (ordinal) categorical labels to all frames in the video based on the fraction of positively annotated samples. This method outperformed the inaccurately supervised video-based method of our previous work on pleural effusions. More surprisingly, this method outperformed the supervised frame-based approach with respect to metrics such as precision-recall area under curve (PR-AUC) and F1 score that are suitable for the class imbalance scenario of our dataset despite being a form of inaccurate learning. This may be due to the combination of a significantly smaller data set size compared to our previous work and the higher complexity of consolidation/collapse compared to pleural effusion, two factors which contribute to label noise and overfitting; specifically, we argue that our video-based method is more robust with respect to label noise and mitigates overfitting in a manner similar to label smoothing. Using clinical expert feedback, separate criteria were developed to exclude data from the training and test sets respectively for our ten-fold cross validation results, which resulted in a PR-AUC score of 73% and an accuracy of 89%. While the efficacy of our classifier using the sampled quaternary method must be verified on a larger consolidation/collapse dataset, when considering the complexity of the pathology, our proposed classifier using the sampled quaternary video-based method is clinically comparable with trained experts and improves over the video-based method of our previous work on pleural effusions.</div>


Author(s):  
Mritunjay Rai ◽  
Rohit Sharma ◽  
Suresh Chandra Satapathy ◽  
Dileep Kumar Yadav ◽  
Tanmoy Maity ◽  
...  

2022 ◽  
Vol 11 (1) ◽  
pp. 34
Author(s):  
Bashar Alsadik ◽  
Yousif Hussein Khalaf

Ongoing developments in video resolution either using consumer-grade or professional cameras has opened opportunities for different applications such as in sports events broadcasting and digital cinematography. In the field of geoinformation science and photogrammetry, image-based 3D city modeling is expected to benefit from this technology development. Highly detailed 3D point clouds with low noise are expected to be produced when using ultra high definition UHD videos (e.g., 4K, 8K). Furthermore, a greater benefit is expected when the UHD videos are captured from the air by consumer-grade or professional drones. To the best of our knowledge, no studies have been published to quantify the expected outputs when using UHD cameras in terms of 3D modeling and point cloud density. In this paper, a quantification is shown about the expected point clouds and orthophotos qualities when using UHD videos from consumer-grade drones and a review of which applications they can be applied in. The results show that an improvement in 3D models of ≅65% relative accuracy and ≅90% in point density can be attained when using 8K video frames compared with HD video frames which will open a wide range of applications and business cases in the near future.


2022 ◽  
Vol 31 (2) ◽  
pp. 917-928
Author(s):  
I. Muthumani ◽  
N. Malmurugan ◽  
L. Ganesan
Keyword(s):  

2022 ◽  
pp. 1522-1531
Author(s):  
Ayan Chatterjee ◽  
Nikhilesh Barik

Today, in the time of internet based communication, steganography is an important approach. In this approach, secret information is embedded in a cover medium with minimum distortion of it. Here, a video steganography scheme is developed in frequency domain category. Frequency domain is more effective than spatial domain due to variation data insertion domain. To change actual domain of entropy pixels of the video frames, uniform crossover of Genetic Algorithm (GA) is used. Then for data insertion in video frames, single layer perceptron of Artificial Neural Network is used. This particular concept of information security is attractive due to its high security during wireless communication. The effectiveness of the proposed technique is analyzed with the parameters PSNR (Peak Signal to Noise Ratio), IF and Payload (bpb).


2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

Modern artificial intelligence systems have revolutionized approaches to scientific and technological challenges in a variety of fields, thus remarkable improvements in the quality of state-of-the-art computer vision and other techniques are observed; object tracking in video frames is a vital field of research that provides information about objects and their trajectories. This paper presents an object tracking method basing on optical flow generated between frames and a ConvNet method. Initially, optical center displacement is employed to detect possible the bounding box center of the tracked object. Then, CenterNet is used for object position correction. Given the initial set of points (i.e., bounding box) in first frame, the tracker tries to follow the motion of center of these points by looking at its direction of change in calculated optical flow with next frame, a correction mechanism takes place and waits for motions that surpass a correction threshold to launch position corrections.


Sign in / Sign up

Export Citation Format

Share Document