Making Video Recognition Models Robust to Common Corruptions With Supervised Contrastive Learning

2021 ◽  
Author(s):  
Tomu Hirata ◽  
Yusuke Mukuta ◽  
Tatsuya Harada
Keyword(s):  
Author(s):  
Ilseo Kim ◽  
Sangmin Oh ◽  
Arash Vahdat ◽  
Kevin Cannons ◽  
A.G. Amitha Perera ◽  
...  
Keyword(s):  

Circulation ◽  
2020 ◽  
Vol 142 (Suppl_3) ◽  
Author(s):  
Pratik Doshi ◽  
John Tanaka ◽  
Jedrek Wosik ◽  
Natalia M Gil ◽  
Martin Bertran ◽  
...  

Introduction: There is a need for innovative solutions to better screen and diagnose the 7 million patients with chronic heart failure. A key component of assessing these patients is monitoring fluid status by evaluating for the presence and height of jugular venous distension (JVD). We hypothesize that video analysis of a patient’s neck using machine learning algorithms and image recognition can identify the amount of JVD. We propose the use of high fidelity video recordings taken using a mobile device camera to determine the presence or absence of JVD, which we will use to develop a point of care testing tool for early detection of acute exacerbation of heart failure. Methods: In this feasibility study, patients in the Duke cardiac catheterization lab undergoing right heart catheterization were enrolled. RGB and infrared videos were captured of the patient’s neck to detect JVD and correlated with right atrial pressure on the heart catheterization. We designed an adaptive filter based on biological priors that enhances spatially consistent frequency anomalies and detects jugular vein distention, with implementation done on Python. Results: We captured and analyzed footage for six patients using our model. Four of these six patients shared a similar strong signal outliner within the frequency band of 95bpm – 200bpm when using a conservative threshold, indicating the presence of JVD. We did not use statistical analysis given the small nature of our cohort, but in those we detected a positive JVD signal the RA mean was 20.25 mmHg and PCWP mean was 24.3 mmHg. Conclusions: We have demonstrated the ability to evaluate for JVD via infrared video and found a relationship with RHC values. Our project is innovative because it uses video recognition and allows for novel patient interactions using a non-invasive screening technique for heart failure. This tool can become a non-invasive standard to both screen for and help manage heart failure patients.


2018 ◽  
Vol 5 (2) ◽  
pp. 258-270
Author(s):  
Aris Budianto

The Automatic License Plate Recognition (ALPR) has been becoming a new trend in transportation systems automation. The extraction of vehicle’s license plate can be done without human intervention. Despite such technology has been widely adopted in developed countries, developing countries remain a far-cry from implementing the sophisticated image and video recognition for some reasons. This paper discusses the challenges and possibilities of implementing Automatic License Plate Recognition within Indonesia’s circumstances. Previous knowledge suggested in the literature, and state of the art of the automatic recognition technology is amassed for consideration in future research and practice.


2021 ◽  
Author(s):  
Matteo Tomei ◽  
Lorenzo Baraldi ◽  
Simone Bronzin ◽  
Rita Cucchiara
Keyword(s):  

2020 ◽  
Vol 34 (07) ◽  
pp. 10941-10948
Author(s):  
Fei He ◽  
Naiyu Gao ◽  
Qiaozhe Li ◽  
Senyao Du ◽  
Xin Zhao ◽  
...  

Video object detection is a challenging task because of the presence of appearance deterioration in certain video frames. One typical solution is to aggregate neighboring features to enhance per-frame appearance features. However, such a method ignores the temporal relations between the aggregated frames, which is critical for improving video recognition accuracy. To handle the appearance deterioration problem, this paper proposes a temporal context enhanced network (TCENet) to exploit temporal context information by temporal aggregation for video object detection. To handle the displacement of the objects in videos, a novel DeformAlign module is proposed to align the spatial features from frame to frame. Instead of adopting a fixed-length window fusion strategy, a temporal stride predictor is proposed to adaptively select video frames for aggregation, which facilitates exploiting variable temporal information and requiring fewer video frames for aggregation to achieve better results. Our TCENet achieves state-of-the-art performance on the ImageNet VID dataset and has a faster runtime. Without bells-and-whistles, our TCENet achieves 80.3% mAP by only aggregating 3 frames.


Author(s):  
Rashmi B Hiremath ◽  
Ramesh M Kagalkar

Sign language is a way of expressing yourself with your body language, where every bit of ones expressions, goals, or sentiments are conveyed by physical practices, for example, outward appearances, body stance, motions, eye movements, touch and the utilization of space. Non-verbal communication exists in both creatures and people, yet this article concentrates on elucidations of human non-verbal or sign language interpretation into Hindi textual expression. The proposed method of implementation utilizes the image processing methods and synthetic intelligence strategies to get the goal of sign video recognition. To carry out the proposed task implementation it uses image processing methods such as frame analysing based tracking, edge detection, wavelet transform, erosion, dilation, blur elimination, noise elimination, on training videos. It also uses elliptical Fourier descriptors called SIFT for shape feature extraction and most important part analysis for feature set optimization and reduction. For result analysis, this paper uses different category videos such as sign of weeks, months, relations etc. Database of extracted outcomes are compared with the video fed to the system as a input of the signer by a trained unclear inference system.


Sign in / Sign up

Export Citation Format

Share Document