An EfficientNet-like Feature Extractor and Focal CTC Loss for Image-base Sequence Recognition

Author(s):  
Dinh Viet Sang ◽  
Nguyen Hoang Thuan
2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Fintan Nagle ◽  
Alan Johnston

AbstractEncoding and recognising complex natural sequences provides a challenge for human vision. We found that observers could recognise a previously presented segment of a video of a hearth fire when embedded in a longer sequence. Recognition performance declined when the test video was spatially inverted, but not when it was hue reversed or temporally reversed. Sampled motion degraded forwards/reversed playback discrimination, indicating observers were sensitive to the asymmetric pattern of motion of flames. For brief targets, performance increased with target length. More generally, performance depended on the relative lengths of the target and embedding sequence. Increased errors with embedded sequence length were driven by positive responses to non-target sequences (false alarms) rather than omissions. Taken together these observations favour interpreting performance in terms of an incremental decision-making model based on a sequential statistical analysis in which evidence accrues for one of two alternatives. We also suggest that prediction could provide a means of providing and evaluating evidence in a sequential analysis model.


2021 ◽  
Vol 11 (9) ◽  
pp. 3952
Author(s):  
Shimin Tang ◽  
Zhiqiang Chen

With the ubiquitous use of mobile imaging devices, the collection of perishable disaster-scene data has become unprecedentedly easy. However, computing methods are unable to understand these images with significant complexity and uncertainties. In this paper, the authors investigate the problem of disaster-scene understanding through a deep-learning approach. Two attributes of images are concerned, including hazard types and damage levels. Three deep-learning models are trained, and their performance is assessed. Specifically, the best model for hazard-type prediction has an overall accuracy (OA) of 90.1%, and the best damage-level classification model has an explainable OA of 62.6%, upon which both models adopt the Faster R-CNN architecture with a ResNet50 network as a feature extractor. It is concluded that hazard types are more identifiable than damage levels in disaster-scene images. Insights are revealed, including that damage-level recognition suffers more from inter- and intra-class variations, and the treatment of hazard-agnostic damage leveling further contributes to the underlying uncertainties.


Author(s):  
Fei Wang ◽  
Chen Li ◽  
Zhen Zeng ◽  
Ke Xu ◽  
Sirui Cheng ◽  
...  

1991 ◽  
Vol 19 (24) ◽  
pp. 7003-7003 ◽  
Author(s):  
Y. Pommier ◽  
G. Capranico ◽  
A. Orr ◽  
K.W. Kohn

Sign in / Sign up

Export Citation Format

Share Document