D-MONA: A dilated mixed-order non-local attention network for speaker and language recognition

2021 ◽  
Author(s):  
Xiaoxiao Miao ◽  
Ian McLoughlin ◽  
Wenchao Wang ◽  
Pengyuan Zhang
Author(s):  
Zhibo Rao ◽  
Mingyi He ◽  
Yuchao Dai ◽  
Zhidong Zhu ◽  
Bo Li ◽  
...  

Accurate disparity prediction is a hot spot in computer vision, and how to efficiently exploit contextual information is the key to improve the performance. In this paper, we propose a simple yet effective non-local context attention network to exploit the global context information by using attention mechanisms and semantic information for stereo matching. First, we develop a 2D geometry feature learning module to get a more discriminative representation by taking advantage of multi-scale features and form them into the variance-based cost volume. Then, we construct a non-local attention matching module by using the non-local block and hierarchical 3D convolutions, which can effectively regularize the cost volume and capture the global contextual information. Finally, we adopt a geometry refinement module to refine the disparity map to further improve the performance. Moreover, we add the warping loss function to help the model learn the matching rule of the non-occluded region. Our experiments show that (1) our approach achieves competitive results on KITTI and SceneFlow datasets in the end-point error and the fraction of erroneous pixels $({D_1})$ ; (2) our proposed method particularly has superior performance in the reflective regions and occluded areas.


IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Xiaobiao Du ◽  
Saibiao Jiang ◽  
Yujuan Si ◽  
Lina Xu ◽  
Chongjin Liu

Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1120
Author(s):  
Lu Meng ◽  
Ronghui Li

Sign language is the most important way of communication for hearing-impaired people. Research on sign language recognition can help normal people understand sign language. We reviewed the classic methods of sign language recognition, and the recognition accuracy is not high enough because of redundant information, human finger occlusion, motion blurring, the diversified signing styles of different people, and so on. To overcome these shortcomings, we propose a multi-scale and dual sign language recognition Network (SLR-Net) based on a graph convolutional network (GCN). The original input data was RGB videos. We first extracted the skeleton data from them and then used the skeleton data for sign language recognition. SLR-Net is mainly composed of three sub-modules: multi-scale attention network (MSA), multi-scale spatiotemporal attention network (MSSTA) and attention enhanced temporal convolution network (ATCN). MSA allows the GCN to learn the dependencies between long-distance vertices; MSSTA can directly learn the spatiotemporal features; ATCN allows the GCN network to better learn the long temporal dependencies. The three different attention mechanisms, multi-scale attention mechanism, spatiotemporal attention mechanism, and temporal attention mechanism, are proposed to further improve the robustness and accuracy. Besides, a keyframe extraction algorithm is proposed, which can greatly improve efficiency by sacrificing a little accuracy. Experimental results showed that our method can reach 98.08% accuracy rate in the CSL-500 dataset with a 500-word vocabulary. Even on the challenging dataset DEVISIGN-L with a 2000-word vocabulary, it also reached a 64.57% accuracy rate, outperforming other state-of-the-art sign language recognition methods.


2021 ◽  
pp. 1-12
Author(s):  
Yanhan Zhang ◽  
Shengwei Tian ◽  
Long Yu ◽  
Yuan Ren ◽  
Zhongyu Gao ◽  
...  

In recent years, the incidence of skin diseases has increased significantly, and some malignant tumors caused by skin diseases have brought great hidden dangers to people’s health. In order to help experts perform lesion measurement and auxiliary diagnosis, automatic segmentation methods are very needed in clinical practice. Deep learning and contextual information extraction methods have been applied to many image segmentation tasks. However, their performance is limited due to insufficient training of a large number of parameters and these parameters sometimes fail to capture long-term dependencies. In addition, due to the many interfering factors of the skin disease image, the complex boundary and the uncertain size and shape of the lesion, the segmentation of the skin disease image is still a challenging problem. To solve these problems, we propose a long-distance contextual attention network(LCA-Net). By connecting the non-local module and the channel attention (CAM) in parallel to form a non-local operation, the long-term dependence is captured from the two dimensions of space and channel to enhance the network’s ability to extract features of skin diseases. Our method has an average Jaccard index of 0.771 on the ISIC2017 dataset, which represents a 0.6%improvement over the ISIC2017 Challenge Champion model. The average Jaccard index of 5-fold cross-validation on the ISIC2018 dataset is 0.8256. At the same time, we also compared with some advanced methods of image segmentation, the experimental results show our proposed method has a competitive performance.


Author(s):  
Zhifeng Shao

Recently, low voltage (≤5kV) scanning electron microscopes have become popular because of their unprecedented advantages, such as minimized charging effects and smaller specimen damage, etc. Perhaps the most important advantage of LVSEM is that they may be able to provide ultrahigh resolution since the interaction volume decreases when electron energy is reduced. It is obvious that no matter how low the operating voltage is, the resolution is always poorer than the probe radius. To achieve 10Å resolution at 5kV (including non-local effects), we would require a probe radius of 5∽6 Å. At low voltages, we can no longer ignore the effects of chromatic aberration because of the increased ratio δV/V. The 3rd order spherical aberration is another major limiting factor. The optimized aperture should be calculated as


Author(s):  
Zhifeng Shao ◽  
A.V. Crewe

For scanning electron microscopes, it is plausible that by lowering the primary electron energy, one can decrease the volume of interaction and improve resolution. As shown by Crewe /1/, at V0 =5kV a 10Å resolution (including non-local effects) is possible. To achieve this, we would need a probe size about 5Å. However, at low voltages, the chromatic aberration becomes the major concern even for field emission sources. In this case, δV/V = 0.1 V/5kV = 2x10-5. As a rough estimate, it has been shown that /2/ the chromatic aberration δC should be less than ⅓ of δ0 the probe size determined by diffraction and spherical aberration in order to neglect its effect. But this did not take into account the distribution of electron energy. We will show that by using a wave optical treatment, the tolerance on the chromatic aberration is much larger than we expected.


Sign in / Sign up

Export Citation Format

Share Document