scholarly journals Multi-Scale Dense Attention Network for Stereo Matching

Electronics ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1881
Author(s):  
Yuhui Chang ◽  
Jiangtao Xu ◽  
Zhiyuan Gao

To improve the accuracy of stereo matching, the multi-scale dense attention network (MDA-Net) is proposed. The network introduces two novel modules in the feature extraction stage to achieve better exploit of context information: dual-path upsampling (DU) block and attention-guided context-aware pyramid feature extraction (ACPFE) block. The DU block is introduced to fuse different scale feature maps. It introduces sub-pixel convolution to compensate for the loss of information caused by the traditional interpolation upsampling method. The ACPFE block is proposed to extract multi-scale context information. Pyramid atrous convolution is adopted to exploit multi-scale features and the channel-attention is used to fuse the multi-scale features. The proposed network has been evaluated on several benchmark datasets. The three-pixel-error evaluated over all ground truth pixels is 2.10% on KITTI 2015 dataset. The experiment results prove that MDA-Net achieves state-of-the-art accuracy on KITTI 2012 and 2015 datasets.

Cancers ◽  
2020 ◽  
Vol 12 (8) ◽  
pp. 2031 ◽  
Author(s):  
Taimoor Shakeel Sheikh ◽  
Yonghee Lee ◽  
Migyung Cho

Diagnosis of pathologies using histopathological images can be time-consuming when many images with different magnification levels need to be analyzed. State-of-the-art computer vision and machine learning methods can help automate the diagnostic pathology workflow and thus reduce the analysis time. Automated systems can also be more efficient and accurate, and can increase the objectivity of diagnosis by reducing operator variability. We propose a multi-scale input and multi-feature network (MSI-MFNet) model, which can learn the overall structures and texture features of different scale tissues by fusing multi-resolution hierarchical feature maps from the network’s dense connectivity structure. The MSI-MFNet predicts the probability of a disease on the patch and image levels. We evaluated the performance of our proposed model on two public benchmark datasets. Furthermore, through ablation studies of the model, we found that multi-scale input and multi-feature maps play an important role in improving the performance of the model. Our proposed model outperformed the existing state-of-the-art models by demonstrating better accuracy, sensitivity, and specificity.


Forests ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 937
Author(s):  
Boyang Zhang ◽  
Hongbo Mu ◽  
Mingyu Gao ◽  
Haiming Ni ◽  
Jianfeng Chen ◽  
...  

The precise segmentation of forest areas is essential for monitoring tasks related to forest exploration, extraction, and statistics. However, the effective and accurate segmentation of forest images will be affected by factors such as blurring and discontinuity of forest boundaries. Therefore, a Pyramid Feature Extraction-UNet network (PFE-UNet) based on traditional UNet is proposed to be applied to end-to-end forest image segmentation. Among them, the Pyramid Feature Extraction module (PFE) is introduced in the network transition layer, which obtains multi-scale forest image information through different receptive fields. The spatial attention module (SA) and the channel-wise attention module (CA) are applied to low-level feature maps and PFE feature maps, respectively, to highlight specific segmentation task features while fusing context information and suppressing irrelevant regions. The standard convolution block is replaced by a novel depthwise separable convolutional unit (DSC Unit), which not only reduces the computational cost but also prevents overfitting. This paper presents an extensive evaluation with the DeepGlobe dataset and a comparative analysis with several state-of-the-art networks. The experimental results show that the PFE-UNet network obtains an accuracy of 94.23% in handling the real-time forest image segmentation, which is significantly higher than other advanced networks. This means that the proposed PFE-UNet also provides a valuable reference for the precise segmentation of forest images.


Author(s):  
Yujia Sun ◽  
Geng Chen ◽  
Tao Zhou ◽  
Yi Zhang ◽  
Nian Liu

Camouflaged object detection (COD) is a challenging task due to the low boundary contrast between the object and its surroundings. In addition, the appearance of camouflaged objects varies significantly, e.g., object size and shape, aggravating the difficulties of accurate COD. In this paper, we propose a novel Context-aware Cross-level Fusion Network (C2F-Net) to address the challenging COD task. Specifically, we propose an Attention-induced Cross-level Fusion Module (ACFM) to integrate the multi-level features with informative attention coefficients. The fused features are then fed to the proposed Dual-branch Global Context Module (DGCM), which yields multi-scale feature representations for exploiting rich global context information. In C2F-Net, the two modules are conducted on high-level features using a cascaded manner. Extensive experiments on three widely used benchmark datasets demonstrate that our C2F-Net is an effective COD model and outperforms state-of-the-art models remarkably. Our code is publicly available at: https://github.com/thograce/C2FNet.


2021 ◽  
Vol 2 (2) ◽  
pp. 1-18
Author(s):  
Hongchao Gao ◽  
Yujia Li ◽  
Jiao Dai ◽  
Xi Wang ◽  
Jizhong Han ◽  
...  

Recognizing irregular text from natural scene images is challenging due to the unconstrained appearance of text, such as curvature, orientation, and distortion. Recent recognition networks regard this task as a text sequence labeling problem and most networks capture the sequence only from a single-granularity visual representation, which to some extent limits the performance of recognition. In this article, we propose a hierarchical attention network to capture multi-granularity deep local representations for recognizing irregular scene text. It consists of several hierarchical attention blocks, and each block contains a Local Visual Representation Module (LVRM) and a Decoder Module (DM). Based on the hierarchical attention network, we propose a scene text recognition network. The extensive experiments show that our proposed network achieves the state-of-the-art performance on several benchmark datasets including IIIT-5K, SVT, CUTE, SVT-Perspective, and ICDAR datasets under shorter training time.


2020 ◽  
Vol 34 (07) ◽  
pp. 11693-11700 ◽  
Author(s):  
Ao Luo ◽  
Fan Yang ◽  
Xin Li ◽  
Dong Nie ◽  
Zhicheng Jiao ◽  
...  

Crowd counting is an important yet challenging task due to the large scale and density variation. Recent investigations have shown that distilling rich relations among multi-scale features and exploiting useful information from the auxiliary task, i.e., localization, are vital for this task. Nevertheless, how to comprehensively leverage these relations within a unified network architecture is still a challenging problem. In this paper, we present a novel network structure called Hybrid Graph Neural Network (HyGnn) which targets to relieve the problem by interweaving the multi-scale features for crowd density as well as its auxiliary task (localization) together and performing joint reasoning over a graph. Specifically, HyGnn integrates a hybrid graph to jointly represent the task-specific feature maps of different scales as nodes, and two types of relations as edges: (i) multi-scale relations capturing the feature dependencies across scales and (ii) mutual beneficial relations building bridges for the cooperation between counting and localization. Thus, through message passing, HyGnn can capture and distill richer relations between nodes to obtain more powerful representations, providing robust and accurate results. Our HyGnn performs significantly well on four challenging datasets: ShanghaiTech Part A, ShanghaiTech Part B, UCF_CC_50 and UCF_QNRF, outperforming the state-of-the-art algorithms by a large margin.


Sensors ◽  
2021 ◽  
Vol 21 (20) ◽  
pp. 6780
Author(s):  
Zhitong Lai ◽  
Rui Tian ◽  
Zhiguo Wu ◽  
Nannan Ding ◽  
Linjian Sun ◽  
...  

Pyramid architecture is a useful strategy to fuse multi-scale features in deep monocular depth estimation approaches. However, most pyramid networks fuse features only within the adjacent stages in a pyramid structure. To take full advantage of the pyramid structure, inspired by the success of DenseNet, this paper presents DCPNet, a densely connected pyramid network that fuses multi-scale features from multiple stages of the pyramid structure. DCPNet not only performs feature fusion between the adjacent stages, but also non-adjacent stages. To fuse these features, we design a simple and effective dense connection module (DCM). In addition, we offer a new consideration of the common upscale operation in our approach. We believe DCPNet offers a more efficient way to fuse features from multiple scales in a pyramid-like network. We perform extensive experiments using both outdoor and indoor benchmark datasets (i.e., the KITTI and the NYU Depth V2 datasets) and DCPNet achieves the state-of-the-art results.


Sensors ◽  
2020 ◽  
Vol 20 (14) ◽  
pp. 3818
Author(s):  
Ye Zhang ◽  
Yi Hou ◽  
Shilin Zhou ◽  
Kewei Ouyang

Recent advances in time series classification (TSC) have exploited deep neural networks (DNN) to improve the performance. One promising approach encodes time series as recurrence plot (RP) images for the sake of leveraging the state-of-the-art DNN to achieve accuracy. Such an approach has been shown to achieve impressive results, raising the interest of the community in it. However, it remains unsolved how to handle not only the variability in the distinctive region scale and the length of sequences but also the tendency confusion problem. In this paper, we tackle the problem using Multi-scale Signed Recurrence Plots (MS-RP), an improvement of RP, and propose a novel method based on MS-RP images and Fully Convolutional Networks (FCN) for TSC. This method first introduces phase space dimension and time delay embedding of RP to produce multi-scale RP images; then, with the use of asymmetrical structure, constructed RP images can represent very long sequences (>700 points). Next, MS-RP images are obtained by multiplying designed sign masks in order to remove the tendency confusion. Finally, FCN is trained with MS-RP images to perform classification. Experimental results on 45 benchmark datasets demonstrate that our method improves the state-of-the-art in terms of classification accuracy and visualization evaluation.


Algorithms ◽  
2020 ◽  
Vol 13 (3) ◽  
pp. 60 ◽  
Author(s):  
Wen Liu ◽  
Yankui Sun ◽  
Qingge Ji

Optical coherence tomography (OCT) is an optical high-resolution imaging technique for ophthalmic diagnosis. In this paper, we take advantages of multi-scale input, multi-scale side output and dual attention mechanism and present an enhanced nested U-Net architecture (MDAN-UNet), a new powerful fully convolutional network for automatic end-to-end segmentation of OCT images. We have evaluated two versions of MDAN-UNet (MDAN-UNet-16 and MDAN-UNet-32) on two publicly available benchmark datasets which are the Duke Diabetic Macular Edema (DME) dataset and the RETOUCH dataset, in comparison with other state-of-the-art segmentation methods. Our experiment demonstrates that MDAN-UNet-32 achieved the best performance, followed by MDAN-UNet-16 with smaller parameter, for multi-layer segmentation and multi-fluid segmentation respectively.


2020 ◽  
Vol 10 (23) ◽  
pp. 8434
Author(s):  
Peiran Peng ◽  
Ying Wang ◽  
Can Hao ◽  
Zhizhong Zhu ◽  
Tong Liu ◽  
...  

Fabric defect detection is very important in the textile quality process. Current deep learning algorithms are not effective in detecting tiny and extreme aspect ratio fabric defects. In this paper, we proposed a strong detection method, Priori Anchor Convolutional Neural Network (PRAN-Net), for fabric defect detection to improve the detection and location accuracy of fabric defects and decrease the inspection time. First, we used Feature Pyramid Network (FPN) by selected multi-scale feature maps to reserve more detailed information of tiny defects. Secondly, we proposed a trick to generate sparse priori anchors based on fabric defects ground truth boxes instead of fixed anchors to locate extreme defects more accurately and efficiently. Finally, a classification network is used to classify and refine the position of the fabric defects. The method was validated on two self-made fabric datasets. Experimental results indicate that our method significantly improved the accuracy and efficiency of detecting fabric defects and is more suitable to the automatic fabric defect detection.


Sensors ◽  
2021 ◽  
Vol 21 (22) ◽  
pp. 7504
Author(s):  
Udit Sharma ◽  
Bruno Artacho ◽  
Andreas Savakis

We propose GourmetNet, a single-pass, end-to-end trainable network for food segmentation that achieves state-of-the-art performance. Food segmentation is an important problem as the first step for nutrition monitoring, food volume and calorie estimation. Our novel architecture incorporates both channel attention and spatial attention information in an expanded multi-scale feature representation using our advanced Waterfall Atrous Spatial Pooling module. GourmetNet refines the feature extraction process by merging features from multiple levels of the backbone through the two attention modules. The refined features are processed with the advanced multi-scale waterfall module that combines the benefits of cascade filtering and pyramid representations without requiring a separate decoder or post-processing. Our experiments on two food datasets show that GourmetNet significantly outperforms existing current state-of-the-art methods.


Sign in / Sign up

Export Citation Format

Share Document