scholarly journals A Multi-Scale Deep Neural Network for Water Detection from SAR Images in the Mountainous Areas

2020 ◽  
Vol 12 (19) ◽  
pp. 3205
Author(s):  
Lifu Chen ◽  
Peng Zhang ◽  
Jin Xing ◽  
Zhenhong Li ◽  
Xuemin Xing ◽  
...  

Water detection from Synthetic Aperture Radar (SAR) images has been widely utilized in various applications. However, it remains an open challenge due to the high similarity between water and shadow in SAR images. To address this challenge, a new end-to-end framework based on deep learning has been proposed to automatically classify water and shadow areas in SAR images. This end-to-end framework is mainly composed of three parts, namely, Multi-scale Spatial Feature (MSF) extraction, Multi-Level Selective Attention Network (MLSAN) and the Improvement Strategy (IS). Firstly, the dataset is input to MSF for multi-scale low-level feature extraction via three different methods. Then, these low-level features are fed into the MLSAN network, which contains the Encoder and Decoder. The Encoder aims to generate different levels of features using residual network of 101 layers. The Decoder extracts geospatial contextual information and fuses the multi-level features to generate high-level features that are further optimized by the IS. Finally, the classification is implemented with the Softmax function. We name the proposed framework as MSF-MLSAN, which is trained and tested using millimeter wave SAR datasets. The classification accuracy reaches 0.8382 and 0.9278 for water and shadow, respectively; while the overall Intersection over Union (IoU) is 0.9076. MSF-MLSAN demonstrates the success of integrating SAR domain knowledge and state-of-the-art deep learning techniques.

Author(s):  
Shanshan Zhao ◽  
Xi Li ◽  
Omar El Farouk Bourahla

As an important and challenging problem in computer vision, learning based optical flow estimation aims to discover the intrinsic correspondence structure between two adjacent video frames through statistical learning. Therefore, a key issue to solve in this area is how to effectively model the multi-scale correspondence structure properties in an adaptive end-to-end learning fashion. Motivated by this observation, we propose an end-to-end multi-scale correspondence structure learning (MSCSL) approach for optical flow estimation. In principle, the proposed MSCSL approach is capable of effectively capturing the multi-scale inter-image-correlation correspondence structures within a multi-level feature space from deep learning. Moreover, the proposed MSCSL approach builds a spatial Conv-GRU neural network model to adaptively model the intrinsic dependency relationships among these multi-scale correspondence structures. Finally, the above procedures for correspondence structure learning and multi-scale dependency modeling are implemented in a unified end-to-end deep learning framework. Experimental results on several benchmark datasets demonstrate the effectiveness of the proposed approach.


2019 ◽  
Vol 11 (9) ◽  
pp. 1044 ◽  
Author(s):  
Wei Cui ◽  
Fei Wang ◽  
Xin He ◽  
Dongyou Zhang ◽  
Xuxiang Xu ◽  
...  

A comprehensive interpretation of remote sensing images involves not only remote sensing object recognition but also the recognition of spatial relations between objects. Especially in the case of different objects with the same spectrum, the spatial relationship can help interpret remote sensing objects more accurately. Compared with traditional remote sensing object recognition methods, deep learning has the advantages of high accuracy and strong generalizability regarding scene classification and semantic segmentation. However, it is difficult to simultaneously recognize remote sensing objects and their spatial relationship from end-to-end only relying on present deep learning networks. To address this problem, we propose a multi-scale remote sensing image interpretation network, called the MSRIN. The architecture of the MSRIN is a parallel deep neural network based on a fully convolutional network (FCN), a U-Net, and a long short-term memory network (LSTM). The MSRIN recognizes remote sensing objects and their spatial relationship through three processes. First, the MSRIN defines a multi-scale remote sensing image caption strategy and simultaneously segments the same image using the FCN and U-Net on different spatial scales so that a two-scale hierarchy is formed. The output of the FCN and U-Net are masked to obtain the location and boundaries of remote sensing objects. Second, using an attention-based LSTM, the remote sensing image captions include the remote sensing objects (nouns) and their spatial relationships described with natural language. Finally, we designed a remote sensing object recognition and correction mechanism to build the relationship between nouns in captions and object mask graphs using an attention weight matrix to transfer the spatial relationship from captions to objects mask graphs. In other words, the MSRIN simultaneously realizes the semantic segmentation of the remote sensing objects and their spatial relationship identification end-to-end. Experimental results demonstrated that the matching rate between samples and the mask graph increased by 67.37 percentage points, and the matching rate between nouns and the mask graph increased by 41.78 percentage points compared to before correction. The proposed MSRIN has achieved remarkable results.


Life ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 582
Author(s):  
Yuchai Wan ◽  
Zhongshu Zheng ◽  
Ran Liu ◽  
Zheng Zhu ◽  
Hongen Zhou ◽  
...  

Many computer-aided diagnosis methods, especially ones with deep learning strategies, of liver cancers based on medical images have been proposed. However, most of such methods analyze the images under only one scale, and the deep learning models are always unexplainable. In this paper, we propose a deep learning-based multi-scale and multi-level fusing approach of CNNs for liver lesion diagnosis on magnetic resonance images, termed as MMF-CNN. We introduce a multi-scale representation strategy to encode both the local and semi-local complementary information of the images. To take advantage of the complementary information of multi-scale representations, we propose a multi-level fusion method to combine the information of both the feature level and the decision level hierarchically and generate a robust diagnostic classifier based on deep learning. We further explore the explanation of the diagnosis decision of the deep neural network through visualizing the areas of interest of the network. A new scoring method is designed to evaluate whether the attention maps can highlight the relevant radiological features. The explanation and visualization make the decision-making process of the deep neural network transparent for the clinicians. We apply our proposed approach to various state-of-the-art deep learning architectures. The experimental results demonstrate the effectiveness of our approach.


2021 ◽  
Vol 13 (2) ◽  
pp. 38
Author(s):  
Yao Xu ◽  
Qin Yu

Great achievements have been made in pedestrian detection through deep learning. For detectors based on deep learning, making better use of features has become the key to their detection effect. While current pedestrian detectors have made efforts in feature utilization to improve their detection performance, the feature utilization is still inadequate. To solve the problem of inadequate feature utilization, we proposed the Multi-Level Feature Fusion Module (MFFM) and its Multi-Scale Feature Fusion Unit (MFFU) sub-module, which connect feature maps of the same scale and different scales by using horizontal and vertical connections and shortcut structures. All of these connections are accompanied by weights that can be learned; thus, they can be used as adaptive multi-level and multi-scale feature fusion modules to fuse the best features. Then, we built a complete pedestrian detector, the Adaptive Feature Fusion Detector (AFFDet), which is an anchor-free one-stage pedestrian detector that can make full use of features for detection. As a result, compared with other methods, our method has better performance on the challenging Caltech Pedestrian Detection Benchmark (Caltech) and has quite competitive speed. It is the current state-of-the-art one-stage pedestrian detection method.


2021 ◽  
Vol 251 ◽  
pp. 04030
Author(s):  
Michael Andrews ◽  
Bjorn Burkle ◽  
Shravan Chaudhari ◽  
Davide DiCroce ◽  
Sergei Gleyzer ◽  
...  

We describe a novel application of the end-to-end deep learning technique to the task of discriminating top quark-initiated jets from those originating from the hadronization of a light quark or a gluon. The end-to-end deep learning technique combines deep learning algorithms and low-level detector representation of the high-energy collision event. In this study, we use lowlevel detector information from the simulated CMS Open Data samples to construct the top jet classifiers. To optimize classifier performance we progressively add low-level information from the CMS tracking detector, including pixel detector reconstructed hits and impact parameters, and demonstrate the value of additional tracking information even when no new spatial structures are added. Relying only on calorimeter energy deposits and reconstructed pixel detector hits, the end-to-end classifier achieves a ROC-AUC score of 0.975±0.002 for the task of classifying boosted top quark jets. After adding derived track quantities, the classifier ROC-AUC score increases to 0.9824±0.0013, serving as the first performance benchmark for these CMS Open Data samples.


2020 ◽  
Vol 34 (04) ◽  
pp. 6422-6429
Author(s):  
Weikun Wu ◽  
Yan Zhang ◽  
David Wang ◽  
Yunqi Lei

Since the PointNet was proposed, deep learning on point cloud has been the concentration of intense 3D research. However, existing point-based methods usually are not adequate to extract the local features and the spatial pattern of a point cloud for further shape understanding. This paper presents an end-to-end framework, SK-Net, to jointly optimize the inference of spatial keypoint with the learning of feature representation of a point cloud for a specific point cloud task. One key process of SK-Net is the generation of spatial keypoints (Skeypoints). It is jointly conducted by two proposed regulating losses and a task objective function without knowledge of Skeypoint location annotations and proposals. Specifically, our Skeypoints are not sensitive to the location consistency but are acutely aware of shape. Another key process of SK-Net is the extraction of the local structure of Skeypoints (detail feature) and the local spatial pattern of normalized Skeypoints (pattern feature). This process generates a comprehensive representation, pattern-detail (PD) feature, which comprises the local detail information of a point cloud and reveals its spatial pattern through the part district reconstruction on normalized Skeypoints. Consequently, our network is prompted to effectively understand the correlation between different regions of a point cloud and integrate contextual information of the point cloud. In point cloud tasks, such as classification and segmentation, our proposed method performs better than or comparable with the state-of-the-art approaches. We also present an ablation study to demonstrate the advantages of SK-Net.


Author(s):  
Xiaoqi Lu ◽  
Yu Gu ◽  
Lidong Yang ◽  
Baohua Zhang ◽  
Ying Zhao ◽  
...  

Objective: False-positive nodule reduction is a crucial part of a computer-aided detection (CADe) system, which assists radiologists in accurate lung nodule detection. In this research, a novel scheme using multi-level 3D DenseNet framework is proposed to implement false-positive nodule reduction task. Methods: Multi-level 3D DenseNet models were extended to differentiate lung nodules from falsepositive nodules. First, different models were fed with 3D cubes with different sizes for encoding multi-level contextual information to meet the challenges of the large variations of lung nodules. In addition, image rotation and flipping were utilized to upsample positive samples which consisted of a positive sample set. Furthermore, the 3D DenseNets were designed to keep low-level information of nodules, as densely connected structures in DenseNet can reuse features of lung nodules and then boost feature propagation. Finally, the optimal weighted linear combination of all model scores obtained the best classification result in this research. Results: The proposed method was evaluated with LUNA16 dataset which contained 888 thin-slice CT scans. The performance was validated via 10-fold cross-validation. Both the Free-response Receiver Operating Characteristic (FROC) curve and the Competition Performance Metric (CPM) score show that the proposed scheme can achieve a satisfactory detection performance in the falsepositive reduction track of the LUNA16 challenge. Conclusion: The result shows that the proposed scheme can be significant for false-positive nodule reduction task.


2021 ◽  
Vol 13 (2) ◽  
pp. 274
Author(s):  
Guobiao Yao ◽  
Alper Yilmaz ◽  
Li Zhang ◽  
Fei Meng ◽  
Haibin Ai ◽  
...  

The available stereo matching algorithms produce large number of false positive matches or only produce a few true-positives across oblique stereo images with large baseline. This undesired result happens due to the complex perspective deformation and radiometric distortion across the images. To address this problem, we propose a novel affine invariant feature matching algorithm with subpixel accuracy based on an end-to-end convolutional neural network (CNN). In our method, we adopt and modify a Hessian affine network, which we refer to as IHesAffNet, to obtain affine invariant Hessian regions using deep learning framework. To improve the correlation between corresponding features, we introduce an empirical weighted loss function (EWLF) based on the negative samples using K nearest neighbors, and then generate deep learning-based descriptors with high discrimination that is realized with our multiple hard network structure (MTHardNets). Following this step, the conjugate features are produced by using the Euclidean distance ratio as the matching metric, and the accuracy of matches are optimized through the deep learning transform based least square matching (DLT-LSM). Finally, experiments on Large baseline oblique stereo images acquired by ground close-range and unmanned aerial vehicle (UAV) verify the effectiveness of the proposed approach, and comprehensive comparisons demonstrate that our matching algorithm outperforms the state-of-art methods in terms of accuracy, distribution and correct ratio. The main contributions of this article are: (i) our proposed MTHardNets can generate high quality descriptors; and (ii) the IHesAffNet can produce substantial affine invariant corresponding features with reliable transform parameters.


Sign in / Sign up

Export Citation Format

Share Document