Weakly Supervised Spatial Deep Learning for Earth Image Segmentation Based on Imperfect Polyline Labels

2022 ◽  
Vol 13 (2) ◽  
pp. 1-20
Author(s):  
Zhe Jiang ◽  
Wenchong He ◽  
Marcus Stephen Kirby ◽  
Arpan Man Sainju ◽  
Shaowen Wang ◽  
...  

In recent years, deep learning has achieved tremendous success in image segmentation for computer vision applications. The performance of these models heavily relies on the availability of large-scale high-quality training labels (e.g., PASCAL VOC 2012). Unfortunately, such large-scale high-quality training data are often unavailable in many real-world spatial or spatiotemporal problems in earth science and remote sensing (e.g., mapping the nationwide river streams for water resource management). Although extensive efforts have been made to reduce the reliance on labeled data (e.g., semi-supervised or unsupervised learning, few-shot learning), the complex nature of geographic data such as spatial heterogeneity still requires sufficient training labels when transferring a pre-trained model from one region to another. On the other hand, it is often much easier to collect lower-quality training labels with imperfect alignment with earth imagery pixels (e.g., through interpreting coarse imagery by non-expert volunteers). However, directly training a deep neural network on imperfect labels with geometric annotation errors could significantly impact model performance. Existing research that overcomes imperfect training labels either focuses on errors in label class semantics or characterizes label location errors at the pixel level. These methods do not fully incorporate the geometric properties of label location errors in the vector representation. To fill the gap, this article proposes a weakly supervised learning framework to simultaneously update deep learning model parameters and infer hidden true vector label locations. Specifically, we model label location errors in the vector representation to partially reserve geometric properties (e.g., spatial contiguity within line segments). Evaluations on real-world datasets in the National Hydrography Dataset (NHD) refinement application illustrate that the proposed framework outperforms baseline methods in classification accuracy.

Author(s):  
Zaid Al-Huda ◽  
Donghai Zhai ◽  
Yan Yang ◽  
Riyadh Nazar Ali Algburi

Deep convolutional neural networks (DCNNs) trained on the pixel-level annotated images have achieved improvements in semantic segmentation. Due to the high cost of labeling training data, their applications may have great limitation. However, weakly supervised segmentation approaches can significantly reduce human labeling efforts. In this paper, we introduce a new framework to generate high-quality initial pixel-level annotations. By using a hierarchical image segmentation algorithm to predict the boundary map, we select the optimal scale of high-quality hierarchies. In the initialization step, scribble annotations and the saliency map are combined to construct a graphic model over the optimal scale segmentation. By solving the minimal cut problem, it can spread information from scribbles to unmarked regions. In the training process, the segmentation network is trained by using the initial pixel-level annotations. To iteratively optimize the segmentation, we use a graphical model to refine segmentation masks and retrain the segmentation network to get more precise pixel-level annotations. The experimental results on Pascal VOC 2012 dataset demonstrate that the proposed framework outperforms most of weakly supervised semantic segmentation methods and achieves the state-of-the-art performance, which is [Formula: see text] mIoU.


2020 ◽  
Author(s):  
Yuan Yuan ◽  
Lei Lin

Satellite image time series (SITS) classification is a major research topic in remote sensing and is relevant for a wide range of applications. Deep learning approaches have been commonly employed for SITS classification and have provided state-of-the-art performance. However, deep learning methods suffer from overfitting when labeled data is scarce. To address this problem, we propose a novel self-supervised pre-training scheme to initialize a Transformer-based network by utilizing large-scale unlabeled data. In detail, the model is asked to predict randomly contaminated observations given an entire time series of a pixel. The main idea of our proposal is to leverage the inherent temporal structure of satellite time series to learn general-purpose spectral-temporal representations related to land cover semantics. Once pre-training is completed, the pre-trained network can be further adapted to various SITS classification tasks by fine-tuning all the model parameters on small-scale task-related labeled data. In this way, the general knowledge and representations about SITS can be transferred to a label-scarce task, thereby improving the generalization performance of the model as well as reducing the risk of overfitting. Comprehensive experiments have been carried out on three benchmark datasets over large study areas. Experimental results demonstrate the effectiveness of the proposed method, leading to a classification accuracy increment up to 1.91% to 6.69%. <div><b>This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.</b></div>


2020 ◽  
Vol 12 (18) ◽  
pp. 3053 ◽  
Author(s):  
Thorsten Hoeser ◽  
Felix Bachofer ◽  
Claudia Kuenzer

In Earth observation (EO), large-scale land-surface dynamics are traditionally analyzed by investigating aggregated classes. The increase in data with a very high spatial resolution enables investigations on a fine-grained feature level which can help us to better understand the dynamics of land surfaces by taking object dynamics into account. To extract fine-grained features and objects, the most popular deep-learning model for image analysis is commonly used: the convolutional neural network (CNN). In this review, we provide a comprehensive overview of the impact of deep learning on EO applications by reviewing 429 studies on image segmentation and object detection with CNNs. We extensively examine the spatial distribution of study sites, employed sensors, used datasets and CNN architectures, and give a thorough overview of applications in EO which used CNNs. Our main finding is that CNNs are in an advanced transition phase from computer vision to EO. Upon this, we argue that in the near future, investigations which analyze object dynamics with CNNs will have a significant impact on EO research. With a focus on EO applications in this Part II, we complete the methodological review provided in Part I.


2020 ◽  
Vol 34 (01) ◽  
pp. 19-26 ◽  
Author(s):  
Chong Chen ◽  
Min Zhang ◽  
Yongfeng Zhang ◽  
Weizhi Ma ◽  
Yiqun Liu ◽  
...  

Recent studies on recommendation have largely focused on exploring state-of-the-art neural networks to improve the expressiveness of models, while typically apply the Negative Sampling (NS) strategy for efficient learning. Despite effectiveness, two important issues have not been well-considered in existing methods: 1) NS suffers from dramatic fluctuation, making sampling-based methods difficult to achieve the optimal ranking performance in practical applications; 2) although heterogeneous feedback (e.g., view, click, and purchase) is widespread in many online systems, most existing methods leverage only one primary type of user feedback such as purchase. In this work, we propose a novel non-sampling transfer learning solution, named Efficient Heterogeneous Collaborative Filtering (EHCF) for Top-N recommendation. It can not only model fine-grained user-item relations, but also efficiently learn model parameters from the whole heterogeneous data (including all unlabeled data) with a rather low time complexity. Extensive experiments on three real-world datasets show that EHCF significantly outperforms state-of-the-art recommendation methods in both traditional (single-behavior) and heterogeneous scenarios. Moreover, EHCF shows significant improvements in training efficiency, making it more applicable to real-world large-scale systems. Our implementation has been released 1 to facilitate further developments on efficient whole-data based neural methods.


Processes ◽  
2020 ◽  
Vol 8 (6) ◽  
pp. 649
Author(s):  
Yifeng Liu ◽  
Wei Zhang ◽  
Wenhao Du

Deep learning based on a large number of high-quality data plays an important role in many industries. However, deep learning is hard to directly embed in the real-time system, because the data accumulation of the system depends on real-time acquisitions. However, the analysis tasks of such systems need to be carried out in real time, which makes it impossible to complete the analysis tasks by accumulating data for a long time. In order to solve the problems of high-quality data accumulation, high timeliness of the data analysis, and difficulty in embedding deep-learning algorithms directly in real-time systems, this paper proposes a new progressive deep-learning framework and conducts experiments on image recognition. The experimental results show that the proposed framework is effective and performs well and can reach a conclusion similar to the deep-learning framework based on large-scale data.


2019 ◽  
Vol 41 (7) ◽  
pp. 1669-1680 ◽  
Author(s):  
Jan Funke ◽  
Fabian Tschopp ◽  
William Grisaitis ◽  
Arlo Sheridan ◽  
Chandan Singh ◽  
...  

2019 ◽  
Author(s):  
Yair Fogel-Dror ◽  
Shaul R. Shenhav ◽  
Tamir Sheafer

The collaborative effort of theory-driven content analysis can benefit significantly from the use of topic analysis methods, which allow researchers to add more categories while developing or testing a theory. This additive approach enables the reuse of previous efforts of analysis or even the merging of separate research projects, thereby making these methods more accessible and increasing the discipline’s ability to create and share content analysis capabilities. This paper proposes a weakly supervised topic analysis method that uses both a low-cost unsupervised method to compile a training set and supervised deep learning as an additive and accurate text classification method. We test the validity of the method, specifically its additivity, by comparing the results of the method after adding 200 categories to an initial number of 450. We show that the suggested method provides a foundation for a low-cost solution for large-scale topic analysis.


Iproceedings ◽  
10.2196/15225 ◽  
2019 ◽  
Vol 5 (1) ◽  
pp. e15225
Author(s):  
Felipe Masculo ◽  
Jorn op den Buijs ◽  
Mariana Simons ◽  
Aki Harma

Background A Personal Emergency Response Service (PERS) enables an aging population to receive help quickly when an emergency situation occurs. The reasons that trigger a PERS alert are varied, including a sudden worsening of a chronic condition, a fall, or other injury. Every PERS case is documented by the response center using a combination of structured variables and free text notes. The text notes, in particular, contain a wealth of information in case of an incident such as contextual information, details about the situation, symptoms and more. Analysis of these notes at a population level could provide insight into the various situations that cause PERS medical alerts. Objective The objectives of this study were to (1) develop methods to enable the large-scale analysis of text notes from a PERS response center, and (2) to apply these methods to a large dataset and gain insight into the different situations that cause medical alerts. Methods More than 2.5 million deidentified PERS case text notes were used to train a document embedding model (ie, a deep learning Recurrent Neural Network [RNN] that takes the medical alert text note as input and produces a corresponding fixed length vector representation as output). We applied this model to 100,000 PERS text notes related to medical incidents that resulted in emergency department admission. Finally, we used t-SNE, a nonlinear dimensionality reduction method, to visualize the vector representation of the text notes in 2D as part of a graphical user interface that enabled interactive exploration of the dataset and visual analytics. Results Visual analysis of the vectors revealed the existence of several well-separated clusters of incidents such as fall, stroke/numbness, seizure, breathing problems, chest pain, and nausea, each of them related to the emergency situation encountered by the patient as recorded in an existing structured variable. In addition, subclusters were identified within each cluster which grouped cases based on additional features extracted from the PERS text notes and not available in the existing structured variables. For example, the incidents labeled as falls (n=37,842) were split into several subclusters corresponding to falls with bone fracture (n=1437), falls with bleeding (n=4137), falls caused by dizziness (n=519), etc. Conclusions The combination of state-of-the-art natural language processing, deep learning, and visualization techniques enables the large-scale analysis of medical alert text notes. This analysis demonstrates that, in addition to falls alerts, the PERS service is broadly used to signal for help in situations often related to underlying chronic conditions and acute symptoms such as respiratory distress, chest pain, diabetic reaction, etc. Moreover, the proposed techniques enable the extraction of structured information related to the medical alert from unstructured text with minimal human supervision. This structured information could be used, for example, to track trends over time, to generate concise medical alert summaries, and to create predictive models for desired outcomes.


Sign in / Sign up

Export Citation Format

Share Document