Quality Assessment of Heterogeneous Training Data Sets for Classification of Urban Area with Landsat Imagery

2021 ◽  
Vol 87 (5) ◽  
pp. 339-348
Author(s):  
Neema Nicodemus Lyimo ◽  
Fang Luo ◽  
Qimin Cheng ◽  
Hao Peng

Quality assessment of training samples collected from heterogeneous sources has received little attention in the existing literature. Inspired by Euclidean spectral distance metrics, this article derives three quality measures for modeling uncertainty in spectral information of open-source heterogeneous training samples for classification with Landsat imagery. We prepared eight test case data sets from volunteered geographic information and open government data sources to assess the proposed measures. The data sets have significant variations in quality, quantity, and data type. A correlation analysis verifies that the proposed measures can successfully rank the quality of heterogeneous training data sets prior to the image classification task. In this era of big data, pre-classification quality assessment measures empower research scientists to select suitable data sets for classification tasks from available open data sources. Research findings prove the versatility of the Euclidean spectral distance function to develop quality metrics for assessing open-source training data sets with varying characteristics for urban area classification.

Author(s):  
Liming Li ◽  
Xiaodong Chai ◽  
Shuguang Zhao ◽  
Shubin Zheng ◽  
Shengchao Su

This paper proposes an effective method to elevate the performance of saliency detection via iterative bootstrap learning, which consists of two tasks including saliency optimization and saliency integration. Specifically, first, multiscale segmentation and feature extraction are performed on the input image successively. Second, prior saliency maps are generated using existing saliency models, which are used to generate the initial saliency map. Third, prior maps are fed into the saliency regressor together, where training samples are collected from the prior maps at multiple scales and the random forest regressor is learned from such training data. An integration of the initial saliency map and the output of saliency regressor is deployed to generate the coarse saliency map. Finally, in order to improve the quality of saliency map further, both initial and coarse saliency maps are fed into the saliency regressor together, and then the output of the saliency regressor, the initial saliency map as well as the coarse saliency map are integrated into the final saliency map. Experimental results on three public data sets demonstrate that the proposed method consistently achieves the best performance and significant improvement can be obtained when applying our method to existing saliency models.


Geophysics ◽  
2020 ◽  
Vol 85 (4) ◽  
pp. WA269-WA277
Author(s):  
Xudong Duan ◽  
Jie Zhang

Picking the first breaks from seismic data is often a challenging problem and still requires significant human effort. We have developed an iterative process that applies a traditional seismic automated picking method to obtain preliminary first breaks and then uses a machine learning (ML) method to identify, remove, and fix poor picks based on a multitrace analysis. The ML method involves constructing a convolutional neural network architecture to help identify poor picks across multiple traces and eliminate them. We then further refill the picks on empty traces with the help of the trained model. To allow training samples applicable to various regions and different data sets, we apply moveout correction with preliminary picks and address the picks in the flattened input. We collect 11,239,800 labeled seismic traces. During the training process, the model’s classification accuracy on the training and validation data sets reaches 98.2% and 97.3%, respectively. We also evaluate the precision and recall rate, both of which exceed 94%. For prediction, the results of 2D and 3D data sets that differ from the training data sets are used to demonstrate the feasibility of our method.


Author(s):  
Xuezhuan Zhao ◽  
Ziheng Zhou ◽  
Lingling Li ◽  
Lishen Pei ◽  
Zhaoyi Ye

Due to the robustness resulted from scale transformation and unbalanced distribution of training samples in scene text detection task, a new fusion framework TSFnet is proposed in this paper. This framework is composed of Detection Stream, Judge Stream and Fusion Stream. In the Detection Stream, loss balance factor (LBF) is raised to improve the region proposal network (RPN). To predict the global text segmentation map, the algorithm combines regression strategy and case segmentation method. In the Judge Stream, a classification of the samples is proposed based on the Judge Map and the corresponding tags to calculate the overlap rate. As a support of Detection Stream, feature pyramid network is utilized in the algorithm to extract Judge Map and calculate LBF. In the Fusion Stream, a new fusion algorithm is raised. By fusing the output of the two streams, we can position the text area in the natural scene accurately. Finally, the algorithm is experimented on the standard data sets ICDAR 2015 and ICDAR2017-MLT. The test results show that the [Formula: see text] values are 87.8% and 67.57%, respectively, superior to the state-of-the art models. This proves that the algorithm can solve the robustness issues under the unbalance between scale transformation and training data.


Author(s):  
Lin-Hsuan Hsiao ◽  
Ke-sheng Cheng

Supervised land-use/land-cover (LULC) classifications are typically conducted using class assignment rules derived from a set of multiclass training samples. Consequently, classification accuracy varies with the training data set and is thus associated with uncertainty. In this study, we propose a bootstrap resampling and reclassification approach that can be applied for assessing not only the uncertainty in classification results of the bootstrap-training data sets, but also the classification uncertainty of individual pixels in the study area. Two measures of pixel-specific classification uncertainty, namely the maximum class probability and Shannon entropy, were derived from the class probability vector of individual pixels and used for the identification of unclassified pixels. Unclassified pixels that are identified using the traditional chi-square threshold technique represent outliers of individual LULC classes, but they are not necessarily associated with higher classification uncertainty. By contrast, unclassified pixels identified using the equal-likelihood technique are associated with higher classification uncertainty and they mostly occur on or near the borders of different land-cover.


2019 ◽  
Vol 9 (22) ◽  
pp. 4749
Author(s):  
Lingyun Jiang ◽  
Kai Qiao ◽  
Linyuan Wang ◽  
Chi Zhang ◽  
Jian Chen ◽  
...  

Decoding human brain activities, especially reconstructing human visual stimuli via functional magnetic resonance imaging (fMRI), has gained increasing attention in recent years. However, the high dimensionality and small quantity of fMRI data impose restrictions on satisfactory reconstruction, especially for the reconstruction method with deep learning requiring huge amounts of labelled samples. When compared with the deep learning method, humans can recognize a new image because our human visual system is naturally capable of extracting features from any object and comparing them. Inspired by this visual mechanism, we introduced the mechanism of comparison into deep learning method to realize better visual reconstruction by making full use of each sample and the relationship of the sample pair by learning to compare. In this way, we proposed a Siamese reconstruction network (SRN) method. By using the SRN, we improved upon the satisfying results on two fMRI recording datasets, providing 72.5% accuracy on the digit dataset and 44.6% accuracy on the character dataset. Essentially, this manner can increase the training data about from n samples to 2n sample pairs, which takes full advantage of the limited quantity of training samples. The SRN learns to converge sample pairs of the same class or disperse sample pairs of different class in feature space.


Sensors ◽  
2021 ◽  
Vol 21 (15) ◽  
pp. 5204
Author(s):  
Anastasija Nikiforova

Nowadays, governments launch open government data (OGD) portals that provide data that can be accessed and used by everyone for their own needs. Although the potential economic value of open (government) data is assessed in millions and billions, not all open data are reused. Moreover, the open (government) data initiative as well as users’ intent for open (government) data are changing continuously and today, in line with IoT and smart city trends, real-time data and sensor-generated data have higher interest for users. These “smarter” open (government) data are also considered to be one of the crucial drivers for the sustainable economy, and might have an impact on information and communication technology (ICT) innovation and become a creativity bridge in developing a new ecosystem in Industry 4.0 and Society 5.0. The paper inspects OGD portals of 60 countries in order to understand the correspondence of their content to the Society 5.0 expectations. The paper provides a report on how much countries provide these data, focusing on some open (government) data success facilitating factors for both the portal in general and data sets of interest in particular. The presence of “smarter” data, their level of accessibility, availability, currency and timeliness, as well as support for users, are analyzed. The list of most competitive countries by data category are provided. This makes it possible to understand which OGD portals react to users’ needs, Industry 4.0 and Society 5.0 request the opening and updating of data for their further potential reuse, which is essential in the digital data-driven world.


2021 ◽  
Vol 16 (1) ◽  
pp. 1-24
Author(s):  
Yaojin Lin ◽  
Qinghua Hu ◽  
Jinghua Liu ◽  
Xingquan Zhu ◽  
Xindong Wu

In multi-label learning, label correlations commonly exist in the data. Such correlation not only provides useful information, but also imposes significant challenges for multi-label learning. Recently, label-specific feature embedding has been proposed to explore label-specific features from the training data, and uses feature highly customized to the multi-label set for learning. While such feature embedding methods have demonstrated good performance, the creation of the feature embedding space is only based on a single label, without considering label correlations in the data. In this article, we propose to combine multiple label-specific feature spaces, using label correlation, for multi-label learning. The proposed algorithm, mu lti- l abel-specific f eature space e nsemble (MULFE), takes consideration label-specific features, label correlation, and weighted ensemble principle to form a learning framework. By conducting clustering analysis on each label’s negative and positive instances, MULFE first creates features customized to each label. After that, MULFE utilizes the label correlation to optimize the margin distribution of the base classifiers which are induced by the related label-specific feature spaces. By combining multiple label-specific features, label correlation based weighting, and ensemble learning, MULFE achieves maximum margin multi-label classification goal through the underlying optimization framework. Empirical studies on 10 public data sets manifest the effectiveness of MULFE.


Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1573
Author(s):  
Loris Nanni ◽  
Giovanni Minchio ◽  
Sheryl Brahnam ◽  
Gianluca Maguolo ◽  
Alessandra Lumini

Traditionally, classifiers are trained to predict patterns within a feature space. The image classification system presented here trains classifiers to predict patterns within a vector space by combining the dissimilarity spaces generated by a large set of Siamese Neural Networks (SNNs). A set of centroids from the patterns in the training data sets is calculated with supervised k-means clustering. The centroids are used to generate the dissimilarity space via the Siamese networks. The vector space descriptors are extracted by projecting patterns onto the similarity spaces, and SVMs classify an image by its dissimilarity vector. The versatility of the proposed approach in image classification is demonstrated by evaluating the system on different types of images across two domains: two medical data sets and two animal audio data sets with vocalizations represented as images (spectrograms). Results show that the proposed system’s performance competes competitively against the best-performing methods in the literature, obtaining state-of-the-art performance on one of the medical data sets, and does so without ad-hoc optimization of the clustering methods on the tested data sets.


Forests ◽  
2021 ◽  
Vol 12 (6) ◽  
pp. 692
Author(s):  
MD Abdul Mueed Choudhury ◽  
Ernesto Marcheggiani ◽  
Andrea Galli ◽  
Giuseppe Modica ◽  
Ben Somers

Currently, the worsening impacts of urbanizations have been impelled to the importance of monitoring and management of existing urban trees, securing sustainable use of the available green spaces. Urban tree species identification and evaluation of their roles in atmospheric Carbon Stock (CS) are still among the prime concerns for city planners regarding initiating a convenient and easily adaptive urban green planning and management system. A detailed methodology on the urban tree carbon stock calibration and mapping was conducted in the urban area of Brussels, Belgium. A comparative analysis of the mapping outcomes was assessed to define the convenience and efficiency of two different remote sensing data sources, Light Detection and Ranging (LiDAR) and WorldView-3 (WV-3), in a unique urban area. The mapping results were validated against field estimated carbon stocks. At the initial stage, dominant tree species were identified and classified using the high-resolution WorldView3 image, leading to the final carbon stock mapping based on the dominant species. An object-based image analysis approach was employed to attain an overall accuracy (OA) of 71% during the classification of the dominant species. The field estimations of carbon stock for each plot were done utilizing an allometric model based on the field tree dendrometric data. Later based on the correlation among the field data and the variables (i.e., Normalized Difference Vegetation Index, NDVI and Crown Height Model, CHM) extracted from the available remote sensing data, the carbon stock mapping and validation had been done in a GIS environment. The calibrated NDVI and CHM had been used to compute possible carbon stock in either case of the WV-3 image and LiDAR data, respectively. A comparative discussion has been introduced to bring out the issues, especially for the developing countries, where WV-3 data could be a better solution over the hardly available LiDAR data. This study could assist city planners in understanding and deciding the applicability of remote sensing data sources based on their availability and the level of expediency, ensuring a sustainable urban green management system.


Sign in / Sign up

Export Citation Format

Share Document