scholarly journals Target concept learning from ambiguously labeled data

2017 ◽  
Author(s):  
◽  
Changzhe Jiao

The multiple instance learning problem addresses the case where training data comes with label ambiguity, i.e., the learner has access only to inaccurately labeled data. For example, in target detection from remotely sensed hyperspectral imagery, targets are usually sub-pixel and the ground truthing of the targets according to GPS coordinates could drift across several meters. Thus the locations of the targets corresponding to the hyperspectral image are inaccurate. Training a supervised algorithm or extracting target signatures from this kind of labels is intractable. This dissertation investigates the topic target concept learning from ambiguously labeled data comprehensively; reviews and proposes several methods that either learn a set of representative or discriminative target concepts. The multiple instance hybrid estimator (MI-HE) maximizes the response of the hybrid detector under a generalized mean framework and estimates a set of discriminative target concepts. MI-HE adopts a linear mixture model and iterates between estimating a set of discriminative target and non-target signatures and solving a sparse unmixing problem. MI-HE preserves bag-level label information for each positive bag and is able to estimate a target concept that is commonly shared among positive bags. Furthermore, MI-HE has the potential to learn multiple signatures to address signature variability. After learning target concept, signature based detector could be applied for target detection. The presented algorithms were tested in many applications including simulated and real hyperspectral target detection, heartbeat characterization from ballistocardiogram signals and tree species classification from remotely sensed data. The presented algorithms were proven to be effective in learning high-quality target signatures and consistently achieved superior performance over the state-of-the-art comparison algorithms.

2019 ◽  
Vol 11 (3) ◽  
pp. 284 ◽  
Author(s):  
Linglin Zeng ◽  
Shun Hu ◽  
Daxiang Xiang ◽  
Xiang Zhang ◽  
Deren Li ◽  
...  

Soil moisture mapping at a regional scale is commonplace since these data are required in many applications, such as hydrological and agricultural analyses. The use of remotely sensed data for the estimation of deep soil moisture at a regional scale has received far less emphasis. The objective of this study was to map the 500-m, 8-day average and daily soil moisture at different soil depths in Oklahoma from remotely sensed and ground-measured data using the random forest (RF) method, which is one of the machine-learning approaches. In order to investigate the estimation accuracy of the RF method at both a spatial and a temporal scale, two independent soil moisture estimation experiments were conducted using data from 2010 to 2014: a year-to-year experiment (with a root mean square error (RMSE) ranging from 0.038 to 0.050 m3/m3) and a station-to-station experiment (with an RMSE ranging from 0.044 to 0.057 m3/m3). Then, the data requirements, importance factors, and spatial and temporal variations in estimation accuracy were discussed based on the results using the training data selected by iterated random sampling. The highly accurate estimations of both the surface and the deep soil moisture for the study area reveal the potential of RF methods when mapping soil moisture at a regional scale, especially when considering the high heterogeneity of land-cover types and topography in the study area.


2021 ◽  
Vol 13 (3) ◽  
pp. 368
Author(s):  
Christopher A. Ramezan ◽  
Timothy A. Warner ◽  
Aaron E. Maxwell ◽  
Bradley S. Price

The size of the training data set is a major determinant of classification accuracy. Nevertheless, the collection of a large training data set for supervised classifiers can be a challenge, especially for studies covering a large area, which may be typical of many real-world applied projects. This work investigates how variations in training set size, ranging from a large sample size (n = 10,000) to a very small sample size (n = 40), affect the performance of six supervised machine-learning algorithms applied to classify large-area high-spatial-resolution (HR) (1–5 m) remotely sensed data within the context of a geographic object-based image analysis (GEOBIA) approach. GEOBIA, in which adjacent similar pixels are grouped into image-objects that form the unit of the classification, offers the potential benefit of allowing multiple additional variables, such as measures of object geometry and texture, thus increasing the dimensionality of the classification input data. The six supervised machine-learning algorithms are support vector machines (SVM), random forests (RF), k-nearest neighbors (k-NN), single-layer perceptron neural networks (NEU), learning vector quantization (LVQ), and gradient-boosted trees (GBM). RF, the algorithm with the highest overall accuracy, was notable for its negligible decrease in overall accuracy, 1.0%, when training sample size decreased from 10,000 to 315 samples. GBM provided similar overall accuracy to RF; however, the algorithm was very expensive in terms of training time and computational resources, especially with large training sets. In contrast to RF and GBM, NEU, and SVM were particularly sensitive to decreasing sample size, with NEU classifications generally producing overall accuracies that were on average slightly higher than SVM classifications for larger sample sizes, but lower than SVM for the smallest sample sizes. NEU however required a longer processing time. The k-NN classifier saw less of a drop in overall accuracy than NEU and SVM as training set size decreased; however, the overall accuracies of k-NN were typically less than RF, NEU, and SVM classifiers. LVQ generally had the lowest overall accuracy of all six methods, but was relatively insensitive to sample size, down to the smallest sample sizes. Overall, due to its relatively high accuracy with small training sample sets, and minimal variations in overall accuracy between very large and small sample sets, as well as relatively short processing time, RF was a good classifier for large-area land-cover classifications of HR remotely sensed data, especially when training data are scarce. However, as performance of different supervised classifiers varies in response to training set size, investigating multiple classification algorithms is recommended to achieve optimal accuracy for a project.


2016 ◽  
Vol 186 ◽  
pp. 64-87 ◽  
Author(s):  
Fabian Ewald Fassnacht ◽  
Hooman Latifi ◽  
Krzysztof Stereńczak ◽  
Aneta Modzelewska ◽  
Michael Lefsky ◽  
...  

2019 ◽  
Vol 11 (7) ◽  
pp. 794 ◽  
Author(s):  
Karsten Lambers ◽  
Wouter Verschoof-van der Vaart ◽  
Quentin Bourgeois

Although the history of automated archaeological object detection in remotely sensed data is short, progress and emerging trends are evident. Among them, the shift from rule-based approaches towards machine learning methods is, at the moment, the cause for high expectations, even though basic problems, such as the lack of suitable archaeological training data are only beginning to be addressed. In a case study in the central Netherlands, we are currently developing novel methods for multi-class archaeological object detection in LiDAR data based on convolutional neural networks (CNNs). This research is embedded in a long-term investigation of the prehistoric landscape of our study region. We here present an innovative integrated workflow that combines machine learning approaches to automated object detection in remotely sensed data with a two-tier citizen science project that allows us to generate and validate detections of hitherto unknown archaeological objects, thereby contributing to the creation of reliable, labeled archaeological training datasets. We motivate our methodological choices in the light of current trends in archaeological prospection, remote sensing, machine learning, and citizen science, and present the first results of the implementation of the workflow in our research area.


2001 ◽  
Vol 91 (5) ◽  
pp. 333-346 ◽  
Author(s):  
G. Hendrickx ◽  
A. Napala ◽  
J.H.W. Slingenbergh ◽  
R. De Deken ◽  
D.J. Rogers

AbstractA raster or grid-based Geographic Information System with data on tsetse, trypanosomiasis, animal production, agriculture and land use has recently been developed in Togo. The area-wide sampling of tsetse fly, aided by satellite imagery, is the subject of two separate papers. This paper on a first paper, published in this journal, describing the generation of digital tsetse distribution and abundance maps and how these accord with the local climatic and agro-ecological setting. Such maps when combined with data on the disease, the hosts and their owners, should contribute the knowledge of the spatial epidemiology of trypanosomiasis and assist planning of integrated control operations. Here we address the problem of generating tsetse distribution and abundance maps from remotely sensed data, using a restricted amount of field data. Different discriminant models have been applied using contemporary tsetse data and remotely sensed, low resolution data acquired from the National Oceanographic and Atmospheric Administration (NOAA) and Meteosat platforms. The results confirm the potential of satellite data application and multivariate for the prediction of the tsetse distribution and abundance. This opens up new avenues because satellite predictions and field data may be combined to strengthen and/or substitute one another. The analysis shows how the strategic incorporation of satellite imagery may minimize field of data. Field surveys may be modified and conducted in two stages, first concentrating on the expected fly distribution limits and thereafter on fly abundance. The study also shows that when applying satellite data, care should be taken in selecting the optimal number of predictor because this number varies with the amount of training data for predicting abundance and on the homogeneity of the distribution limits for predicting fly presence. Finally, it is suggested that in addition to the use of contemporary training data and predictor variables, training predicted data sets should refer to the same eco-geographic zone.


Sign in / Sign up

Export Citation Format

Share Document