scholarly journals Evaluation measure for group-based record linkage

Author(s):  
Charini Nanayakkara ◽  
Peter Christen ◽  
Thilina Ranbaduge ◽  
Eilidh Garrett

Introduction The robustness of record linkage evaluation measures is of high importance since linkage techniques are assessed based on these. However, minimal research has been conducted to evaluate the suitability of existing evaluation measures in the context of linking groups of records. Linkage quality is generally evaluated based on traditional measures such as precision and recall. As we show, these traditional evaluation measures are not suitable for evaluating groups of linked records because they evaluate the quality of individual record pairs rather than the quality of records grouped into clusters. Objectives We highlight the shortcomings of traditional evaluation measures and then propose a novel method to evaluate clustering quality in the context of group-based record linkage. Methods The proposed linkage evaluation method assesses how well individual records have been allocated into predicted groups/clusters with respect to ground-truth data. We first identify the best representative predicted cluster for each ground-truth cluster and, based on the resulting mapping, each record in a ground-truth cluster is assigned to one of seven categories. These categories reflect how well the linkage technique assigned records into groups. Results We empirically evaluate our proposed method using real-world data and show that it better reflects the quality of clusters generated by three group-based record linkage techniques. We also show that traditional measures such as precision and recall can produce ambiguous results whereas our method does not. Conclusions The proposed evaluation method provides unambiguous results regarding the assessed group-based record linkage approaches. The method comprises of seven categories which reflect how each record was predicted, providing more detailed information about the quality of the linkage result. This will help to make better-informed decisions about which linkage technique is best suited for a given linkage application.

Author(s):  
Christopher Toth ◽  
Wonho Suh ◽  
Vetri Elango ◽  
Ramik Sadana ◽  
Angshuman Guin ◽  
...  

Basic traffic counts are among the key elements in transportation planning and forecasting. As emerging data collection technologies proliferate, the availability of traffic count data will expand by orders of magnitude. However, availability of data does not always guarantee data accuracy, and it is essential that observed data are compared with ground truth data. Little research or guidance is available that ensures the quality of ground truth data with which the count results of automated technologies can be compared. To address the issue of ground truth data based on manual counts, a manual traffic counting application was developed for an Android tablet. Unlike other manual count applications, this application allows data collectors to replay and toggle through the video in supervisory mode to review and correct counts made in the first pass. For system verification, the review function of the application was used to count and recount freeway traffic in videos from the Atlanta, Georgia, metropolitan area. Initial counts and reviewed counts were compared, and improvements in count accuracy were assessed. The results indicated the benefit of the review process and suggested that this application could minimize human error and provide more accurate ground truth traffic count data for use in transportation planning applications and for model verification.


Semantic Web ◽  
2020 ◽  
pp. 1-19
Author(s):  
Anca Dumitrache ◽  
Oana Inel ◽  
Benjamin Timmermans ◽  
Carlos Ortiz ◽  
Robert-Jan Sips ◽  
...  

The process of gathering ground truth data through human annotation is a major bottleneck in the use of information extraction methods for populating the Semantic Web. Crowdsourcing-based approaches are gaining popularity in the attempt to solve the issues related to volume of data and lack of annotators. Typically these practices use inter-annotator agreement as a measure of quality. However, in many domains, such as event detection, there is ambiguity in the data, as well as a multitude of perspectives of the information examples. We present an empirically derived methodology for efficiently gathering of ground truth data in a diverse set of use cases covering a variety of domains and annotation tasks. Central to our approach is the use of CrowdTruth metrics that capture inter-annotator disagreement. We show that measuring disagreement is essential for acquiring a high quality ground truth. We achieve this by comparing the quality of the data aggregated with CrowdTruth metrics with majority vote, over a set of diverse crowdsourcing tasks: Medical Relation Extraction, Twitter Event Identification, News Event Extraction and Sound Interpretation. We also show that an increased number of crowd workers leads to growth and stabilization in the quality of annotations, going against the usual practice of employing a small number of annotators.


2020 ◽  
Vol 24 ◽  
pp. 63-86
Author(s):  
Francisco Mena ◽  
Ricardo Ñanculef ◽  
Carlos Valle

The lack of annotated data is one of the major barriers facing machine learning applications today. Learning from crowds, i.e. collecting ground-truth data from multiple inexpensive annotators, has become a common method to cope with this issue. It has been recently shown that modeling the varying quality of the annotations obtained in this way, is fundamental to obtain satisfactory performance in tasks where inexpert annotators may represent the majority but not the most trusted group. Unfortunately, existing techniques represent annotation patterns for each annotator individually, making the models difficult to estimate in large-scale scenarios. In this paper, we present two models to address these problems. Both methods are based on the hypothesis that it is possible to learn collective annotation patterns by introducing confusion matrices that involve groups of data point annotations or annotators. The first approach clusters data points with a common annotation pattern, regardless the annotators from which the labels have been obtained. Implicitly, this method attributes annotation mistakes to the complexity of the data itself and not to the variable behavior of the annotators. The second approach explicitly maps annotators to latent groups that are collectively parametrized to learn a common annotation pattern. Our experimental results show that, compared with other methods for learning from crowds, both methods have advantages in scenarios with a large number of annotators and a small number of annotations per annotator.


Author(s):  
S. Karam ◽  
M. Peter ◽  
S. Hosseinyalamdary ◽  
G. Vosselman

<p><strong>Abstract.</strong> The necessity for the modelling of building interiors has encouraged researchers in recent years to focus on improving the capturing and modelling techniques for such environments. State-of-the-art indoor mobile mapping systems use a combination of laser scanners and/or cameras mounted on movable platforms and allow for capturing 3D data of buildings’ interiors. As GNSS positioning does not work inside buildings, the extensively investigated Simultaneous Localisation and Mapping (SLAM) algorithms seem to offer a suitable solution for the problem. Because of the dead-reckoning nature of SLAM approaches, their results usually suffer from registration errors. Therefore, indoor data acquisition has remained a challenge and the accuracy of the captured data has to be analysed and investigated. In this paper, we propose to use architectural constraints to partly evaluate the quality of the acquired point cloud in the absence of any ground truth model. The internal consistency of walls is utilized to check the accuracy and correctness of indoor models. In addition, we use a floor plan (if available) as an external information source to check the quality of the generated indoor model. The proposed evaluation method provides an overall impression of the reconstruction accuracy. Our results show that perpendicularity, parallelism, and thickness of walls are important cues in buildings and can be used for an internal consistency check.</p>


2021 ◽  
Author(s):  
Michael Tarasiou

This paper presents DeepSatData a pipeline for automatically generating satellite imagery datasets for training machine learning models. We also discuss design considerations with emphasis on dense classification tasks, e.g. semantic segmentation. The implementation presented makes use of freely available Sentinel-2 data which allows the generation of large scale datasets required for training deep neural networks (DNN). We discuss issues faced from the point of view of DNN training and evaluation such as checking the quality of ground truth data and comment on the scalability of the approach.


Author(s):  
Jos Hornikx ◽  
Annemarie Weerman ◽  
Hans Hoeken

According to Mercier and Sperber (2009, 2011, 2017), people have an immediate and intuitive feeling about the strength of an argument. These intuitive evaluations are not captured by current evaluation methods of argument strength, yet they could be important to predict the extent to which people accept the claim supported by the argument. In an exploratory study, therefore, a newly developed intuitive evaluation method to assess argument strength was compared to an explicit argument strength evaluation method (the PAS scale; Zhao et al., 2011), on their ability to predict claim acceptance (predictive validity) and on their sensitivity to differences in the manipulated quality of arguments (construct validity). An experimental study showed that the explicit argument strength evaluation performed well on the two validity measures. The intuitive evaluation measure, on the other hand, was not found to be valid. Suggestions for other ways of constructing and testing intuitive evaluation measures are presented.


2018 ◽  
Author(s):  
Naihui Zhou ◽  
Zachary D Siegel ◽  
Scott Zarecor ◽  
Nigel Lee ◽  
Darwin A Campbell ◽  
...  

AbstractThe accuracy of machine learning tasks critically depends on high quality ground truth data. Therefore, in many cases, producing good ground truth data typically involves trained professionals; however, this can be costly in time, effort, and money. Here we explore the use of crowdsourcing to generate a large number of training data of good quality. We explore an image analysis task involving the segmentation of corn tassels from images taken in a field setting. We investigate the accuracy, speed and other quality metrics when this task is performed by students for academic credit, Amazon MTurk workers, and Master Amazon MTurk workers. We conclude that the Amazon MTurk and Master Mturk workers perform significantly better than the for-credit students, but with no significant difference between the two MTurk worker types. Furthermore, the quality of the segmentation produced by Amazon MTurk workers rivals that of an expert worker. We provide best practices to assess the quality of ground truth data, and to compare data quality produced by different sources. We conclude that properly managed crowdsourcing can be used to establish large volumes of viable ground truth data at a low cost and high quality, especially in the context of high throughput plant phenotyping. We also provide several metrics for assessing the quality of the generated datasets.Author SummaryFood security is a growing global concern. Farmers, plant breeders, and geneticists are hastening to address the challenges presented to agriculture by climate change, dwindling arable land, and population growth. Scientists in the field of plant phenomics are using satellite and drone images to understand how crops respond to a changing environment and to combine genetics and environmental measures to maximize crop growth efficiency. However, the terabytes of image data require new computational methods to extract useful information. Machine learning algorithms are effective in recognizing select parts of images, butthey require high quality data curated by people to train them, a process that can be laborious and costly. We examined how well crowdsourcing works in providing training data for plant phenomics, specifically, segmenting a corn tassel – the male flower of the corn plant – from the often-cluttered images of a cornfield. We provided images to students, and to Amazon MTurkers, the latter being an on-demand workforce brokered by Amazon.com and paid on a task-by-task basis. We report on best practices in crowdsourcing image labeling for phenomics, and compare the different groups on measures such as fatigue and accuracy over time. We find that crowdsourcing is a good way of generating quality labeled data, rivaling that of experts.


Author(s):  
P. Glira ◽  
N. Pfeifer ◽  
C. Briese ◽  
C. Ressl

Airborne Laser Scanning (ALS) is an efficient method for the acquisition of dense and accurate point clouds over extended areas. To ensure a gapless coverage of the area, point clouds are collected strip wise with a considerable overlap. The redundant information contained in these overlap areas can be used, together with ground-truth data, to re-calibrate the ALS system and to compensate for systematic measurement errors. This process, usually denoted as <i>strip adjustment</i>, leads to an improved georeferencing of the ALS strips, or in other words, to a higher data quality of the acquired point clouds. We present a fully automatic strip adjustment method that (a) uses the original scanner and trajectory measurements, (b) performs an on-the-job calibration of the entire ALS multisensor system, and (c) corrects the trajectory errors individually for each strip. Like in the Iterative Closest Point (ICP) algorithm, correspondences are established iteratively and directly between points of overlapping ALS strips (avoiding a time-consuming segmentation and/or interpolation of the point clouds). The suitability of the method for large amounts of data is demonstrated on the basis of an ALS block consisting of 103 strips.


2021 ◽  
Author(s):  
Zheng Duan ◽  
Nina del Rosario ◽  
Jianzhi Dong ◽  
Hongkai Gao ◽  
Jian Peng ◽  
...  

&lt;p&gt;Soil moisture is an Essential Climate Variable (ECV) that plays an important role in land surface-atmosphere interactions. Accurate monitoring of soil moisture is essential for many studies in water, energy and carbon cycles. However, soil moisture is characterized with high spatial and temporal variability, making conventional point-based in-situ measurements difficult to sufficiently capture these variabilities given the often lack of dense in-situ network for most regions. Considerable efforts have been made to explore satellite remote sensing, hydrological and land surface models in estimating and mapping soil moisture, leading to increasing availability of different gridded soil moisture products at various spatial and temporal resolutions. The accuracy of an individual product varies between regions and needs to be evaluated in order to guide the selection of the most suitable products for certain applications. Such evaluation will also benefit product development and improvements. The most common (traditional) evaluation method is to calculate error metrics of the evaluated products with in-situ measurements as ground truth. The triple collocation (TC) analysis has been widely used and demonstrated powerful in evaluation of various products for different geophysical variables when ground truth is not available.&lt;/p&gt;&lt;p&gt;The Integrated Carbon Observation System (ICOS) is a research infrastructure with aim to quantify the greenhouse gas balance of Europe and adjacent regions. A standardized network of more than 140 research stations in 13 member states has been established and is operated by ICOS to provide direct measurements of climate relevant variables. The ICOS Carbon Portal offers a 'one-stop shop' freely for all ICOS data products at https://www.icos-cp.eu/observations/carbon-portal. This study evaluates for the first time a large number of different satellite-based and reanalysis surface soil moisture products at varying spatial and temporal resolutions using ICOS measurements from 2015 over Sweden. Evaluated products include ESA CCI, ASCAT, SMAP, SMOS, Sentinel-1 derived, ERA5 and GLDAS products. In order to quantify spatial patterns of errors of each individual product, TC analysis is applied to different combinations of gridded products for spatial evaluation across entire Sweden. The performance of products in different seasons and years is evaluated. The similarity and difference among different products for the drought period in the year 2018 is particularly assessed. This study is expected to improve our understanding of the applicability and limitations of various gridded soil moisture products in the Nordic region.&lt;/p&gt;


2020 ◽  
pp. 1-11
Author(s):  
Huang Wenming

The efficiency of traditional English teaching quality evaluation is relatively low, and evaluation statistics are very troublesome. Traditional evaluation method makes teaching evaluation a difficult project, and traditional evaluation method takes a long time and has low efficiency, which seriously affects the school’s efficiency. In order to improve the quality of English teaching, based on machine learning technology, this study combines Gaussian process to improve the algorithm, use mixed Gaussian to explore the distribution characteristics of samples, and improve the classic relevance vector machine model. Moreover, this study proposes an active learning algorithm that combines sparse Bayesian learning and mixed Gaussian, strategically selects and labels samples, and constructs a classifier that combines the distribution characteristics of the samples. In addition, this study designed a control experiment to analyze the performance of the model proposed in this study. It can be seen from the comparison that this research model has a good performance in the evaluation of the English teaching quality of traditional models and online models. This shows that the algorithm proposed in this paper has certain advantages, and it can be applied to the practice of English intelligent teaching system.


Sign in / Sign up

Export Citation Format

Share Document