Quantifying data quality in a citizen science monitoring program: False negatives, false positives and occupancy trends

<p>Accurate and complete inventory of natural hazard occurrence and their level of impact is a key first step to risk assessment, but it remains a challenge, especially for high frequency low impact events that rarely makes it to the news media. This challenge is even greater in rural areas of developing countries such as Uganda, where limited IT facilities prevent dissemination of information through social media. Here we report on a citizen-science initiative to monitor small-scale disasters (landslides and floods) occurring in the Rwenzori Mountains. A network of citizen (geo-)observers was established in February 2017 to collect temporally explicit geo-referenced information on eight different hazards and their impact using smartphone technology. Since then, over 500 hazard occurrences have been reported. However, such dataset needs to be assessed for its accuracy and potential biases before being used for scientific analysis. In this study, we evaluate the accuracy and completeness of the geo-observer-based disaster reports. First, systematic errors are reduced by peer reviewing the reports and implementing automatic tests to assess potential errors in detection and biases. Then, we compare the geo-observer-based records with two independent inventories collected through systematic field mapping and&#160; satellite imagery mapping, focusing on landslide and flood events for the period between May 2019 and May 2020.&#160; Results show over 95% of the geo-observer reports validated in the field were correctly identified and recorded less than 5 days after the occurrence (60% true positives, 1% false positives and 39% false negatives). For the satellite imagery mapping, 29% were true positives, 43% false positives and 28% false negatives. Geo-observers provide near real time disaster information on the location and level of impact, something difficult to achieve with systematic field and satellite imagery mapping. Depending on the topography of the area and the weather conditions, it can take several days to weeks before a cloud-free satellite image of a place can be obtained. The false negatives in the Geo-observer data are due to the tendency to report mainly occurrences along roads and rural foot paths since such occurrences are easily seen and accessed. Isolated small and inaccessible landslides are often not seen or reported to the Geo-observers. While satellite imagery mapping provides an opportunity to record disaster occurrences even in extremely inaccessible places, small landslides are often missed while shallow ones can easily be confused with freshly cleared vegetation for crop planting. Citizen science-based disaster reporting therefore not only provide the spatial occurrence of disasters but also the temporal and weather-related information, necessary for disaster risk analysis.</p>

Download Full-text

Sky Segmentation for Enhanced Depth Reconstruction and Bokeh Rendering with Efficient Architectures

Electronic Imaging ◽

10.2352/issn.2470-1173.2020.14.coimg-378 ◽

2020 ◽

Vol 2020 (14) ◽

pp. 378-1-378-7

Author(s):

Tyler Nuanes ◽

Matt Elsey ◽

Radek Grzeszczuk ◽

John Paul Shen

Keyword(s):

Real Time ◽

Mobile Device ◽

Computational Cost ◽

False Positives ◽

Compact Model ◽

High Quality ◽

False Negatives ◽

Trade Off ◽

Depth Reconstruction ◽

Binary Classifiers

We present a high-quality sky segmentation model for depth refinement and investigate residual architecture performance to inform optimally shrinking the network. We describe a model that runs in near real-time on mobile device, present a new, highquality dataset, and detail a unique weighing to trade off false positives and false negatives in binary classifiers. We show how the optimizations improve bokeh rendering by correcting stereo depth misprediction in sky regions. We detail techniques used to preserve edges, reject false positives, and ensure generalization to the diversity of sky scenes. Finally, we present a compact model and compare performance of four popular residual architectures (ShuffleNet, MobileNetV2, Resnet-101, and Resnet-34-like) at constant computational cost.

Download Full-text

Automatic Extraction of Acronyms from Text

10.26686/wgtn.12922298 ◽

2020 ◽

Author(s):

Stuart Yeates

Keyword(s):

Digital Library ◽

False Positives ◽

Automatic Extraction ◽

False Negatives ◽

Library Research ◽

Communications Theory ◽

Textual Content

A brief introduction to acronyms is given and motivation for extracting them in a digital library environment is discussed. A technique for extracting acronyms is given with an analysis of the results. The technique is found to have a low number of false negatives and a high number of false positives. Introduction Digital library research seeks to build tools to enable access of content, while making as few as possible assumptions about the content, since assumptions limit the range of applicability of the tools. Generally, the broader the assumptions the more widely applicable the tools. For example, keyword based indexing [5] is based on communications theory and applies to all natural human textual languages (allowances for differences in character sets and similar localisation issues not withstanding) . The algorithm described in this paper makes much stronger assumptions about the content. It assumes textual content that contains acronyms, an assumption which is known to hold for...

Download Full-text

Avoiding Interest-Based Revenues While Constructing Shariah-Compliant Portfolios: False Negatives and False Positives

SSRN Electronic Journal ◽

10.2139/ssrn.2975790 ◽

2017 ◽

Author(s):

zggr Arslan-Ayaydin ◽

Kris Boudt ◽

Muhammad Wajid Raza

Keyword(s):

False Positives ◽

False Negatives

Download Full-text

A Conceptual Probabilistic Framework for Annotation Aggregation of Citizen Science Data

Mathematics ◽

10.3390/math9080875 ◽

2021 ◽

Vol 9 (8) ◽

pp. 875

Author(s):

Jesus Cerquides ◽

Mehmet Oğuz Mülâyim ◽

Jerónimo Hernández-González ◽

Amudha Ravi Shankar ◽

Jose Luis Fernandez-Marquez

Keyword(s):

Data Quality ◽

Citizen Science ◽

Graphical Model ◽

Real Life ◽

Scientific Journals ◽

Probabilistic Framework ◽

Science Data ◽

Label Aggregation ◽

Evaluation Of Data ◽

Model Formalism

Over the last decade, hundreds of thousands of volunteers have contributed to science by collecting or analyzing data. This public participation in science, also known as citizen science, has contributed to significant discoveries and led to publications in major scientific journals. However, little attention has been paid to data quality issues. In this work we argue that being able to determine the accuracy of data obtained by crowdsourcing is a fundamental question and we point out that, for many real-life scenarios, mathematical tools and processes for the evaluation of data quality are missing. We propose a probabilistic methodology for the evaluation of the accuracy of labeling data obtained by crowdsourcing in citizen science. The methodology builds on an abstract probabilistic graphical model formalism, which is shown to generalize some already existing label aggregation models. We show how to make practical use of the methodology through a comparison of data obtained from different citizen science communities analyzing the earthquake that took place in Albania in 2019.

Download Full-text

The impact of data quality filtering of opportunistic citizen science data on species distribution model performance

Ecological Modelling ◽

10.1016/j.ecolmodel.2021.109453 ◽

2021 ◽

Vol 444 ◽

pp. 109453

Author(s):

Camille Van Eupen ◽

Dirk Maes ◽

Marc Herremans ◽

Kristijn R.R. Swinnen ◽

Ben Somers ◽

...

Keyword(s):

Data Quality ◽

Citizen Science ◽

Species Distribution ◽

Species Distribution Model ◽

Model Performance ◽

Distribution Model ◽

Science Data ◽

Quality Filtering ◽

The Impact

Download Full-text

CWDAT—An Open-Source Tool for the Visualization and Analysis of Community-Generated Water Quality Data

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10040207 ◽

2021 ◽

Vol 10 (4) ◽

pp. 207

Author(s):

Annie Gray ◽

Colin Robertson ◽

Rob Feick

Keyword(s):

Water Quality ◽

Data Quality ◽

Capacity Building ◽

Open Source ◽

Citizen Science ◽

Water Resource ◽

Water Quality Monitoring ◽

Quality Monitoring ◽

User Engagement ◽

Community Based

Citizen science initiatives span a wide range of topics, designs, and research needs. Despite this heterogeneity, there are several common barriers to the uptake and sustainability of citizen science projects and the information they generate. One key barrier often cited in the citizen science literature is data quality. Open-source tools for the analysis, visualization, and reporting of citizen science data hold promise for addressing the challenge of data quality, while providing other benefits such as technical capacity-building, increased user engagement, and reinforcing data sovereignty. We developed an operational citizen science tool called the Community Water Data Analysis Tool (CWDAT)—a R/Shiny-based web application designed for community-based water quality monitoring. Surveys and facilitated user-engagement were conducted among stakeholders during the development of CWDAT. Targeted recruitment was used to gather feedback on the initial CWDAT prototype’s interface, features, and potential to support capacity building in the context of community-based water quality monitoring. Fourteen of thirty-two invited individuals (response rate 44%) contributed feedback via a survey or through facilitated interaction with CWDAT, with eight individuals interacting directly with CWDAT. Overall, CWDAT was received favourably. Participants requested updates and modifications such as water quality thresholds and indices that reflected well-known barriers to citizen science initiatives related to data quality assurance and the generation of actionable information. Our findings support calls to engage end-users directly in citizen science tool design and highlight how design can contribute to users’ understanding of data quality. Enhanced citizen participation in water resource stewardship facilitated by tools such as CWDAT may provide greater community engagement and acceptance of water resource management and policy-making.

Download Full-text

A New Automatic Monitoring Network of Surface Waters in Greece: Preliminary Data Quality Checks and Visualization

Hydrology ◽

10.3390/hydrology8010033 ◽

2021 ◽

Vol 8 (1) ◽

pp. 33

Author(s):

Yiannis Panagopoulos ◽

Anna Konstantinidou ◽

Konstantinos Lazogiannis ◽

Anastasios Papadopoulos ◽

Elias Dimitriou

Keyword(s):

Data Quality ◽

Surface Waters ◽

Data Dissemination ◽

Monitoring Program ◽

Monitoring Network ◽

Automatic Monitoring ◽

Water Monitoring ◽

Marine Research ◽

Management Actions ◽

Quality Checks

The monitoring of surface waters is of fundamental importance for their preservation under good quantitative and qualitative conditions, as it can facilitate the understanding of the actual status of water and indicate suitable management actions. Taking advantage of the experience gained from the coordination of the national water monitoring program in Greece and the available funding from two ongoing infrastructure projects, the Institute of Inland Waters of the Hellenic Centre for Marine Research has developed the first homogeneous real-time network of automatic water monitoring across many Greek rivers. In this paper, its installation and maintenance procedures are presented with emphasis on the data quality checks, based on values range and variability tests, before their online publication and dissemination to end-users. Preliminary analyses revealed that the water pH and dissolved oxygen (DO) sensors and produced data need increased maintenance and quality checks respectively, compared to the more reliably recorded water stage, temperature (T) and electrical conductivity (EC). Moreover, the data dissemination platform and selected data visualization options are demonstrated and the need for both this platform and the monitoring network to be maintained and potentially expanded after the termination of the funding projects is highlighted.

Download Full-text

Evaluation of Positive T- and B-Cell Gene Rearrangement Studies Among Patients Without a Definitive Diagnosis by Other Assays

American Journal of Clinical Pathology ◽

10.1093/ajcp/aqz112.067 ◽

2019 ◽

Vol 152 (Supplement_1) ◽

pp. S35-S36

Author(s):

Hadrian Mendoza ◽

Christopher Tormey ◽

Alexa Siddon

Keyword(s):

T Cell ◽

False Positive ◽

Gene Rearrangement ◽

Hematologic Malignancy ◽

False Negative ◽

False Positives ◽

False Negatives ◽

True Negative ◽

Flow Cytometric ◽

Pathology Reports

Abstract In the evaluation of bone marrow (BM) and peripheral blood (PB) for hematologic malignancy, positive immunoglobulin heavy chain (IG) or T-cell receptor (TCR) gene rearrangement results may be detected despite unrevealing results from morphologic, flow cytometric, immunohistochemical (IHC), and/or cytogenetic studies. The significance of positive rearrangement studies in the context of otherwise normal ancillary findings is unknown, and as such, we hypothesized that gene rearrangement studies may be predictive of an emerging B- or T-cell clone in the absence of other abnormal laboratory tests. Data from all patients who underwent IG or TCR gene rearrangement testing at the authors’ affiliated VA hospital between January 1, 2013, and July 6, 2018, were extracted from the electronic medical record. Date of testing; specimen source; and morphologic, flow cytometric, IHC, and cytogenetic characterization of the tissue source were recorded from pathology reports. Gene rearrangement results were categorized as true positive, false positive, false negative, or true negative. Lastly, patient records were reviewed for subsequent diagnosis of hematologic malignancy in patients with positive gene rearrangement results with negative ancillary testing. A total of 136 patients, who had 203 gene rearrangement studies (50 PB and 153 BM), were analyzed. In TCR studies, there were 2 false positives and 1 false negative in 47 PB assays, as well as 7 false positives and 1 false negative in 54 BM assays. Regarding IG studies, 3 false positives and 12 false negatives in 99 BM studies were identified. Sensitivity and specificity, respectively, were calculated for PB TCR studies (94% and 93%), BM IG studies (71% and 95%), and BM TCR studies (92% and 83%). Analysis of PB IG gene rearrangement studies was not performed due to the small number of tests (3; all true negative). None of the 12 patients with false-positive IG/TCR gene rearrangement studies later developed a lymphoproliferative disorder, although 2 patients were later diagnosed with acute myeloid leukemia. Of the 14 false negatives, 10 (71%) were related to a diagnosis of plasma cell neoplasms. Results from the present study suggest that positive IG/TCR gene rearrangement studies are not predictive of lymphoproliferative disorders in the context of otherwise negative BM or PB findings. As such, when faced with equivocal pathology reports, clinicians can be practically advised that isolated positive IG/TCR gene rearrangement results may not indicate the need for closer surveillance.

Download Full-text

Emerging problems of data quality in citizen science

Conservation Biology ◽

10.1111/cobi.12706 ◽

2016 ◽

Vol 30 (3) ◽

pp. 447-449 ◽

Cited By ~ 67

Author(s):

Roman Lukyanenko ◽

Jeffrey Parsons ◽

Yolanda F. Wiersma

Keyword(s):

Data Quality ◽

Citizen Science

Download Full-text