Survey Data Quality in Analyzing Harmonized Indicators of Protest Behavior: A Survey Data Recycling Approach

This article proposes a new approach to analyze protest participation measured in surveys of uneven quality. Because single international survey projects cover only a fraction of the world’s nations in specific periods, researchers increasingly turn to ex-post harmonization of different survey data sets not a priori designed as comparable. However, very few scholars systematically examine the impact of the survey data quality on substantive results. We argue that the variation in source data, especially deviations from standards of survey documentation, data processing, and computer files—proposed by methodologists of Total Survey Error, Survey Quality Monitoring, and Fitness for Intended Use—is important for analyzing protest behavior. In particular, we apply the Survey Data Recycling framework to investigate the extent to which indicators of attending demonstrations and signing petitions in 1,184 national survey projects are associated with measures of data quality, controlling for variability in the questionnaire items. We demonstrate that the null hypothesis of no impact of measures of survey quality on indicators of protest participation must be rejected. Measures of survey documentation, data processing, and computer records, taken together, explain over 5% of the intersurvey variance in the proportions of the populations attending demonstrations or signing petitions.

Download Full-text

Detecting Fraudulent Interviewers by Improved Clustering Methods – The Case of Falsifications of Answers to Parts of a Questionnaire

Journal of Official Statistics ◽

10.1515/jos-2016-0033 ◽

2016 ◽

Vol 32 (3) ◽

pp. 643-660 ◽

Cited By ~ 3

Author(s):

Samuel De Haas ◽

Peter Winker

Keyword(s):

Cluster Analysis ◽

Data Quality ◽

Survey Data ◽

Empirical Research ◽

A Priori ◽

Cluster Method ◽

Clustering Methods ◽

New Approach ◽

Bootstrap Approach ◽

Synthetic Datasets

Abstract Falsified interviews represent a serious threat to empirical research based on survey data. The identification of such cases is important to ensure data quality. Applying cluster analysis to a set of indicators helps to identify suspicious interviewers when a substantial share of all of their interviews are complete falsifications, as shown by previous research. This analysis is extended to the case when only a share of questions within all interviews provided by an interviewer is fabricated. The assessment is based on synthetic datasets with a priori set properties. These are constructed from a unique experimental dataset containing both real and fabricated data for each respondent. Such a bootstrap approach makes it possible to evaluate the robustness of the method when the share of fabricated answers per interview decreases. The results indicate a substantial loss of discriminatory power in the standard cluster analysis if the share of fabricated answers within an interview becomes small. Using a novel cluster method which allows imposing constraints on cluster sizes, performance can be improved, in particular when only few falsifiers are present. This new approach will help to increase the robustness of survey data by detecting potential falsifiers more reliably.

Download Full-text

A real product scandal’s impact on a high-equity brand: a new approach to assessing scandal impact

Journal of Product & Brand Management ◽

10.1108/jpbm-05-2017-1469 ◽

2018 ◽

Vol 27 (4) ◽

pp. 427-439 ◽

Cited By ~ 4

Author(s):

Verena Sabine Thaler ◽

Uta Herbst ◽

Michael A. Merz

Keyword(s):

Brand Management ◽

A Priori ◽

Spillover Effects ◽

Negative Effects ◽

New Approach ◽

Content Type ◽

Ex Post ◽

Research Findings ◽

Longitudinal Support ◽

The Impact

Purpose While product scandals generate many negative headlines, the extent of their impact on the scandalized brands’ equity remains unclear. Research findings are mixed. This might be because of the limitations of existing measurement approaches when investigating the effects of real crises after they occurred. This study aims to propose a new approach for measuring the impact of a real scandal on a high-equity brand using only post-crisis measures. Design/methodology/approach To overcome the challenge of comparing a priori and ex post outcome measures, this study draws on the brand management literature to evaluate a real scandal’s impact. Volkswagen’s emission scandal serves as a failure context. Two consumer experiments are conducted to examine its impact. Findings The results provide (longitudinal) support for the proposed evaluative approach. They reveal new evidence that building brand equity is a means to mitigate negative effects, and indicate that negative spillover effects within a high-equity brand portfolio are unlikely. Finally, this research identifies situations in which developing a new brand might be more beneficial than leveraging an existing brand. Practical implications This research has significant implications for firms with high-equity brands that might be affected by a scandal. The findings support managers to navigate their brands through a crisis. Originality/value This research adds to the discussion concerning the role of a brand’s equity in a crisis. Existing research findings are contradictory. This research provides new empirical evidence and another view on how to measure “impact”.

Download Full-text

Multi-Variable, High Order, Performance Models (2005C)

Fluids Engineering ◽

10.1115/imece2005-79416 ◽

2005 ◽

Cited By ~ 3

Author(s):

David Japikse ◽

Oleg Dubitsky ◽

Kerry N. Oliphant ◽

Robert J. Pelton ◽

Daniel Maynes ◽

...

Keyword(s):

Data Processing ◽

Large Data ◽

High Order ◽

Large Data Sets ◽

Data Sets ◽

Performance Models ◽

Statistical Accuracy ◽

Evaluation Methodologies ◽

New Models ◽

The Impact

In the course of developing advanced data processing and advanced performance models, as presented in companion papers, a number of basic scientific and mathematical questions arose. This paper deals with questions such as uniqueness, convergence, statistical accuracy, training, and evaluation methodologies. The process of bringing together large data sets and utilizing them, with outside data supplementation, is considered in detail. After these questions are focused carefully, emphasis is placed on how the new models, based on highly refined data processing, can best be used in the design world. The impact of this work on designs of the future is discussed. It is expected that this methodology will assist designers to move beyond contemporary design practices.

Download Full-text

Survey Research in the Arab World

10.1093/oxfordhb/9780190213299.013.14 ◽

2017 ◽

Author(s):

Lindsay J. Benstead

Keyword(s):

Data Quality ◽

Survey Research ◽

Research Agenda ◽

Voting Behavior ◽

Arab World ◽

Data Sets ◽

World Values Survey ◽

Methodological Research ◽

Survey Error ◽

The Arab Spring

Since the first surveys were conducted there in the late 1980s, survey research has expanded rapidly in the Arab world. Almost every country in the region is now included in the Arab Barometer, Afrobarometer, or World Values Survey. Moreover, the Arab spring marked a watershed, with the inclusion of Tunisia and Libya and addition of many topics, such as voting behavior, that were previously considered too sensitive. As a result, political scientists have dozens of largely untapped data sets to answer theoretical and policy questions. To make progress toward measuring and reducing total survey error, discussion is needed about quality issues, such as high rates of missingness and sampling challenges. Ongoing attention to ethics is also critical. This chapter discusses these developments and frames a substantive and methodological research agenda for improving data quality and survey practice in the Arab world.

Download Full-text

COVID-19’s Socioeconomic Impact on Low-Income Benefit Recipients: Early Evidence from Tracking Surveys

Socius Sociological Research for a Dynamic World ◽

10.1177/2378023120970794 ◽

2020 ◽

Vol 6 ◽

pp. 237802312097079

Author(s):

Diana Enriquez ◽

Adam Goldstein

Keyword(s):

Survey Data ◽

Racial Differences ◽

Low Income ◽

Census Bureau ◽

Socioeconomic Impact ◽

Data Sets ◽

Online Surveys ◽

Assistance Program ◽

Early Evidence ◽

The Impact

The coronavirus disease 2019 (COVID-19) pandemic has introduced manifold dislocations in Americans’ lives. Using novel survey data samples of Supplemental Nutritional Assistance Program (SNAP) recipients and U.S. Census Bureau Household Pulse Survey data, the authors examine the incidence of COVID-19-induced hardships among low-income/benefits-eligible households during the early months of the crisis. Five repeated online surveys of SNAP recipients measured perceived and realized housing insecurity, food scarcity, new debt accrual, and recent job loss. These data were supplemented by creating parallel measures among all low-income households from Household Pulse Survey. Food insecurity and debt accrual grew more prevalent between from April to June 2020, and job losses compounded. Although the magnitude of racial differences varies across indicators and data sources, black respondents fared consistently worse than non-Hispanic whites in both survey data sets, and Latinx respondents fared worse than whites in the Household Pulse Survey. These results provide early systematic evidence on the impact of the COVID-19 crisis on poor Americans and racial disparities therein.

Download Full-text

Total Column Water Vapour Retrieval from S-5P/TROPOMI in the Visible Blue Spectral Range

10.5194/egusphere-egu2020-3408 ◽

2020 ◽

Author(s):

Christian Borger ◽

Steffen Beirle ◽

Steffen Dörner ◽

Holger Sihler ◽

Thomas Wagner

Keyword(s):

Water Vapour ◽

Spectral Range ◽

Climate Modeling ◽

Weather Prediction ◽

A Priori ◽

Data Sets ◽

Vertical Column ◽

Linearized Scheme ◽

The Impact ◽

Total Column

<div> <p>Atmospheric water plays a key role for the Earth&#8217;s energy budget and temperature distribution via radiative effects (clouds and vapour) and latent heat transport. Thus, the distribution and transport of water vapour are closely linked to atmospheric dynamics on different spatio-temporal scales. In this context, global monitoring of the water vapour distribution is essential for numerical weather prediction, climate modeling and a better understanding of climate feedbacks.</p> </div><div> <p>Here, we present a total column water vapour (TCWV) retrieval using the absorption structures of water vapour in the visible blue spectral range. The retrieval consists of the common two-step DOAS approach: first the spectral analysis is performed within a linearized scheme. Then, the retrieved slant column densities are converted to vertical column densities (VCDs) using an iterative scheme for the water vapour a priori profile shape which is based on an empirical parameterization of the water vapour scale height.&#160;&#160;</p> </div><div> <p>We apply this novel retrieval to measurements of the TROPOspheric Monitoring Instrument (TROPOMI) onboard ESA&#8216;s Sentinel-5P satellite and compare our retrieved H<sub>2</sub>O VCDs to a variety of different reference data sets. Furthermore we present a detailed characterization of this retrieval including theoretical error estimations for different observation conditions. In addition we investigate the impact of different input data sets (e.g. surface albedo) on the retrieved H<sub>2</sub>O VCDs.&#160;&#160;</p> </div>

Download Full-text

The applying of machine learning methods to improve the quality of well casing

Oil and Gas Studies ◽

10.31660/0445-0108-2020-5-81-93 ◽

2020 ◽

pp. 81-93

Author(s):

D. V. Shalyapin ◽

D. L. Bakirov ◽

M. M. Fattakhov ◽

A. D. Shalyapina ◽

A. V. Melekhov ◽

...

Keyword(s):

Oil And Gas ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

New Approach ◽

Geological Conditions ◽

Expert Assessments ◽

The Impact ◽

The Relationship

The article is devoted to the quality of well casing at the Pyakyakhinskoye oil and gas condensate field. The issue of improving the quality of well casing is associated with many problems, for example, a large amount of work on finding the relationship between laboratory studies and actual data from the field; the difficulty of finding logically determined relationships between the parameters and the final quality of well casing. The text gives valuable information on a new approach to assessing the impact of various parameters, based on a mathematical apparatus that excludes subjective expert assessments, which in the future will allow applying this method to deposits with different rock and geological conditions. We propose using the principles of mathematical processing of large data sets applying neural networks trained to predict the characteristics of the quality of well casing (continuity of contact of cement with the rock and with the casing). Taking into account the previously identified factors, we developed solutions to improve the tightness of the well casing and the adhesion of cement to the limiting surfaces.

Download Full-text

Data Visualization

Employee Surveys and Sensing ◽

10.1093/oso/9780190939717.003.0019 ◽

2020 ◽

pp. 306-323

Author(s):

Evan F. Sinar

Keyword(s):

Survey Data ◽

Data Visualization ◽

Repeated Administration ◽

Quantitative Information ◽

Evidence Based ◽

Data Sets ◽

High Quality ◽

Text Information ◽

Visualization Techniques ◽

The Impact

Data visualization—a set of approaches for applying graphical principles to represent quantitative information—is extremely well matched to the nature of survey data but often underleveraged for this purpose. Surveys produce data sets that are highly structured and comparative across groups and geographies, that often blend numerical and open-text information, and that are designed for repeated administration and analysis. Each of these characteristics aligns well with specific visualization types, use of which has the potential to—when paired with foundational, evidence-based tenets of high-quality graphical representations—substantially increase the impact and influence of data presentations given by survey researchers. This chapter recommends and provides guidance on data visualization techniques fit to purpose for survey researchers, while also describing key risks and missteps associated with these approaches.

Download Full-text

Morphological Characters Can Strongly Influence Early Animal Relationships Inferred from Phylogenomic Data Sets

Systematic Biology ◽

10.1093/sysbio/syaa038 ◽

2020 ◽

Author(s):

Johannes S Neumann ◽

Rob Desalle ◽

Apurva Narechania ◽

Bernd Schierwater ◽

Michael Tessler

Keyword(s):

A Priori ◽

Genomic Data ◽

Morphological Characters ◽

Molecular Data ◽

Morphological Data ◽

Taxon Sampling ◽

Data Sets ◽

Analysis Tool ◽

Character Weighting ◽

The Impact

Abstract There are considerable phylogenetic incongruencies between morphological and phylogenomic data for the deep evolution of animals. This has contributed to a heated debate over the earliest-branching lineage of the animal kingdom: the sister to all other Metazoa (SOM). Here, we use published phylogenomic data sets ($\sim $45,000–400,000 characters in size with $\sim $15–100 taxa) that focus on early metazoan phylogeny to evaluate the impact of incorporating morphological data sets ($\sim $15–275 characters). We additionally use small exemplar data sets to quantify how increased taxon sampling can help stabilize phylogenetic inferences. We apply a plethora of common methods, that is, likelihood models and their “equivalent” under parsimony: character weighting schemes. Our results are at odds with the typical view of phylogenomics, that is, that genomic-scale data sets will swamp out inferences from morphological data. Instead, weighting morphological data 2–10$\times $ in both likelihood and parsimony can in some cases “flip” which phylum is inferred to be the SOM. This typically results in the molecular hypothesis of Ctenophora as the SOM flipping to Porifera (or occasionally Placozoa). However, greater taxon sampling improves phylogenetic stability, with some of the larger molecular data sets ($>$200,000 characters and up to $\sim $100 taxa) showing node stability even with $\geqq100\times $ upweighting of morphological data. Accordingly, our analyses have three strong messages. 1) The assumption that genomic data will automatically “swamp out” morphological data is not always true for the SOM question. Morphological data have a strong influence in our analyses of combined data sets, even when outnumbered thousands of times by molecular data. Morphology therefore should not be counted out a priori. 2) We here quantify for the first time how the stability of the SOM node improves for several genomic data sets when the taxon sampling is increased. 3) The patterns of “flipping points” (i.e., the weighting of morphological data it takes to change the inferred SOM) carry information about the phylogenetic stability of matrices. The weighting space is an innovative way to assess comparability of data sets that could be developed into a new sensitivity analysis tool. [Metazoa; Morphology; Phylogenomics; Weighting.]

Download Full-text

Strategies for the Identification and Prevention of Survey Fraud: Data Analysis of a Web-Based Survey (Preprint)

10.2196/preprints.30730 ◽

2021 ◽

Author(s):

Mandi Pratt-Chapman ◽

Jenna Moses ◽

Hannah Arem

Keyword(s):

Social Media ◽

Financial Incentive ◽

A Priori ◽

Turing Test ◽

Demographic Characteristics ◽

Data Sets ◽

Web Based ◽

Survey Responses ◽

Almost All ◽

The Impact

BACKGROUND To assess the impact of COVID-19 on cancer survivors, we fielded a survey promoted via email and social media in winter 2020. Examination of the data showed suspicious patterns that warranted serious review. OBJECTIVE The aim of this paper is to review the methods used to identify and prevent fraudulent survey responses. METHODS As precautions, we included a Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA), a hidden question, and instructions for respondents to type a specific word. To identify likely fraudulent data, we defined a priori indicators that warranted elimination or suspicion. If a survey contained two or more suspicious indicators, the survey was eliminated. We examined differences between the retained and eliminated data sets. RESULTS Of the total responses (N=1977), nearly three-fourths (n=1408) were dropped and one-fourth (n=569) were retained after data quality checking. Comparisons of the two data sets showed statistically significant differences across almost all demographic characteristics. CONCLUSIONS Numerous precautions beyond the inclusion of a CAPTCHA are needed when fielding web-based surveys, particularly if a financial incentive is offered.

Download Full-text