The Effect of New Data Collection Technologies on Survey Data Quality

This paper investigates whether a ‘wisdom of the crowd’ approach might offer an alternative to recent political polls that have raised questions about survey data quality. Data collection costs have become so low that, as well as the question of data quality, concerns have also been raised about low response rates, professional respondents and respondent interaction. There are also uncertainties about self-selecting ‘samples’. This paper looks at more than 100 such surveys and reports that, in five out of the six cases discussed, £0.08p interviews delivered results in line with known outcomes. The results discussed in the paper show that such interviews are not a waste of money.

Download Full-text

Using Machine Learning to Optimize the Quality of Survey Data: Protocol for a Use Case in India

JMIR Research Protocols ◽

10.2196/17619 ◽

2020 ◽

Vol 9 (8) ◽

pp. e17619

Author(s):

Neha Shah ◽

Diwakar Mohan ◽

Jean Juste Harisson Bashingwa ◽

Osama Ummer ◽

Arpita Chakraborty ◽

...

Keyword(s):

Machine Learning ◽

Quality Assurance ◽

Data Collection ◽

Data Quality ◽

Survey Data ◽

Data Analytics ◽

Feedback Loops ◽

Time Data ◽

Real Time Data

Background Data quality is vital for ensuring the accuracy, reliability, and validity of survey findings. Strategies for ensuring survey data quality have traditionally used quality assurance procedures. Data analytics is an increasingly vital part of survey quality assurance, particularly in light of the increasing use of tablets and other electronic tools, which enable rapid, if not real-time, data access. Routine data analytics are most often concerned with outlier analyses that monitor a series of data quality indicators, including response rates, missing data, and reliability of coefficients for test-retest interviews. Machine learning is emerging as a possible tool for enhancing real-time data monitoring by identifying trends in the data collection, which could compromise quality. Objective This study aimed to describe methods for the quality assessment of a household survey using both traditional methods as well as machine learning analytics. Methods In the Kilkari impact evaluation’s end-line survey amongst postpartum women (n=5095) in Madhya Pradesh, India, we plan to use both traditional and machine learning–based quality assurance procedures to improve the quality of survey data captured on maternal and child health knowledge, care-seeking, and practices. The quality assurance strategy aims to identify biases and other impediments to data quality and includes seven main components: (1) tool development, (2) enumerator recruitment and training, (3) field coordination, (4) field monitoring, (5) data analytics, (6) feedback loops for decision making, and (7) outcomes assessment. Analyses will include basic descriptive and outlier analyses using machine learning algorithms, which will involve creating features from time-stamps, “don’t know” rates, and skip rates. We will also obtain labeled data from self-filled surveys, and build models using k-folds cross-validation on a training data set using both supervised and unsupervised learning algorithms. Based on these models, results will be fed back to the field through various feedback loops. Results Data collection began in late October 2019 and will span through March 2020. We expect to submit quality assurance results by August 2020. Conclusions Machine learning is underutilized as a tool to improve survey data quality in low resource settings. Study findings are anticipated to improve the overall quality of Kilkari survey data and, in turn, enhance the robustness of the impact evaluation. More broadly, the proposed quality assurance approach has implications for data capture applications used for special surveys as well as in the routine collection of health information by health workers. International Registered Report Identifier (IRRID) DERR1-10.2196/17619

Download Full-text

Using Machine Learning to Optimize the Quality of Survey Data: Protocol for a Use Case in India (Preprint)

10.2196/preprints.17619 ◽

2019 ◽

Author(s):

Neha Shah ◽

Diwakar Mohan ◽

Jean Juste Harisson Bashingwa ◽

Osama Ummer ◽

Arpita Chakraborty ◽

...

Keyword(s):

Machine Learning ◽

Quality Assurance ◽

Data Collection ◽

Data Quality ◽

Survey Data ◽

Data Analytics ◽

Feedback Loops ◽

Time Data ◽

Real Time Data

BACKGROUND Data quality is vital for ensuring the accuracy, reliability, and validity of survey findings. Strategies for ensuring survey data quality have traditionally used quality assurance procedures. Data analytics is an increasingly vital part of survey quality assurance, particularly in light of the increasing use of tablets and other electronic tools, which enable rapid, if not real-time, data access. Routine data analytics are most often concerned with outlier analyses that monitor a series of data quality indicators, including response rates, missing data, and reliability of coefficients for test-retest interviews. Machine learning is emerging as a possible tool for enhancing real-time data monitoring by identifying trends in the data collection, which could compromise quality. OBJECTIVE This study aimed to describe methods for the quality assessment of a household survey using both traditional methods as well as machine learning analytics. METHODS In the Kilkari impact evaluation’s end-line survey amongst postpartum women (n=5095) in Madhya Pradesh, India, we plan to use both traditional and machine learning–based quality assurance procedures to improve the quality of survey data captured on maternal and child health knowledge, care-seeking, and practices. The quality assurance strategy aims to identify biases and other impediments to data quality and includes seven main components: (1) tool development, (2) enumerator recruitment and training, (3) field coordination, (4) field monitoring, (5) data analytics, (6) feedback loops for decision making, and (7) outcomes assessment. Analyses will include basic descriptive and outlier analyses using machine learning algorithms, which will involve creating features from time-stamps, “don’t know” rates, and skip rates. We will also obtain labeled data from self-filled surveys, and build models using k-folds cross-validation on a training data set using both supervised and unsupervised learning algorithms. Based on these models, results will be fed back to the field through various feedback loops. RESULTS Data collection began in late October 2019 and will span through March 2020. We expect to submit quality assurance results by August 2020. CONCLUSIONS Machine learning is underutilized as a tool to improve survey data quality in low resource settings. Study findings are anticipated to improve the overall quality of Kilkari survey data and, in turn, enhance the robustness of the impact evaluation. More broadly, the proposed quality assurance approach has implications for data capture applications used for special surveys as well as in the routine collection of health information by health workers. CLINICALTRIAL INTERNATIONAL REGISTERED REPORT DERR1-10.2196/17619

Download Full-text

Survey Data Quality in Analyzing Harmonized Indicators of Protest Behavior: A Survey Data Recycling Approach

American Behavioral Scientist ◽

10.1177/00027642211021623 ◽

2021 ◽

pp. 000276422110216

Author(s):

Kazimierz M. Slomczynski ◽

Irina Tomescu-Dubrow ◽

Ilona Wysmulek

Keyword(s):

Data Processing ◽

Data Quality ◽

Survey Data ◽

A Priori ◽

Data Sets ◽

New Approach ◽

Survey Quality ◽

Survey Error ◽

Ex Post ◽

The Impact

This article proposes a new approach to analyze protest participation measured in surveys of uneven quality. Because single international survey projects cover only a fraction of the world’s nations in specific periods, researchers increasingly turn to ex-post harmonization of different survey data sets not a priori designed as comparable. However, very few scholars systematically examine the impact of the survey data quality on substantive results. We argue that the variation in source data, especially deviations from standards of survey documentation, data processing, and computer files—proposed by methodologists of Total Survey Error, Survey Quality Monitoring, and Fitness for Intended Use—is important for analyzing protest behavior. In particular, we apply the Survey Data Recycling framework to investigate the extent to which indicators of attending demonstrations and signing petitions in 1,184 national survey projects are associated with measures of data quality, controlling for variability in the questionnaire items. We demonstrate that the null hypothesis of no impact of measures of survey quality on indicators of protest participation must be rejected. Measures of survey documentation, data processing, and computer records, taken together, explain over 5% of the intersurvey variance in the proportions of the populations attending demonstrations or signing petitions.

Download Full-text

Thinking about police data: Analysts’ perceptions of data quality in Canadian policing

The Police Journal Theory Practice and Principles ◽

10.1177/0032258x211021461 ◽

2021 ◽

pp. 0032258X2110214

Author(s):

Christopher D O’Connor ◽

John Ng ◽

Dallas Hill ◽

Tyler Frederick

Keyword(s):

Big Data ◽

Data Collection ◽

Data Quality ◽

Research Culture ◽

Police Services ◽

Police Data ◽

Data Collection And Analysis ◽

Quality Issues

Policing is increasingly being shaped by data collection and analysis. However, we still know little about the quality of the data police services acquire and utilize. Drawing on a survey of analysts from across Canada, this article examines several data collection, analysis, and quality issues. We argue that as we move towards an era of big data policing it is imperative that police services pay more attention to the quality of the data they collect. We conclude by discussing the implications of ignoring data quality issues and the need to develop a more robust research culture in policing.

Download Full-text

Utilizing asynchronous email interviews for health research: overview of benefits and drawbacks

BMC Research Notes ◽

10.1186/s13104-021-05547-2 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Michelle Amri ◽

Christina Angelakis ◽

Dilani Logan

Keyword(s):

Public Health ◽

Data Collection ◽

Data Quality ◽

Health Research ◽

Geographic Location ◽

Public Health Research ◽

Data Accuracy ◽

Global Public Health ◽

Snowball Sampling ◽

Face To Face

Abstract Objective Through collating observations from various studies and complementing these findings with one author’s study, a detailed overview of the benefits and drawbacks of asynchronous email interviewing is provided. Through this overview, it is evident there is great potential for asynchronous email interviews in the broad field of health, particularly for studies drawing on expertise from participants in academia or professional settings, those across varied geographical settings (i.e. potential for global public health research), and/or in circumstances when face-to-face interactions are not possible (e.g. COVID-19). Results Benefits of asynchronous email interviewing and additional considerations for researchers are discussed around: (i) access transcending geographic location and during restricted face-to-face communications; (ii) feasibility and cost; (iii) sampling and inclusion of diverse participants; (iv) facilitating snowball sampling and increased transparency; (v) data collection with working professionals; (vi) anonymity; (vii) verification of participants; (viii) data quality and enhanced data accuracy; and (ix) overcoming language barriers. Similarly, potential drawbacks of asynchronous email interviews are also discussed with suggested remedies, which centre around: (i) time; (ii) participant verification and confidentiality; (iii) technology and sampling concerns; (iv) data quality and availability; and (v) need for enhanced clarity and precision.

Download Full-text

Conducting Population Health Research during the COVID-19 Pandemic: Impacts and Recommendations

Sustainability ◽

10.3390/su13063320 ◽

2021 ◽

Vol 13 (6) ◽

pp. 3320

Author(s):

Amy R. Villarosa ◽

Lucie M. Ramjan ◽

Della Maneze ◽

Ajesh George

Keyword(s):

Data Collection ◽

Health Services ◽

Data Quality ◽

Population Health ◽

Health Research ◽

Alternative Methods ◽

Residential Aged Care ◽

Recruitment Methods ◽

Face To Face ◽

The Impact

The COVID-19 pandemic has resulted in many changes, including restrictions on indoor gatherings and visitation to residential aged care facilities, hospitals and certain communities. Coupled with potential restrictions imposed by health services and academic institutions, these changes may significantly impact the conduct of population health research. However, the continuance of population health research is beneficial for the provision of health services and sometimes imperative. This paper discusses the impact of COVID-19 restrictions on the conduct of population health research. This discussion unveils important ethical considerations, as well as potential impacts on recruitment methods, face-to-face data collection, data quality and validity. In addition, this paper explores potential recruitment and data collection methods that could replace face-to-face methods. The discussion is accompanied by reflections on the challenges experienced by the authors in their own research at an oral health service during the COVID-19 pandemic and alternative methods that were utilised in place of face-to-face methods. This paper concludes that, although COVID-19 presents challenges to the conduct of population health research, there is a range of alternative methods to face-to-face recruitment and data collection. These alternative methods should be considered in light of project aims to ensure data quality is not compromised.

Download Full-text

Enhancing Survey Data Collection Among Youth and Adults

CIN Computers Informatics Nursing ◽

10.1097/00024665-200409000-00004 ◽

2004 ◽

Vol 22 (5) ◽

pp. 255-265 ◽

Cited By ~ 20

Author(s):

JAMES A. BOBULA ◽

LORI S. ANDERSON ◽

SUSAN K. RIESCH ◽

JANIE CANTY-MITCHELL ◽

ANGELA DUNCAN ◽

...

Keyword(s):

Data Collection ◽

Survey Data

Download Full-text

Cognitive load reduction strategies in questionnaire design

International Journal of Market Research ◽

10.1177/1470785320986797 ◽

2021 ◽

pp. 147078532098679

Author(s):

Kylie Brosnan ◽

Bettina Grün ◽

Sara Dolnicar

Keyword(s):

Cognitive Load ◽

Data Quality ◽

Survey Data ◽

Task Difficulty ◽

Questionnaire Design ◽

Load Reduction ◽

The Past ◽

Load Theory ◽

Learning Tasks ◽

Reduction Strategies

Survey data quality suffers when respondents have difficulty completing complex tasks in questionnaires. Cognitive load theory informed the development of strategies for educators to reduce the cognitive load of learning tasks. We investigate whether these cognitive load reduction strategies can be used in questionnaire design to reduce task difficulty and, in so doing, improve survey data quality. We find that this is not the case and conclude that some of the traditional survey answer formats, such as grid questions, which have been criticized in the past lead to equally good data and do not frustrate respondents more than alternative formats.

Download Full-text

Knowledge of Contraceptives: An Assessment of World Fertility Survey Data Collection Procedures

Population Studies ◽

10.2307/2174659 ◽

1981 ◽

Vol 35 (3) ◽

pp. 357 ◽

Cited By ~ 1

Author(s):

M. Vaessen

Keyword(s):

Data Collection ◽

Survey Data ◽

Fertility Survey ◽

World Fertility Survey

Download Full-text