Total Error in a Big Data World: Adapting the TSE Framework to Big Data

2020 ◽  
Vol 8 (1) ◽  
pp. 89-119
Author(s):  
Ashley Amaya ◽  
Paul P Biemer ◽  
David Kinyon

Abstract While Big Data offers a potentially less expensive, less burdensome, and more timely alternative to survey data for producing a variety of statistics, it is not without error. The AAPOR Task Force on Big Data and others have called for researchers to evaluate the quality of Big Data using an approach similar to the total survey error (TSE) framework. However, differences in the construction of, access to, and overall data structure between survey data and Big Data make application of TSE difficult. In this article, we seek to develop the Total Error Framework (TEF), an extension of the TSE framework, to be (1) more inclusive and applicable to many types of Big Data, (2) comprehensive in that it considers “total” error, and (3) unified in that it allows researchers to compare errors in Big Data to errors in survey data. After outlining this framework, we then illustrate an application of TEF by comparing error in housing unit area (square footage) estimates collected in a survey (the 2015 Residential Energy Consumption Survey [RECS]) to those estimates found in three Big Data databases (Zillow.com, Acxiom, and CoreLogic).

1982 ◽  
Vol 46 (2) ◽  
pp. 114-123 ◽  
Author(s):  
Henry Assael ◽  
John Keon

To increase the validity and reliability of survey data, one must minimize total error and its components, sampling and nonsampling error. This article examines and empirically compares the components of total survey error for several research designs and data collection methods. The consistent finding is that nonsampling error is the major contributor to total survey error, while random sampling error is minimal. On this basis guidelines for improving quality of survey data are provided.


Field Methods ◽  
2020 ◽  
Vol 33 (1) ◽  
pp. 68-84
Author(s):  
Rachel Harter ◽  
Katherine B. Morton ◽  
Ashley Amaya ◽  
Derick Brown

The literature has no standard method for estimating the coverage of area probability segments in address-based frames. Versatility is desirable for different study needs, but standardization improves comparability. Many segment estimates are simple ratios of counts of frame addresses to control totals, or net coverage ratios. Challenges to segment ratios include geocoding error, outdated control totals, errors in the address frame, and systematic exclusion of types of addresses. We tested various net coverage ratios on segments selected for the 2015 Residential Energy Consumption Survey, and we share our results and recommendations for using net coverage ratios for estimating coverage for segments.


Sign in / Sign up

Export Citation Format

Share Document