Identifying Data Quality Dimensions for Person-Generated Wearable Device Data: A Multi-Method Study (Preprint)
BACKGROUND There is a growing interest in using person-generated wearable device data for biomedical research, but concerns in the quality of data such as missing or incorrect data exists. This emphasizes the importance of assessing data quality prior to conducting research. In order to perform data quality assessments, it is essential to define what data quality means for person-generated wearable device data by identifying data quality dimensions. OBJECTIVE The goal of this study was to identify data quality dimensions for person-generated wearable device data for research purposes. METHODS Study was conducted in three phases: (1) literature review, (2) survey, and (3) focus group discussion. Literature review was conducted following the PRISMA guideline to identify factors affecting data quality and its associated data quality challenges. In addition, a survey was conducted to confirm and complement results from the literature review, and to understand researchers’ perception on data quality dimensions that were previously identified as dimensions for the secondary use of electronic health record (EHR) data. The survey was sent out to researchers with experience in analyzing wearable device data. Focus group discussion sessions were conducted with domain experts to derive data quality dimensions for person-generated wearable device data. Based on the results from the literature review and survey, a facilitator proposed potential data quality dimensions relevant to person-generated wearable device data, and the domain experts accepted or rejected the suggested dimensions. RESULTS Nineteen studies were included in the literature review. Three major themes emerged: device- and technical-related, user-related, and data governance-related factors. Associated data quality problems were incomplete data, incorrect data, and heterogeneous data. Twenty respondents answered the survey. Major data quality challenges faced by researchers were completeness, accuracy, and plausibility. The importance ratings on data quality dimensions in an existing framework showed that dimensions for secondary use of EHR data is applicable to person-generated wearable device data. There were three focus group sessions with domain experts in data quality and wearable device research. The experts concluded that intrinsic data quality features such as conformance, completeness, and plausibility, and contextual/fitness-for-use data quality features such as completeness (breadth and density) and temporal data granularity are important data quality dimensions for assessing person-generated wearable device data for research purposes. CONCLUSIONS In this study, intrinsic and contextual/fitness-for-use data quality dimensions for person-generated wearable device data were identified. The dimensions were adapted from data quality terminologies and frameworks for the secondary use of EHR data with a few modifications. Further research on how data quality can be assessed in regards to each dimension is needed.