Accuracy of Wristband Fitbit Models in Assessing Sleep: Systematic Review and Meta-Analysis (Preprint)
BACKGROUND Wearable sleep monitors are of high interest to consumers and researchers because of their ability to provide estimation of sleep patterns in free-living conditions in a cost-efficient way. OBJECTIVE We conducted a systematic review of publications reporting on the performance of wristband <italic>Fitbit</italic> models in assessing sleep parameters and stages. METHODS In adherence with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement, we comprehensively searched the Cumulative Index to Nursing and Allied Health Literature (CINAHL), Cochrane, Embase, MEDLINE, PubMed, PsycINFO, and Web of Science databases using the keyword <italic>Fitbit</italic> to identify relevant publications meeting predefined inclusion and exclusion criteria. RESULTS The search yielded 3085 candidate articles. After eliminating duplicates and in compliance with inclusion and exclusion criteria, 22 articles qualified for systematic review, with 8 providing quantitative data for meta-analysis. In reference to polysomnography (PSG), nonsleep-staging <italic>Fitbit</italic> models tended to overestimate total sleep time (TST; range from approximately 7 to 67 mins; effect size=-0.51, <italic>P</italic><.001; heterogenicity: I<sup>2</sup>=8.8%, <italic>P</italic>=.36) and sleep efficiency (SE; range from approximately 2% to 15%; effect size=-0.74, <italic>P</italic><.001; heterogenicity: I<sup>2</sup>=24.0%, <italic>P</italic>=.25), and underestimate wake after sleep onset (WASO; range from approximately 6 to 44 mins; effect size=0.60, <italic>P</italic><.001; heterogenicity: I<sup>2</sup>=0%, <italic>P</italic>=.92) and there was no significant difference in sleep onset latency (SOL; <italic>P</italic>=.37; heterogenicity: I<sup>2</sup>=0%, <italic>P</italic>=.92). In reference to PSG, nonsleep-staging <italic>Fitbit</italic> models correctly identified sleep epochs with accuracy values between 0.81 and 0.91, sensitivity values between 0.87 and 0.99, and specificity values between 0.10 and 0.52. Recent-generation <italic>Fitbit</italic> models that collectively utilize heart rate variability and body movement to assess sleep stages performed better than early-generation nonsleep-staging ones that utilize only body movement. Sleep-staging <italic>Fitbit</italic> models, in comparison to PSG, showed no significant difference in measured values of WASO (<italic>P</italic>=.25; heterogenicity: I<sup>2</sup>=0%, <italic>P</italic>=.92), TST (<italic>P</italic>=.29; heterogenicity: I<sup>2</sup>=0%, <italic>P</italic>=.98), and SE (<italic>P</italic>=.19) but they underestimated SOL (<italic>P</italic>=.03; heterogenicity: I<sup>2</sup>=0%, <italic>P</italic>=.66). Sleep-staging <italic>Fitbit</italic> models showed higher sensitivity (0.95-0.96) and specificity (0.58-0.69) values in detecting sleep epochs than nonsleep-staging models and those reported in the literature for regular wrist actigraphy. CONCLUSIONS Sleep-staging <italic>Fitbit</italic> models showed promising performance, especially in differentiating wake from sleep. However, although these models are a convenient and economical means for consumers to obtain gross estimates of sleep parameters and time spent in sleep stages, they are of limited specificity and are not a substitute for PSG.