247 Automated sleep staging using wrist-worn device and deep neural networks

SLEEP ◽  
2021 ◽  
Vol 44 (Supplement_2) ◽  
pp. A100-A100
Author(s):  
Niranjan Sridhar ◽  
Atiyeh Ghoreyshi ◽  
Lance Myers ◽  
Zachary Owens

Abstract Introduction Heart rate is well-known to be modulated by sleep stages. If clinically useful sleep scoring can be performed using only cardiac rhythms, then existing medical and consumer-grade devices that can measure heart rate can enable low-cost sleep evaluations. Methods We trained a neural network which uses dilated convolutional blocks to learn both local and long range features of heart rate extracted from ECG R-wave timing to predict for every non-overlapping 30s epoch of the input the probabilities of the epoch being in one of four classes—wake, light sleep, deep sleep or REM. The largest probability is chosen as the network’s class prediction and used to form the hypnogram. We used the Sleep Heart Health Study (SHHS) and Multi-Ethnic Study of Atherosclerosis Study (MESA) and Physionet Computing in Cardiology (CinC) dataset (over 10000 nights) for training and evaluation. Then we deployed the algorithm on PPG based heart rate measured by a wrist-worn device worn by subjects in a free-living setting. Results On the held out test SHHS dataset (800 nights, 561 subjects), the overall 4-class staging accuracy was 77% and Cohen’s kappa was 0.66. On the CinC dataset (993 nights, 993 subjects), the overall 4 class accuracy was 72% and Cohen’s kappa was 0.55. The study on free-living subjects is underway and these novel results will be collated and presented upon completion. Conclusion We hope these results build more trust in automated heart rate based sleep staging and encourage further research into its clinical application in screening and diagnosis of sleep disorders. Low cost, high efficacy devices which can be used in longitudinal studies can lead to breakthroughs in clinical applications of sleep staging for early diagnosis of chronic conditions and novel treatment endpoints. Support (if any) We recently published the training/testing of the algorithm as well a population level analysis showing differences in predicted sleep stages between disease cohorts. The article was published in NPJ Digital Medicine in Aug 2020. The study on free living subjects is currently underway and these new results will be presented at the sleep conference. Preliminary results indicate high concordance with our published results.

SLEEP ◽  
2020 ◽  
Vol 43 (11) ◽  
Author(s):  
Maurice Abou Jaoude ◽  
Haoqi Sun ◽  
Kyle R Pellerin ◽  
Milena Pavlova ◽  
Rani A Sarkis ◽  
...  

Abstract Study Objectives Develop a high-performing, automated sleep scoring algorithm that can be applied to long-term scalp electroencephalography (EEG) recordings. Methods Using a clinical dataset of polysomnograms from 6,431 patients (MGH–PSG dataset), we trained a deep neural network to classify sleep stages based on scalp EEG data. The algorithm consists of a convolutional neural network for feature extraction, followed by a recurrent neural network that extracts temporal dependencies of sleep stages. The algorithm’s inputs are four scalp EEG bipolar channels (F3-C3, C3-O1, F4-C4, and C4-O2), which can be derived from any standard PSG or scalp EEG recording. We initially trained the algorithm on the MGH–PSG dataset and used transfer learning to fine-tune it on a dataset of long-term (24–72 h) scalp EEG recordings from 112 patients (scalpEEG dataset). Results The algorithm achieved a Cohen’s kappa of 0.74 on the MGH–PSG holdout testing set and cross-validated Cohen’s kappa of 0.78 after optimization on the scalpEEG dataset. The algorithm also performed well on two publicly available PSG datasets, demonstrating high generalizability. Performance on all datasets was comparable to the inter-rater agreement of human sleep staging experts (Cohen’s kappa ~ 0.75 ± 0.11). The algorithm’s performance on long-term scalp EEGs was robust over a wide age range and across common EEG background abnormalities. Conclusion We developed a deep learning algorithm that achieves human expert level sleep staging performance on long-term scalp EEG recordings. This algorithm, which we have made publicly available, greatly facilitates the use of large long-term EEG clinical datasets for sleep-related research.


2020 ◽  
Vol 3 (1) ◽  
Author(s):  
Niranjan Sridhar ◽  
Ali Shoeb ◽  
Philip Stephens ◽  
Alaa Kharbouch ◽  
David Ben Shimol ◽  
...  

Abstract Clinical sleep evaluations currently require multimodal data collection and manual review by human experts, making them expensive and unsuitable for longer term studies. Sleep staging using cardiac rhythm is an active area of research because it can be measured much more easily using a wide variety of both medical and consumer-grade devices. In this study, we applied deep learning methods to create an algorithm for automated sleep stage scoring using the instantaneous heart rate (IHR) time series extracted from the electrocardiogram (ECG). We trained and validated an algorithm on over 10,000 nights of data from the Sleep Heart Health Study (SHHS) and Multi-Ethnic Study of Atherosclerosis (MESA). The algorithm has an overall performance of 0.77 accuracy and 0.66 kappa against the reference stages on a held-out portion of the SHHS dataset for classifying every 30 s of sleep into four classes: wake, light sleep, deep sleep, and rapid eye movement (REM). Moreover, we demonstrate that the algorithm generalizes well to an independent dataset of 993 subjects labeled by American Academy of Sleep Medicine (AASM) licensed clinical staff at Massachusetts General Hospital that was not used for training or validation. Finally, we demonstrate that the stages predicted by our algorithm can reproduce previous clinical studies correlating sleep stages with comorbidities such as sleep apnea and hypertension as well as demographics such as age and gender.


2021 ◽  
Vol 12 ◽  
Author(s):  
Mingyu Fu ◽  
Yitian Wang ◽  
Zixin Chen ◽  
Jin Li ◽  
Fengguo Xu ◽  
...  

This study centers on automatic sleep staging with a single channel electroencephalography (EEG), with some significant findings for sleep staging. In this study, we proposed a deep learning-based network by integrating attention mechanism and bidirectional long short-term memory neural network (AT-BiLSTM) to classify wakefulness, rapid eye movement (REM) sleep and non-REM (NREM) sleep stages N1, N2 and N3. The AT-BiLSTM network outperformed five other networks and achieved an accuracy of 83.78%, a Cohen’s kappa coefficient of 0.766 and a macro F1-score of 82.14% on the PhysioNet Sleep-EDF Expanded dataset, and an accuracy of 81.72%, a Cohen’s kappa coefficient of 0.751 and a macro F1-score of 80.74% on the DREAMS Subjects dataset. The proposed AT-BiLSTM network even achieved a higher accuracy than the existing methods based on traditional feature extraction. Moreover, better performance was obtained by the AT-BiLSTM network with the frontal EEG derivations than with EEG channels located at the central, occipital or parietal lobe. As EEG signal can be easily acquired using dry electrodes on the forehead, our findings might provide a promising solution for automatic sleep scoring without feature extraction and may prove very useful for the screening of sleep disorders.


Circulation ◽  
2021 ◽  
Vol 143 (Suppl_1) ◽  
Author(s):  
Naghmeh Rezaei ◽  
Michael A Grandner

Introduction: Population-level objective estimates of changes in health metrics over the course of the COVID-19 pandemic are sparse. This study evaluated change in resting heart rate (RHR) determined by optical plethysmography and relationships to changes in other lifestyle health behaviors (sleep and activity). Methods: Data were obtained from N=197,988 Fitbit users who wore their heart-rate enabled Fitbit device to sleep and had detected sleep stages at least 10 days in the month of January, the baseline period; and synced their devices at least once in the last 10 days of April. In addition, potential participants needed to reside in one of 6 target cities: Chicago, Illinois; Houston, Texas; Los Angeles, California; San Francisco, California; New York City, New York; and Miami, Florida. Users who met these criteria were randomly selected. Daily RHR, sleep duration (minutes), sleep duration variability (standard deviation), bedtime, step count, and active minutes were estimated by the device. Differences between January (before the pandemic) and April (peak of stay-at-home orders) was computed. Correlations between change in RHR and change in other variables were evaluated, stratified by age and sex. Results: For all age groups, in both men and women, mean RHR declined from January to April by about 1bpm, with the highest reductions in the youngest adults (all p<1x10 -100 ). In general, across both genders and all age groups, reductions in RHR were correlated with greater sleep duration, delaying bedtime, reduced sleep variability, and more active minutes. Steps were also associated in younger (but not older) adults. Results for ages 18-29 and >=65 are displayed in the Table. Discussion: During the COVID-19 pandemic, RHR decreased robustly but very slightly. Reductions in RHR were correlated with improvements in other health behaviors (sleep and activity). Causal relationships could not be evaluated, but future studies may explore whether even small changes in health behaviors can measurably impact population RHR.


SLEEP ◽  
2019 ◽  
Vol 42 (11) ◽  
Author(s):  
Linda Zhang ◽  
Daniel Fabbri ◽  
Raghu Upender ◽  
David Kent

Abstract Study Objectives Polysomnography (PSG) scoring is labor intensive and suffers from variability in inter- and intra-rater reliability. Automated PSG scoring has the potential to reduce the human labor costs and the variability inherent to this task. Deep learning is a form of machine learning that uses neural networks to recognize data patterns by inspecting many examples rather than by following explicit programming. Methods A sleep staging classifier trained using deep learning methods scored PSG data from the Sleep Heart Health Study (SHHS). The training set was composed of 42 560 hours of PSG data from 5213 patients. To capture higher-order data, spectrograms were generated from electroencephalography, electrooculography, and electromyography data and then passed to the neural network. A holdout set of 580 PSGs not included in the training set was used to assess model accuracy and discrimination via weighted F1-score, per-stage accuracy, and Cohen’s kappa (K). Results The optimal neural network model was composed of spectrograms in the input layer feeding into convolutional neural network layers and a long short-term memory layer to achieve a weighted F1-score of 0.87 and K = 0.82. Conclusions The deep learning sleep stage classifier demonstrates excellent accuracy and agreement with expert sleep stage scoring, outperforming human agreement on sleep staging. It achieves comparable or better F1-scores, accuracy, and Cohen’s kappa compared to literature for automated sleep stage scoring of PSG epochs. Accurate automated scoring of other PSG events may eventually allow for fully automated PSG scoring.


SLEEP ◽  
2020 ◽  
Vol 43 (11) ◽  
Author(s):  
Pierrick J Arnal ◽  
Valentin Thorey ◽  
Eden Debellemaniere ◽  
Michael E Ballard ◽  
Albert Bou Hernandez ◽  
...  

Abstract Study Objectives The development of ambulatory technologies capable of monitoring brain activity during sleep longitudinally is critical for advancing sleep science. The aim of this study was to assess the signal acquisition and the performance of the automatic sleep staging algorithms of a reduced-montage dry-electroencephalographic (EEG) device (Dreem headband, DH) compared to the gold-standard polysomnography (PSG) scored by five sleep experts. Methods A total of 25 subjects who completed an overnight sleep study at a sleep center while wearing both a PSG and the DH simultaneously have been included in the analysis. We assessed (1) similarity of measured EEG brain waves between the DH and the PSG; (2) the heart rate, breathing frequency, and respiration rate variability (RRV) agreement between the DH and the PSG; and (3) the performance of the DH’s automatic sleep staging according to American Academy of Sleep Medicine guidelines versus PSG sleep experts manual scoring. Results The mean percentage error between the EEG signals acquired by the DH and those from the PSG for the monitoring of α was 15 ± 3.5%, 16 ± 4.3% for β, 16 ± 6.1% for λ, and 10 ± 1.4% for θ frequencies during sleep. The mean absolute error for heart rate, breathing frequency, and RRV was 1.2 ± 0.5 bpm, 0.3 ± 0.2 cpm, and 3.2 ± 0.6%, respectively. Automatic sleep staging reached an overall accuracy of 83.5 ± 6.4% (F1 score: 83.8 ± 6.3) for the DH to be compared with an average of 86.4 ± 8.0% (F1 score: 86.3 ± 7.4) for the 5 sleep experts. Conclusions These results demonstrate the capacity of the DH to both monitor sleep-related physiological signals and process them accurately into sleep stages. This device paves the way for, large-scale, longitudinal sleep studies. Clinical Trial Registration NCT03725943.


SLEEP ◽  
2020 ◽  
Vol 43 (Supplement_1) ◽  
pp. A463-A463
Author(s):  
V Thorey ◽  
A Guillot ◽  
K El Kanbi ◽  
M Harris ◽  
P J Arnal

Abstract Introduction The development of new sleep study devices, adapted for daily use, is necessary for diagnosis of sleep disorders. However, this requires to be both suitable for daily use and capable of recording accurate electrophysiological data. This study assesses the signal acquisition of a comfortable sleep headband, using dry electrodes, and the performance of its automatic sleep staging algorithms compared to the gold-standard clinical PSG scored by 4 sleep experts. Methods 42 participants slept at a sleep center wearing both the Dreem headband (DH) and a PSG simultaneously. We measured 1) the EEG signal similarity between both devices, 2) heart rate, breathing frequency and respiration rate variability (RRV) agreement, and 3) the performance of the headband automatic sleep scoring compared to PSG sleep experts manual scoring. Results Results demonstrate a strong correlation between the EEG signals acquired by the headband and those from the PSG, and the signals acquired by the headband enable monitoring of alpha (r= 0.75 ± 0.11), beta (r= 0.74 ± 0.14), delta (r = 0.78 ± 0.16), and theta (r = 0.63 ± 0.15) frequencies during sleep. The mean absolute error for heart rate, breathing frequency, and RRV was 2.2 ± 0.8 bpm, 0.3 ± 0.2 cpm and 3.1 ± 0.4 %, respectively. Automatic Sleep Staging reached an overall accuracy of 84.1 ± 7.5% (F1 score: 83.0 ± 8.4) for the headband to be compared with an average of 86.4 ± 5.5% (F1 score: 86.5 ± 5.5) for the 4 sleep experts. Conclusion These results demonstrate the capacity of the headband to both precisely monitor sleep-related physiological signals and process them accurately into sleep stages. This device paves the way for high-quality, large-scale, longitudinal sleep studies. Support This Study has been supported by Dreem sas.


2021 ◽  
Vol 3 ◽  
Author(s):  
Zilu Liang ◽  
Mario Alberto Chapa-Martell

Consumer wearable activity trackers, such as Fitbit are widely used in ubiquitous and longitudinal sleep monitoring in free-living environments. However, these devices are known to be inaccurate for measuring sleep stages. In this study, we develop and validate a novel approach that leverages the processed data readily available from consumer activity trackers (i.e., steps, heart rate, and sleep metrics) to predict sleep stages. The proposed approach adopts a selective correction strategy and consists of two levels of classifiers. The level-I classifier judges whether a Fitbit labeled sleep epoch is misclassified, and the level-II classifier re-classifies misclassified epochs into one of the four sleep stages (i.e., light sleep, deep sleep, REM sleep, and wakefulness). Best epoch-wise performance was achieved when support vector machine and gradient boosting decision tree (XGBoost) with up sampling were used, respectively at the level-I and level-II classification. The model achieved an overall per-epoch accuracy of 0.731 ± 0.119, Cohen's Kappa of 0.433 ± 0.212, and multi-class Matthew's correlation coefficient (MMCC) of 0.451 ± 0.214. Regarding the total duration of individual sleep stage, the mean normalized absolute bias (MAB) of this model was 0.469, which is a 23.9% reduction against the proprietary Fitbit algorithm. The model that combines support vector machine and XGBoost with down sampling achieved sub-optimal per-epoch accuracy of 0.704 ± 0.097, Cohen's Kappa of 0.427 ± 0.178, and MMCC of 0.439 ± 0.180. The sub-optimal model obtained a MAB of 0.179, a significantly reduction of 71.0% compared to the proprietary Fitbit algorithm. We highlight the challenges in machine learning based sleep stage prediction with consumer wearables, and suggest directions for future research.


2019 ◽  
Author(s):  
Pierrick J. Arnal ◽  
Valentin Thorey ◽  
Michael E. Ballard ◽  
Albert Bou Hernandez ◽  
Antoine Guillot ◽  
...  

Despite the central role of sleep in our lives and the high prevalence of sleep disorders, sleep is still poorly understood. The development of ambulatory technologies capable of monitoring brain activity during sleep longitudinally is critical to advancing sleep science and facilitating the diagnosis of sleep disorders. We introduced the Dreem headband (DH) as an affordable, comfortable, and user-friendly alternative to polysomnography (PSG). The purpose of this study was to assess the signal acquisition of the DH and the performance of its embedded automatic sleep staging algorithms compared to the gold-standard clinical PSG scored by 5 sleep experts. Thirty-one subjects completed an over-night sleep study at a sleep center while wearing both a PSG and the DH simultaneously. We assessed 1) the EEG signal quality between the DH and the PSG, 2) the heart rate, breathing frequency, and respiration rate variability (RRV) agreement between the DH and the PSG, and 3) the performance of the DH’s automatic sleep staging according to AASM guidelines vs. PSG sleep experts manual scoring. Results demonstrate a strong correlation between the EEG signals acquired by the DH and those from the PSG, and the signals acquired by the DH enable monitoring of alpha (r= 0.71 ± 0.13), beta (r= 0.71 ± 0.18), delta (r = 0.76 ± 0.14), and theta (r = 0.61 ± 0.12) frequencies during sleep. The mean absolute error for heart rate, breathing frequency and RRV was 1.2 ± 0.5 bpm, 0.3 ± 0.2 cpm and 3.2 ± 0.6 %, respectively. Automatic Sleep Staging reached an overall accuracy of 83.5 ± 6.4% (F1 score : 83.8 ± 6.3) for the DH to be compared with an average of 86.4 ± 8.0% (F1 score: 86.3 ± 7.4) for the five sleep experts. These results demonstrate the capacity of the DH to both precisely monitor sleep-related physiological signals and process them accurately into sleep stages. This device paves the way for high-quality, large-scale, longitudinal sleep studies.


10.2196/14120 ◽  
2019 ◽  
Vol 7 (10) ◽  
pp. e14120 ◽  
Author(s):  
Andre Matthias Müller ◽  
Nan Xin Wang ◽  
Jiali Yao ◽  
Chuen Seng Tan ◽  
Ivan Cherh Chiet Low ◽  
...  

Background Wrist-worn activity trackers are popular, and an increasing number of these devices are equipped with heart rate (HR) measurement capabilities. However, the validity of HR data obtained from such trackers has not been thoroughly assessed outside the laboratory setting. Objective This study aimed to investigate the validity of HR measures of a high-cost consumer-based tracker (Polar A370) and a low-cost tracker (Tempo HR) in the laboratory and free-living settings. Methods Participants underwent a laboratory-based cycling protocol while wearing the two trackers and the chest-strapped Polar H10, which acted as criterion. Participants also wore the devices throughout the waking hours of the following day during which they were required to conduct at least one 10-min bout of moderate-to-vigorous physical activity (MVPA) to ensure variability in the HR signal. We extracted 10-second values from all devices and time-matched HR data from the trackers with those from the Polar H10. We calculated intraclass correlation coefficients (ICCs), mean absolute errors, and mean absolute percentage errors (MAPEs) between the criterion and the trackers. We constructed decile plots that compared HR data from Tempo HR and Polar A370 with criterion measures across intensity deciles. We investigated how many HR data points within the MVPA zone (≥64% of maximum HR) were detected by the trackers. Results Of the 57 people screened, 55 joined the study (mean age 30.5 [SD 9.8] years). Tempo HR showed moderate agreement and large errors (laboratory: ICC 0.51 and MAPE 13.00%; free-living: ICC 0.71 and MAPE 10.20%). Polar A370 showed moderate-to-strong agreement and small errors (laboratory: ICC 0.73 and MAPE 6.40%; free-living: ICC 0.83 and MAPE 7.10%). Decile plots indicated increasing differences between Tempo HR and the criterion as HRs increased. Such trend was less pronounced when considering the Polar A370 HR data. Tempo HR identified 62.13% (1872/3013) and 54.27% (5717/10,535) of all MVPA time points in the laboratory phase and free-living phase, respectively. Polar A370 detected 81.09% (2273/2803) and 83.55% (9323/11,158) of all MVPA time points in the laboratory phase and free-living phase, respectively. Conclusions HR data from the examined wrist-worn trackers were reasonably accurate in both the settings, with the Polar A370 showing stronger agreement with the Polar H10 and smaller errors. Inaccuracies increased with increasing HRs; this was pronounced for Tempo HR.


Sign in / Sign up

Export Citation Format

Share Document