scholarly journals Precision Assessment of COVID-19 Phenotypes Using Large-Scale Clinic Visit Audio Recordings: Harnessing the Power of Patient Voice

10.2196/20545 ◽  
2021 ◽  
Vol 23 (2) ◽  
pp. e20545
Author(s):  
Paul J Barr ◽  
James Ryan ◽  
Nicholas C Jacobson

COVID-19 cases are exponentially increasing worldwide; however, its clinical phenotype remains unclear. Natural language processing (NLP) and machine learning approaches may yield key methods to rapidly identify individuals at a high risk of COVID-19 and to understand key symptoms upon clinical manifestation and presentation. Data on such symptoms may not be accurately synthesized into patient records owing to the pressing need to treat patients in overburdened health care settings. In this scenario, clinicians may focus on documenting widely reported symptoms that indicate a confirmed diagnosis of COVID-19, albeit at the expense of infrequently reported symptoms. While NLP solutions can play a key role in generating clinical phenotypes of COVID-19, they are limited by the resulting limitations in data from electronic health records (EHRs). A comprehensive record of clinic visits is required—audio recordings may be the answer. A recording of clinic visits represents a more comprehensive record of patient-reported symptoms. If done at scale, a combination of data from the EHR and recordings of clinic visits can be used to power NLP and machine learning models, thus rapidly generating a clinical phenotype of COVID-19. We propose the generation of a pipeline extending from audio or video recordings of clinic visits to establish a model that factors in clinical symptoms and predict COVID-19 incidence. With vast amounts of available data, we believe that a prediction model can be rapidly developed to promote the accurate screening of individuals at a high risk of COVID-19 and to identify patient characteristics that predict a greater risk of a more severe infection. If clinical encounters are recorded and our NLP model is adequately refined, benchtop virologic findings would be better informed. While clinic visit recordings are not the panacea for this pandemic, they are a low-cost option with many potential benefits, which have recently begun to be explored.

2020 ◽  
Author(s):  
Paul J Barr ◽  
James Ryan ◽  
Nicholas C Jacobson

UNSTRUCTURED The novel coronavirus (SARS-CoV-2) and its related disease, COVID-19, are exponentially increasing across the world, yet there is still uncertainty about the clinical phenotype. Natural Language Processing (NLP) and machine learning may hold one key to quickly identify individuals at high risk for COVID-19 and understand key symptoms in its clinical manifestation and presentation. In healthcare, such data often come the medical record, yet when overburdened, clinicians may focus on documenting widely reported symptoms that appear to confirm the diagnosis of COVID-19, at the expense of infrequently reported symptoms. A comprehensive record of the clinic visit is required—an audio recording may be the answer. If done at scale, a combination of data from the EHR and recordings of clinic visits can be used to power NLP and machine learning models, quickly creating a clinical phenotype of COVID-19. We propose the creation of a pipeline from the audio/video recording of clinic visits to the clinical symptomatology model and prediction of COVID-19 infection. With vast amounts of data available, we believe a prediction model can be quickly developed that could promote the accurate screening of individuals at risk of COVID-19 and identify patient characteristics predicting a greater risk of a more severe infection. If clinical encounters are recorded and our NLP is adequately refined, then benchtop-virology will be better informed and risk of spread reduced. While recordings of clinic visits are not the panacea to this pandemic, they are a low cost option with many potential benefits that have only just begun to be explored.


2011 ◽  
Vol 29 (8) ◽  
pp. 1029-1035 ◽  
Author(s):  
Donna L. Berry ◽  
Brent A. Blumenstein ◽  
Barbara Halpenny ◽  
Seth Wolpin ◽  
Jesse R. Fann ◽  
...  

Purpose Although patient-reported cancer symptoms and quality-of-life issues (SQLIs) have been promoted as essential to a comprehensive assessment, efficient and efficacious methods have not been widely tested in clinical settings. The purpose of this trial was to determine the effect of the Electronic Self-Report Assessment–Cancer (ESRA-C) on the likelihood of SQLIs discussed between clinicians and patients with cancer in ambulatory clinic visits. Secondary objectives included comparison of visit duration between groups and usefulness of the ESRA-C as reported by clinicians. Patients and Methods This randomized controlled trial was conducted in 660 patients with various cancer diagnoses and stages at two institutions of a comprehensive cancer center. Patient-reported SQLIs were automatically displayed on a graphical summary and provided to the clinical team before an on-treatment visit (n = 327); in the control group, no summary was provided (n = 333). SQLIs were scored for level of severity or distress. One on-treatment clinic visit was audio recorded for each participant and then scored for discussion of each SQLI. We hypothesized that problematic SQLIs would be discussed more often when the intervention was delivered to the clinicians. Results The likelihood of SQLIs being discussed differed by randomized group and depended on whether an SQLI was first reported as problematic (P = .032). Clinic visits were similar with regard to duration between groups, and clinicians reported the summary as useful. Conclusion The ESRA-C is the first electronic self-report application to increase discussion of SQLIs in a US randomized clinical trial.


Author(s):  
Ekaterina Kochmar ◽  
Dung Do Vu ◽  
Robert Belfer ◽  
Varun Gupta ◽  
Iulian Vlad Serban ◽  
...  

AbstractIntelligent tutoring systems (ITS) have been shown to be highly effective at promoting learning as compared to other computer-based instructional approaches. However, many ITS rely heavily on expert design and hand-crafted rules. This makes them difficult to build and transfer across domains and limits their potential efficacy. In this paper, we investigate how feedback in a large-scale ITS can be automatically generated in a data-driven way, and more specifically how personalization of feedback can lead to improvements in student performance outcomes. First, in this paper we propose a machine learning approach to generate personalized feedback in an automated way, which takes individual needs of students into account, while alleviating the need of expert intervention and design of hand-crafted rules. We leverage state-of-the-art machine learning and natural language processing techniques to provide students with personalized feedback using hints and Wikipedia-based explanations. Second, we demonstrate that personalized feedback leads to improved success rates at solving exercises in practice: our personalized feedback model is used in , a large-scale dialogue-based ITS with around 20,000 students launched in 2019. We present the results of experiments with students and show that the automated, data-driven, personalized feedback leads to a significant overall improvement of 22.95% in student performance outcomes and substantial improvements in the subjective evaluation of the feedback.


2021 ◽  
Author(s):  
Xinxu Shen ◽  
Troy Houser ◽  
David Victor Smith ◽  
Vishnu P. Murty

The use of naturalistic stimuli, such as narrative movies, is gaining popularity in many fields, characterizing memory, affect, and decision-making. Narrative recall paradigms are often used to capture the complexity and richness of memory for naturalistic events. However, scoring narrative recalls is time-consuming and prone to human biases. Here, we show the validity and reliability of using a natural language processing tool, the Universal Sentence Encoder (USE), to automatically score narrative recall. We compared the reliability in scoring made between two independent raters (i.e., hand-scored) and between our automated algorithm and individual raters (i.e., automated) on trial-unique, video clips of magic tricks. Study 1 showed that our automated segmentation approaches yielded high reliability and reflected measures yielded by hand-scoring, and further that the results using USE outperformed another popular natural language processing tool, GloVe. In study two, we tested whether our automated approach remained valid when testing individual’s varying on clinically-relevant dimensions that influence episodic memory, age and anxiety. We found that our automated approach was equally reliable across both age groups and anxiety groups, which shows the efficacy of our approach to assess narrative recall in large-scale individual difference analysis. In sum, these findings suggested that machine learning approaches implementing USE are a promising tool for scoring large-scale narrative recalls and perform individual difference analysis for research using naturalistic stimuli.


2020 ◽  
Vol 38 (15_suppl) ◽  
pp. 2063-2063
Author(s):  
Sarah Tressel Gary ◽  
Nadeeka Dias ◽  
Elisa Conrad ◽  
Kenneth G Faulkner

2063 Background: Patient-reported outcomes (PRO) and electronic PRO (ePRO) play an important role in the development and approval of cancer products. Regulatory agencies are encouraging the inclusion of PRO-based endpoints that are indicative of clinical benefit in terms of patient symptoms and overall quality of life (QOL). Compliance with completion of ePRO assessments is an important component for obtaining accurate and high-quality data when conducting clinical trials. Traditionally, ePRO data in oncology trials has been collected mainly at clinic visits due to concerns over poor compliance at home. However, since symptoms and QOL can vary widely through a treatment course, it is often necessary to collect ePRO data more frequently in between clinic visits. It has been hypothesized that home completion, length of time in a study, and number of assessments may affect compliance. Methods: To address this hypothesis, ePRO compliance data was analyzed from two clinical studies in prostate cancer. Both studies used a handheld smartphone that contained an application to collect ePRO data. At the randomization visit, subjects completed ePRO assessments in clinic (2-3 questionnaires). Subsequently, all assessments were completed at home, including a daily diary and 1-4 questionnaires completed every 4-8 weeks for up to 14 months. Compliance was calculated as the number of assessments received divided by the number of assessments expected in a given assessment period. To evaluate assessment burden, each assessment period was categorized as requiring a lower number (daily diary and 1 questionnaire) or higher number (daily diary and 2-4 questionnaires) of assessments. Results: A total of 1,040 patients were included in the analysis. Overall compliance at the single clinic visit was 100%, which was expected since it was a required randomization visit. Overall compliance at home over 14 months was 80%. Compliance ranged from 78% to 89% over the duration of the studies, with no effect of time in the study on compliance. Compliance remained high even as patient numbers declined. Compliance when patients were required to complete a lower number of assessments (80%) was similar to compliance when patients were required to complete a higher number of assessments (79%). Compliance by region varied from 72% (Middle East) to 87% (Asia and Eastern Europe). Conclusions: The collection of ePRO at home provided high compliance that did not vary with length of time in the study or due to assessment burden. At home ePRO assessments provide an effective and feasible approach for recording symptoms and QOL in prostate cancer patients.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Koen I. Neijenhuijs ◽  
Carel F. W. Peeters ◽  
Henk van Weert ◽  
Pim Cuijpers ◽  
Irma Verdonck-de Leeuw

Abstract Purpose Knowledge regarding symptom clusters may inform targeted interventions. The current study investigated symptom clusters among cancer survivors, using machine learning techniques on a large data set. Methods Data consisted of self-reports of cancer survivors who used a fully automated online application ‘Oncokompas’ that supports them in their self-management. This is done by 1) monitoring their symptoms through patient reported outcome measures (PROMs); and 2) providing a personalized overview of supportive care options tailored to their scores, aiming to reduce symptom burden and improve health-related quality of life. In the present study, data on 26 generic symptoms (physical and psychosocial) were used. Results of the PROM of each symptom are presented to the user as a no well-being risk, moderate well-being risk, or high well-being risk score. Data of 1032 cancer survivors were analysed using Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) on high risk scores and moderate-to-high risk scores separately. Results When analyzing the high risk scores, seven clusters were extracted: one main cluster which contained most frequently occurring physical and psychosocial symptoms, and six subclusters with different combinations of these symptoms. When analyzing moderate-to-high risk scores, three clusters were extracted: two main clusters were identified, which separated physical symptoms (and their consequences) and psycho-social symptoms, and one subcluster with only body weight issues. Conclusion There appears to be an inherent difference on the co-occurrence of symptoms dependent on symptom severity. Among survivors with high risk scores, the data showed a clustering of more connections between physical and psycho-social symptoms in separate subclusters. Among survivors with moderate-to-high risk scores, we observed less connections in the clustering between physical and psycho-social symptoms.


2020 ◽  
Vol 63 (1) ◽  
Author(s):  
S. S. Haas ◽  
G. E. Doucet ◽  
S. Garg ◽  
S. N. Herrera ◽  
C. Sarac ◽  
...  

Abstract Background. Abnormalities in the semantic and syntactic organization of speech have been reported in individuals at clinical high-risk (CHR) for psychosis. The current study seeks to examine whether such abnormalities are associated with changes in brain structure and functional connectivity in CHR individuals. Methods. Automated natural language processing analysis was applied to speech samples obtained from 46 CHR and 22 healthy individuals. Brain structural and resting-state functional imaging data were also acquired from all participants. Sparse canonical correlation analysis (sCCA) was used to ascertain patterns of covariation between linguistic features, clinical symptoms, and measures of brain morphometry and functional connectivity related to the language network. Results. In CHR individuals, we found a significant mode of covariation between linguistic and clinical features (r = 0.73; p = 0.003), with negative symptoms and bizarre thinking covarying mostly with measures of syntactic complexity. In the entire sample, separate sCCAs identified a single mode of covariation linking linguistic features with brain morphometry (r = 0.65; p = 0.05) and resting-state network connectivity (r = 0.63; p = 0.01). In both models, semantic and syntactic features covaried with brain structural and functional connectivity measures of the language network. However, the contribution of diagnosis to both models was negligible. Conclusions. Syntactic complexity appeared sensitive to prodromal symptoms in CHR individuals while the patterns of brain-language covariation seemed preserved. Further studies in larger samples are required to establish the reproducibility of these findings.


2018 ◽  
Vol 47 (7) ◽  
pp. 451-464 ◽  
Author(s):  
Sean Kelly ◽  
Andrew M. Olney ◽  
Patrick Donnelly ◽  
Martin Nystrand ◽  
Sidney K. D’Mello

Analyzing the quality of classroom talk is central to educational research and improvement efforts. In particular, the presence of authentic teacher questions, where answers are not predetermined by the teacher, helps constitute and serves as a marker of productive classroom discourse. Further, authentic questions can be cultivated to improve teaching effectiveness and consequently student achievement. Unfortunately, current methods to measure question authenticity do not scale because they rely on human observations or coding of teacher discourse. To address this challenge, we set out to use automatic speech recognition, natural language processing, and machine learning to train computers to detect authentic questions in real-world classrooms automatically. Our methods were iteratively refined using classroom audio and human-coded observational data from two sources: (a) a large archival database of text transcripts of 451 observations from 112 classrooms; and (b) a newly collected sample of 132 high-quality audio recordings from 27 classrooms, obtained under technical constraints that anticipate large-scale automated data collection and analysis. Correlations between human-coded and computer-coded authenticity at the classroom level were sufficiently high ( r = .602 for archival transcripts and .687 for audio recordings) to provide a valuable complement to human coding in research efforts.


2021 ◽  
Vol 15 ◽  
Author(s):  
Nora Hollenstein ◽  
Cedric Renggli ◽  
Benjamin Glaus ◽  
Maria Barrett ◽  
Marius Troendle ◽  
...  

Until recently, human behavioral data from reading has mainly been of interest to researchers to understand human cognition. However, these human language processing signals can also be beneficial in machine learning-based natural language processing tasks. Using EEG brain activity for this purpose is largely unexplored as of yet. In this paper, we present the first large-scale study of systematically analyzing the potential of EEG brain activity data for improving natural language processing tasks, with a special focus on which features of the signal are most beneficial. We present a multi-modal machine learning architecture that learns jointly from textual input as well as from EEG features. We find that filtering the EEG signals into frequency bands is more beneficial than using the broadband signal. Moreover, for a range of word embedding types, EEG data improves binary and ternary sentiment classification and outperforms multiple baselines. For more complex tasks such as relation detection, only the contextualized BERT embeddings outperform the baselines in our experiments, which raises the need for further research. Finally, EEG data shows to be particularly promising when limited training data is available.


2021 ◽  
Author(s):  
Tanya Nijhawan ◽  
Girija Attigeri ◽  
Ananthakrishna T

Abstract Cyberspace is a vast soapbox for people to post anything that they witness in their day-to-day lives. Subsequently, it can be used as a very effective tool in detecting the stress levels of an individual based on the posts and comments shared by him/her on social networking platforms. We leverage large-scale datasets with tweets to successfully accomplish sentiment analysis with the aid of machine learning algorithms. We take the help of a capable deep learning pre-trained model called BERT to solve the problems which come with sentiment classification. The BERT model outperforms a lot of other well-known models for this job without any sophisticated architecture. We also adopted Latent Dirichlet Allocation which is an unsupervised machine learning method that’s skilled in scanning a group of documents, recognizing the word and phrase patterns within them, and gathering word groups and alike expressions that most precisely illustrate a set of documents. This helps us predict which topic is linked to the textual data. With the aid of the models suggested, we will be able to detect the emotion of users online. We are primarily working with Twitter data because Twitter is a website where people express their thoughts often. In conclusion, this proposal is for the well- being of one’s mental health. The results are evaluated using various metric at macro and micro level and indicate that the trained model detects the status of emotions bases on social interactions.


Sign in / Sign up

Export Citation Format

Share Document