scholarly journals Understanding Psychiatric Illness Through Natural Language Processing (UNDERPIN): Rationale, Design, and Methodology

Author(s):  
Taishiro Kishimoto ◽  
Hironobu Nakamura ◽  
Yoshinobu Kano ◽  
Yoko Eguchi ◽  
Momoko Kitazawa ◽  
...  

AbstractIntroductionPsychiatric disorders are diagnosed according to diagnostic criteria such as the DSM-5 and ICD-11. Basically, psychiatrists extract symptoms and make a diagnosis by conversing with patients. However, such processes often lack objectivity. In contrast, specific linguistic features can be observed in some psychiatric disorders, such as a loosening of associations in schizophrenia. The purposes of the present study are to quantify the language features of psychiatric disorders and neurocognitive disorders using natural language processing and to identify features that differentiate disorders from one another and from healthy subjects.MethodsThis study will have a multi-center prospective design. Major depressive disorder, bipolar disorder, schizophrenia, anxiety disorder including obsessive compulsive disorder and, major and minor neurocognitive disorders, as well as healthy subjects will be recruited. A psychiatrist or psychologist will conduct 30-to-60-min interviews with each participant and these interviews will be recorded using a microphone headset. In addition, the severity of disorders will be assessed using clinical rating scales. Data will be collected from each participant at least twice during the study period and up to a maximum of five times.DiscussionThe overall goal of this proposed study, the Understanding Psychiatric Illness Through Natural Language Processing (UNDERPIN), is to develop objective and easy-to-use biomarkers for diagnosing and assessing the severity of each psychiatric disorder using natural language processing. As of August 2021, we have collected a total of >900 datasets from >350 participants. To the best of our knowledge, this data sample is one of the largest in this field.Trial registrationUMIN000032141, University Hospital Medical Information Network (UMIN).

10.2196/23230 ◽  
2021 ◽  
Vol 9 (8) ◽  
pp. e23230
Author(s):  
Pei-Fu Chen ◽  
Ssu-Ming Wang ◽  
Wei-Chih Liao ◽  
Lu-Cheng Kuo ◽  
Kuan-Chih Chen ◽  
...  

Background The International Classification of Diseases (ICD) code is widely used as the reference in medical system and billing purposes. However, classifying diseases into ICD codes still mainly relies on humans reading a large amount of written material as the basis for coding. Coding is both laborious and time-consuming. Since the conversion of ICD-9 to ICD-10, the coding task became much more complicated, and deep learning– and natural language processing–related approaches have been studied to assist disease coders. Objective This paper aims at constructing a deep learning model for ICD-10 coding, where the model is meant to automatically determine the corresponding diagnosis and procedure codes based solely on free-text medical notes to improve accuracy and reduce human effort. Methods We used diagnosis records of the National Taiwan University Hospital as resources and apply natural language processing techniques, including global vectors, word to vectors, embeddings from language models, bidirectional encoder representations from transformers, and single head attention recurrent neural network, on the deep neural network architecture to implement ICD-10 auto-coding. Besides, we introduced the attention mechanism into the classification model to extract the keywords from diagnoses and visualize the coding reference for training freshmen in ICD-10. Sixty discharge notes were randomly selected to examine the change in the F1-score and the coding time by coders before and after using our model. Results In experiments on the medical data set of National Taiwan University Hospital, our prediction results revealed F1-scores of 0.715 and 0.618 for the ICD-10 Clinical Modification code and Procedure Coding System code, respectively, with a bidirectional encoder representations from transformers embedding approach in the Gated Recurrent Unit classification model. The well-trained models were applied on the ICD-10 web service for coding and training to ICD-10 users. With this service, coders can code with the F1-score significantly increased from a median of 0.832 to 0.922 (P<.05), but not in a reduced interval. Conclusions The proposed model significantly improved the F1-score but did not decrease the time consumed in coding by disease coders.


2017 ◽  
Vol 22 (1) ◽  
pp. 7-31 ◽  
Author(s):  
Alexandra Pomares Quimbaya ◽  
Rafael A. Gonzalez ◽  
Oscar Muñoz ◽  
Olga Milena García ◽  
Wilson Ricardo Bohorquez

Objective: Electronic medical records (EMR) typically contain both structured attributes as well as narrative text. The usefulness of EMR for research and administration is hampered by the difficulty in automatically analyzing their narrative portions. Accordingly, this paper proposes SPIRE, a strategy for prioritizing EMR, using natural language processing in combination with analysis of structured data, in order to identify and rank EMR that match specific queries from clinical researchers and health administrators. Materials and Methods: The resulting software tool was evaluated technically and validated with three cases (heart failure, pulmonary hypertension and diabetes mellitus) compared against expert obtained results. Results and Discussion: Our preliminary results show high sensitivity (70%, 82% and 87% respectively) and specificity (85%, 73.7% and 87.5%) in the resulting set of records. The AUC was between 0.84 and 0.9. Conclusions: SPIRE was successfully implemented and used in the context of a university hospital information system, enabling clinical researchers to obtain prioritized EMR to solve their information needs through collaborative search templates with faster and more accurate results than other existing methods.


2020 ◽  
Vol 41 (Supplement_2) ◽  
Author(s):  
P Brekke ◽  
I Pilan ◽  
H Husby ◽  
T Gundersen ◽  
F.A Dahl ◽  
...  

Abstract Background Syncope is a commonly occurring presenting symptom in emergency departments. While the majority of episodes are benign, syncope is associated with worse prognosis in hypertrophic cardiomyopathy, arrhythmia syndromes, heart failure, aortic stenosis and coronary heart disease. Flagging documented syncope in these patients may be crucial to management decisions. Previous studies show that the International Classification of Diseases (ICD) codes for syncope have a sensitivity of around 0.63, leading to a large number of false negatives if patient identification is based on administrative codes. Thus, in order to provide data-driven, clinical decision support, and to improve identification of patient cohorts for research, better tools are needed. A recent study manually annotated more than 30.000 patient records in order to develop a natural language processing (NLP) tool, which achieved a sensitivity of 92.2%. Since access to medical records and annotation resources is limited, we aimed to investigate whether an unsupervised machine learning and NLP approach with no manual input could achieve similar performance. Methods Our data was admission notes for adult patients admitted between 2005 and 2016 at a large university hospital in Norway. 500 records from patients with, and 500 without a “R55 Syncope” ICD code at discharge were drawn at random. R55 code was considered “ground truth”. Headers containing information about tentative diagnoses were removed from the notes, when present, using regular expressions. The dataset was divided into 70%/15%/15% subsets for training, validation and testing. Baseline identification was calculated by a simple lexical matching using the term “synkope”. We evaluated two linear classifiers, a Support Vector Machine (SVM) and a Linear Regression (LR) model, with a term frequency–inverse document frequency vectorizer, using a bag-of-words approach. In addition, we evaluated a simple convolutional neural network (CNN) consisting of a convolutional layer concatenating filter sizes of 3–5, max pooling and a dropout of 0.5 with randomly initialised word embeddings of 300 dimensions. Results Even a baseline regular expression model achieved a sensitivity of 78% and a specificity of 91% when classifying admission notes as belonging to the syncope class or not. The SVM model and the LR model achieved a sensitivity of 91% and 89%, respectively, and a specificity of 89% and 91%. The CNN model had a sensitivity of 95% and a specificity of 84%. Conclusion With a limited non-English dataset, common NLP and machine learning approaches were able to achieve approximately 90–95% sensitivity for the identification of admission notes related to syncope. Linear classifiers outperformed a CNN model in terms of specificity, as expected in this small dataset. The study demonstrates the feasibility of training document classifiers based on diagnostic codes in order to detect important clinical events. ROC curves for SVM and LR models Funding Acknowledgement Type of funding source: Public grant(s) – National budget only. Main funding source(s): The Research Council of Norway


2018 ◽  
Author(s):  
Oscar Nils Erik Kjell ◽  
Katarina Kjell ◽  
Danilo Garcia ◽  
Sverker Sikström

Psychological constructs, such as emotions, thoughts and attitudes are often measured by asking individuals to reply to questions using closed-ended numerical rating scales. However, when asking people about their state of mind in a natural context (“How are you?”), we receive open-ended answers using words (“fine and happy!”) and not closed-ended answers using numbers (“7”) or categories (“A lot”). Nevertheless, to date it has been difficult to objectively quantify responses to open-ended questions. We develop an approach using open-ended questions in which the responses are analyzed using natural language processing (Latent Semantic Analyses). This approach of using open-ended, semantic questions is compared with traditional rating scales in nine studies (N=92-854), including two different study paradigms. The first paradigm requires participants to describe psychological aspects of external stimuli (facial expressions) and the second paradigm involves asking participants to report their subjective well-being and mental health problems. The results demonstrate that the approach using semantic questions yields good statistical properties with competitive, or higher, validity and reliability compared with corresponding numerical rating scales. As these semantic measures are based on natural language and measure, differentiate and describe psychological constructs, they have the potential of complementing and extending traditional rating scales.


2020 ◽  
Author(s):  
Pei-Fu Chen ◽  
Ssu-Ming Wang ◽  
Wei-Chih Liao ◽  
Lu-Cheng Kuo ◽  
Kuan-Chih Chen ◽  
...  

BACKGROUND The International Classification of Diseases (ICD) code is widely used as the reference in medical system and billing purposes. However, classifying diseases into ICD codes still mainly relies on humans reading a large amount of written material as the basis for coding. Coding is both laborious and time-consuming. Since the conversion of ICD-9 to ICD-10, the coding task became much more complicated, and deep learning– and natural language processing–related approaches have been studied to assist disease coders. OBJECTIVE This paper aims at constructing a deep learning model for ICD-10 coding, where the model is meant to automatically determine the corresponding diagnosis and procedure codes based solely on free-text medical notes to improve accuracy and reduce human effort. METHODS We used diagnosis records of the National Taiwan University Hospital as resources and apply natural language processing techniques, including global vectors, word to vectors, embeddings from language models, bidirectional encoder representations from transformers, and single head attention recurrent neural network, on the deep neural network architecture to implement ICD-10 auto-coding. Besides, we introduced the attention mechanism into the classification model to extract the keywords from diagnoses and visualize the coding reference for training freshmen in ICD-10. Sixty discharge notes were randomly selected to examine the change in the F<sub>1</sub>-score and the coding time by coders before and after using our model. RESULTS In experiments on the medical data set of National Taiwan University Hospital, our prediction results revealed F<sub>1</sub>-scores of 0.715 and 0.618 for the ICD-10 Clinical Modification code and Procedure Coding System code, respectively, with a <i>bidirectional encoder representations from transformers</i> embedding approach in the Gated Recurrent Unit classification model. The well-trained models were applied on the ICD-10 web service for coding and training to ICD-10 users. With this service, coders can code with the F<sub>1</sub>-score significantly increased from a median of 0.832 to 0.922 (<i>P</i>&lt;.05), but not in a reduced interval. CONCLUSIONS The proposed model significantly improved the F<sub>1</sub>-score but did not decrease the time consumed in coding by disease coders.


Author(s):  
François Grosjean

The author and his family went back to Europe for good, and had to acculturate to a new culture. His boys needed a bit of help during their first years but then things worked out well. The author set up his laboratory and started collaborating with firms involved in natural language processing (NLP). One of the main projects he worked on with his team was an English writing tool and grammar checker for French speakers. He also developed a long-term partnership with the Lausanne University Hospital (CHUV). The author explains how he helped students do experimental research in his laboratory. The chapter ends with some statistics showing how successful the laboratory was over a span of twenty years.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
David Chandran ◽  
Deborah Ahn Robbins ◽  
Chin-Kuo Chang ◽  
Hitesh Shetty ◽  
Jyoti Sanyal ◽  
...  

Abstract Obsessive and Compulsive Symptoms (OCS) or Obsessive Compulsive Disorder (OCD) in the context of schizophrenia or related disorders are of clinical importance as these are associated with a range of adverse outcomes. Natural Language Processing (NLP) applied to Electronic Health Records (EHRs) presents an opportunity to create large datasets to facilitate research in this area. This is a challenging endeavour however, because of the wide range of ways in which these symptoms are recorded, and the overlap of terms used to describe OCS with those used to describe other conditions. We developed an NLP algorithm to extract OCS information from a large mental healthcare EHR data resource at the South London and Maudsley NHS Foundation Trust using its Clinical Record Interactive Search (CRIS) facility. We extracted documents from individuals who had received a diagnosis of schizophrenia, schizoaffective disorder, or bipolar disorder. These text documents, annotated by human coders, were used for developing and refining the NLP algorithm (600 documents) with an additional set reserved for final validation (300 documents). The developed NLP algorithm utilized a rules-based approach to identify each of symptoms associated with OCS, and then combined them to determine the overall number of instances of OCS. After its implementation, the algorithm was shown to identify OCS with a precision and recall (with 95% confidence intervals) of 0.77 (0.65–0.86) and 0.67 (0.55–0.77) respectively. The development of this application demonstrated the potential to extract complex symptomatic data from mental healthcare EHRs using NLP to facilitate further analyses of these clinical symptoms and their relevance for prognosis and intervention response.


2021 ◽  
Vol 3 ◽  
Author(s):  
Aurelie Mascio ◽  
Robert Stewart ◽  
Riley Botelle ◽  
Marcus Williams ◽  
Luwaiza Mirza ◽  
...  

Background: Cognitive impairments are a neglected aspect of schizophrenia despite being a major factor of poor functional outcome. They are usually measured using various rating scales, however, these necessitate trained practitioners and are rarely routinely applied in clinical settings. Recent advances in natural language processing techniques allow us to extract such information from unstructured portions of text at a large scale and in a cost effective manner. We aimed to identify cognitive problems in the clinical records of a large sample of patients with schizophrenia, and assess their association with clinical outcomes.Methods: We developed a natural language processing based application identifying cognitive dysfunctions from the free text of medical records, and assessed its performance against a rating scale widely used in the United Kingdom, the cognitive component of the Health of the Nation Outcome Scales (HoNOS). Furthermore, we analyzed cognitive trajectories over the course of patient treatment, and evaluated their relationship with various socio-demographic factors and clinical outcomes.Results: We found a high prevalence of cognitive impairments in patients with schizophrenia, and a strong correlation with several socio-demographic factors (gender, education, ethnicity, marital status, and employment) as well as adverse clinical outcomes. Results obtained from the free text were broadly in line with those obtained using the HoNOS subscale, and shed light on additional associations, notably related to attention and social impairments for patients with higher education.Conclusions: Our findings demonstrate that cognitive problems are common in patients with schizophrenia, can be reliably extracted from clinical records using natural language processing, and are associated with adverse clinical outcomes. Harvesting the free text from medical records provides a larger coverage in contrast to neurocognitive batteries or rating scales, and access to additional socio-demographic and clinical variables. Text mining tools can therefore facilitate large scale patient screening and early symptoms detection, and ultimately help inform clinical decisions.


Sign in / Sign up

Export Citation Format

Share Document