Living and working in a fourth culture

Author(s):  
François Grosjean

The author and his family went back to Europe for good, and had to acculturate to a new culture. His boys needed a bit of help during their first years but then things worked out well. The author set up his laboratory and started collaborating with firms involved in natural language processing (NLP). One of the main projects he worked on with his team was an English writing tool and grammar checker for French speakers. He also developed a long-term partnership with the Lausanne University Hospital (CHUV). The author explains how he helped students do experimental research in his laboratory. The chapter ends with some statistics showing how successful the laboratory was over a span of twenty years.

2020 ◽  
Vol 4 (Supplement_1) ◽  
pp. 183-183
Author(s):  
Javad Razjouyan ◽  
Jennifer Freytag ◽  
Edward Odom ◽  
Lilian Dindo ◽  
Aanand Naik

Abstract Patient Priorities Care (PPC) is a model of care that aligns health care recommendations with priorities of older adults with multiple chronic conditions. Social workers (SW), after online training, document PPC in the patient’s electronic health record (EHR). Our goal is to identify free-text notes with PPC language using a natural language processing (NLP) model and to measure PPC adoption and effect on long term services and support (LTSS) use. Free-text notes from the EHR produced by trained SWs passed through a hybrid NLP model that utilized rule-based and statistical machine learning. NLP accuracy was validated against chart review. Patients who received PPC were propensity matched with patients not receiving PPC (control) on age, gender, BMI, Charlson comorbidity index, facility and SW. The change in LTSS utilization 6-month intervals were compared by groups with univariate analysis. Chart review indicated that 491 notes out of 689 had PPC language and the NLP model reached to precision of 0.85, a recall of 0.90, an F1 of 0.87, and an accuracy of 0.91. Within group analysis shows that intervention group used LTSS 1.8 times more in the 6 months after the encounter compared to 6 months prior. Between group analysis shows that intervention group has significant higher number of LTSS utilization (p=0.012). An automated NLP model can be used to reliably measure the adaptation of PPC by SW. PPC seems to encourage use of LTSS that may delay time to long term care placement.


10.2196/23230 ◽  
2021 ◽  
Vol 9 (8) ◽  
pp. e23230
Author(s):  
Pei-Fu Chen ◽  
Ssu-Ming Wang ◽  
Wei-Chih Liao ◽  
Lu-Cheng Kuo ◽  
Kuan-Chih Chen ◽  
...  

Background The International Classification of Diseases (ICD) code is widely used as the reference in medical system and billing purposes. However, classifying diseases into ICD codes still mainly relies on humans reading a large amount of written material as the basis for coding. Coding is both laborious and time-consuming. Since the conversion of ICD-9 to ICD-10, the coding task became much more complicated, and deep learning– and natural language processing–related approaches have been studied to assist disease coders. Objective This paper aims at constructing a deep learning model for ICD-10 coding, where the model is meant to automatically determine the corresponding diagnosis and procedure codes based solely on free-text medical notes to improve accuracy and reduce human effort. Methods We used diagnosis records of the National Taiwan University Hospital as resources and apply natural language processing techniques, including global vectors, word to vectors, embeddings from language models, bidirectional encoder representations from transformers, and single head attention recurrent neural network, on the deep neural network architecture to implement ICD-10 auto-coding. Besides, we introduced the attention mechanism into the classification model to extract the keywords from diagnoses and visualize the coding reference for training freshmen in ICD-10. Sixty discharge notes were randomly selected to examine the change in the F1-score and the coding time by coders before and after using our model. Results In experiments on the medical data set of National Taiwan University Hospital, our prediction results revealed F1-scores of 0.715 and 0.618 for the ICD-10 Clinical Modification code and Procedure Coding System code, respectively, with a bidirectional encoder representations from transformers embedding approach in the Gated Recurrent Unit classification model. The well-trained models were applied on the ICD-10 web service for coding and training to ICD-10 users. With this service, coders can code with the F1-score significantly increased from a median of 0.832 to 0.922 (P<.05), but not in a reduced interval. Conclusions The proposed model significantly improved the F1-score but did not decrease the time consumed in coding by disease coders.


2017 ◽  
Vol 22 (1) ◽  
pp. 7-31 ◽  
Author(s):  
Alexandra Pomares Quimbaya ◽  
Rafael A. Gonzalez ◽  
Oscar Muñoz ◽  
Olga Milena García ◽  
Wilson Ricardo Bohorquez

Objective: Electronic medical records (EMR) typically contain both structured attributes as well as narrative text. The usefulness of EMR for research and administration is hampered by the difficulty in automatically analyzing their narrative portions. Accordingly, this paper proposes SPIRE, a strategy for prioritizing EMR, using natural language processing in combination with analysis of structured data, in order to identify and rank EMR that match specific queries from clinical researchers and health administrators. Materials and Methods: The resulting software tool was evaluated technically and validated with three cases (heart failure, pulmonary hypertension and diabetes mellitus) compared against expert obtained results. Results and Discussion: Our preliminary results show high sensitivity (70%, 82% and 87% respectively) and specificity (85%, 73.7% and 87.5%) in the resulting set of records. The AUC was between 0.84 and 0.9. Conclusions: SPIRE was successfully implemented and used in the context of a university hospital information system, enabling clinical researchers to obtain prioritized EMR to solve their information needs through collaborative search templates with faster and more accurate results than other existing methods.


Author(s):  
Sa Wang ◽  
Hui Xu

In view of the lack of intelligent guidance in online teaching of English composition, this paper proposes an intelligent support system for English writing based on B/S mode. On the basis of vocabulary, grammar rules and other corpus, this system uses Natural Language Processing technology, which combines rule matching and probability statistics, to evaluate and optimize the efficiency of the composition. The empirical results show that the system can effectively improve the teaching direction according to the results of intelligent quantitative analysis.


2021 ◽  
Author(s):  
Taishiro Kishimoto ◽  
Hironobu Nakamura ◽  
Yoshinobu Kano ◽  
Yoko Eguchi ◽  
Momoko Kitazawa ◽  
...  

AbstractIntroductionPsychiatric disorders are diagnosed according to diagnostic criteria such as the DSM-5 and ICD-11. Basically, psychiatrists extract symptoms and make a diagnosis by conversing with patients. However, such processes often lack objectivity. In contrast, specific linguistic features can be observed in some psychiatric disorders, such as a loosening of associations in schizophrenia. The purposes of the present study are to quantify the language features of psychiatric disorders and neurocognitive disorders using natural language processing and to identify features that differentiate disorders from one another and from healthy subjects.MethodsThis study will have a multi-center prospective design. Major depressive disorder, bipolar disorder, schizophrenia, anxiety disorder including obsessive compulsive disorder and, major and minor neurocognitive disorders, as well as healthy subjects will be recruited. A psychiatrist or psychologist will conduct 30-to-60-min interviews with each participant and these interviews will be recorded using a microphone headset. In addition, the severity of disorders will be assessed using clinical rating scales. Data will be collected from each participant at least twice during the study period and up to a maximum of five times.DiscussionThe overall goal of this proposed study, the Understanding Psychiatric Illness Through Natural Language Processing (UNDERPIN), is to develop objective and easy-to-use biomarkers for diagnosing and assessing the severity of each psychiatric disorder using natural language processing. As of August 2021, we have collected a total of >900 datasets from >350 participants. To the best of our knowledge, this data sample is one of the largest in this field.Trial registrationUMIN000032141, University Hospital Medical Information Network (UMIN).


Author(s):  
Virginie Goepp ◽  
Nada Matta ◽  
Emmanuel Caillaud ◽  
Françoise Feugeas

AbstractCommunity of Practice (CoP) efficiency evaluation is a great deal in research. Indeed, having the possibility to know if a given CoP is successful or not is essential to better manage it over time. The existing approaches for efficiency evaluation are difficult and time-consuming to put into action on real CoPs. They require either to evaluate subjective constructs making the analysis unreliable, either to work out a knowledge interaction matrix that is difficult to set up. However, these approaches build their evaluation on the fact that a CoP is successful if knowledge is exchanged between the members. It is the case if there are some interactions between the actors involved in the CoP. Therefore, we propose to analyze these interactions through the exchanges of emails thanks to Natural Language Processing. Our approach is systematic and semi-automated. It requires the e-mails exchanged and the definition of the speech-acts that will be retrieved. We apply it on a real project-based CoP: the SEPOLBE research project that involves different expertise fields. It allows us to identify the CoP core group and to emphasize learning processes between members with different backgrounds (Microbiology, Electrochemistry and Civil engineering).


2020 ◽  
Vol 41 (Supplement_2) ◽  
Author(s):  
P Brekke ◽  
I Pilan ◽  
H Husby ◽  
T Gundersen ◽  
F.A Dahl ◽  
...  

Abstract Background Syncope is a commonly occurring presenting symptom in emergency departments. While the majority of episodes are benign, syncope is associated with worse prognosis in hypertrophic cardiomyopathy, arrhythmia syndromes, heart failure, aortic stenosis and coronary heart disease. Flagging documented syncope in these patients may be crucial to management decisions. Previous studies show that the International Classification of Diseases (ICD) codes for syncope have a sensitivity of around 0.63, leading to a large number of false negatives if patient identification is based on administrative codes. Thus, in order to provide data-driven, clinical decision support, and to improve identification of patient cohorts for research, better tools are needed. A recent study manually annotated more than 30.000 patient records in order to develop a natural language processing (NLP) tool, which achieved a sensitivity of 92.2%. Since access to medical records and annotation resources is limited, we aimed to investigate whether an unsupervised machine learning and NLP approach with no manual input could achieve similar performance. Methods Our data was admission notes for adult patients admitted between 2005 and 2016 at a large university hospital in Norway. 500 records from patients with, and 500 without a “R55 Syncope” ICD code at discharge were drawn at random. R55 code was considered “ground truth”. Headers containing information about tentative diagnoses were removed from the notes, when present, using regular expressions. The dataset was divided into 70%/15%/15% subsets for training, validation and testing. Baseline identification was calculated by a simple lexical matching using the term “synkope”. We evaluated two linear classifiers, a Support Vector Machine (SVM) and a Linear Regression (LR) model, with a term frequency–inverse document frequency vectorizer, using a bag-of-words approach. In addition, we evaluated a simple convolutional neural network (CNN) consisting of a convolutional layer concatenating filter sizes of 3–5, max pooling and a dropout of 0.5 with randomly initialised word embeddings of 300 dimensions. Results Even a baseline regular expression model achieved a sensitivity of 78% and a specificity of 91% when classifying admission notes as belonging to the syncope class or not. The SVM model and the LR model achieved a sensitivity of 91% and 89%, respectively, and a specificity of 89% and 91%. The CNN model had a sensitivity of 95% and a specificity of 84%. Conclusion With a limited non-English dataset, common NLP and machine learning approaches were able to achieve approximately 90–95% sensitivity for the identification of admission notes related to syncope. Linear classifiers outperformed a CNN model in terms of specificity, as expected in this small dataset. The study demonstrates the feasibility of training document classifiers based on diagnostic codes in order to detect important clinical events. ROC curves for SVM and LR models Funding Acknowledgement Type of funding source: Public grant(s) – National budget only. Main funding source(s): The Research Council of Norway


2019 ◽  
Author(s):  
Theo Araujo

Conversational agents in the form of chatbots available in messaging platforms are gaining increasing relevance in our communication environment. Based on natural language processing and generation techniques, they are built to automatically interact with users in several contexts. We present here a tool, the Conversational Agent Research Toolkit (CART), aimed at enabling researchers to create conversational agents for experimental studies. CART integrates existing APIs frequently used in practice and provides functionality that allows researchers to create and manage multiple versions of a chatbot to be used as stimuli in experimental studies. This paper provides an overview of the tool and provides a step-by-step tutorial of to design an experiment with a chatbot.


2020 ◽  
Author(s):  
Pei-Fu Chen ◽  
Ssu-Ming Wang ◽  
Wei-Chih Liao ◽  
Lu-Cheng Kuo ◽  
Kuan-Chih Chen ◽  
...  

BACKGROUND The International Classification of Diseases (ICD) code is widely used as the reference in medical system and billing purposes. However, classifying diseases into ICD codes still mainly relies on humans reading a large amount of written material as the basis for coding. Coding is both laborious and time-consuming. Since the conversion of ICD-9 to ICD-10, the coding task became much more complicated, and deep learning– and natural language processing–related approaches have been studied to assist disease coders. OBJECTIVE This paper aims at constructing a deep learning model for ICD-10 coding, where the model is meant to automatically determine the corresponding diagnosis and procedure codes based solely on free-text medical notes to improve accuracy and reduce human effort. METHODS We used diagnosis records of the National Taiwan University Hospital as resources and apply natural language processing techniques, including global vectors, word to vectors, embeddings from language models, bidirectional encoder representations from transformers, and single head attention recurrent neural network, on the deep neural network architecture to implement ICD-10 auto-coding. Besides, we introduced the attention mechanism into the classification model to extract the keywords from diagnoses and visualize the coding reference for training freshmen in ICD-10. Sixty discharge notes were randomly selected to examine the change in the F<sub>1</sub>-score and the coding time by coders before and after using our model. RESULTS In experiments on the medical data set of National Taiwan University Hospital, our prediction results revealed F<sub>1</sub>-scores of 0.715 and 0.618 for the ICD-10 Clinical Modification code and Procedure Coding System code, respectively, with a <i>bidirectional encoder representations from transformers</i> embedding approach in the Gated Recurrent Unit classification model. The well-trained models were applied on the ICD-10 web service for coding and training to ICD-10 users. With this service, coders can code with the F<sub>1</sub>-score significantly increased from a median of 0.832 to 0.922 (<i>P</i>&lt;.05), but not in a reduced interval. CONCLUSIONS The proposed model significantly improved the F<sub>1</sub>-score but did not decrease the time consumed in coding by disease coders.


2020 ◽  
Vol 7 (Supplement_1) ◽  
pp. S72-S72
Author(s):  
Brian R Lee ◽  
Alaina Linafelter ◽  
Alaina Burns ◽  
Allison Burris ◽  
Heather Jones ◽  
...  

Abstract Background Acute pharyngitis is one of the most common causes of pediatric health care visits, accounting for approximately 12 million ambulatory care visits each year. Rapid antigen detection tests (RADTs) for Group A Streptococcus (GAS) are one of the most commonly ordered tests in the ambulatory settings. Approximately 40–60% of RADTs are estimated to be inappropriate. Determination of inappropriate RADT frequently requires time-intensive chart reviews. The purpose of this study was to determine if natural language processing (NLP) can provide an accurate and automated alternative for assessing RADT inappropriateness. Methods Patients ≥ 3 years of age who received an RADT while evaluated in our EDs/UCCs between April 2018 and September 2018 were identified. A manual chart review was completed on a 10% random sample to determine the presence of sore throat or viral symptoms (i.e., conjunctivitis, rhinorrhea, cough, diarrhea, hoarse voice, and viral exanthema). Inappropriate RADT was defined as either absence of sore throat or reporting 2 or more viral symptoms. An NLP algorithm was developed independently to assign the presence/absence of symptoms and RADT inappropriateness. The NLP sensitivity/specificity was calculated using the manual chart review sample as the gold standard. Results Manual chart review was completed on 720 patients, of which 320 (44.4%) were considered to have an inappropriate RADT. When compared to the manual review, the NLP approach showed high sensitivity (se) and specificity (sp) when assigning inappropriateness (88.4% and 90.0%, respectively). Optimal sensitivity/specificity was also observed for select symptoms, including sore throat (se: 92.9%, sp: 92.5%), cough (se: 94.5%, sp: 96.5%), and rhinorrhea (se: 86.1%, sp: 95.3%). The prevalence of clinical symptoms was similar when running NLP on subsequent, independent validation sets. After validating the NLP algorithm, a long term monthly trend report was developed. Figure Inappropriate GAS RADTs Determined by NLP, June 2018-May 2020 Conclusion An NLP algorithm can accurately identify inappropriate RADT when compared to a gold standard. Manual chart review requires dozens of hours to complete. In contrast, NLP requires only a couple of minutes and offers the potential to calculate valid metrics that are easily scaled-up to help monitor comprehensive, long-term trends. Disclosures Brian R. Lee, MPH, PhD, Merck (Grant/Research Support)


Sign in / Sign up

Export Citation Format

Share Document