BACKGROUND
Information related to patient medication is crucial for health care. However, up to 80% of the information resides solely in unstructured text. Manual extraction may be difficult and time-consuming. Many studies have shown the interest of natural language processing for this task but only a few on French corpus.
OBJECTIVE
We aim at developing a system to extract medication-related information from French clinical text.
METHODS
We developed a hybrid system combining an expert rule-based system (RBS), contextual word embedding (ELMo) trained on clinical notes and a deep recurrent neural network (BiLSTM-CRF). The task consists in extracting drug mentions and their related information (e.g. dosage, frequency, duration, route, condition). We manually annotated 320 clinical notes extracted from a French clinical data warehouse, to train and evaluate the model. We compared the performances of our approach to standard approaches: rule-based or machine learning only, and classic word embeddings. We evaluated the models using token level recall, precision and F-measure.
RESULTS
Models including RBS, ELMo and BiLSTM reached the best results: overall F-measure of 89.9%. F-measures per category were 95.3% for the medication name, 64.4% for the drug class mentions, 95.3% for the dosage, 92.2% for the frequency, 78.8% for the duration, and 62.2% for the condition of the intake.
CONCLUSIONS
Associating expert rules, deep contextualized embedding (ELMo) and deep neural networks improves medication information extraction. Our results reveal a synergy when associating expert knowledge and latent knowledge.