scholarly journals Developing a Natural Language Processing tool to identify perinatal self-harm in electronic healthcare records

PLoS ONE ◽  
2021 ◽  
Vol 16 (8) ◽  
pp. e0253809
Author(s):  
Karyn Ayre ◽  
André Bittar ◽  
Joyce Kam ◽  
Somain Verma ◽  
Louise M. Howard ◽  
...  

Background Self-harm occurring within pregnancy and the postnatal year (“perinatal self-harm”) is a clinically important yet under-researched topic. Current research likely under-estimates prevalence due to methodological limitations. Electronic healthcare records (EHRs) provide a source of clinically rich data on perinatal self-harm. Aims (1) To create a Natural Language Processing (NLP) tool that can, with acceptable precision and recall, identify mentions of acts of perinatal self-harm within EHRs. (2) To use this tool to identify service-users who have self-harmed perinatally, based on their EHRs. Methods We used the Clinical Record Interactive Search system to extract de-identified EHRs of secondary mental healthcare service-users at South London and Maudsley NHS Foundation Trust. We developed a tool that applied several layers of linguistic processing based on the spaCy NLP library for Python. We evaluated mention-level performance in the following domains: span, status, temporality and polarity. Evaluation was done against a manually coded reference standard. Mention-level performance was reported as precision, recall, F-score and Cohen’s kappa for each domain. Performance was also assessed at ‘service-user’ level and explored whether a heuristic rule improved this. We report per-class statistics for service-user performance, as well as likelihood ratios and post-test probabilities. Results Mention-level performance: micro-averaged F-score, precision and recall for span, polarity and temporality >0.8. Kappa for status 0.68, temporality 0.62, polarity 0.91. Service-user level performance with heuristic: F-score, precision, recall of minority class 0.69, macro-averaged F-score 0.81, positive LR 9.4 (4.8–19), post-test probability 69.0% (53–82%). Considering the task difficulty, the tool performs well, although temporality was the attribute with the lowest level of annotator agreement. Conclusions It is feasible to develop an NLP tool that identifies, with acceptable validity, mentions of perinatal self-harm within EHRs, although with limitations regarding temporality. Using a heuristic rule, it can also function at a service-user-level.

BJPsych Open ◽  
2021 ◽  
Vol 7 (S1) ◽  
pp. S4-S5
Author(s):  
Karyn Ayre ◽  
Andre Bittar ◽  
Rina Dutta ◽  
Somain Verma ◽  
Joyce Kam

Aims1.To generate a Natural Language Processing (NLP) application that can identify mentions of perinatal self-harm among electronic healthcare records (EHRs)2.To use this application to estimate the prevalence of perinatal self-harm within a data-linkage cohort of women accessing secondary mental healthcare during the perinatal period.MethodData source: the Clinical Record Interactive Search system. This is a database of de-identified EHRs of secondary mental healthcare service-users at South London and Maudsley NHS Foundation Trust (SLaM). CRIS has pre-existing ethical approval via the Oxfordshire Research Ethics Committee C (ref 18/SC/0372) and this project was approved by the CRIS Oversight Committee (16-069). After developing a list of synonyms for self-harm and piloting coding rules, a gold standard dataset of EHRs was manually coded using Extensible Human Oracle Suite of Tools (eHOST) software. An NLP application to detect perinatal self-harm was then developed using several layers of linguistic processing based on the spaCy NLP library for Python. Evaluation of mention-level performance was done according to the attributes of mentions the application was designed to identify (span, status, temporality and polarity), by comparing application performance against the gold standard dataset. Performance was described as precision, recall, F-score and Cohen's kappa. Most service-users had more than one EHR in their period of perinatal service use. Performance was therefore also measured at “service-user level” with additional performance metrics of likelihood ratios and post-test probabilities. Linkage with the Hospital Episode Statistics datacase allowed creation of a cohort of women who accessed SLaM during the perinatal period. By deploying the application on the EHRs of the women in the cohort, we were able to estimate the prevalence of perinatal self-harm.ResultMention-level performance: micro-averaged F-score, precision and recall for span, polarity and temporality all >0.8. Kappa for status 0.68, temporality 0.62, polarity 0.91. Service-user level performance: F-score, precision, recall all 0.69, overall F-score 0.81, positive likelihood ratio 9.4 (4.8–19), post-test probability 68.9% (95%CI 53–82).Cohort prevalence of self-harm in pregnancy was 15.3% (95% CI 14.3–16.3); self-harm in the postnatal year was 19.7% (95% CI 18.6–20.8). Only a very small proportion of women self-harmed in both pregnancy and the postnatal year (3.9%, 95% CI 3.3–4.4).ConclusionNLP can be used to identify perinatal self-harm within EHRs. The hardest attribute to classify was temporality. This is in line with the wider literature indicating temporality as a notoriously difficult problem in NLP. As a result, the application probably over-estimates prevalence, to a degree. However, overall performance, given the difficulty of the task, is good.Bearing in mind the limitations, our findings suggest that self-harm is likely to be relatively common in women accessing secondary mental healthcare during the perinatal period.Funding: KA is funded by a National Institute for Health Research Doctoral Research Fellowship (NIHR-DRF-2016-09-042). The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. RD is funded by a Clinician Scientist Fellowship (research project e-HOST-IT) from the Health Foundation in partnership with the Academy of Medical Sciences which also party funds AB. AB's work was also part supported by Health Data Research UK, an initiative funded by UK Research and Innovation, Department of Health and Social Care (England) and the devolved administrations, and leading medical research charities, as well as the Maudsley Charity.Acknowledgements: Professor Louise M Howard, who originally suggested using NLP to identify perinatal self-harm in EHRs. Professor Howard is the primary supervisor of KA's Fellowship.


Author(s):  
Nibedita Roy ◽  
Apurbalal Senapati

Machine Translation (MT) is the process of automatically converting one natural language into another, preserving the exact meaning of the input text to the output text. It is one of the classical problems in the Natural Language Processing (NLP) domain and there is a wide application in our daily life. Though the research in MT in English and some other language is relatively in an advanced stage, but for most of the languages, it is far from the human-level performance in the translation task. From the computational point of view, for MT a lot of preprocessing and basic NLP tools and resources are needed. This study gives an overview of the available basic NLP resources in the context of Assamese-English machine translation.


2020 ◽  
pp. 3-17
Author(s):  
Peter Nabende

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.


Diabetes ◽  
2019 ◽  
Vol 68 (Supplement 1) ◽  
pp. 1243-P
Author(s):  
JIANMIN WU ◽  
FRITHA J. MORRISON ◽  
ZHENXIANG ZHAO ◽  
XUANYAO HE ◽  
MARIA SHUBINA ◽  
...  

Author(s):  
Pamela Rogalski ◽  
Eric Mikulin ◽  
Deborah Tihanyi

In 2018, we overheard many CEEA-AGEC members stating that they have "found their people"; this led us to wonder what makes this evolving community unique. Using cultural historical activity theory to view the proceedings of CEEA-ACEG 2004-2018 in comparison with the geographically and intellectually adjacent ASEE, we used both machine-driven (Natural Language Processing, NLP) and human-driven (literature review of the proceedings) methods. Here, we hoped to build on surveys—most recently by Nelson and Brennan (2018)—to understand, beyond what members say about themselves, what makes the CEEA-AGEC community distinct, where it has come from, and where it is going. Engaging in the two methods of data collection quickly diverted our focus from an analysis of the data themselves to the characteristics of the data in terms of cultural historical activity theory. Our preliminary findings point to some unique characteristics of machine- and human-driven results, with the former, as might be expected, focusing on the micro-level (words and language patterns) and the latter on the macro-level (ideas and concepts). NLP generated data within the realms of "community" and "division of labour" while the review of proceedings centred on "subject" and "object"; both found "instruments," although NLP with greater granularity. With this new understanding of the relative strengths of each method, we have a revised framework for addressing our original question.  


Sign in / Sign up

Export Citation Format

Share Document