Development of an Automated Solution for Large Scale Health Service Feedback: Using NLP and Topic Modelling techniques (Preprint)

2021 ◽  
Author(s):  
George Alexander ◽  
Mohammed Bahja ◽  
Gibran F Butt

UNSTRUCTURED Obtaining patient feedback is an essential mechanism for healthcare service providers to assess their quality and effectiveness. Unlike assessments of clinical outcomes, feedback from patients offers insights into their lived experience. The Department of Health and Social Care in England via NHS Digital operates a patient feedback web service through which patients can leave feedback of their experiences into structured and free-text report forms. Free-text feedback compared to structured questionnaires may be less biased by the feedback collector thus more representative; however, it is harder to analyse in large quantities and challenging to derive meaningful, quantitative outcomes for better representation of the general public feedback. This study details the development of a text analysis tool that utilises contemporary natural language processing (NLP) and machine learning models to analyse free-text clinical service reviews to develop a robust classification model, and interactive visualisation web application based on a Vue.js application with NodeJS, working with a C# serverless API and SQL server all hosted on Microsoft Azure Platform, which facilitates exploration of the data, designed for the use by all stakeholders. Of the 11,103 possible clinical services that could be reviewed across England, 2030 different services had received a combined total of 51,845 reviews between 1/10/2017 and 31/10/2019; these were included for analysis. Dominant topics were identified for the entire corpus and then negative and positive sentiment topics in turn. Reviews containing high and low sentiment topics occurred more frequently than less polarised topics. Time series analysis can identify trends in topic and sentiment occurrence frequency across the study period. This tool automates the analysis of large volumes of free text specific to medical services, and the web application summarises the results and presents them in an accessible and interactive format. Such a tool has the potential to considerably reduce administrative burden and increase user uptake.

2021 ◽  
Vol 28 (1) ◽  
pp. e100262
Author(s):  
Mustafa Khanbhai ◽  
Patrick Anyadi ◽  
Joshua Symons ◽  
Kelsey Flott ◽  
Ara Darzi ◽  
...  

ObjectivesUnstructured free-text patient feedback contains rich information, and analysing these data manually would require a lot of personnel resources which are not available in most healthcare organisations.To undertake a systematic review of the literature on the use of natural language processing (NLP) and machine learning (ML) to process and analyse free-text patient experience data.MethodsDatabases were systematically searched to identify articles published between January 2000 and December 2019 examining NLP to analyse free-text patient feedback. Due to the heterogeneous nature of the studies, a narrative synthesis was deemed most appropriate. Data related to the study purpose, corpus, methodology, performance metrics and indicators of quality were recorded.ResultsNineteen articles were included. The majority (80%) of studies applied language analysis techniques on patient feedback from social media sites (unsolicited) followed by structured surveys (solicited). Supervised learning was frequently used (n=9), followed by unsupervised (n=6) and semisupervised (n=3). Comments extracted from social media were analysed using an unsupervised approach, and free-text comments held within structured surveys were analysed using a supervised approach. Reported performance metrics included the precision, recall and F-measure, with support vector machine and Naïve Bayes being the best performing ML classifiers.ConclusionNLP and ML have emerged as an important tool for processing unstructured free text. Both supervised and unsupervised approaches have their role depending on the data source. With the advancement of data analysis tools, these techniques may be useful to healthcare organisations to generate insight from the volumes of unstructured free-text data.


2018 ◽  
Vol 45 (3) ◽  
pp. 364-386
Author(s):  
Ceri Binding ◽  
Douglas Tudhope ◽  
Andreas Vlachidis

This study investigates the semantic integration of data extracted from archaeological datasets with information extracted via natural language processing (NLP) across different languages. The investigation follows a broad theme relating to wooden objects and their dating via dendrochronological techniques, including types of wooden material, samples taken and wooden objects including shipwrecks. The outcomes are an integrated RDF dataset coupled with an associated interactive research demonstrator query builder application. The semantic framework combines the CIDOC Conceptual Reference Model (CRM) with the Getty Art and Architecture Thesaurus (AAT). The NLP, data cleansing and integration methods are described in detail together with illustrative scenarios from the web application Demonstrator. Reflections and recommendations from the study are discussed. The Demonstrator is a novel SPARQL web application, with CRM/AAT-based data integration. Functionality includes the combination of free text and semantic search with browsing on semantic links, hierarchical and associative relationship thesaurus query expansion. Queries concern wooden objects (e.g. samples of beech wood keels), optionally from a given date range, with automatic expansion over AAT hierarchies of wood types and specialised associative relationships. Following a ‘mapping pattern’ approach (via the STELETO tool) ensured validity and consistency of all RDF output. The user is shielded from the complexity of the underlying semantic framework by a query builder user interface. The study demonstrates the feasibility of connecting information extracted from datasets and grey literature reports in different languages and semantic cross-searching of the integrated information. The semantic linking of textual reports and datasets opens new possibilities for integrative research across diverse resources.


2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

This study presents an intelligent information retrieval system that will effectively extract useful information from breast cancer datasets and utilized that information to build a classification model. The proposed model will reduce the missed cancer rate by providing a comprehensive decision support to the radiologist. The model is built on two datasets, Wisconsin Breast Cancer Dataset (WBCD) and 365 free text mammography reports from a hospital. Effective pre-processing techniques including filling missing values with regression, an effective Natural Language Processing (NLP) Parser is developed to handle free text mammography reports, balancing the dataset with Synthetic Minority Oversampling (SMOTE) was applied to prepare the dataset for learning. Most relevant features were selected with the help of filter method and tf-idf scores. K-NN and SGD classifiers are optimized with optimum value of k for K-NN and hyper tuning the SGD parameters with grid search technique.


2018 ◽  
Vol 7 (3.33) ◽  
pp. 183
Author(s):  
Sung-Ho Cho ◽  
Sung-Uk Choi ◽  
. .

This paper proposes a method to optimize the performance of web application firewalls according to their positions in large scale networks. Since ports for web services are always open and vulnerable in security, the introduction of web application firewalls is essential. Methods to configure web application firewalls in existing networks are largely divided into two types. There is an in-line type where a web application firewall is located between the network and the web server to be protected. This is mostly used in small scale single networks and is vulnerable to the physical obstruction of web application firewalls. The port redirection type configured with the help of peripheral network equipment such as routers or L4 switches can maintain web services even when physical obstruction of the web application firewall occurs and is suitable for large scale networks where several web services are mixed. In this study, port redirection type web application firewalls were configured in large-scale networks and there was a problem in that the performance of routers was degraded due to the IP-based VLAN when a policy was set for the ports on the routers for web security. In order to solve this problem, only those agencies and enterprises that provide web services of networks were separated and in-line type web application firewalls were configured for them. Internet service providers (ISPs) or central line-concentration agencies can apply the foregoing to configure systems for web security for unit small enterprises or small scale agencies at low costs.  


Author(s):  
Paula M Mabee ◽  
Wasila M Dahdul ◽  
James P Balhoff ◽  
Hilmar Lapp ◽  
Prashanti Manda ◽  
...  

The study of how the observable features of organisms, i.e., their phenotypes, result from the complex interplay between genetics, development, and the environment, is central to much research in biology. The varied language used in the description of phenotypes, however, impedes the large scale and interdisciplinary analysis of phenotypes by computational methods. The Phenoscape project (www.phenoscape.org) has developed semantic annotation tools and a gene–phenotype knowledgebase, the Phenoscape KB, that uses machine reasoning to connect evolutionary phenotypes from the comparative literature to mutant phenotypes from model organisms. The semantically annotated data enables the linking of novel species phenotypes with candidate genes that may underlie them. Semantic annotation of evolutionary phenotypes further enables previously difficult or novel analyses of comparative anatomy and evolution. These include generating large, synthetic character matrices of presence/absence phenotypes based on inference, and searching for taxa and genes with similar variation profiles using semantic similarity. Phenoscape is further extending these tools to enable users to automatically generate synthetic supermatrices for diverse character types, and use the domain knowledge encoded in ontologies for evolutionary trait analysis. Curating the annotated phenotypes necessary for this research requires significant human curator effort, although semi-automated natural language processing tools promise to expedite the curation of free text. As semantic tools and methods are developed for the biodiversity sciences, new insights from the increasingly connected stores of interoperable phenotypic and genetic data are anticipated.


2020 ◽  
Author(s):  
Shoya Wada ◽  
Toshihiro Takeda ◽  
Shiro Manabe ◽  
Shozo Konishi ◽  
Jun Kamohara ◽  
...  

Abstract Background: Pre-training large-scale neural language models on raw texts has been shown to make a significant contribution to a strategy for transfer learning in natural language processing (NLP). With the introduction of transformer-based language models, such as Bidirectional Encoder Representations from Transformers (BERT), the performance of information extraction from free text by NLP has significantly improved for both the general domain and the medical domain; however, it is difficult for languages in which there are few publicly available medical databases with a high quality and a large size to train medical BERT models that perform well.Method: We introduce a method to train a BERT model using a small medical corpus both in English and in Japanese. Our proposed method consists of two interventions: simultaneous pre-training, which is intended to encourage masked language modeling and next-sentence prediction on the small medical corpus, and amplified vocabulary, which helps with suiting the small corpus when building the customized corpus by byte-pair encoding. Moreover, we used whole PubMed abstracts and developed a high-performance BERT model, Bidirectional Encoder Representations from Transformers for Biomedical Text Mining by Osaka University (ouBioBERT), in English via our method. We then evaluated the performance of our BERT models and publicly available baselines and compared them.Results: We confirmed that our Japanese medical BERT outperforms conventional baselines and the other BERT models in terms of the medical-document classification task and that our English BERT pre-trained using both the general and medical domain corpora performs sufficiently for practical use in terms of the biomedical language understanding evaluation (BLUE) benchmark. Moreover, ouBioBERT shows that the total score of the BLUE benchmark is 1.1 points above that of BioBERT and 0.3 points above that of the ablation model trained without our proposed method.Conclusions: Our proposed method makes it feasible to construct a practical medical BERT model in both Japanese and English, and it has a potential to produce higher performing models for biomedical shared tasks.


2019 ◽  
Vol 73 (9) ◽  
pp. 724-729
Author(s):  
Hugo Loureiro ◽  
Michael Prem ◽  
Georg Wuitschik

ChemPager is a freely available data analysis tool for analyzing, comparing and improving synthetic routes. Here, we present an expansion of this application that makes use of the functionality of the PMI Predictor, which the ACS Green Chemistry Institute Pharmaceutical Roundtable has recently published as a web application. This addition enables ChemPager to predict the cumulative process mass intensity of chemical routes, irrespective of their development status, by comparison with a set of reactions executed on large scale. The prediction of this core green chemistry metric aims to improve existing routes and help the decision-making process among route alternatives without the need for experimental data.


Author(s):  
L. W. Amarasinghe ◽  
R. D. Nawarathna

Aims: Database creation is the most critical component of the design and implementation of any software application. Generally, the process of creating the database from the requirement specification of a software application is believed to be extremely hard. This study presents a method to automatically generate database scripts from a given scenario description of the requirement specification. Study Design: The method is developed based on a set of natural language processing (NLP) techniques and a few algorithms. Standard database scenario descriptions presented in popular textbooks on Database Design are used for the validation of the method. Place and Duration of Study: Department of Statistics and Computer Science, Faculty of Science, University of Peradeniya, Sri Lanka, Between December 2019 to December 2020. Methodology: The description of the problem scenario is processed using NLP operations such as tokenization, complex word handling, basic group handling, complex phrase handling, structure merging, and template construction to extract the necessary information required for the entity relational model. New algorithms are proposed to automatically convert the entity relational model to the logical schema and finally to the database script. The system can generate scripts for relational databases (RDB), object relational databases (ORDB) and Not Only SQL (NoSQL) databases. The proposed method is integrated into a web application where the users can type the scenario in natural or free text. The user can select the type of database (i.e., one of RDB, ORDB, NoSQL) considered in their system and accordingly the application generates the SQL scripts. Results: The proposed method was evaluated using 10 scenario descriptions connected to 10 different domains such as company, university, airport, etc. for all three types of databases. The method performed with impressive accuracies of 82.5%, 84.0% and 83.5% for RDB, ORDB and NoSQL scripts, respectively. Conclusion: This study is mainly focused on the automatic generation of SQL scripts from scenario descriptions of the requirement specification of a software system. Overall, the developed method helps to speed up the database development process. Further, the developed web application provides a learning environment for people who are novices in database technology. 


Author(s):  
Beata Fonferko-Shadrach ◽  
Arron Lacey ◽  
Ashley Akbari ◽  
Simon Thompson ◽  
David Ford ◽  
...  

IntroductionElectronic health records (EHR) are a powerful resource in enabling large-scale healthcare research. EHRs often lack detailed disease-specific information that is collected in free text within clinical settings. This challenge can be addressed by using Natural Language Processing (NLP) to derive and extract detailed clinical information from free text. Objectives and ApproachUsing a training sample of 40 letters, we used the General Architecture for Text Engineering (GATE) framework to build custom rule sets for nine categories of epilepsy information as well as clinic date and date of birth. We used a validation set of 200 clinic letters to compare the results of our algorithm to a separate manual review by a clinician, where we evaluated a “per item” and a “per letter” approach for each category. ResultsThe “per letter” approach identified 1,939 items of information with overall precision, recall and F1-score of 92.7%, 77.7% and 85.6%. Precision and recall for epilepsy specific categories were: diagnosis (85.3%,92.4%),  type (93.7%,83.2%), focal seizure (99.0%,68.3%), generalised seizure (92.5%,57.0%), seizure frequency (92.0%,52.3%), medication (96.1%,94.0%), CT (66.7%,47.1%), MRI (96.6%,51.4%) and EEG (95.8%,40.6%). By combining all items per category, per letter we were able to achieve higher precision, recall and F1-scores of 94.6%, 84.2% and 89.0% across all categories. Conclusion/ImplicationsOur results demonstrate that NLP techniques can be used to accurately extract rich phenotypic details from clinic letters that is often missing from routinely-collected data. Capturing these new data types provides a platform for conducting novel precision neurology research, in addition to potential applicability to other disease areas.


2021 ◽  
Vol 3 ◽  
Author(s):  
Aurelie Mascio ◽  
Robert Stewart ◽  
Riley Botelle ◽  
Marcus Williams ◽  
Luwaiza Mirza ◽  
...  

Background: Cognitive impairments are a neglected aspect of schizophrenia despite being a major factor of poor functional outcome. They are usually measured using various rating scales, however, these necessitate trained practitioners and are rarely routinely applied in clinical settings. Recent advances in natural language processing techniques allow us to extract such information from unstructured portions of text at a large scale and in a cost effective manner. We aimed to identify cognitive problems in the clinical records of a large sample of patients with schizophrenia, and assess their association with clinical outcomes.Methods: We developed a natural language processing based application identifying cognitive dysfunctions from the free text of medical records, and assessed its performance against a rating scale widely used in the United Kingdom, the cognitive component of the Health of the Nation Outcome Scales (HoNOS). Furthermore, we analyzed cognitive trajectories over the course of patient treatment, and evaluated their relationship with various socio-demographic factors and clinical outcomes.Results: We found a high prevalence of cognitive impairments in patients with schizophrenia, and a strong correlation with several socio-demographic factors (gender, education, ethnicity, marital status, and employment) as well as adverse clinical outcomes. Results obtained from the free text were broadly in line with those obtained using the HoNOS subscale, and shed light on additional associations, notably related to attention and social impairments for patients with higher education.Conclusions: Our findings demonstrate that cognitive problems are common in patients with schizophrenia, can be reliably extracted from clinical records using natural language processing, and are associated with adverse clinical outcomes. Harvesting the free text from medical records provides a larger coverage in contrast to neurocognitive batteries or rating scales, and access to additional socio-demographic and clinical variables. Text mining tools can therefore facilitate large scale patient screening and early symptoms detection, and ultimately help inform clinical decisions.


Sign in / Sign up

Export Citation Format

Share Document