Development of an Automated Solution for Large Scale Health Service Feedback: Using NLP and Topic Modelling techniques (Preprint)

Mapping Intimacies ◽

10.2196/preprints.29385 ◽

2021 ◽

Author(s):

George Alexander ◽

Mohammed Bahja ◽

Gibran F Butt

Keyword(s):

Lived Experience ◽

Language Processing ◽

Web Application ◽

Large Scale ◽

Service Providers ◽

Classification Model ◽

Free Text ◽

Analysis Tool ◽

Patient Feedback ◽

Health And Social Care

UNSTRUCTURED Obtaining patient feedback is an essential mechanism for healthcare service providers to assess their quality and effectiveness. Unlike assessments of clinical outcomes, feedback from patients offers insights into their lived experience. The Department of Health and Social Care in England via NHS Digital operates a patient feedback web service through which patients can leave feedback of their experiences into structured and free-text report forms. Free-text feedback compared to structured questionnaires may be less biased by the feedback collector thus more representative; however, it is harder to analyse in large quantities and challenging to derive meaningful, quantitative outcomes for better representation of the general public feedback. This study details the development of a text analysis tool that utilises contemporary natural language processing (NLP) and machine learning models to analyse free-text clinical service reviews to develop a robust classification model, and interactive visualisation web application based on a Vue.js application with NodeJS, working with a C# serverless API and SQL server all hosted on Microsoft Azure Platform, which facilitates exploration of the data, designed for the use by all stakeholders. Of the 11,103 possible clinical services that could be reviewed across England, 2030 different services had received a combined total of 51,845 reviews between 1/10/2017 and 31/10/2019; these were included for analysis. Dominant topics were identified for the entire corpus and then negative and positive sentiment topics in turn. Reviews containing high and low sentiment topics occurred more frequently than less polarised topics. Time series analysis can identify trends in topic and sentiment occurrence frequency across the study period. This tool automates the analysis of large volumes of free text specific to medical services, and the web application summarises the results and presents them in an accessible and interactive format. Such a tool has the potential to considerably reduce administrative burden and increase user uptake.

Download Full-text

Applying natural language processing and machine learning techniques to patient experience feedback: a systematic review

BMJ Health & Care Informatics ◽

10.1136/bmjhci-2020-100262 ◽

2021 ◽

Vol 28 (1) ◽

pp. e100262

Author(s):

Mustafa Khanbhai ◽

Patrick Anyadi ◽

Joshua Symons ◽

Kelsey Flott ◽

Ara Darzi ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Patient Experience ◽

Language Processing ◽

Performance Metrics ◽

Free Text ◽

Patient Feedback

ObjectivesUnstructured free-text patient feedback contains rich information, and analysing these data manually would require a lot of personnel resources which are not available in most healthcare organisations.To undertake a systematic review of the literature on the use of natural language processing (NLP) and machine learning (ML) to process and analyse free-text patient experience data.MethodsDatabases were systematically searched to identify articles published between January 2000 and December 2019 examining NLP to analyse free-text patient feedback. Due to the heterogeneous nature of the studies, a narrative synthesis was deemed most appropriate. Data related to the study purpose, corpus, methodology, performance metrics and indicators of quality were recorded.ResultsNineteen articles were included. The majority (80%) of studies applied language analysis techniques on patient feedback from social media sites (unsolicited) followed by structured surveys (solicited). Supervised learning was frequently used (n=9), followed by unsupervised (n=6) and semisupervised (n=3). Comments extracted from social media were analysed using an unsupervised approach, and free-text comments held within structured surveys were analysed using a supervised approach. Reported performance metrics included the precision, recall and F-measure, with support vector machine and Naïve Bayes being the best performing ML classifiers.ConclusionNLP and ML have emerged as an important tool for processing unstructured free text. Both supervised and unsupervised approaches have their role depending on the data source. With the advancement of data analysis tools, these techniques may be useful to healthcare organisations to generate insight from the volumes of unstructured free-text data.

Download Full-text

A study of semantic integration across archaeological data and reports in different languages

Journal of Information Science ◽

10.1177/0165551518789874 ◽

2018 ◽

Vol 45 (3) ◽

pp. 364-386

Author(s):

Ceri Binding ◽

Douglas Tudhope ◽

Andreas Vlachidis

Keyword(s):

Language Processing ◽

Web Application ◽

Reference Model ◽

Grey Literature ◽

Beech Wood ◽

Semantic Integration ◽

Free Text ◽

Semantic Framework ◽

Integrative Research ◽

Pattern Approach

This study investigates the semantic integration of data extracted from archaeological datasets with information extracted via natural language processing (NLP) across different languages. The investigation follows a broad theme relating to wooden objects and their dating via dendrochronological techniques, including types of wooden material, samples taken and wooden objects including shipwrecks. The outcomes are an integrated RDF dataset coupled with an associated interactive research demonstrator query builder application. The semantic framework combines the CIDOC Conceptual Reference Model (CRM) with the Getty Art and Architecture Thesaurus (AAT). The NLP, data cleansing and integration methods are described in detail together with illustrative scenarios from the web application Demonstrator. Reflections and recommendations from the study are discussed. The Demonstrator is a novel SPARQL web application, with CRM/AAT-based data integration. Functionality includes the combination of free text and semantic search with browsing on semantic links, hierarchical and associative relationship thesaurus query expansion. Queries concern wooden objects (e.g. samples of beech wood keels), optionally from a given date range, with automatic expansion over AAT hierarchies of wood types and specialised associative relationships. Following a ‘mapping pattern’ approach (via the STELETO tool) ensured validity and consistency of all RDF output. The user is shielded from the complexity of the underlying semantic framework by a query builder user interface. The study demonstrates the feasibility of connecting information extracted from datasets and grey literature reports in different languages and semantic cross-searching of the integrated information. The semantic linking of textual reports and datasets opens new possibilities for integrative research across diverse resources.

Download Full-text

Intelligent information retrieval for reducing missed cancer and improving the healthcare system

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2022010102 ◽

2022 ◽

Vol 12 (1) ◽

pp. 0-0

Keyword(s):

Breast Cancer ◽

Information Retrieval ◽

Language Processing ◽

Missing Values ◽

Classification Model ◽

Filter Method ◽

Breast Cancer Dataset ◽

Free Text ◽

Intelligent Information Retrieval ◽

Intelligent Information

This study presents an intelligent information retrieval system that will effectively extract useful information from breast cancer datasets and utilized that information to build a classification model. The proposed model will reduce the missed cancer rate by providing a comprehensive decision support to the radiologist. The model is built on two datasets, Wisconsin Breast Cancer Dataset (WBCD) and 365 free text mammography reports from a hospital. Effective pre-processing techniques including filling missing values with regression, an effective Natural Language Processing (NLP) Parser is developed to handle free text mammography reports, balancing the dataset with Synthetic Minority Oversampling (SMOTE) was applied to prepare the dataset for learning. Most relevant features were selected with the help of filter method and tf-idf scores. K-NN and SGD classifiers are optimized with optimum value of k for K-NN and hyper tuning the SGD parameters with grid search technique.

Download Full-text

A Study on Comparison of Network Location Efficiency of Web Application Firewall

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.33.21009 ◽

2018 ◽

Vol 7 (3.33) ◽

pp. 183

Author(s):

Sung-Ho Cho ◽

Sung-Uk Choi ◽

. .

Keyword(s):

Web Services ◽

Web Application ◽

Large Scale ◽

Service Providers ◽

Web Security ◽

Small Scale ◽

Internet Service ◽

Large Scale Networks ◽

The Web ◽

Line Type

This paper proposes a method to optimize the performance of web application firewalls according to their positions in large scale networks. Since ports for web services are always open and vulnerable in security, the introduction of web application firewalls is essential. Methods to configure web application firewalls in existing networks are largely divided into two types. There is an in-line type where a web application firewall is located between the network and the web server to be protected. This is mostly used in small scale single networks and is vulnerable to the physical obstruction of web application firewalls. The port redirection type configured with the help of peripheral network equipment such as routers or L4 switches can maintain web services even when physical obstruction of the web application firewall occurs and is suitable for large scale networks where several web services are mixed. In this study, port redirection type web application firewalls were configured in large-scale networks and there was a problem in that the performance of routers was degraded due to the IP-based VLAN when a policy was set for the ports on the routers for web security. In order to solve this problem, only those agencies and enterprises that provide web services of networks were separated and in-line type web application firewalls were configured for them. Internet service providers (ISPs) or central line-concentration agencies can apply the foregoing to configure systems for web security for unit small enterprises or small scale agencies at low costs.

Download Full-text

Phenoscape: Semantic analysis of organismal traits and genes yields insights in evolutionary biology

10.7287/peerj.preprints.26988v1 ◽

2018 ◽

Cited By ~ 1

Author(s):

Paula M Mabee ◽

Wasila M Dahdul ◽

James P Balhoff ◽

Hilmar Lapp ◽

Prashanti Manda ◽

...

Keyword(s):

Language Processing ◽

Evolutionary Biology ◽

Domain Knowledge ◽

Large Scale ◽

Semantic Analysis ◽

Comparative Anatomy ◽

Semantic Annotation ◽

Model Organisms ◽

Free Text ◽

Mutant Phenotypes

The study of how the observable features of organisms, i.e., their phenotypes, result from the complex interplay between genetics, development, and the environment, is central to much research in biology. The varied language used in the description of phenotypes, however, impedes the large scale and interdisciplinary analysis of phenotypes by computational methods. The Phenoscape project (www.phenoscape.org) has developed semantic annotation tools and a gene–phenotype knowledgebase, the Phenoscape KB, that uses machine reasoning to connect evolutionary phenotypes from the comparative literature to mutant phenotypes from model organisms. The semantically annotated data enables the linking of novel species phenotypes with candidate genes that may underlie them. Semantic annotation of evolutionary phenotypes further enables previously difficult or novel analyses of comparative anatomy and evolution. These include generating large, synthetic character matrices of presence/absence phenotypes based on inference, and searching for taxa and genes with similar variation profiles using semantic similarity. Phenoscape is further extending these tools to enable users to automatically generate synthetic supermatrices for diverse character types, and use the domain knowledge encoded in ontologies for evolutionary trait analysis. Curating the annotated phenotypes necessary for this research requires significant human curator effort, although semi-automated natural language processing tools promise to expedite the curation of free text. As semantic tools and methods are developed for the biodiversity sciences, new insights from the increasingly connected stores of interoperable phenotypic and genetic data are anticipated.

Download Full-text

A Pre-Training Technique to Localize Medical BERT and to Enhance Biomedical BERT

10.21203/rs.3.rs-103477/v1 ◽

2020 ◽

Author(s):

Shoya Wada ◽

Toshihiro Takeda ◽

Shiro Manabe ◽

Shozo Konishi ◽

Jun Kamohara ◽

...

Keyword(s):

Language Processing ◽

High Performance ◽

Large Scale ◽

Language Models ◽

Free Text ◽

Medical Databases ◽

Large Size ◽

Training Technique ◽

Medical Domain ◽

Medical Document

Abstract Background: Pre-training large-scale neural language models on raw texts has been shown to make a significant contribution to a strategy for transfer learning in natural language processing (NLP). With the introduction of transformer-based language models, such as Bidirectional Encoder Representations from Transformers (BERT), the performance of information extraction from free text by NLP has significantly improved for both the general domain and the medical domain; however, it is difficult for languages in which there are few publicly available medical databases with a high quality and a large size to train medical BERT models that perform well.Method: We introduce a method to train a BERT model using a small medical corpus both in English and in Japanese. Our proposed method consists of two interventions: simultaneous pre-training, which is intended to encourage masked language modeling and next-sentence prediction on the small medical corpus, and amplified vocabulary, which helps with suiting the small corpus when building the customized corpus by byte-pair encoding. Moreover, we used whole PubMed abstracts and developed a high-performance BERT model, Bidirectional Encoder Representations from Transformers for Biomedical Text Mining by Osaka University (ouBioBERT), in English via our method. We then evaluated the performance of our BERT models and publicly available baselines and compared them.Results: We confirmed that our Japanese medical BERT outperforms conventional baselines and the other BERT models in terms of the medical-document classification task and that our English BERT pre-trained using both the general and medical domain corpora performs sufficiently for practical use in terms of the biomedical language understanding evaluation (BLUE) benchmark. Moreover, ouBioBERT shows that the total score of the BLUE benchmark is 1.1 points above that of BioBERT and 0.3 points above that of the ablation model trained without our proposed method.Conclusions: Our proposed method makes it feasible to construct a practical medical BERT model in both Japanese and English, and it has a potential to produce higher performing models for biomedical shared tasks.

Download Full-text

ChemPager: Now Expanded for Even Greener Chemistry

CHIMIA International Journal for Chemistry ◽

10.2533/chimia.2019.724 ◽

2019 ◽

Vol 73 (9) ◽

pp. 724-729

Author(s):

Hugo Loureiro ◽

Michael Prem ◽

Georg Wuitschik

Keyword(s):

Experimental Data ◽

Green Chemistry ◽

Web Application ◽

Large Scale ◽

Decision Making Process ◽

Analysis Tool ◽

Development Status ◽

Cumulative Process ◽

Data Analysis Tool ◽

Synthetic Routes

ChemPager is a freely available data analysis tool for analyzing, comparing and improving synthetic routes. Here, we present an expansion of this application that makes use of the functionality of the PMI Predictor, which the ACS Green Chemistry Institute Pharmaceutical Roundtable has recently published as a web application. This addition enables ChemPager to predict the cumulative process mass intensity of chemical routes, irrespective of their development status, by comparison with a set of reactions executed on large scale. The prediction of this core green chemistry metric aims to improve existing routes and help the decision-making process among route alternatives without the need for experimental data.

Download Full-text

Automatic Generation of Scripts for Database Creation from Scenario Descriptions

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2021/v7i330180 ◽

2021 ◽

pp. 34-48

Author(s):

L. W. Amarasinghe ◽

R. D. Nawarathna

Keyword(s):

Language Processing ◽

Web Application ◽

Relational Databases ◽

Automatic Generation ◽

Relational Model ◽

Free Text ◽

Software Application ◽

Requirement Specification ◽

Database Technology ◽

Computer Science Faculty

Aims: Database creation is the most critical component of the design and implementation of any software application. Generally, the process of creating the database from the requirement specification of a software application is believed to be extremely hard. This study presents a method to automatically generate database scripts from a given scenario description of the requirement specification. Study Design: The method is developed based on a set of natural language processing (NLP) techniques and a few algorithms. Standard database scenario descriptions presented in popular textbooks on Database Design are used for the validation of the method. Place and Duration of Study: Department of Statistics and Computer Science, Faculty of Science, University of Peradeniya, Sri Lanka, Between December 2019 to December 2020. Methodology: The description of the problem scenario is processed using NLP operations such as tokenization, complex word handling, basic group handling, complex phrase handling, structure merging, and template construction to extract the necessary information required for the entity relational model. New algorithms are proposed to automatically convert the entity relational model to the logical schema and finally to the database script. The system can generate scripts for relational databases (RDB), object relational databases (ORDB) and Not Only SQL (NoSQL) databases. The proposed method is integrated into a web application where the users can type the scenario in natural or free text. The user can select the type of database (i.e., one of RDB, ORDB, NoSQL) considered in their system and accordingly the application generates the SQL scripts. Results: The proposed method was evaluated using 10 scenario descriptions connected to 10 different domains such as company, university, airport, etc. for all three types of databases. The method performed with impressive accuracies of 82.5%, 84.0% and 83.5% for RDB, ORDB and NoSQL scripts, respectively. Conclusion: This study is mainly focused on the automatic generation of SQL scripts from scenario descriptions of the requirement specification of a software system. Overall, the developed method helps to speed up the database development process. Further, the developed web application provides a learning environment for people who are novices in database technology.

Download Full-text

Using natural language processing to extract structured epilepsy data from unstructured clinic letters

International Journal for Population Data Science ◽

10.23889/ijpds.v3i4.699 ◽

2018 ◽

Vol 3 (4) ◽

Cited By ~ 1

Author(s):

Beata Fonferko-Shadrach ◽

Arron Lacey ◽

Ashley Akbari ◽

Simon Thompson ◽

David Ford ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

Clinical Information ◽

Training Sample ◽

Healthcare Research ◽

Free Text ◽

Specific Information ◽

Data Types

IntroductionElectronic health records (EHR) are a powerful resource in enabling large-scale healthcare research. EHRs often lack detailed disease-specific information that is collected in free text within clinical settings. This challenge can be addressed by using Natural Language Processing (NLP) to derive and extract detailed clinical information from free text. Objectives and ApproachUsing a training sample of 40 letters, we used the General Architecture for Text Engineering (GATE) framework to build custom rule sets for nine categories of epilepsy information as well as clinic date and date of birth. We used a validation set of 200 clinic letters to compare the results of our algorithm to a separate manual review by a clinician, where we evaluated a “per item” and a “per letter” approach for each category. ResultsThe “per letter” approach identified 1,939 items of information with overall precision, recall and F1-score of 92.7%, 77.7% and 85.6%. Precision and recall for epilepsy specific categories were: diagnosis (85.3%,92.4%), type (93.7%,83.2%), focal seizure (99.0%,68.3%), generalised seizure (92.5%,57.0%), seizure frequency (92.0%,52.3%), medication (96.1%,94.0%), CT (66.7%,47.1%), MRI (96.6%,51.4%) and EEG (95.8%,40.6%). By combining all items per category, per letter we were able to achieve higher precision, recall and F1-scores of 94.6%, 84.2% and 89.0% across all categories. Conclusion/ImplicationsOur results demonstrate that NLP techniques can be used to accurately extract rich phenotypic details from clinic letters that is often missing from routinely-collected data. Capturing these new data types provides a platform for conducting novel precision neurology research, in addition to potential applicability to other disease areas.

Download Full-text

Cognitive Impairments in Schizophrenia: A Study in a Large Clinical Sample Using Natural Language Processing

Frontiers in Digital Health ◽

10.3389/fdgth.2021.711941 ◽

2021 ◽

Vol 3 ◽

Author(s):

Aurelie Mascio ◽

Robert Stewart ◽

Riley Botelle ◽

Marcus Williams ◽

Luwaiza Mirza ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Clinical Outcomes ◽

Language Processing ◽

Large Scale ◽

Rating Scales ◽

Cognitive Impairments ◽

Free Text ◽

Socio Demographic Factors ◽

Clinical Records

Background: Cognitive impairments are a neglected aspect of schizophrenia despite being a major factor of poor functional outcome. They are usually measured using various rating scales, however, these necessitate trained practitioners and are rarely routinely applied in clinical settings. Recent advances in natural language processing techniques allow us to extract such information from unstructured portions of text at a large scale and in a cost effective manner. We aimed to identify cognitive problems in the clinical records of a large sample of patients with schizophrenia, and assess their association with clinical outcomes.Methods: We developed a natural language processing based application identifying cognitive dysfunctions from the free text of medical records, and assessed its performance against a rating scale widely used in the United Kingdom, the cognitive component of the Health of the Nation Outcome Scales (HoNOS). Furthermore, we analyzed cognitive trajectories over the course of patient treatment, and evaluated their relationship with various socio-demographic factors and clinical outcomes.Results: We found a high prevalence of cognitive impairments in patients with schizophrenia, and a strong correlation with several socio-demographic factors (gender, education, ethnicity, marital status, and employment) as well as adverse clinical outcomes. Results obtained from the free text were broadly in line with those obtained using the HoNOS subscale, and shed light on additional associations, notably related to attention and social impairments for patients with higher education.Conclusions: Our findings demonstrate that cognitive problems are common in patients with schizophrenia, can be reliably extracted from clinical records using natural language processing, and are associated with adverse clinical outcomes. Harvesting the free text from medical records provides a larger coverage in contrast to neurocognitive batteries or rating scales, and access to additional socio-demographic and clinical variables. Text mining tools can therefore facilitate large scale patient screening and early symptoms detection, and ultimately help inform clinical decisions.

Download Full-text