A study of semantic integration across archaeological data and reports in different languages

This study investigates the semantic integration of data extracted from archaeological datasets with information extracted via natural language processing (NLP) across different languages. The investigation follows a broad theme relating to wooden objects and their dating via dendrochronological techniques, including types of wooden material, samples taken and wooden objects including shipwrecks. The outcomes are an integrated RDF dataset coupled with an associated interactive research demonstrator query builder application. The semantic framework combines the CIDOC Conceptual Reference Model (CRM) with the Getty Art and Architecture Thesaurus (AAT). The NLP, data cleansing and integration methods are described in detail together with illustrative scenarios from the web application Demonstrator. Reflections and recommendations from the study are discussed. The Demonstrator is a novel SPARQL web application, with CRM/AAT-based data integration. Functionality includes the combination of free text and semantic search with browsing on semantic links, hierarchical and associative relationship thesaurus query expansion. Queries concern wooden objects (e.g. samples of beech wood keels), optionally from a given date range, with automatic expansion over AAT hierarchies of wood types and specialised associative relationships. Following a ‘mapping pattern’ approach (via the STELETO tool) ensured validity and consistency of all RDF output. The user is shielded from the complexity of the underlying semantic framework by a query builder user interface. The study demonstrates the feasibility of connecting information extracted from datasets and grey literature reports in different languages and semantic cross-searching of the integrated information. The semantic linking of textual reports and datasets opens new possibilities for integrative research across diverse resources.

Download Full-text

Automatic Generation of Scripts for Database Creation from Scenario Descriptions

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2021/v7i330180 ◽

2021 ◽

pp. 34-48

Author(s):

L. W. Amarasinghe ◽

R. D. Nawarathna

Keyword(s):

Language Processing ◽

Web Application ◽

Relational Databases ◽

Automatic Generation ◽

Relational Model ◽

Free Text ◽

Software Application ◽

Requirement Specification ◽

Database Technology ◽

Computer Science Faculty

Aims: Database creation is the most critical component of the design and implementation of any software application. Generally, the process of creating the database from the requirement specification of a software application is believed to be extremely hard. This study presents a method to automatically generate database scripts from a given scenario description of the requirement specification. Study Design: The method is developed based on a set of natural language processing (NLP) techniques and a few algorithms. Standard database scenario descriptions presented in popular textbooks on Database Design are used for the validation of the method. Place and Duration of Study: Department of Statistics and Computer Science, Faculty of Science, University of Peradeniya, Sri Lanka, Between December 2019 to December 2020. Methodology: The description of the problem scenario is processed using NLP operations such as tokenization, complex word handling, basic group handling, complex phrase handling, structure merging, and template construction to extract the necessary information required for the entity relational model. New algorithms are proposed to automatically convert the entity relational model to the logical schema and finally to the database script. The system can generate scripts for relational databases (RDB), object relational databases (ORDB) and Not Only SQL (NoSQL) databases. The proposed method is integrated into a web application where the users can type the scenario in natural or free text. The user can select the type of database (i.e., one of RDB, ORDB, NoSQL) considered in their system and accordingly the application generates the SQL scripts. Results: The proposed method was evaluated using 10 scenario descriptions connected to 10 different domains such as company, university, airport, etc. for all three types of databases. The method performed with impressive accuracies of 82.5%, 84.0% and 83.5% for RDB, ORDB and NoSQL scripts, respectively. Conclusion: This study is mainly focused on the automatic generation of SQL scripts from scenario descriptions of the requirement specification of a software system. Overall, the developed method helps to speed up the database development process. Further, the developed web application provides a learning environment for people who are novices in database technology.

Download Full-text

Development of an Automated Solution for Large Scale Health Service Feedback: Using NLP and Topic Modelling techniques (Preprint)

10.2196/preprints.29385 ◽

2021 ◽

Author(s):

George Alexander ◽

Mohammed Bahja ◽

Gibran F Butt

Keyword(s):

Lived Experience ◽

Language Processing ◽

Web Application ◽

Large Scale ◽

Service Providers ◽

Classification Model ◽

Free Text ◽

Analysis Tool ◽

Patient Feedback ◽

Health And Social Care

UNSTRUCTURED Obtaining patient feedback is an essential mechanism for healthcare service providers to assess their quality and effectiveness. Unlike assessments of clinical outcomes, feedback from patients offers insights into their lived experience. The Department of Health and Social Care in England via NHS Digital operates a patient feedback web service through which patients can leave feedback of their experiences into structured and free-text report forms. Free-text feedback compared to structured questionnaires may be less biased by the feedback collector thus more representative; however, it is harder to analyse in large quantities and challenging to derive meaningful, quantitative outcomes for better representation of the general public feedback. This study details the development of a text analysis tool that utilises contemporary natural language processing (NLP) and machine learning models to analyse free-text clinical service reviews to develop a robust classification model, and interactive visualisation web application based on a Vue.js application with NodeJS, working with a C# serverless API and SQL server all hosted on Microsoft Azure Platform, which facilitates exploration of the data, designed for the use by all stakeholders. Of the 11,103 possible clinical services that could be reviewed across England, 2030 different services had received a combined total of 51,845 reviews between 1/10/2017 and 31/10/2019; these were included for analysis. Dominant topics were identified for the entire corpus and then negative and positive sentiment topics in turn. Reviews containing high and low sentiment topics occurred more frequently than less polarised topics. Time series analysis can identify trends in topic and sentiment occurrence frequency across the study period. This tool automates the analysis of large volumes of free text specific to medical services, and the web application summarises the results and presents them in an accessible and interactive format. Such a tool has the potential to considerably reduce administrative burden and increase user uptake.

Download Full-text

The Case for Retaining Natural Language Descriptions of Phenotypes in Plant Databases and a Web Application as Proof of Concept

10.1101/2021.02.04.429796 ◽

2021 ◽

Author(s):

Ian R. Braun ◽

Diane C. Bassham ◽

Carolyn J. Lawrence-Dill

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Web Application ◽

Free Text ◽

Similarity Metrics ◽

Proof Of Concept ◽

Link Type ◽

Plant Genes ◽

Phenotype Similarity

ABSTRACTMotivationFinding similarity across phenotypic descriptions is not straightforward, with previous successes in computation requiring significant expert data curation. Natural language processing of free text phenotype descriptions is often easier to apply than intensive curation. It is therefore critical to understand the extent to which these techniques can be used to organize and analyze biological datasets and enable biological discoveries.ResultsA wide variety of approaches from the natural language processing domain perform as well as similarity metrics over curated annotations for predicting shared phenotypes. These approaches also show promise both for helping curators organize and work through large datasets as well as for enabling researchers to explore relationships among available phenotype descriptions. Here we generate networks of phenotype similarity and share a web application for querying a dataset of associated plant genes using these text mining approaches. Example situations and species for which application of these techniques is most useful are discussed.AvailabilityThe dataset used in this work is available at https://git.io/JTutQ. The code for the analysis performed here is available at https://git.io/JTutN and https://git.io/JTuqv. The code for the web application discussed here is available at https://git.io/Jtv9J, and the application itself is available at https://quoats.dill-picl.org/.

Download Full-text

Sentiment Analysis Techniques Applied to Raw-Text Data from a Csq-8 Questionnaire about Mindfulness in Times of COVID-19 to Improve Strategy Generation

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18126408 ◽

2021 ◽

Vol 18 (12) ◽

pp. 6408

Author(s):

Mario Jojoa Acosta ◽

Gema Castillo-Sánchez ◽

Begonya Garcia-Zapirain ◽

Isabel de la Torre Díez ◽

Manuel Franco-Martín

Keyword(s):

Health Care ◽

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Transfer Learning ◽

Language Processing ◽

Health Care Professionals ◽

Ground Truth ◽

Relevant Information ◽

Free Text

The use of artificial intelligence in health care has grown quickly. In this sense, we present our work related to the application of Natural Language Processing techniques, as a tool to analyze the sentiment perception of users who answered two questions from the CSQ-8 questionnaires with raw Spanish free-text. Their responses are related to mindfulness, which is a novel technique used to control stress and anxiety caused by different factors in daily life. As such, we proposed an online course where this method was applied in order to improve the quality of life of health care professionals in COVID 19 pandemic times. We also carried out an evaluation of the satisfaction level of the participants involved, with a view to establishing strategies to improve future experiences. To automatically perform this task, we used Natural Language Processing (NLP) models such as swivel embedding, neural networks, and transfer learning, so as to classify the inputs into the following three categories: negative, neutral, and positive. Due to the limited amount of data available—86 registers for the first and 68 for the second—transfer learning techniques were required. The length of the text had no limit from the user’s standpoint, and our approach attained a maximum accuracy of 93.02% and 90.53%, respectively, based on ground truth labeled by three experts. Finally, we proposed a complementary analysis, using computer graphic text representation based on word frequency, to help researchers identify relevant information about the opinions with an objective approach to sentiment. The main conclusion drawn from this work is that the application of NLP techniques in small amounts of data using transfer learning is able to obtain enough accuracy in sentiment analysis and text classification stages.

Download Full-text

Applying natural language processing and machine learning techniques to patient experience feedback: a systematic review

BMJ Health & Care Informatics ◽

10.1136/bmjhci-2020-100262 ◽

2021 ◽

Vol 28 (1) ◽

pp. e100262

Author(s):

Mustafa Khanbhai ◽

Patrick Anyadi ◽

Joshua Symons ◽

Kelsey Flott ◽

Ara Darzi ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Patient Experience ◽

Language Processing ◽

Performance Metrics ◽

Free Text ◽

Patient Feedback

ObjectivesUnstructured free-text patient feedback contains rich information, and analysing these data manually would require a lot of personnel resources which are not available in most healthcare organisations.To undertake a systematic review of the literature on the use of natural language processing (NLP) and machine learning (ML) to process and analyse free-text patient experience data.MethodsDatabases were systematically searched to identify articles published between January 2000 and December 2019 examining NLP to analyse free-text patient feedback. Due to the heterogeneous nature of the studies, a narrative synthesis was deemed most appropriate. Data related to the study purpose, corpus, methodology, performance metrics and indicators of quality were recorded.ResultsNineteen articles were included. The majority (80%) of studies applied language analysis techniques on patient feedback from social media sites (unsolicited) followed by structured surveys (solicited). Supervised learning was frequently used (n=9), followed by unsupervised (n=6) and semisupervised (n=3). Comments extracted from social media were analysed using an unsupervised approach, and free-text comments held within structured surveys were analysed using a supervised approach. Reported performance metrics included the precision, recall and F-measure, with support vector machine and Naïve Bayes being the best performing ML classifiers.ConclusionNLP and ML have emerged as an important tool for processing unstructured free text. Both supervised and unsupervised approaches have their role depending on the data source. With the advancement of data analysis tools, these techniques may be useful to healthcare organisations to generate insight from the volumes of unstructured free-text data.

Download Full-text

Ascertaining Framingham heart failure phenotype from inpatient electronic health record data using natural language processing: a multicentre Atherosclerosis Risk in Communities (ARIC) validation study

BMJ Open ◽

10.1136/bmjopen-2020-047356 ◽

2021 ◽

Vol 11 (6) ◽

pp. e047356

Author(s):

Carlton R Moore ◽

Saumya Jain ◽

Stephanie Haas ◽

Harish Yadav ◽

Eric Whitsel ◽

...

Keyword(s):

Language Processing ◽

Validation Dataset ◽

Free Text ◽

Electronic Health Record Data ◽

Atherosclerosis Risk In Communities ◽

Clinical Notes ◽

Atherosclerosis Risk ◽

Record Data ◽

Sensitivity Specificity ◽

Aric Study

ObjectivesUsing free-text clinical notes and reports from hospitalised patients, determine the performance of natural language processing (NLP) ascertainment of Framingham heart failure (HF) criteria and phenotype.Study designA retrospective observational study design of patients hospitalised in 2015 from four hospitals participating in the Atherosclerosis Risk in Communities (ARIC) study was used to determine NLP performance in the ascertainment of Framingham HF criteria and phenotype.SettingFour ARIC study hospitals, each representing an ARIC study region in the USA.ParticipantsA stratified random sample of hospitalisations identified using a broad range of International Classification of Disease, ninth revision, diagnostic codes indicative of an HF event and occurring during 2015 was drawn for this study. A randomly selected set of 394 hospitalisations was used as the derivation dataset and 406 hospitalisations was used as the validation dataset.InterventionUse of NLP on free-text clinical notes and reports to ascertain Framingham HF criteria and phenotype.Primary and secondary outcome measuresNLP performance as measured by sensitivity, specificity, positive-predictive value (PPV) and agreement in ascertainment of Framingham HF criteria and phenotype. Manual medical record review by trained ARIC abstractors was used as the reference standard.ResultsOverall, performance of NLP ascertainment of Framingham HF phenotype in the validation dataset was good, with 78.8%, 81.7%, 84.4% and 80.0% for sensitivity, specificity, PPV and agreement, respectively.ConclusionsBy decreasing the need for manual chart review, our results on the use of NLP to ascertain Framingham HF phenotype from free-text electronic health record data suggest that validated NLP technology holds the potential for significantly improving the feasibility and efficiency of conducting large-scale epidemiologic surveillance of HF prevalence and incidence.

Download Full-text

Measuring Adoption of Patient Priorities-Aligned Care Using Natural Language Processing

Innovation in Aging ◽

10.1093/geroni/igaa057.592 ◽

2020 ◽

Vol 4 (Supplement_1) ◽

pp. 183-183

Author(s):

Javad Razjouyan ◽

Jennifer Freytag ◽

Edward Odom ◽

Lilian Dindo ◽

Aanand Naik

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Chart Review ◽

Group Analysis ◽

Intervention Group ◽

Multiple Chronic Conditions ◽

Free Text ◽

Term Care

Abstract Patient Priorities Care (PPC) is a model of care that aligns health care recommendations with priorities of older adults with multiple chronic conditions. Social workers (SW), after online training, document PPC in the patient’s electronic health record (EHR). Our goal is to identify free-text notes with PPC language using a natural language processing (NLP) model and to measure PPC adoption and effect on long term services and support (LTSS) use. Free-text notes from the EHR produced by trained SWs passed through a hybrid NLP model that utilized rule-based and statistical machine learning. NLP accuracy was validated against chart review. Patients who received PPC were propensity matched with patients not receiving PPC (control) on age, gender, BMI, Charlson comorbidity index, facility and SW. The change in LTSS utilization 6-month intervals were compared by groups with univariate analysis. Chart review indicated that 491 notes out of 689 had PPC language and the NLP model reached to precision of 0.85, a recall of 0.90, an F1 of 0.87, and an accuracy of 0.91. Within group analysis shows that intervention group used LTSS 1.8 times more in the 6 months after the encounter compared to 6 months prior. Between group analysis shows that intervention group has significant higher number of LTSS utilization (p=0.012). An automated NLP model can be used to reliably measure the adaptation of PPC by SW. PPC seems to encourage use of LTSS that may delay time to long term care placement.

Download Full-text

Natural language processing methods for knowledge management—Applying document clustering for fast search and grouping of engineering documents

Concurrent Engineering ◽

10.1177/1063293x20982973 ◽

2021 ◽

pp. 1063293X2098297

Author(s):

Ivar Örn Arnarsson ◽

Otto Frost ◽

Emil Gustavsson ◽

Mats Jirstrand ◽

Johan Malmqvist

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Domain Knowledge ◽

Clustering Algorithms ◽

Document Clustering ◽

Unstructured Data ◽

Free Text ◽

Engineering Change ◽

Engineering Documents

Product development companies collect data in form of Engineering Change Requests for logged design issues, tests, and product iterations. These documents are rich in unstructured data (e.g. free text). Previous research affirms that product developers find that current IT systems lack capabilities to accurately retrieve relevant documents with unstructured data. In this research, we demonstrate a method using Natural Language Processing and document clustering algorithms to find structurally or contextually related documents from databases containing Engineering Change Request documents. The aim is to radically decrease the time needed to effectively search for related engineering documents, organize search results, and create labeled clusters from these documents by utilizing Natural Language Processing algorithms. A domain knowledge expert at the case company evaluated the results and confirmed that the algorithms we applied managed to find relevant document clusters given the queries tested.

Download Full-text

Automatic Extraction and Classification of Patients’ Smoking Status from Free Text Using Natural Language Processing

Value in Health ◽

10.1016/j.jval.2016.09.158 ◽

2016 ◽

Vol 19 (7) ◽

pp. A373

Author(s):

A Caccamisi ◽

L Jörgensen ◽

H Dalianis ◽

M Rosenlund

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Smoking Status ◽

Free Text ◽

Automatic Extraction

Download Full-text

Enhancing goals of care communication by oncologists using a pathway-based intervention.

Journal of Clinical Oncology ◽

10.1200/jco.2020.39.28_suppl.324 ◽

2021 ◽

Vol 39 (28_suppl) ◽

pp. 324-324

Author(s):

Isaac S. Chua ◽

Elise Tarbi ◽

Jocelyn H. Siegel ◽

Kate Sciacca ◽

Anne Kwok ◽

...

Keyword(s):

Electronic Health Record ◽

Advanced Cancer ◽

Language Processing ◽

Clinical Pathways ◽

Free Text ◽

Health Record ◽

Goals Of Care ◽

Progress Notes ◽

Electronic Health ◽

Pilot Sample

324 Background: Delivering goal-concordant care to patients with advanced cancer requires identifying eligible patients who would benefit from goals of care (GOC) conversations; training clinicians how to have these conversations; conducting conversations in a timely manner; and documenting GOC conversations that can be readily accessed by care teams. We used an existing, locally developed electronic cancer care clinical pathways system to guide oncologists toward these conversations. Methods: To identify eligible patients, pathways directors from 12 oncology disease centers identified therapeutic decision nodes for each pathway that corresponded to a predicted life expectancy of ≤1 year. When oncologists selected one of these pre-identified pathways nodes, the decision was captured in a relational database. From these patients, we sought evidence of GOC documentation within the electronic health record by extracting coded data from the advance care planning (ACP) module—a designated area within the electronic health record for clinicians to document GOC conversations. We also used rule-based natural language processing (NLP) to capture free text GOC documentation within these same patients’ progress notes. A domain expert reviewed all progress notes identified by NLP to confirm the presence of GOC documentation. Results: In a pilot sample obtained between March 20 and September 25, 2020, we identified a total of 21 pathway nodes conveying a poor prognosis, which represented 91 unique patients with advanced cancer. Among these patients, the mean age was 62 (SD 13.8) years old; 55 (60.4%) patients were female, and 69 (75.8%) were non-Hispanic White. The cancers most represented were thoracic (32 [35.2%]), breast (31 [34.1%]), and head and neck (13 [14.3%]). Within the 3 months leading up to the pathways decision date, a total 62 (68.1%) patients had any GOC documentation. Twenty-one (23.1%) patients had documentation in both the ACP module and NLP-identified progress notes; 5 (5.5%) had documentation in the ACP module only; and 36 (39.6%) had documentation in progress notes only. Twenty-two unique clinicians utilized the ACP module, of which 1 (4.5%) was an oncologist and 21 (95.5%) were palliative care clinicians. Conclusions: Approximately two thirds of patients had any GOC documentation. A total of 26 (28.6%) patients had any GOC documentation in the ACP module, and only 1 oncologist documented using the ACP module, where care teams can most easily retrieve GOC information. These findings provide an important baseline for future quality improvement efforts (e.g., implementing serious illness communications training, increasing support around ACP module utilization, and incorporating behavioral nudges) to enhance oncologists’ ability to conduct and to document timely, high quality GOC conversations.

Download Full-text