Conceptual Map Creation from Natural Language Processing: a Systematic Mapping Study

Context: Conceptual Maps (CMs) have been used to organize knowledge and facilitate learning and teaching in multiple domains. CMs also are used in multiple settings in education, since they are able to clarify the relationships between the subcomponents of a particular topic. However, the construction of a CM requires time and effort in identifying and structuring knowledge. In order to mitigate this problem, Natural Language Processing (NLP) techniques have been employed and have contributed to automate the extraction of concepts and relationships from texts. Objective: This article summarizes the main initiatives of building CMs from NLP. Method: A systematic mapping study was used to identify primary studies that present approaches on the use of NLP to automatically create CMs. Results: The mapping provides a description of 23 available articles that have been reviewed in order to extract relevant information on a set of Research Questions (RQ). From the RQ results, a framework was designed in order to present how NLP could be employed to construct CMs. From this framework, a solution graph was elaborated to present different solutions paths to construct CMs using NLP. Conclusions: The construction of CMs using NLP is still a recent field, however, it has been proven to be effective in assisting the automatic construction of CMs.

Download Full-text

Sentiment Analysis Techniques Applied to Raw-Text Data from a Csq-8 Questionnaire about Mindfulness in Times of COVID-19 to Improve Strategy Generation

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18126408 ◽

2021 ◽

Vol 18 (12) ◽

pp. 6408

Author(s):

Mario Jojoa Acosta ◽

Gema Castillo-Sánchez ◽

Begonya Garcia-Zapirain ◽

Isabel de la Torre Díez ◽

Manuel Franco-Martín

Keyword(s):

Health Care ◽

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Transfer Learning ◽

Language Processing ◽

Health Care Professionals ◽

Ground Truth ◽

Relevant Information ◽

Free Text

The use of artificial intelligence in health care has grown quickly. In this sense, we present our work related to the application of Natural Language Processing techniques, as a tool to analyze the sentiment perception of users who answered two questions from the CSQ-8 questionnaires with raw Spanish free-text. Their responses are related to mindfulness, which is a novel technique used to control stress and anxiety caused by different factors in daily life. As such, we proposed an online course where this method was applied in order to improve the quality of life of health care professionals in COVID 19 pandemic times. We also carried out an evaluation of the satisfaction level of the participants involved, with a view to establishing strategies to improve future experiences. To automatically perform this task, we used Natural Language Processing (NLP) models such as swivel embedding, neural networks, and transfer learning, so as to classify the inputs into the following three categories: negative, neutral, and positive. Due to the limited amount of data available—86 registers for the first and 68 for the second—transfer learning techniques were required. The length of the text had no limit from the user’s standpoint, and our approach attained a maximum accuracy of 93.02% and 90.53%, respectively, based on ground truth labeled by three experts. Finally, we proposed a complementary analysis, using computer graphic text representation based on word frequency, to help researchers identify relevant information about the opinions with an objective approach to sentiment. The main conclusion drawn from this work is that the application of NLP techniques in small amounts of data using transfer learning is able to obtain enough accuracy in sentiment analysis and text classification stages.

Download Full-text

Natural Language Processing Tools for Tamil Grammar Learning and Teaching

International Journal of Computer Applications ◽

10.5120/1314-1790 ◽

2010 ◽

Vol 8 (14) ◽

pp. 26-30 ◽

Cited By ~ 2

Author(s):

V Dhanalakshmi ◽

S Rajendran

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Learning And Teaching ◽

Grammar Learning

Download Full-text

The War for Talent

International Journal of Sociotechnology and Knowledge Development ◽

10.4018/jskd.2010070103 ◽

2010 ◽

Vol 2 (3) ◽

pp. 26-36 ◽

Cited By ~ 7

Author(s):

Ricardo Colomo-Palacios ◽

Marcos Ruano-Mayoral ◽

Pedro Soto-Acosta ◽

Ángel García-Crespo

Keyword(s):

Information Technology ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Relevant Information ◽

Semantic Technologies ◽

Production Of Knowledge ◽

Software Artifacts ◽

Knowledge Intensive ◽

Processing Techniques

In current organizations, the importance of knowledge and competence is unquestionable. In Information Technology (IT) companies, which are, by definition, knowledge intensive, this importance is critical. In such organizations, the models of knowledge exploitation include specific processes and elements that drive the production of knowledge aimed at satisfying organizational objectives. However, competence evidence recollection is a highly intensive and time consuming task, which is the key point for this system. SeCEC-IT is a tool based on software artifacts that extracts relevant information using natural language processing techniques. It enables competence evidence detection by deducing competence facts from documents in an automated way. SeCEC-IT includes within its technological components such items as semantic technologies, natural language processing, and human resource communication standards (HR-XML).

Download Full-text

Sentiment Analysis of Students’ Feedback with NLP and Deep Learning: A Systematic Mapping Study

Applied Sciences ◽

10.3390/app11093986 ◽

2021 ◽

Vol 11 (9) ◽

pp. 3986

Author(s):

Zenun Kastrati ◽

Fisnik Dalipi ◽

Ali Shariq Imran ◽

Krenare Pireva Nuci ◽

Mudasir Ahmad Wani

Keyword(s):

Deep Learning ◽

Sentiment Analysis ◽

Language Processing ◽

The Body ◽

Systematic Mapping Study ◽

Mapping Study ◽

Systematic Mapping ◽

Learning Platform ◽

Literature Reviews ◽

Complicated Task

In the last decade, sentiment analysis has been widely applied in many domains, including business, social networks and education. Particularly in the education domain, where dealing with and processing students’ opinions is a complicated task due to the nature of the language used by students and the large volume of information, the application of sentiment analysis is growing yet remains challenging. Several literature reviews reveal the state of the application of sentiment analysis in this domain from different perspectives and contexts. However, the body of literature is lacking a review that systematically classifies the research and results of the application of natural language processing (NLP), deep learning (DL), and machine learning (ML) solutions for sentiment analysis in the education domain. In this article, we present the results of a systematic mapping study to structure the published information available. We used a stepwise PRISMA framework to guide the search process and searched for studies conducted between 2015 and 2020 in the electronic research databases of the scientific literature. We identified 92 relevant studies out of 612 that were initially found on the sentiment analysis of students’ feedback in learning platform environments. The mapping results showed that, despite the identified challenges, the field is rapidly growing, especially regarding the application of DL, which is the most recent trend. We identified various aspects that need to be considered in order to contribute to the maturity of research and development in the field. Among these aspects, we highlighted the need of having structured datasets, standardized solutions and increased focus on emotional expression and detection.

Download Full-text

Characterizing Usability Inspection Methods through the Analysis of a Systematic Mapping Study Extension

CLEI electronic journal ◽

10.19153/cleiej.16.1.11 ◽

2013 ◽

Vol 16 (1) ◽

Cited By ~ 2

Author(s):

Luis Rivero ◽

Raimundo Barreto ◽

Tayana Conte

Keyword(s):

Web Application ◽

Web Applications ◽

Current Knowledge ◽

Cost Effective ◽

Relevant Information ◽

Systematic Mapping Study ◽

Mapping Study ◽

Systematic Mapping ◽

Quality Aspects ◽

The Web

Usability is one of the most relevant quality aspects in Web applications. A Web application is usable if it provides a friendly, direct and easy to understand interface. Many Usability Inspection Methods (UIMs) have been proposed as a cost effective way to enhance usability. However, many companies are not aware of these UIMs and consequently, are not using them. A secondary study can identify, evaluate and interpret all data that is relevant to the current knowledge available regarding UIMs that have been used to evaluate Web applications in the past few decades. Therefore, we have extended a systematic mapping study about Usability Evaluation Methods by analyzing 26 of its research papers from which we extracted and categorized UIMs. We provide practitioners and researches with the rationale to understand both the strengths and weaknesses of the emerging UIMs for the Web. Furthermore, we have summarized the relevant information of the UIMs, which suggested new ideas or theoretical basis regarding usability inspection in the Web domain. In addition, we present a new UIM and a tool for Web usability inspection starting from the results shown in this paper.

Download Full-text

Cohort profile: St. Michael's Hospital Tuberculosis Database (SMH-TB), a retrospective cohort of electronic health record data and variables extracted using natural language processing

10.1101/2020.09.11.20192419 ◽

2020 ◽

Author(s):

David Landsman ◽

Ahmed Abdelbasit ◽

Christine Wang ◽

Michael Guerzhoy ◽

Ujash Joshi ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Epidemiology ◽

Relevant Information ◽

Structured Data ◽

Free Text ◽

Electronic Health Record Data ◽

Latent Tb ◽

Electronic Health

Background Tuberculosis (TB) is a major cause of death worldwide. TB research draws heavily on clinical cohorts which can be generated using electronic health records (EHR), but granular information extracted from unstructured EHR data is limited. The St. Michael's Hospital TB database (SMH-TB) was established to address gaps in EHR-derived TB clinical cohorts and provide researchers and clinicians with detailed, granular data related to TB management and treatment. Methods We collected and validated multiple layers of EHR data from the TB outpatient clinic at St. Michael's Hospital, Toronto, Ontario, Canada to generate the SMH-TB database. SMH-TB contains structured data directly from the EHR, and variables generated using natural language processing (NLP) by extracting relevant information from free-text within clinic, radiology, and other notes. NLP performance was assessed using recall, precision and F1 score averaged across variable labels. We present characteristics of the cohort population using binomial proportions and 95% confidence intervals (CI), with and without adjusting for NLP misclassification errors. Results SMH-TB currently contains retrospective patient data spanning 2011 to 2018, for a total of 3298 patients (N=3237 with at least 1 associated dictation). Performance of TB diagnosis and medication NLP rulesets surpasses 93% in recall, precision and F1 metrics, indicating good generalizability. We estimated 20% (95% CI: 18.4-21.2%) were diagnosed with active TB and 46% (95% CI: 43.8-47.2%) were diagnosed with latent TB. After adjusting for potential misclassification, the proportion of patients diagnosed with active and latent TB was 18% (95% CI: 16.8-19.7%) and 40% (95% CI: 37.8-41.6%) respectively Conclusion SMH-TB is a unique database that includes a breadth of structured data derived from structured and unstructured EHR data. The data are available for a variety of research applications, such as clinical epidemiology, quality improvement and mathematical modelling studies.

Download Full-text

Constructing a corpus-informed list of Arabic formulaic sequences (ArFSs) for language pedagogy and technology

International Journal of Corpus Linguistics ◽

10.1075/ijcl.16088.alg ◽

2019 ◽

Vol 24 (2) ◽

pp. 202-228

Author(s):

Ayman Alghamdi ◽

Eric Atwell

Keyword(s):

Natural Language Processing ◽

Mixed Methods ◽

Natural Language ◽

Language Processing ◽

Native Speakers ◽

Formulaic Sequences ◽

Learning And Teaching ◽

Language Pedagogy ◽

Language Resource

Abstract This study aims to construct a corpus-informed list of Arabic Formulaic Sequences (ArFSs) for use in language pedagogy (LP) and Natural Language Processing (NLP) applications. A hybrid mixed methods model was adopted for extracting ArFSs from a corpus, that combined automatic and manual extracting methods, based on well-established quantitative and qualitative criteria that are relevant from the perspective of LP and NLP. The pedagogical implications of this list are examined to facilitate the inclusion of ArFSs in the process of learning and teaching Arabic, particularly for non-native speakers. The computational implications of the ArFSs list are related to the key role of the ArFSs as a novel language resource in the improvement of various Arabic NLP tasks.

Download Full-text

Cohort profile: St. Michael’s Hospital Tuberculosis Database (SMH-TB), a retrospective cohort of electronic health record data and variables extracted using natural language processing

PLoS ONE ◽

10.1371/journal.pone.0247872 ◽

2021 ◽

Vol 16 (3) ◽

pp. e0247872

Author(s):

David Landsman ◽

Ahmed Abdelbasit ◽

Christine Wang ◽

Michael Guerzhoy ◽

Ujash Joshi ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Epidemiology ◽

Relevant Information ◽

Structured Data ◽

Free Text ◽

Electronic Health Record Data ◽

Latent Tb ◽

Electronic Health

Background Tuberculosis (TB) is a major cause of death worldwide. TB research draws heavily on clinical cohorts which can be generated using electronic health records (EHR), but granular information extracted from unstructured EHR data is limited. The St. Michael’s Hospital TB database (SMH-TB) was established to address gaps in EHR-derived TB clinical cohorts and provide researchers and clinicians with detailed, granular data related to TB management and treatment. Methods We collected and validated multiple layers of EHR data from the TB outpatient clinic at St. Michael’s Hospital, Toronto, Ontario, Canada to generate the SMH-TB database. SMH-TB contains structured data directly from the EHR, and variables generated using natural language processing (NLP) by extracting relevant information from free-text within clinic, radiology, and other notes. NLP performance was assessed using recall, precision and F1 score averaged across variable labels. We present characteristics of the cohort population using binomial proportions and 95% confidence intervals (CI), with and without adjusting for NLP misclassification errors. Results SMH-TB currently contains retrospective patient data spanning 2011 to 2018, for a total of 3298 patients (N = 3237 with at least 1 associated dictation). Performance of TB diagnosis and medication NLP rulesets surpasses 93% in recall, precision and F1 metrics, indicating good generalizability. We estimated 20% (95% CI: 18.4–21.2%) were diagnosed with active TB and 46% (95% CI: 43.8–47.2%) were diagnosed with latent TB. After adjusting for potential misclassification, the proportion of patients diagnosed with active and latent TB was 18% (95% CI: 16.8–19.7%) and 40% (95% CI: 37.8–41.6%) respectively Conclusion SMH-TB is a unique database that includes a breadth of structured data derived from structured and unstructured EHR data by using NLP rulesets. The data are available for a variety of research applications, such as clinical epidemiology, quality improvement and mathematical modeling studies.

Download Full-text

Natural Language Processing for Requirements Engineering

ACM Computing Surveys ◽

10.1145/3444689 ◽

2021 ◽

Vol 54 (3) ◽

pp. 1-41

Author(s):

Liping Zhao ◽

Waad Alhoshan ◽

Alessio Ferrari ◽

Keletso J. Letsholo ◽

Muideen A. Ajagbe ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Requirements Engineering ◽

Language Processing ◽

New Technologies ◽

The State ◽

Mapping Study ◽

Pos Tagging ◽

Starting Point ◽

Holistic Understanding

Natural Language Processing for Requirements Engineering (NLP4RE) is an area of research and development that seeks to apply natural language processing (NLP) techniques, tools, and resources to the requirements engineering (RE) process, to support human analysts to carry out various linguistic analysis tasks on textual requirements documents, such as detecting language issues, identifying key domain concepts, and establishing requirements traceability links. This article reports on a mapping study that surveys the landscape of NLP4RE research to provide a holistic understanding of the field. Following the guidance of systematic review, the mapping study is directed by five research questions, cutting across five aspects of NLP4RE research, concerning the state of the literature, the state of empirical research, the research focus, the state of tool development, and the usage of NLP technologies. Our main results are as follows: (i) we identify a total of 404 primary studies relevant to NLP4RE, which were published over the past 36 years and from 170 different venues; (ii) most of these studies (67.08%) are solution proposals, assessed by a laboratory experiment or an example application, while only a small percentage (7%) are assessed in industrial settings; (iii) a large proportion of the studies (42.70%) focus on the requirements analysis phase, with quality defect detection as their central task and requirements specification as their commonly processed document type; (iv) 130 NLP4RE tools (i.e., RE specific NLP tools) are extracted from these studies, but only 17 of them (13.08%) are available for download; (v) 231 different NLP technologies are also identified, comprising 140 NLP techniques, 66 NLP tools, and 25 NLP resources, but most of them—particularly those novel NLP techniques and specialized tools—are used infrequently; by contrast, commonly used NLP technologies are traditional analysis techniques (e.g., POS tagging and tokenization), general-purpose tools (e.g., Stanford CoreNLP and GATE) and generic language lexicons (WordNet and British National Corpus). The mapping study not only provides a collection of the literature in NLP4RE but also, more importantly, establishes a structure to frame the existing literature through categorization, synthesis and conceptualization of the main theoretical concepts and relationships that encompass both RE and NLP aspects. Our work thus produces a conceptual framework of NLP4RE. The framework is used to identify research gaps and directions, highlight technology transfer needs, and encourage more synergies between the RE community, the NLP one, and the software and systems practitioners. Our results can be used as a starting point to frame future studies according to a well-defined terminology and can be expanded as new technologies and novel solutions emerge.

Download Full-text