A New Statistical and Verbal-Semantic Approach to Pattern Extraction in Text Mining Applications

Dildre Georgiana Vasques; Paulo Sérgio Martins; Solange Oliveira Rezende

doi:10.19153/cleiej.22.3.5

A New Statistical and Verbal-Semantic Approach to Pattern Extraction in Text Mining Applications

CLEI electronic journal ◽

10.19153/cleiej.22.3.5 ◽

2019 ◽

Vol 22 (3) ◽

Author(s):

Dildre Georgiana Vasques ◽

Paulo Sérgio Martins ◽

Solange Oliveira Rezende

Keyword(s):

Text Mining ◽

Natural Language ◽

Alternative Medicine ◽

Complex Networks ◽

Implicit Knowledge ◽

Semantic Approach ◽

Text Corpus ◽

Useful Knowledge ◽

Textual Databases

The discovery of knowledge in textual databases is an approach that basically seeks for implicitrelationships between different concepts in different documents written in natural language, inorder to identify new useful knowledge. To assist in this process, this approach can count on thehelp of Text Mining techniques. Despite all the progress made, researchers in this area must stilldeal with the large number of false relationships generated by most of the available processes.A statistical and verbal semantic approach that supports the understanding of the logic betweenrelationships may bridge this gap. Thus, the objective of this work is to support the user with theidentification of implicit relationships between concepts present in different texts, consideringthe causal relationships between concepts in the texts. To this end, this work proposes a hybridapproach for the discovery of implicit knowledge present in a text corpus, using analysis based onassociation rules together with metrics from complex networks and verbal semantics. Througha case study, a set of texts from alternative medicine was selected and the different extractionsshowed that the proposed approach facilitates the identification of implicit knowledge by theuser

Download Full-text

Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions

Lecture Notes in Computer Science - Data Integration in the Life Sciences ◽

10.1007/11799511_7 ◽

2006 ◽

pp. 66-81 ◽

Cited By ~ 9

Author(s):

Tobias Kuhn ◽

Loïc Royer ◽

Norbert E. Fuchs ◽

Michael Schröder

Keyword(s):

Text Mining ◽

Natural Language ◽

Protein Interactions ◽

Controlled Natural Language

Download Full-text

Natural Language Processing and Text Mining Approaches in Production Shortfalls Analytics: Methodology, Case-Study and Value in the North Sea

10.2118/205443-ms ◽

2021 ◽

Author(s):

Edgar Bernier ◽

Sebastien Perrier

Keyword(s):

Natural Language Processing ◽

Text Mining ◽

Natural Language ◽

Language Processing ◽

North Sea ◽

Gas Production ◽

The North ◽

Wide Range ◽

The North Sea

Abstract Maximizing operational efficiency is a critical challenge in oil and gas production, particularly important for mature assets in the North Sea. The causes of production shortfalls are numerous, distributed across a wide range of disciplines, technical and non-technical causes. The primary reason to apply Natural Language Processing (NLP) and text mining on several years of shortfall history was the need to support efficiently the evaluation of digital transformation use-case screenings and value mapping exercises, through a proper mapping of the issues faced. Obviously, this mapping contributed as well to reflect on operational surveillance and maintenance strategies to reduce the production shortfalls. This paper presents a methodology where the historical records of descriptions, comments and results of investigation regarding production shortfalls are revisited, adding to existing shortfall classifications and statistics, in particular in two domains: richer first root-cause mapping, and a series of advanced visualizations and analytics. The methodology put in place uses natural-language pre-processing techniques, combined with keyword-based text-mining and classification techniques. The limitations associated to the size and quality of these language datasets will be described, and the results discussed, highlighting the value of reaching high level of data granularity while defeating the ‘more information, less attention’ bias. At the same time, visual designs are introduced to display efficiently the different dimensions of this data (impact, frequency evolution through time, location in term of field and affected systems, root causes and other cause-related categories). The ambition in the domain of visualization is to create User Experience-friendly shortfall analytics, that can be displayed in smart rooms and collaborative rooms, where display's efficiency is higher when user-interactions are kept minimal, number of charts is limited and multiple dimensions do not collide. The paper is based on several applications across the North Sea. This case study and the associated lessons learned regarding natural language processing and text mining applied to similar technical concise data are answering several frequently asked questions on the value of the textual data records gathered over years.

Download Full-text

Text mining and Natural Language Processing on Social Media Data giving Insights for Pharmacovigilance: A Case Study with Fentanyl

Indian Journal of Pharmaceutical Sciences ◽

10.4172/pharmaceutical-sciences.1000418 ◽

2018 ◽

Vol 80 (4) ◽

Author(s):

R Paulose ◽

B Gopal Samy ◽

K Jegatheesan

Keyword(s):

Social Media ◽

Natural Language Processing ◽

Text Mining ◽

Natural Language ◽

Language Processing ◽

Social Media Data ◽

Media Data

Download Full-text

Linguistic Data Model for Natural Languages and Artificial Intelligence. Part 5. Introduction to Logic

Discourse ◽

10.32603/2412-8562-2020-6-3-109-117 ◽

2020 ◽

Vol 6 (3) ◽

pp. 109-117

Author(s):

O. M. Polyakov

Keyword(s):

Artificial Intelligence ◽

Mathematical Logic ◽

Natural Language ◽

Semantic Interpretation ◽

General View ◽

World Model ◽

Semantic Approach ◽

Natural Languages ◽

The World ◽

Linguistic Aspect

Introduction. The article continues the series of publications on the linguistics of relations (hereinafter R–linguistics) and is devoted to an introduction to the logic of natural language in relation to the approach considered in the series. The problem of natural language logic still remains relevant, since this logic differs significantly from traditional mathematical logic. Moreover, with the appearance of artificial intelligence systems, the importance of this problem only increases. The article analyzes logical problems that prevent the application of classical logic methods to natural languages. This is possible because R-linguistics forms the semantics of a language in the form of world model structures in which language sentences are interpreted.Methodology and sources. The results obtained in the previous parts of the series are used as research tools. To develop the necessary mathematical representations in the field of logic and semantics, the formulated concept of the interpretation operator is used.Results and discussion. The problems that arise when studying the logic of natural language in the framework of R–linguistics are analyzed. These issues are discussed in three aspects: the logical aspect itself; the linguistic aspect; the aspect of correlation with reality. A very General approach to language semantics is considered and semantic axioms of the language are formulated. The problems of the language and its logic related to the most General view of semantics are shown.Conclusion. It is shown that the application of mathematical logic, regardless of its type, to the study of natural language logic faces significant problems. This is a consequence of the inconsistency of existing approaches with the world model. But it is the coherence with the world model that allows us to build a new logical approach. Matching with the model means a semantic approach to logic. Even the most General view of semantics allows to formulate important results about the properties of languages that lack meaning. The simplest examples of semantic interpretation of traditional logic demonstrate its semantic problems (primarily related to negation).

Download Full-text

Using Text Mining to Analyze Doctor-Patient Verbal Communication: A Geriatric Depression Case Study

2020 IEEE International Conference on Healthcare Informatics (ICHI) ◽

10.1109/ichi48887.2020.9374382 ◽

2020 ◽

Author(s):

Avishek Choudhury ◽

Safa Elkefi ◽

Onur Asan

Keyword(s):

Text Mining ◽

Verbal Communication ◽

Geriatric Depression

Download Full-text

Does higher education properly prepare graduates for the growing artificial intelligence market? Gaps identification using text mining

Human Systems Management ◽

10.3233/hsm-211179 ◽

2021 ◽

pp. 1-13

Author(s):

Lamiae Benhayoun ◽

Daniel Lang

Keyword(s):

Artificial Intelligence ◽

Natural Language Processing ◽

Text Mining ◽

Natural Language ◽

Language Processing ◽

Academic Training ◽

Market Requirements ◽

Job Advertisements ◽

The Individual

BACKGROUND: The renewed advent of Artificial Intelligence (AI) is inducing profound changes in the classic categories of technology professions and is creating the need for new specific skills. OBJECTIVE: Identify the gaps in terms of skills between academic training on AI in French engineering and Business Schools, and the requirements of the labour market. METHOD: Extraction of AI training contents from the schools’ websites and scraping of a job advertisements’ website. Then, analysis based on a text mining approach with a Python code for Natural Language Processing. RESULTS: Categorization of occupations related to AI. Characterization of three classes of skills for the AI market: Technical, Soft and Interdisciplinary. Skills’ gaps concern some professional certifications and the mastery of specific tools, research abilities, and awareness of ethical and regulatory dimensions of AI. CONCLUSIONS: A deep analysis using algorithms for Natural Language Processing. Results that provide a better understanding of the AI capability components at the individual and the organizational levels. A study that can help shape educational programs to respond to the AI market requirements.

Download Full-text

On systematically building a controlled natural language for functional requirements

Empirical Software Engineering ◽

10.1007/s10664-021-09956-6 ◽

2021 ◽

Vol 26 (4) ◽

Author(s):

Alvaro Veizaga ◽

Mauricio Alferez ◽

Damiano Torre ◽

Mehrdad Sabetzadeh ◽

Lionel Briand

Keyword(s):

Information System ◽

Natural Language ◽

Qualitative Methodology ◽

Case Study Research ◽

Empirical Evaluation ◽

Functional Requirements ◽

Industrial Case Study ◽

Requirements Specifications ◽

Financial Domain

AbstractNatural language (NL) is pervasive in software requirements specifications (SRSs). However, despite its popularity and widespread use, NL is highly prone to quality issues such as vagueness, ambiguity, and incompleteness. Controlled natural languages (CNLs) have been proposed as a way to prevent quality problems in requirements documents, while maintaining the flexibility to write and communicate requirements in an intuitive and universally understood manner. In collaboration with an industrial partner from the financial domain, we systematically develop and evaluate a CNL, named Rimay, intended at helping analysts write functional requirements. We rely on Grounded Theory for building Rimay and follow well-known guidelines for conducting and reporting industrial case study research. Our main contributions are: (1) a qualitative methodology to systematically define a CNL for functional requirements; this methodology is intended to be general for use across information-system domains, (2) a CNL grammar to represent functional requirements; this grammar is derived from our experience in the financial domain, but should be applicable, possibly with adaptations, to other information-system domains, and (3) an empirical evaluation of our CNL (Rimay) through an industrial case study. Our contributions draw on 15 representative SRSs, collectively containing 3215 NL requirements statements from the financial domain. Our evaluation shows that Rimay is expressive enough to capture, on average, 88% (405 out of 460) of the NL requirements statements in four previously unseen SRSs from the financial domain.

Download Full-text

Analyzing Machine‐Learned Representations: A Natural Language Case Study

Cognitive Science ◽

10.1111/cogs.12925 ◽

2020 ◽

Vol 44 (12) ◽

Author(s):

Ishita Dasgupta ◽

Demi Guo ◽

Samuel J. Gershman ◽

Noah D. Goodman

Keyword(s):

Natural Language

Download Full-text

The errors analysis of natural language generation — A case study of Topic-to-Essay generation

2020 16th International Conference on Computational Intelligence and Security (CIS) ◽

10.1109/cis52066.2020.00027 ◽

2020 ◽

Author(s):

Ping Cai ◽

Xingyuan Chen ◽

Hongjun Wang ◽

Peng Jin

Keyword(s):

Natural Language ◽

Natural Language Generation ◽

Language Generation ◽

Errors Analysis

Download Full-text

Slow scholarship for social work: A praxis of resistance and creativity

Qualitative Social Work ◽

10.1177/1473325021990865 ◽

2021 ◽

pp. 147332502199086

Author(s):

Stéphanie Wahab ◽

Gita R Mehrotra ◽

Kelly E Myers

Keyword(s):

Domestic Violence ◽

Social Work ◽

Knowledge Production ◽

Relational Ontology ◽

Research Project ◽

Useful Knowledge ◽

Time Space ◽

Rapid Production ◽

Time Frames

Expediency, efficiency, and rapid production within compressed time frames represent markers for research and scholarship within the neoliberal academe. Scholars who wish to resist these practices of knowledge production have articulated the need for Slow scholarship—a slower pace to make room for thinking, creativity, and useful knowledge. While these calls are important for drawing attention to the costs and problems of the neoliberal academy, many scholars have moved beyond “slow” as being uniquely referencing pace and duration, by calling for the different conceptualizations of time, space, and knowing. Guided by post-structural feminisms, we engaged in a research project that moved at the pace of trust in the integrity of our ideas and relationships. Our case study aimed to better understand the ways macro forces such as neoliberalism, criminalization and professionalization shape domestic violence work. This article discusses our praxis of Slow scholarship by showcasing four specific key markers of Slow scholarship in our research; time reimagined, a relational ontology, moving inside and towards complexity, and embodiment. We discuss how Slow scholarship complicates how we understand constructs of productivity and knowledge production, as well as map the ways Slow scholarship offers a praxis of resistance for generating power from the epistemic margins within social work and the neoliberal academy.

Download Full-text