scholarly journals Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions

Author(s):  
Tobias Kuhn ◽  
Loïc Royer ◽  
Norbert E. Fuchs ◽  
Michael Schröder
2021 ◽  
Author(s):  
Edgar Bernier ◽  
Sebastien Perrier

Abstract Maximizing operational efficiency is a critical challenge in oil and gas production, particularly important for mature assets in the North Sea. The causes of production shortfalls are numerous, distributed across a wide range of disciplines, technical and non-technical causes. The primary reason to apply Natural Language Processing (NLP) and text mining on several years of shortfall history was the need to support efficiently the evaluation of digital transformation use-case screenings and value mapping exercises, through a proper mapping of the issues faced. Obviously, this mapping contributed as well to reflect on operational surveillance and maintenance strategies to reduce the production shortfalls. This paper presents a methodology where the historical records of descriptions, comments and results of investigation regarding production shortfalls are revisited, adding to existing shortfall classifications and statistics, in particular in two domains: richer first root-cause mapping, and a series of advanced visualizations and analytics. The methodology put in place uses natural-language pre-processing techniques, combined with keyword-based text-mining and classification techniques. The limitations associated to the size and quality of these language datasets will be described, and the results discussed, highlighting the value of reaching high level of data granularity while defeating the ‘more information, less attention’ bias. At the same time, visual designs are introduced to display efficiently the different dimensions of this data (impact, frequency evolution through time, location in term of field and affected systems, root causes and other cause-related categories). The ambition in the domain of visualization is to create User Experience-friendly shortfall analytics, that can be displayed in smart rooms and collaborative rooms, where display's efficiency is higher when user-interactions are kept minimal, number of charts is limited and multiple dimensions do not collide. The paper is based on several applications across the North Sea. This case study and the associated lessons learned regarding natural language processing and text mining applied to similar technical concise data are answering several frequently asked questions on the value of the textual data records gathered over years.


2019 ◽  
Vol 22 (3) ◽  
Author(s):  
Dildre Georgiana Vasques ◽  
Paulo Sérgio Martins ◽  
Solange Oliveira Rezende

The discovery of knowledge in textual databases is an approach that basically seeks for implicitrelationships between different concepts in different documents written in natural language, inorder to identify new useful knowledge. To assist in this process, this approach can count on thehelp of Text Mining techniques. Despite all the progress made, researchers in this area must stilldeal with the large number of false relationships generated by most of the available processes.A statistical and verbal semantic approach that supports the understanding of the logic betweenrelationships may bridge this gap. Thus, the objective of this work is to support the user with theidentification of implicit relationships between concepts present in different texts, consideringthe causal relationships between concepts in the texts. To this end, this work proposes a hybridapproach for the discovery of implicit knowledge present in a text corpus, using analysis based onassociation rules together with metrics from complex networks and verbal semantics. Througha case study, a set of texts from alternative medicine was selected and the different extractionsshowed that the proposed approach facilitates the identification of implicit knowledge by theuser


2015 ◽  
Vol 2015 ◽  
pp. 1-10 ◽  
Author(s):  
Zhiqiang Zeng ◽  
Hua Shi ◽  
Yun Wu ◽  
Zhiling Hong

Informatics methods, such as text mining and natural language processing, are always involved in bioinformatics research. In this study, we discuss text mining and natural language processing methods in bioinformatics from two perspectives. First, we aim to search for knowledge on biology, retrieve references using text mining methods, and reconstruct databases. For example, protein-protein interactions and gene-disease relationship can be mined from PubMed. Then, we analyze the applications of text mining and natural language processing techniques in bioinformatics, including predicting protein structure and function, detecting noncoding RNA. Finally, numerous methods and applications, as well as their contributions to bioinformatics, are discussed for future use by text mining and natural language processing researchers.


2021 ◽  
pp. 1-13
Author(s):  
Lamiae Benhayoun ◽  
Daniel Lang

BACKGROUND: The renewed advent of Artificial Intelligence (AI) is inducing profound changes in the classic categories of technology professions and is creating the need for new specific skills. OBJECTIVE: Identify the gaps in terms of skills between academic training on AI in French engineering and Business Schools, and the requirements of the labour market. METHOD: Extraction of AI training contents from the schools’ websites and scraping of a job advertisements’ website. Then, analysis based on a text mining approach with a Python code for Natural Language Processing. RESULTS: Categorization of occupations related to AI. Characterization of three classes of skills for the AI market: Technical, Soft and Interdisciplinary. Skills’ gaps concern some professional certifications and the mastery of specific tools, research abilities, and awareness of ethical and regulatory dimensions of AI. CONCLUSIONS: A deep analysis using algorithms for Natural Language Processing. Results that provide a better understanding of the AI capability components at the individual and the organizational levels. A study that can help shape educational programs to respond to the AI market requirements.


2021 ◽  
Vol 26 (4) ◽  
Author(s):  
Alvaro Veizaga ◽  
Mauricio Alferez ◽  
Damiano Torre ◽  
Mehrdad Sabetzadeh ◽  
Lionel Briand

AbstractNatural language (NL) is pervasive in software requirements specifications (SRSs). However, despite its popularity and widespread use, NL is highly prone to quality issues such as vagueness, ambiguity, and incompleteness. Controlled natural languages (CNLs) have been proposed as a way to prevent quality problems in requirements documents, while maintaining the flexibility to write and communicate requirements in an intuitive and universally understood manner. In collaboration with an industrial partner from the financial domain, we systematically develop and evaluate a CNL, named Rimay, intended at helping analysts write functional requirements. We rely on Grounded Theory for building Rimay and follow well-known guidelines for conducting and reporting industrial case study research. Our main contributions are: (1) a qualitative methodology to systematically define a CNL for functional requirements; this methodology is intended to be general for use across information-system domains, (2) a CNL grammar to represent functional requirements; this grammar is derived from our experience in the financial domain, but should be applicable, possibly with adaptations, to other information-system domains, and (3) an empirical evaluation of our CNL (Rimay) through an industrial case study. Our contributions draw on 15 representative SRSs, collectively containing 3215 NL requirements statements from the financial domain. Our evaluation shows that Rimay is expressive enough to capture, on average, 88% (405 out of 460) of the NL requirements statements in four previously unseen SRSs from the financial domain.


2020 ◽  
Vol 44 (12) ◽  
Author(s):  
Ishita Dasgupta ◽  
Demi Guo ◽  
Samuel J. Gershman ◽  
Noah D. Goodman
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document