Ten Years of BabelNet: A Survey

The intelligent manipulation of symbolic knowledge has been a long-sought goal of AI. However, when it comes to Natural Language Processing (NLP), symbols have to be mapped to words and phrases, which are not only ambiguous but also language-specific: multilinguality is indeed a desirable property for NLP systems, and one which enables the generalization of tasks where multiple languages need to be dealt with, without translating text. In this paper we survey BabelNet, a popular wide-coverage lexical-semantic knowledge resource obtained by merging heterogeneous sources into a unified semantic network that helps to scale tasks and applications to hundreds of languages. Over its ten years of existence, thanks to its promise to interconnect languages and resources in structured form, BabelNet has been employed in countless ways and directions. We first introduce the BabelNet model, its components and statistics, and then overview its successful use in a wide range of tasks in NLP as well as in other fields of AI.

Download Full-text

Lexical Semantic Knowledge of Children with ASD—a Review Study

Review Journal of Autism and Developmental Disorders ◽

10.1007/s40489-021-00272-9 ◽

2021 ◽

Author(s):

Nufar Sukenik ◽

Laurice Tuller

Keyword(s):

Children With Autism ◽

Semantic Knowledge ◽

Future Studies ◽

Lexical Semantic ◽

Children With Asd ◽

Wide Range ◽

Lexical Semantic Knowledge ◽

Review Study ◽

Linguistic Impairment ◽

Semantic Skills

AbstractStudies on the lexical semantic abilities of children with autism have yielded contradicting results. The aim of the current review was to explore studies that have specifically focused on the lexical semantic abilities of children with ASD and try to find an explanation for these contradictions. In the 32 studies reviewed, no single factor was found to affect lexical semantic skills, although children with broader linguistic impairment generally, but not universally, also showed impaired lexical semantic skills. The need for future studies with young ASD participants, with differing intellectual functioning, longitudinal studies, and studies assessing a wide range of language domains are discussed.

Download Full-text

Application of natural language processing methods to extract coded data from administrative data held in the Scottish Prescribing Information System

International Journal for Population Data Science ◽

10.23889/ijpds.v1i1.263 ◽

2017 ◽

Vol 1 (1) ◽

Author(s):

Clifford Nangle ◽

Stuart McTaggart ◽

Margaret MacLeod ◽

Jackie Caldwell ◽

Marion Bennie

Keyword(s):

Information System ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Drug Exposure ◽

Drug Dose ◽

Free Text ◽

Wide Range ◽

The Impact ◽

Prescribing Information

ABSTRACT ObjectivesThe Prescribing Information System (PIS) datamart, hosted by NHS National Services Scotland receives around 90 million electronic prescription messages per year from GP practices across Scotland. Prescription messages contain information including drug name, quantity and strength stored as coded, machine readable, data while prescription dose instructions are unstructured free text and difficult to interpret and analyse in volume. The aim, using Natural Language Processing (NLP), was to extract drug dose amount, unit and frequency metadata from freely typed text in dose instructions to support calculating the intended number of days’ treatment. This then allows comparison with actual prescription frequency, treatment adherence and the impact upon prescribing safety and effectiveness. ApproachAn NLP algorithm was developed using the Ciao implementation of Prolog to extract dose amount, unit and frequency metadata from dose instructions held in the PIS datamart for drugs used in the treatment of gastrointestinal, cardiovascular and respiratory disease. Accuracy estimates were obtained by randomly sampling 0.1% of the distinct dose instructions from source records, comparing these with metadata extracted by the algorithm and an iterative approach was used to modify the algorithm to increase accuracy and coverage. ResultsThe NLP algorithm was applied to 39,943,465 prescription instructions issued in 2014, consisting of 575,340 distinct dose instructions. For drugs used in the gastrointestinal, cardiovascular and respiratory systems (i.e. chapters 1, 2 and 3 of the British National Formulary (BNF)) the NLP algorithm successfully extracted drug dose amount, unit and frequency metadata from 95.1%, 98.5% and 97.4% of prescriptions respectively. However, instructions containing terms such as ‘as directed’ or ‘as required’ reduce the usability of the metadata by making it difficult to calculate the total dose intended for a specific time period as 7.9%, 0.9% and 27.9% of dose instructions contained terms meaning ‘as required’ while 3.2%, 3.7% and 4.0% contained terms meaning ‘as directed’, for drugs used in BNF chapters 1, 2 and 3 respectively. ConclusionThe NLP algorithm developed can extract dose, unit and frequency metadata from text found in prescriptions issued to treat a wide range of conditions and this information may be used to support calculating treatment durations, medicines adherence and cumulative drug exposure. The presence of terms such as ‘as required’ and ‘as directed’ has a negative impact on the usability of the metadata and further work is required to determine the level of impact this has on calculating treatment durations and cumulative drug exposure.

Download Full-text

A Survey on Using Gaze Behaviour for Natural Language Processing

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/683 ◽

2020 ◽

Author(s):

Sandeep Mathias ◽

Diptesh Kanojia ◽

Abhijit Mishra ◽

Pushpak Bhattacharya

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Eye Tracking ◽

Language Processing ◽

Word Identification ◽

Test Time ◽

Gaze Behaviour ◽

Cognitive Information ◽

Essay Grading ◽

Multiple Languages

Gaze behaviour has been used as a way to gather cognitive information for a number of years. In this paper, we discuss the use of gaze behaviour in solving different tasks in natural language processing (NLP) without having to record it at test time. This is because the collection of gaze behaviour is a costly task, both in terms of time and money. Hence, in this paper, we focus on research done to alleviate the need for recording gaze behaviour at run time. We also mention different eye tracking corpora in multiple languages, which are currently available and can be used in natural language processing. We conclude our paper by discussing applications in a domain - education - and how learning gaze behaviour can help in solving the tasks of complex word identification and automatic essay grading.

Download Full-text

Reports of the Workshops Held at the Tenth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment

AI Magazine ◽

10.1609/aimag.v36i1.2576 ◽

2015 ◽

Vol 36 (1) ◽

pp. 99-102

Author(s):

Tiffany Barnes ◽

Oliver Bown ◽

Michael Buro ◽

Michael Cook ◽

Arne Eigenfeldt ◽

...

Keyword(s):

Artificial Intelligence ◽

North Carolina ◽

Natural Language Processing ◽

Natural Language ◽

Real Time ◽

Language Processing ◽

State University ◽

North Carolina State University ◽

Wide Range ◽

Digital Entertainment

The AIIDE-14 Workshop program was held Friday and Saturday, October 3–4, 2014 at North Carolina State University in Raleigh, North Carolina. The workshop program included five workshops covering a wide range of topics. The titles of the workshops held Friday were Games and Natural Language Processing, and Artificial Intelligence in Adversarial Real-Time Games. The titles of the workshops held Saturday were Diversity in Games Research, Experimental Artificial Intelligence in Games, and Musical Metacreation. This article presents short summaries of those events.

Download Full-text

Mood and modality: out of theory and into the fray

Natural Language Engineering ◽

10.1017/s1351324903003279 ◽

2004 ◽

Vol 10 (1) ◽

pp. 57-89 ◽

Cited By ~ 2

Author(s):

MARJORIE MCSHANE ◽

SERGEI NIRENBURG ◽

RON ZACHARSKI

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Translation System ◽

Free Standing ◽

Indicative Conditional ◽

Tense And Aspect ◽

Language L ◽

Wide Range ◽

Value Sets

The topic of mood and modality (MOD) is a difficult aspect of language description because, among other reasons, the inventory of modal meanings is not stable across languages, moods do not map neatly from one language to another, modality may be realised morphologically or by free-standing words, and modality interacts in complex ways with other modules of the grammar, like tense and aspect. Describing MOD is especially difficult if one attempts to develop a unified approach that not only provides cross-linguistic coverage, but is also useful in practical natural language processing systems. This article discusses an approach to MOD that was developed for and implemented in the Boas Knowledge-Elicitation (KE) system. Boas elicits knowledge about any language, L, from an informant who need not be a trained linguist. That knowledge then serves as the static resources for an L-to-English translation system. The KE methodology used throughout Boas is driven by a resident inventory of parameters, value sets, and means of their realisation for a wide range of language phenomena. MOD is one of those parameters, whose values are the inventory of attested and not yet attested moods (e.g. indicative, conditional, imperative), and whose realisations include flective morphology, agglutinating morphology, isolating morphology, words, phrases and constructions. Developing the MOD elicitation procedures for Boas amounted to wedding the extensive theoretical and descriptive research on MOD with practical approaches to guiding an untrained informant through this non-trivial task. We believe that our experience in building the MOD module of Boas offers insights not only into cross-linguistic aspects of MOD that have not previously been detailed in the natural language processing literature, but also into KE methodologies that could be applied more broadly.

Download Full-text

Parallel natural language processing on a semantic network array processor

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/69.390246 ◽

1995 ◽

Vol 7 (3) ◽

pp. 391-405 ◽

Cited By ~ 5

Author(s):

Minhwa Chung ◽

D.I. Moldovan

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Semantic Network ◽

Array Processor

Download Full-text

Inducing Relational Knowledge from BERT

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6242 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7456-7463 ◽

Cited By ~ 3

Author(s):

Zied Bouraoui ◽

Jose Camacho-Collados ◽

Steven Schockaert

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Language Model ◽

Language Models ◽

Word Embeddings ◽

Relational Knowledge ◽

Wide Range ◽

Fine Tune ◽

Standard Word

One of the most remarkable properties of word embeddings is the fact that they capture certain types of semantic and syntactic relationships. Recently, pre-trained language models such as BERT have achieved groundbreaking results across a wide range of Natural Language Processing tasks. However, it is unclear to what extent such models capture relational knowledge beyond what is already captured by standard word embeddings. To explore this question, we propose a methodology for distilling relational knowledge from a pre-trained language model. Starting from a few seed instances of a given relation, we first use a large text corpus to find sentences that are likely to express this relation. We then use a subset of these extracted sentences as templates. Finally, we fine-tune a language model to predict whether a given word pair is likely to be an instance of some relation, when given an instantiated template for that relation as input.

Download Full-text

Measuring and Improving Consistency in Pretrained Language Models

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00410 ◽

2021 ◽

Vol 9 ◽

pp. 1012-1031

Author(s):

Yanai Elazar ◽

Nora Kassner ◽

Shauli Ravfogel ◽

Abhilasha Ravichander ◽

Eduard Hovy ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Desirable Property ◽

Language Models ◽

Factual Knowledge ◽

High Quality ◽

Model Consistency

Abstract Consistency of a model—that is, the invariance of its behavior under meaning-preserving alternations in its input—is a highly desirable property in natural language processing. In this paper we study the question: Are Pretrained Language Models (PLMs) consistent with respect to factual knowledge? To this end, we create ParaRel🤘, a high-quality resource of cloze-style query English paraphrases. It contains a total of 328 paraphrases for 38 relations. Using ParaRel🤘, we show that the consistency of all PLMs we experiment with is poor— though with high variance between relations. Our analysis of the representational spaces of PLMs suggests that they have a poor structure and are currently not suitable for representing knowledge robustly. Finally, we propose a method for improving model consistency and experimentally demonstrate its effectiveness.1

Download Full-text

Live Chat Analysis Using Machine Learning

International Academic Journal of Science and Engineering ◽

10.9756/iajse/v8i1/iajse0805 ◽

2021 ◽

Vol 8 (1) ◽

pp. 39-44

Author(s):

S. Kavibharathi ◽

S. Lakshmi Priyankaa ◽

M.S. Kaviya ◽

Dr.S. Vasanthi

Keyword(s):

Neural Networks ◽

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Recurrent Neural Networks ◽

Learning Model ◽

Live Chat ◽

Wide Range ◽

Depth Analysis

The World Wide Web such as social networking sites and blog comments forum has huge user comments emotion data from different social events and product brand and arguments in the form of political views. Generate a heap. Reflects the user's mood on the network, the reader, has a huge impact on product suppliers and politicians. The challenge for the credibility of the analysis is the lack of sufficient tag data in the Natural Language Processing (NLP) field. Positive and negative classify content based on user feedback, live chat, whether the user is used as the base for a wide range of tasks related to the text content of a meaningful assessment. Data collection, and function number for all variants. A recurrent neural network is very good text classification. Analyzing unstructured form from social media data, reasonable structure, and analyzes attach great importance to note for this emotion. Emotional rewiring can use natural language processing sentiment analysis to predict. In the method by the Recurrent Neural Networks (RNNs) of the proposed prediction chat live chat into sentiment analysis. Sentiment analysis and in-depth learning technology have been integrated into the solution to this problem, with their deep learning model automatic learning function is active. Using a Recurrent Neural Networks (RNNs) reputation analysis to solve various problems and language problems of text analysis and visualization product retrospective sentiment classifier cross-depth analysis of the learning model implementation.

Download Full-text

Natural Language Processing and Text Mining Approaches in Production Shortfalls Analytics: Methodology, Case-Study and Value in the North Sea

10.2118/205443-ms ◽

2021 ◽

Author(s):

Edgar Bernier ◽

Sebastien Perrier

Keyword(s):

Natural Language Processing ◽

Text Mining ◽

Natural Language ◽

Language Processing ◽

North Sea ◽

Gas Production ◽

The North ◽

Wide Range ◽

The North Sea

Abstract Maximizing operational efficiency is a critical challenge in oil and gas production, particularly important for mature assets in the North Sea. The causes of production shortfalls are numerous, distributed across a wide range of disciplines, technical and non-technical causes. The primary reason to apply Natural Language Processing (NLP) and text mining on several years of shortfall history was the need to support efficiently the evaluation of digital transformation use-case screenings and value mapping exercises, through a proper mapping of the issues faced. Obviously, this mapping contributed as well to reflect on operational surveillance and maintenance strategies to reduce the production shortfalls. This paper presents a methodology where the historical records of descriptions, comments and results of investigation regarding production shortfalls are revisited, adding to existing shortfall classifications and statistics, in particular in two domains: richer first root-cause mapping, and a series of advanced visualizations and analytics. The methodology put in place uses natural-language pre-processing techniques, combined with keyword-based text-mining and classification techniques. The limitations associated to the size and quality of these language datasets will be described, and the results discussed, highlighting the value of reaching high level of data granularity while defeating the ‘more information, less attention’ bias. At the same time, visual designs are introduced to display efficiently the different dimensions of this data (impact, frequency evolution through time, location in term of field and affected systems, root causes and other cause-related categories). The ambition in the domain of visualization is to create User Experience-friendly shortfall analytics, that can be displayed in smart rooms and collaborative rooms, where display's efficiency is higher when user-interactions are kept minimal, number of charts is limited and multiple dimensions do not collide. The paper is based on several applications across the North Sea. This case study and the associated lessons learned regarding natural language processing and text mining applied to similar technical concise data are answering several frequently asked questions on the value of the textual data records gathered over years.

Download Full-text