scholarly journals Emotion norms for 6000 Polish word meanings with a direct mapping to the Polish wordnet

Author(s):  
Małgorzata Wierzba ◽  
Monika Riegel ◽  
Jan Kocoń ◽  
Piotr Miłkowski ◽  
Arkadiusz Janz ◽  
...  

AbstractEmotion lexicons are useful in research across various disciplines, but the availability of such resources remains limited for most languages. While existing emotion lexicons typically comprise words, it is a particular meaning of a word (rather than the word itself) that conveys emotion. To mitigate this issue, we present the Emotion Meanings dataset, a novel dataset of 6000 Polish word meanings. The word meanings are derived from the Polish wordnet (plWordNet), a large semantic network interlinking words by means of lexical and conceptual relations. The word meanings were manually rated for valence and arousal, along with a variety of basic emotion categories (anger, disgust, fear, sadness, anticipation, happiness, surprise, and trust). The annotations were found to be highly reliable, as demonstrated by the similarity between data collected in two independent samples: unsupervised (n = 21,317) and supervised (n = 561). Although we found the annotations to be relatively stable for female, male, younger, and older participants, we share both summary data and individual data to enable emotion research on different demographically specific subgroups. The word meanings are further accompanied by the relevant metadata, derived from open-source linguistic resources. Direct mapping to Princeton WordNet makes the dataset suitable for research on multiple languages. Altogether, this dataset provides a versatile resource that can be employed for emotion research in psychology, cognitive science, psycholinguistics, computational linguistics, and natural language processing.

2020 ◽  
Author(s):  
Mario Crespo Miguel

Computational linguistics is the scientific study of language from a computational perspective. It aims is to provide computational models of natural language processing (NLP) and incorporate them into practical applications such as speech synthesis, speech recognition, automatic translation and many others where automatic processing of language is required. The use of good linguistic resources is crucial for the development of computational linguistics systems. Real world applications need resources which systematize the way linguistic information is structured in a certain language. There is a continuous effort to increase the number of linguistic resources available for the linguistic and NLP Community. Most of the existing linguistic resources have been created for English, mainly because most modern approaches to computational lexical semantics emerged in the United States. This situation is changing over time and some of these projects have been subsequently extended to other languages; however, in all cases, much time and effort need to be invested in creating such resources. Because of this, one of the main purposes of this work is to investigate the possibility of extending these resources to other languages such as Spanish. In this work, we introduce some of the most important resources devoted to lexical semantics, such as WordNet or FrameNet, and those focusing on Spanish such as 3LB-LEX or Adesse. Of these, this project focuses on FrameNet. The project aims to document the range of semantic and syntactic combinatory possibilities of words in English. Words are grouped according to the different frames or situations evoked by their meaning. If we focus on a particular topic domain like medicine and we try to describe it in terms of FrameNet, we probably would obtain frames representing it like CURE, formed by words like cure.v, heal.v or palliative.a or MEDICAL CONDITIONS with lexical units such as arthritis.n, asphyxia.n or asthma.n. The purpose of this work is to develop an automatic means of selecting frames from a particular domain and to translate them into Spanish. As we have stated, we will focus on medicine. The selection of the medical frames will be corpus-based, that is, we will extract all the frames that are statistically significant from a representative corpus. We will discuss why using a corpus-based approach is a reliable and unbiased way of dealing with this task. We will present an automatic method for the selection of FrameNet frames and, in order to make sure that the results obtained are coherent, we will contrast them with a previous manual selection or benchmark. Outcomes will be analysed by using the F-score, a measure widely used in this type of applications. We obtained a 0.87 F-score according to our benchmark, which demonstrates the applicability of this type of automatic approaches. The second part of the book is devoted to the translation of this selection into Spanish. The translation will be made using EuroWordNet, a extension of the Princeton WordNet for some European languages. We will explore different ways to link the different units of our medical FrameNet selection to a certain WordNet synset or set of words that have similar meanings. Matching the frame units to a specific synset in EuroWordNet allows us both to translate them into Spanish and to add new terms provided by WordNet into FrameNet. The results show how translation can be done quite accurately (95.6%). We hope this work can add new insight into the field of natural language processing.


Author(s):  
Víctor Peinado ◽  
Álvaro Rodrigo ◽  
Fernando López-Ostenero

This chapter focuses on Multilingual Information Access (MLIA), a multidisciplinary area that aims to solve accessing, querying, and retrieving information from heterogeneous information sources expressed in different languages. Current Information Retrieval technology, combined with Natural Language Processing tools allows building systems able to efficiently retrieve relevant information and, to some extent, to provide concrete answers to questions expressed in natural language. Besides, when linguistic resources and translation tools are available, cross-language information systems can assist to find information in multiple languages. Nevertheless, little is still known about how to properly assist people to find and use information expressed in unknown languages. Approaches proved as useful for automatic systems seem not to match with real user’s needs.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Ellie Pavlick

Deep learning has recently come to dominate computational linguistics, leading to claims of human-level performance in a range of language processing tasks. Like much previous computational work, deep learning–based linguistic representations adhere to the distributional meaning-in-use hypothesis, deriving semantic representations from word co-occurrence statistics. However, current deep learning methods entail fundamentally new models of lexical and compositional meaning that are ripe for theoretical analysis. Whereas traditional distributional semantics models take a bottom-up approach in which sentence meaning is characterized by explicit composition functions applied to word meanings, new approaches take a top-down approach in which sentence representations are treated as primary and representations of words and syntax are viewed as emergent. This article summarizes our current understanding of how well such representations capture lexical semantics, world knowledge, and composition. The goal is to foster increased collaboration on testing the implications of such representations as general-purpose models of semantics. Expected final online publication date for the Annual Review of Linguistics, Volume 8 is January 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


Author(s):  
Roberto Navigli ◽  
Michele Bevilacqua ◽  
Simone Conia ◽  
Dario Montagnini ◽  
Francesco Cecconi

The intelligent manipulation of symbolic knowledge has been a long-sought goal of AI. However, when it comes to Natural Language Processing (NLP), symbols have to be mapped to words and phrases, which are not only ambiguous but also language-specific: multilinguality is indeed a desirable property for NLP systems, and one which enables the generalization of tasks where multiple languages need to be dealt with, without translating text. In this paper we survey BabelNet, a popular wide-coverage lexical-semantic knowledge resource obtained by merging heterogeneous sources into a unified semantic network that helps to scale tasks and applications to hundreds of languages. Over its ten years of existence, thanks to its promise to interconnect languages and resources in structured form, BabelNet has been employed in countless ways and directions. We first introduce the BabelNet model, its components and statistics, and then overview its successful use in a wide range of tasks in NLP as well as in other fields of AI.


Author(s):  
Zahra Mousavi ◽  
Heshaam Faili

Nowadays, wordnets are extensively used as a major resource in natural language processing and information retrieval tasks. Therefore, the accuracy of wordnets has a direct influence on the performance of the involved applications. This paper presents a fully-automated method for extending a previously developed Persian wordnet to cover more comprehensive and accurate verbal entries. At first, by using a bilingual dictionary, some Persian verbs are linked to Princeton WordNet synsets. A feature set related to the semantic behavior of compound verbs as the majority of Persian verbs is proposed. This feature set is employed in a supervised classification system to select the proper links for inclusion in the wordnet. We also benefit from a pre-existing Persian wordnet, FarsNet, and a similarity-based method to produce a training set. This is the largest automatically developed Persian wordnet with more than 27,000 words, 28,000 PWN synsets and 67,000 word-sense pairs that substantially outperforms the previous Persian wordnet with about 16,000 words, 22,000 PWN synsets and 38,000 word-sense pairs.


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Fridah Katushemererwe ◽  
Andrew Caines ◽  
Paula Buttery

AbstractThis paper describes an endeavour to build natural language processing (NLP) tools for Runyakitara, a group of four closely related Bantu languages spoken in western Uganda. In contrast with major world languages such as English, for which corpora are comparatively abundant and NLP tools are well developed, computational linguistic resources for Runyakitara are in short supply. First therefore, we need to collect corpora for these languages, before we can proceed to the design of a spell-checker, grammar-checker and applications for computer-assisted language learning (CALL). We explain how we are collecting primary data for a new Runya Corpus of speech and writing, we outline the design of a morphological analyser, and discuss how we can use these new resources to build NLP tools. We are initially working with Runyankore–Rukiga, a closely-related pair of Runyakitara languages, and we frame our project in the context of NLP for low-resource languages, as well as CALL for the preservation of endangered languages. We put our project forward as a test case for the revitalization of endangered languages through education and technology.


2020 ◽  
Vol 31 (1) ◽  
pp. 62-76
Author(s):  
Olessia Jouravlev ◽  
Zachary Mineroff ◽  
Idan A Blank ◽  
Evelina Fedorenko

Abstract Acquiring a foreign language is challenging for many adults. Yet certain individuals choose to acquire sometimes dozens of languages and often just for fun. Is there something special about the minds and brains of such polyglots? Using robust individual-level markers of language activity, measured with fMRI, we compared native language processing in polyglots versus matched controls. Polyglots (n = 17, including nine “hyper-polyglots” with proficiency in 10–55 languages) used fewer neural resources to process language: Their activations were smaller in both magnitude and extent. This difference was spatially and functionally selective: The groups were similar in their activation of two other brain networks—the multiple demand network and the default mode network. We hypothesize that the activation reduction in the language network is experientially driven, such that the acquisition and use of multiple languages makes language processing generally more efficient. However, genetic and longitudinal studies will be critical to distinguish this hypothesis from the one whereby polyglots’ brains already differ at birth or early in development. This initial characterization of polyglots’ language network opens the door to future investigations of the cognitive and neural architecture of individuals who gain mastery of multiple languages, including changes in this architecture with linguistic experiences.


2014 ◽  
Vol 40 (2) ◽  
pp. 469-510 ◽  
Author(s):  
Khaled Shaalan

As more and more Arabic textual information becomes available through the Web in homes and businesses, via Internet and Intranet services, there is an urgent need for technologies and tools to process the relevant information. Named Entity Recognition (NER) is an Information Extraction task that has become an integral part of many other Natural Language Processing (NLP) tasks, such as Machine Translation and Information Retrieval. Arabic NER has begun to receive attention in recent years. The characteristics and peculiarities of Arabic, a member of the Semitic languages family, make dealing with NER a challenge. The performance of an Arabic NER component affects the overall performance of the NLP system in a positive manner. This article attempts to describe and detail the recent increase in interest and progress made in Arabic NER research. The importance of the NER task is demonstrated, the main characteristics of the Arabic language are highlighted, and the aspects of standardization in annotating named entities are illustrated. Moreover, the different Arabic linguistic resources are presented and the approaches used in Arabic NER field are explained. The features of common tools used in Arabic NER are described, and standard evaluation metrics are illustrated. In addition, a review of the state of the art of Arabic NER research is discussed. Finally, we present our conclusions. Throughout the presentation, illustrative examples are used for clarification.


Sign in / Sign up

Export Citation Format

Share Document