Multilingual Information Access

Author(s):  
Víctor Peinado ◽  
Álvaro Rodrigo ◽  
Fernando López-Ostenero

This chapter focuses on Multilingual Information Access (MLIA), a multidisciplinary area that aims to solve accessing, querying, and retrieving information from heterogeneous information sources expressed in different languages. Current Information Retrieval technology, combined with Natural Language Processing tools allows building systems able to efficiently retrieve relevant information and, to some extent, to provide concrete answers to questions expressed in natural language. Besides, when linguistic resources and translation tools are available, cross-language information systems can assist to find information in multiple languages. Nevertheless, little is still known about how to properly assist people to find and use information expressed in unknown languages. Approaches proved as useful for automatic systems seem not to match with real user’s needs.

2016 ◽  
Vol 55 ◽  
pp. 1-15
Author(s):  
Marta R. Costa-jussà ◽  
Srinivas Bangalore ◽  
Patrik Lambert ◽  
Lluís Màrquez ◽  
Elena Montiel-Ponsoda

With the increasingly global nature of our everyday interactions, the need for multilin- gual technologies to support efficient and effective information access and communication cannot be overemphasized. Computational modeling of language has been the focus of Natural Language Processing, a subdiscipline of Artificial Intelligence. One of the current challenges for this discipline is to design methodologies and algorithms that are cross- language in order to create multilingual technologies rapidly. The goal of this JAIR special issue on Cross-Language Algorithms and Applications (CLAA) is to present leading re- search in this area, with emphasis on developing unifying themes that could lead to the development of the science of multi- and cross-lingualism. In this introduction, we provide the reader with the motivation for this special issue and summarize the contributions of the papers that have been included. The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment analysis, cross-language lexical resources, dependency parsing, information retrieval and knowledge representation. We anticipate that this special issue will serve as an invaluable resource for researchers interested in topics of cross-lingual natural language processing.


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Fridah Katushemererwe ◽  
Andrew Caines ◽  
Paula Buttery

AbstractThis paper describes an endeavour to build natural language processing (NLP) tools for Runyakitara, a group of four closely related Bantu languages spoken in western Uganda. In contrast with major world languages such as English, for which corpora are comparatively abundant and NLP tools are well developed, computational linguistic resources for Runyakitara are in short supply. First therefore, we need to collect corpora for these languages, before we can proceed to the design of a spell-checker, grammar-checker and applications for computer-assisted language learning (CALL). We explain how we are collecting primary data for a new Runya Corpus of speech and writing, we outline the design of a morphological analyser, and discuss how we can use these new resources to build NLP tools. We are initially working with Runyankore–Rukiga, a closely-related pair of Runyakitara languages, and we frame our project in the context of NLP for low-resource languages, as well as CALL for the preservation of endangered languages. We put our project forward as a test case for the revitalization of endangered languages through education and technology.


Author(s):  
Mario Jojoa Acosta ◽  
Gema Castillo-Sánchez ◽  
Begonya Garcia-Zapirain ◽  
Isabel de la Torre Díez ◽  
Manuel Franco-Martín

The use of artificial intelligence in health care has grown quickly. In this sense, we present our work related to the application of Natural Language Processing techniques, as a tool to analyze the sentiment perception of users who answered two questions from the CSQ-8 questionnaires with raw Spanish free-text. Their responses are related to mindfulness, which is a novel technique used to control stress and anxiety caused by different factors in daily life. As such, we proposed an online course where this method was applied in order to improve the quality of life of health care professionals in COVID 19 pandemic times. We also carried out an evaluation of the satisfaction level of the participants involved, with a view to establishing strategies to improve future experiences. To automatically perform this task, we used Natural Language Processing (NLP) models such as swivel embedding, neural networks, and transfer learning, so as to classify the inputs into the following three categories: negative, neutral, and positive. Due to the limited amount of data available—86 registers for the first and 68 for the second—transfer learning techniques were required. The length of the text had no limit from the user’s standpoint, and our approach attained a maximum accuracy of 93.02% and 90.53%, respectively, based on ground truth labeled by three experts. Finally, we proposed a complementary analysis, using computer graphic text representation based on word frequency, to help researchers identify relevant information about the opinions with an objective approach to sentiment. The main conclusion drawn from this work is that the application of NLP techniques in small amounts of data using transfer learning is able to obtain enough accuracy in sentiment analysis and text classification stages.


2014 ◽  
Vol 40 (2) ◽  
pp. 469-510 ◽  
Author(s):  
Khaled Shaalan

As more and more Arabic textual information becomes available through the Web in homes and businesses, via Internet and Intranet services, there is an urgent need for technologies and tools to process the relevant information. Named Entity Recognition (NER) is an Information Extraction task that has become an integral part of many other Natural Language Processing (NLP) tasks, such as Machine Translation and Information Retrieval. Arabic NER has begun to receive attention in recent years. The characteristics and peculiarities of Arabic, a member of the Semitic languages family, make dealing with NER a challenge. The performance of an Arabic NER component affects the overall performance of the NLP system in a positive manner. This article attempts to describe and detail the recent increase in interest and progress made in Arabic NER research. The importance of the NER task is demonstrated, the main characteristics of the Arabic language are highlighted, and the aspects of standardization in annotating named entities are illustrated. Moreover, the different Arabic linguistic resources are presented and the approaches used in Arabic NER field are explained. The features of common tools used in Arabic NER are described, and standard evaluation metrics are illustrated. In addition, a review of the state of the art of Arabic NER research is discussed. Finally, we present our conclusions. Throughout the presentation, illustrative examples are used for clarification.


Author(s):  
Sandeep Mathias ◽  
Diptesh Kanojia ◽  
Abhijit Mishra ◽  
Pushpak Bhattacharya

Gaze behaviour has been used as a way to gather cognitive information for a number of years. In this paper, we discuss the use of gaze behaviour in solving different tasks in natural language processing (NLP) without having to record it at test time. This is because the collection of gaze behaviour is a costly task, both in terms of time and money. Hence, in this paper, we focus on research done to alleviate the need for recording gaze behaviour at run time. We also mention different eye tracking corpora in multiple languages, which are currently available and can be used in natural language processing. We conclude our paper by discussing applications in a domain - education - and how learning gaze behaviour can help in solving the tasks of complex word identification and automatic essay grading.


2010 ◽  
Vol 1 (3) ◽  
pp. 1-19 ◽  
Author(s):  
Weisen Guo ◽  
Steven B. Kraines

To promote global knowledge sharing, one should solve the problem that knowledge representation in diverse natural languages restricts knowledge sharing effectively. Traditional knowledge sharing models are based on natural language processing (NLP) technologies. The ambiguity of natural language is a problem for NLP; however, semantic web technologies can circumvent the problem by enabling human authors to specify meaning in a computer-interpretable form. In this paper, the authors propose a cross-language semantic model (SEMCL) for knowledge sharing, which uses semantic web technologies to provide a potential solution to the problem of ambiguity. Also, this model can match knowledge descriptions in diverse languages. First, the methods used to support searches at the semantic predicate level are given, and the authors present a cross-language approach. Finally, an implementation of the model for the general engineering domain is discussed, and a scenario describing how the model implementation handles semantic cross-language knowledge sharing is given.


2018 ◽  
Vol 17 (03) ◽  
pp. 883-910 ◽  
Author(s):  
P. D. Mahendhiran ◽  
S. Kannimuthu

Contemporary research in Multimodal Sentiment Analysis (MSA) using deep learning is becoming popular in Natural Language Processing. Enormous amount of data are obtainable from social media such as Facebook, WhatsApp, YouTube, Twitter and microblogs every day. In order to deal with these large multimodal data, it is difficult to identify the relevant information from social media websites. Hence, there is a need to improve an intellectual MSA. Here, Deep Learning is used to improve the understanding and performance of MSA better. Deep Learning delivers automatic feature extraction and supports to achieve the best performance to enhance the combined model that integrates Linguistic, Acoustic and Video information extraction method. This paper focuses on the various techniques used for classifying the given portion of natural language text, audio and video according to the thoughts, feelings or opinions expressed in it, i.e., whether the general attitude is Neutral, Positive or Negative. From the results, it is perceived that Deep Learning classification algorithm gives better results compared to other machine learning classifiers such as KNN, Naive Bayes, Random Forest, Random Tree and Neural Net model. The proposed MSA in deep learning is to identify sentiment in web videos which conduct the poof-of-concept experiments that proved, in preliminary experiments using the ICT-YouTube dataset, our proposed multimodal system achieves an accuracy of 96.07%.


Author(s):  
Małgorzata Wierzba ◽  
Monika Riegel ◽  
Jan Kocoń ◽  
Piotr Miłkowski ◽  
Arkadiusz Janz ◽  
...  

AbstractEmotion lexicons are useful in research across various disciplines, but the availability of such resources remains limited for most languages. While existing emotion lexicons typically comprise words, it is a particular meaning of a word (rather than the word itself) that conveys emotion. To mitigate this issue, we present the Emotion Meanings dataset, a novel dataset of 6000 Polish word meanings. The word meanings are derived from the Polish wordnet (plWordNet), a large semantic network interlinking words by means of lexical and conceptual relations. The word meanings were manually rated for valence and arousal, along with a variety of basic emotion categories (anger, disgust, fear, sadness, anticipation, happiness, surprise, and trust). The annotations were found to be highly reliable, as demonstrated by the similarity between data collected in two independent samples: unsupervised (n = 21,317) and supervised (n = 561). Although we found the annotations to be relatively stable for female, male, younger, and older participants, we share both summary data and individual data to enable emotion research on different demographically specific subgroups. The word meanings are further accompanied by the relevant metadata, derived from open-source linguistic resources. Direct mapping to Princeton WordNet makes the dataset suitable for research on multiple languages. Altogether, this dataset provides a versatile resource that can be employed for emotion research in psychology, cognitive science, psycholinguistics, computational linguistics, and natural language processing.


Author(s):  
Ricardo Colomo-Palacios ◽  
Marcos Ruano-Mayoral ◽  
Pedro Soto-Acosta ◽  
Ángel García-Crespo

In current organizations, the importance of knowledge and competence is unquestionable. In Information Technology (IT) companies, which are, by definition, knowledge intensive, this importance is critical. In such organizations, the models of knowledge exploitation include specific processes and elements that drive the production of knowledge aimed at satisfying organizational objectives. However, competence evidence recollection is a highly intensive and time consuming task, which is the key point for this system. SeCEC-IT is a tool based on software artifacts that extracts relevant information using natural language processing techniques. It enables competence evidence detection by deducing competence facts from documents in an automated way. SeCEC-IT includes within its technological components such items as semantic technologies, natural language processing, and human resource communication standards (HR-XML).


Sign in / Sign up

Export Citation Format

Share Document