natural language processing task
Recently Published Documents


TOTAL DOCUMENTS

12
(FIVE YEARS 8)

H-INDEX

2
(FIVE YEARS 1)

2021 ◽  
Vol 11 (21) ◽  
pp. 10267
Author(s):  
Puri Phakmongkol ◽  
Peerapon Vateekul

Question Answering (QA) is a natural language processing task that enables the machine to understand a given context and answer a given question. There are several QA research trials containing high resources of the English language. However, Thai is one of the languages that have low availability of labeled corpora in QA studies. According to previous studies, while the English QA models could achieve more than 90% of F1 scores, Thai QA models could obtain only 70% in our baseline. In this study, we aim to improve the performance of Thai QA models by generating more question-answer pairs with Multilingual Text-to-Text Transfer Transformer (mT5) along with data preprocessing methods for Thai. With this method, the question-answer pairs can synthesize more than 100 thousand pairs from provided Thai Wikipedia articles. Utilizing our synthesized data, many fine-tuning strategies were investigated to achieve the highest model performance. Furthermore, we have presented that the syllable-level F1 is a more suitable evaluation measure than Exact Match (EM) and the word-level F1 for Thai QA corpora. The experiment was conducted on two Thai QA corpora: Thai Wiki QA and iApp Wiki QA. The results show that our augmented model is the winner on both datasets compared to other modern transformer models: Roberta and mT5.


2021 ◽  
Vol 7 ◽  
pp. e598
Author(s):  
Wenjie Yin ◽  
Arkaitz Zubiaga

Hate speech is one type of harmful online content which directly attacks or promotes hate towards a group or an individual member based on their actual or perceived aspects of identity, such as ethnicity, religion, and sexual orientation. With online hate speech on the rise, its automatic detection as a natural language processing task is gaining increasing interest. However, it is only recently that it has been shown that existing models generalise poorly to unseen data. This survey paper attempts to summarise how generalisable existing hate speech detection models are and the reasons why hate speech models struggle to generalise, sums up existing attempts at addressing the main obstacles, and then proposes directions of future research to improve generalisation in hate speech detection.


2021 ◽  
Vol 11 (3) ◽  
pp. 1090
Author(s):  
Miguel A. Alonso ◽  
Carlos Gómez-Rodríguez ◽  
Jesús Vilares

Parsing is a core natural language processing technique that can be used to obtain the structure underlying sentences in human languages. Named entity recognition (NER) is the task of identifying the entities that appear in a text. NER is a challenging natural language processing task that is essential to extract knowledge from texts in multiple domains, ranging from financial to medical. It is intuitive that the structure of a text can be helpful to determine whether or not a certain portion of it is an entity and if so, to establish its concrete limits. However, parsing has been a relatively little-used technique in NER systems, since most of them have chosen to consider shallow approaches to deal with text. In this work, we study the characteristics of NER, a task that is far from being solved despite its long history; we analyze the latest advances in parsing that make its use advisable in NER settings; we review the different approaches to NER that make use of syntactic information; and we propose a new way of using parsing in NER based on casting parsing itself as a sequence labeling task.


Author(s):  
Praveen Gujjar J ◽  
Prasanna Kumar H R

Evolution in the field of web technology has made an enormous amount of data available in the web for the internet users. These internet users give their useful feedback, comments, suggestion or opinion for the available product or service in the web. User generated data are very essential to analyze for business decision making. TextBlob is one of the simple API offered by python library to perform certain natural language processing task. This paper proposed a method for analyzing the opinion of the customer using TextBlob to understand the customer opinion for decision making. This paper, provide a result for aforesaid data using TextBlob API using python. The paper includes advantages of the proposed technique and concludes with the challenges for the marketers when using this technique in their decision-making.


2019 ◽  
Vol 7 ◽  
pp. 581-596
Author(s):  
Yumo Xu ◽  
Mirella Lapata

In this paper we introduce domain detection as a new natural language processing task. We argue that the ability to detect textual segments that are domain-heavy (i.e., sentences or phrases that are representative of and provide evidence for a given domain) could enhance the robustness and portability of various text classification applications. We propose an encoder-detector framework for domain detection and bootstrap classifiers with multiple instance learning. The model is hierarchically organized and suited to multilabel classification. We demonstrate that despite learning with minimal supervision, our model can be applied to text spans of different granularities, languages, and genres. We also showcase the potential of domain detection for text summarization.


Author(s):  
Shizhe Chen ◽  
Qin Jin ◽  
Alexander Hauptmann

Bilingual lexicon induction, translating words from the source language to the target language, is a long-standing natural language processing task. Recent endeavors prove that it is promising to employ images as pivot to learn the lexicon induction without reliance on parallel corpora. However, these vision-based approaches simply associate words with entire images, which are constrained to translate concrete words and require object-centered images. We humans can understand words better when they are within a sentence with context. Therefore, in this paper, we propose to utilize images and their associated captions to address the limitations of previous approaches. We propose a multi-lingual caption model trained with different mono-lingual multimodal data to map words in different languages into joint spaces. Two types of word representation are induced from the multi-lingual caption model: linguistic features and localized visual features. The linguistic feature is learned from the sentence contexts with visual semantic constraints, which is beneficial to learn translation for words that are less visual-relevant. The localized visual feature is attended to the region in the image that correlates to the word, so that it alleviates the image restriction for salient visual representation. The two types of features are complementary for word translation. Experimental results on multiple language pairs demonstrate the effectiveness of our proposed method, which substantially outperforms previous vision-based approaches without using any parallel sentences or supervision of seed word pairs.


Author(s):  
Tianlin Liu ◽  
Lyle Ungar ◽  
João Sedoc

Word vectors are at the core of many natural language processing tasks. Recently, there has been interest in post-processing word vectors to enrich their semantic information. In this paper, we introduce a novel word vector post-processing technique based on matrix conceptors (Jaeger 2014), a family of regularized identity maps. More concretely, we propose to use conceptors to suppress those latent features of word vectors having high variances. The proposed method is purely unsupervised: it does not rely on any corpus or external linguistic database. We evaluate the post-processed word vectors on a battery of intrinsic lexical evaluation tasks, showing that the proposed method consistently outperforms existing state-of-the-art alternatives. We also show that post-processed word vectors can be used for the downstream natural language processing task of dialogue state tracking, yielding improved results in different dialogue domains.


Author(s):  
Juan Manuel Adán Coello ◽  
Armando Dalla Costa Neto

Sentiment analysis of texts posted on Twitter is a natural language processing task whose importance has grown along with the increase in the number of users of the platform and the interest of organizations on the opinions of their employees, customers and users.Although Brazil is the sixth country in the world with most active users of Tweeter and Portuguese is the seventh most spoken language in the world, with 221 million speakers (200 million of them living in Brazil), the number of articles that discuss sentiment analysis approaches for Brazilian Portuguese is a small fraction of those that focus on the English language. On the other hand, few works use deep learning for this task when compared with other machine learning and lexical based methods. In this context, the work described in this article addresses the problem using Convolutional Neural Networks (CNN). The paper presents the results of an experimental evaluation that shows that a CNN with a relatively simple architecture can perform much better than a previous approach that uses ensembles of other machine learning classifiers combined with text preprocessing heuristics


2018 ◽  
pp. 136-142
Author(s):  
Surbhi Bhatia ◽  
Manisha Sharma ◽  
Komal Kumar Bhatia ◽  
Pragyaditya Das

Social networks have increased their demand extensively for mining texts. Opinions are used to express views and reviews are used to provide information about how a product is perceived. The reviews available online can be available in thousands, so making the right decision to select a product becomes a very tedious task. Several research works has been proposed in the past but they were limited to certain issues discussed in this paper. A dynamic system is proposed based on the features using ontology followed with classification. Classifying information from such text is highly challenging. We propose a novel method of extracting aspects using ontology and further categorizing these sentiments into positive, negative and neutral category using supervised leaning technique. Opinion Mining is a natural language processing task that mine information from various text forums and classify them on the basis of their polarity as positive, negative or neutral. In this paper, we demonstrate machine learning algorithms using WEKA tool and efficiency is evaluated using information retrieval search strategies.


Sign in / Sign up

Export Citation Format

Share Document