natural language processing task Latest Research Papers

Question Answering (QA) is a natural language processing task that enables the machine to understand a given context and answer a given question. There are several QA research trials containing high resources of the English language. However, Thai is one of the languages that have low availability of labeled corpora in QA studies. According to previous studies, while the English QA models could achieve more than 90% of F1 scores, Thai QA models could obtain only 70% in our baseline. In this study, we aim to improve the performance of Thai QA models by generating more question-answer pairs with Multilingual Text-to-Text Transfer Transformer (mT5) along with data preprocessing methods for Thai. With this method, the question-answer pairs can synthesize more than 100 thousand pairs from provided Thai Wikipedia articles. Utilizing our synthesized data, many fine-tuning strategies were investigated to achieve the highest model performance. Furthermore, we have presented that the syllable-level F1 is a more suitable evaluation measure than Exact Match (EM) and the word-level F1 for Thai QA corpora. The experiment was conducted on two Thai QA corpora: Thai Wiki QA and iApp Wiki QA. The results show that our augmented model is the winner on both datasets compared to other modern transformer models: Roberta and mT5.

Download Full-text

Towards generalisable hate speech detection: a review on obstacles and solutions

PeerJ Computer Science ◽

10.7717/peerj-cs.598 ◽

2021 ◽

Vol 7 ◽

pp. e598

Author(s):

Wenjie Yin ◽

Arkaitz Zubiaga

Keyword(s):

Natural Language Processing ◽

Sexual Orientation ◽

Language Processing ◽

Hate Speech ◽

Future Research ◽

Individual Member ◽

Speech Detection ◽

Survey Paper ◽

Unseen Data ◽

Natural Language Processing Task

Hate speech is one type of harmful online content which directly attacks or promotes hate towards a group or an individual member based on their actual or perceived aspects of identity, such as ethnicity, religion, and sexual orientation. With online hate speech on the rise, its automatic detection as a natural language processing task is gaining increasing interest. However, it is only recently that it has been shown that existing models generalise poorly to unseen data. This survey paper attempts to summarise how generalisable existing hate speech detection models are and the reasons why hate speech models struggle to generalise, sums up existing attempts at addressing the main obstacles, and then proposes directions of future research to improve generalisation in hate speech detection.

Download Full-text

On the Use of Parsing for Named Entity Recognition

Applied Sciences ◽

10.3390/app11031090 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1090

Author(s):

Miguel A. Alonso ◽

Carlos Gómez-Rodríguez ◽

Jesús Vilares

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Named Entity Recognition ◽

Processing Technique ◽

Entity Recognition ◽

Named Entity ◽

Syntactic Information ◽

Multiple Domains ◽

Natural Language Processing Task

Parsing is a core natural language processing technique that can be used to obtain the structure underlying sentences in human languages. Named entity recognition (NER) is the task of identifying the entities that appear in a text. NER is a challenging natural language processing task that is essential to extract knowledge from texts in multiple domains, ranging from financial to medical. It is intuitive that the structure of a text can be helpful to determine whether or not a certain portion of it is an entity and if so, to establish its concrete limits. However, parsing has been a relatively little-used technique in NER systems, since most of them have chosen to consider shallow approaches to deal with text. In this work, we study the characteristics of NER, a task that is far from being solved despite its long history; we analyze the latest advances in parsing that make its use advisable in NER settings; we review the different approaches to NER that make use of syntactic information; and we propose a new way of using parsing in NER based on casting parsing itself as a sequence labeling task.

Download Full-text

Opinion Mining for the Customer Feedback using TextBlob

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit206418 ◽

2020 ◽

pp. 72-76

Author(s):

Praveen Gujjar J ◽

Prasanna Kumar H R

Keyword(s):

Decision Making ◽

Language Processing ◽

Opinion Mining ◽

The Internet ◽

Business Decision ◽

Customer Feedback ◽

Internet Users ◽

Enormous Amount ◽

The Web ◽

Natural Language Processing Task

Evolution in the field of web technology has made an enormous amount of data available in the web for the internet users. These internet users give their useful feedback, comments, suggestion or opinion for the available product or service in the web. User generated data are very essential to analyze for business decision making. TextBlob is one of the simple API offered by python library to perform certain natural language processing task. This paper proposed a method for analyzing the opinion of the customer using TextBlob to understand the customer opinion for decision making. This paper, provide a result for aforesaid data using TextBlob API using python. The paper includes advantages of the proposed technique and concludes with the challenges for the marketers when using this technique in their decision-making.

Download Full-text

Weakly Supervised Domain Detection

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00287 ◽

2019 ◽

Vol 7 ◽

pp. 581-596

Author(s):

Yumo Xu ◽

Mirella Lapata

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Classification ◽

Multiple Instance Learning ◽

Text Summarization ◽

Multilabel Classification ◽

Processing Task ◽

Weakly Supervised ◽

Natural Language Processing Task

In this paper we introduce domain detection as a new natural language processing task. We argue that the ability to detect textual segments that are domain-heavy (i.e., sentences or phrases that are representative of and provide evidence for a given domain) could enhance the robustness and portability of various text classification applications. We propose an encoder-detector framework for domain detection and bootstrap classifiers with multiple instance learning. The model is hierarchically organized and suited to multilabel classification. We demonstrate that despite learning with minimal supervision, our model can be applied to text spans of different granularities, languages, and genres. We also showcase the potential of domain detection for text summarization.

Download Full-text

Unsupervised Bilingual Lexicon Induction from Mono-Lingual Multimodal Data

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018207 ◽

2019 ◽

Vol 33 ◽

pp. 8207-8214 ◽

Cited By ~ 2

Author(s):

Shizhe Chen ◽

Qin Jin ◽

Alexander Hauptmann

Keyword(s):

Language Processing ◽

Target Language ◽

Linguistic Features ◽

Linguistic Feature ◽

Multimodal Data ◽

Parallel Corpora ◽

Semantic Constraints ◽

Bilingual Lexicon ◽

Word Representation ◽

Natural Language Processing Task

Bilingual lexicon induction, translating words from the source language to the target language, is a long-standing natural language processing task. Recent endeavors prove that it is promising to employ images as pivot to learn the lexicon induction without reliance on parallel corpora. However, these vision-based approaches simply associate words with entire images, which are constrained to translate concrete words and require object-centered images. We humans can understand words better when they are within a sentence with context. Therefore, in this paper, we propose to utilize images and their associated captions to address the limitations of previous approaches. We propose a multi-lingual caption model trained with different mono-lingual multimodal data to map words in different languages into joint spaces. Two types of word representation are induced from the multi-lingual caption model: linguistic features and localized visual features. The linguistic feature is learned from the sentence contexts with visual semantic constraints, which is beneficial to learn translation for words that are less visual-relevant. The localized visual feature is attended to the region in the image that correlates to the word, so that it alleviates the image restriction for salient visual representation. The two types of features are complementary for word translation. Experimental results on multiple language pairs demonstrate the effectiveness of our proposed method, which substantially outperforms previous vision-based approaches without using any parallel sentences or supervision of seed word pairs.

Download Full-text

Unsupervised Post-Processing of Word Vectors via Conceptor Negation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016778 ◽

2019 ◽

Vol 33 ◽

pp. 6778-6785

Author(s):

Tianlin Liu ◽

Lyle Ungar ◽

João Sedoc

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Semantic Information ◽

State Of The Art ◽

Processing Technique ◽

Post Processing ◽

Latent Features ◽

State Tracking ◽

Natural Language Processing Task

Word vectors are at the core of many natural language processing tasks. Recently, there has been interest in post-processing word vectors to enrich their semantic information. In this paper, we introduce a novel word vector post-processing technique based on matrix conceptors (Jaeger 2014), a family of regularized identity maps. More concretely, we propose to use conceptors to suppress those latent features of word vectors having high variances. The proposed method is purely unsupervised: it does not rely on any corpus or external linguistic database. We evaluate the post-processed word vectors on a battery of intrinsic lexical evaluation tasks, showing that the proposed method consistently outperforms existing state-of-the-art alternatives. We also show that post-processed word vectors can be used for the downstream natural language processing task of dialogue state tracking, yielding improved results in different dialogue domains.

Download Full-text

Sentiment Analysis of Tweets in Brazilian Portuguese with Convolutional Neural Networks

International Journal for Innovation Education and Research ◽

10.31686/ijier.vol7.iss6.1547 ◽

2019 ◽

Vol 7 (6) ◽

pp. 29-41

Author(s):

Juan Manuel Adán Coello ◽

Armando Dalla Costa Neto

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Sentiment Analysis ◽

Convolutional Neural Networks ◽

Language Processing ◽

English Language ◽

Brazilian Portuguese ◽

The World ◽

Better Than ◽

Natural Language Processing Task

Sentiment analysis of texts posted on Twitter is a natural language processing task whose importance has grown along with the increase in the number of users of the platform and the interest of organizations on the opinions of their employees, customers and users.Although Brazil is the sixth country in the world with most active users of Tweeter and Portuguese is the seventh most spoken language in the world, with 221 million speakers (200 million of them living in Brazil), the number of articles that discuss sentiment analysis approaches for Brazilian Portuguese is a small fraction of those that focus on the English language. On the other hand, few works use deep learning for this task when compared with other machine learning and lexical based methods. In this context, the work described in this article addresses the problem using Convolutional Neural Networks (CNN). The paper presents the results of an experimental evaluation that shows that a CNN with a relatively simple architecture can perform much better than a previous approach that uses ensembles of other machine learning classifiers combined with text preprocessing heuristics

Download Full-text

OPINION TARGET EXTRACTION WITH SENTIMENT ANALYSIS

International Journal of Computing ◽

10.47839/ijc.17.3.1033 ◽

2018 ◽

pp. 136-142

Author(s):

Surbhi Bhatia ◽

Manisha Sharma ◽

Komal Kumar Bhatia ◽

Pragyaditya Das

Keyword(s):

Machine Learning ◽

Social Networks ◽

Language Processing ◽

Opinion Mining ◽

Machine Learning Algorithms ◽

The Past ◽

Target Extraction ◽

Novel Method ◽

The Right ◽

Natural Language Processing Task

Social networks have increased their demand extensively for mining texts. Opinions are used to express views and reviews are used to provide information about how a product is perceived. The reviews available online can be available in thousands, so making the right decision to select a product becomes a very tedious task. Several research works has been proposed in the past but they were limited to certain issues discussed in this paper. A dynamic system is proposed based on the features using ontology followed with classification. Classifying information from such text is highly challenging. We propose a novel method of extracting aspects using ontology and further categorizing these sentiments into positive, negative and neutral category using supervised leaning technique. Opinion Mining is a natural language processing task that mine information from various text forums and classify them on the basis of their polarity as positive, negative or neutral. In this paper, we demonstrate machine learning algorithms using WEKA tool and efficiency is evaluated using information retrieval search strategies.

Download Full-text

Explicative Deep Learning with Probabilistic Formal Concepts in a Natural Language Processing Task

The Bulletin of Irkutsk State University Series Mathematics ◽

10.26516/1997-7670.2017.22.31 ◽

2017 ◽

Vol 22 ◽

pp. 31-49

Author(s):

E. E. Vityaev ◽

◽

V. V. Martynovich ◽

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Processing Task ◽

Formal Concepts ◽

Natural Language Processing Task

Download Full-text

natural language processing task
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Enhance Text-to-Text Transfer Transformer with Generated Questions for Thai Question Answering

Towards generalisable hate speech detection: a review on obstacles and solutions

On the Use of Parsing for Named Entity Recognition

Opinion Mining for the Customer Feedback using TextBlob

Weakly Supervised Domain Detection

Unsupervised Bilingual Lexicon Induction from Mono-Lingual Multimodal Data

Unsupervised Post-Processing of Word Vectors via Conceptor Negation

Sentiment Analysis of Tweets in Brazilian Portuguese with Convolutional Neural Networks

OPINION TARGET EXTRACTION WITH SENTIMENT ANALYSIS

Explicative Deep Learning with Probabilistic Formal Concepts in a Natural Language Processing Task

Export Citation Format

natural language processing taskRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Enhance Text-to-Text Transfer Transformer with Generated Questions for Thai Question Answering

Towards generalisable hate speech detection: a review on obstacles and solutions

On the Use of Parsing for Named Entity Recognition

Opinion Mining for the Customer Feedback using TextBlob

Weakly Supervised Domain Detection

Unsupervised Bilingual Lexicon Induction from Mono-Lingual Multimodal Data

Unsupervised Post-Processing of Word Vectors via Conceptor Negation

Sentiment Analysis of Tweets in Brazilian Portuguese with Convolutional Neural Networks

OPINION TARGET EXTRACTION WITH SENTIMENT ANALYSIS

Explicative Deep Learning with Probabilistic Formal Concepts in a Natural Language Processing Task

natural language processing task
Recently Published Documents