Big Data Management and Analytics in Scientific Programming: A Deep Learning-Based Method for Aspect Category Classification of Question-Answering-Style Reviews

BUILD KNOWLEDGE GRAPH FROM HETEROGENEOUS DOCUMENTS

Journal of Science and Technology - IUH ◽

10.46242/jst-iuh.v47i05.761 ◽

2021 ◽

Vol 47 (05) ◽

Author(s):

NGUYỄN CHÍ HIẾU

Keyword(s):

Information Retrieval ◽

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Question Answering ◽

Semantic Analysis ◽

Knowledge Graph ◽

Question Answering Systems ◽

Knowledge Graphs

Knowledge Graphs are applied in many fields such as search engines, semantic analysis, and question answering in recent years. However, there are many obstacles for building knowledge graphs as methodologies, data and tools. This paper introduces a novel methodology to build knowledge graph from heterogeneous documents. We use the methodologies of Natural Language Processing and deep learning to build this graph. The knowledge graph can use in Question answering systems and Information retrieval especially in Computing domain

Download Full-text

Textual Adversarial Attacking with Limited Queries

Electronics ◽

10.3390/electronics10212671 ◽

2021 ◽

Vol 10 (21) ◽

pp. 2671

Author(s):

Yu Zhang ◽

Junan Yang ◽

Xiaoshuai Li ◽

Hui Liu ◽

Kun Shao

Keyword(s):

Language Processing ◽

Main Idea ◽

Local Model ◽

Small Perturbations ◽

Target Model ◽

Word Level ◽

Sentence Level ◽

Adversarial Examples ◽

Reducing Costs ◽

The Cost

Recent studies have shown that natural language processing (NLP) models are vulnerable to adversarial examples, which are maliciously designed by adding small perturbations to benign inputs that are imperceptible to the human eye, leading to false predictions by the target model. Compared to character- and sentence-level textual adversarial attacks, word-level attack can generate higher-quality adversarial examples, especially in a black-box setting. However, existing attack methods usually require a huge number of queries to successfully deceive the target model, which is costly in a real adversarial scenario. Hence, finding appropriate models is difficult. Therefore, we propose a novel attack method, the main idea of which is to fully utilize the adversarial examples generated by the local model and transfer part of the attack to the local model to complete ahead of time, thereby reducing costs related to attacking the target model. Extensive experiments conducted on three public benchmarks show that our attack method can not only improve the success rate but also reduce the cost, while outperforming the baselines by a significant margin.

Download Full-text

A comparative review on deep learning models for text classification

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v19.i1.pp325-335 ◽

2020 ◽

Vol 19 (1) ◽

pp. 325

Author(s):

Muhammad Zulqarnain ◽

Rozaida Ghazali ◽

Yana Mazwin Mohmad Hassim ◽

Muhammad Rehan

Keyword(s):

Neural Network ◽

Deep Learning ◽

Language Processing ◽

Text Classification ◽

Question Answering ◽

Learning Models ◽

Semantic Classification ◽

Analysis Question ◽

Comparative Review ◽

Classification Tasks

<p>Text classification is a fundamental task in several areas of natural language processing (NLP), including words semantic classification, sentiment analysis, question answering, or dialog management. This paper investigates three basic architectures of deep learning models for the tasks of text classification: Deep Belief Neural (DBN), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN), these three main types of deep learning architectures, are largely explored to handled various classification tasks. DBN have excellent learning capabilities to extracts highly distinguishable features and good for general purpose. CNN have supposed to be better at extracting the position of various related features while RNN is modeling in sequential of long-term dependencies. This paper work shows the systematic comparison of DBN, CNN, and RNN on text classification tasks. Finally, we show the results of deep models by research experiment. The aim of this paper to provides basic guidance about the deep learning models that which models are best for the task of text classification.</p>

Download Full-text

Deep Learning Based Question Answering Search Engine

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit2172139 ◽

2021 ◽

pp. 25-32

Author(s):

Mrunal Malekar

Keyword(s):

Information Retrieval ◽

Deep Learning ◽

Natural Language ◽

Search Engine ◽

Language Processing ◽

Question Answering ◽

Research Work ◽

Construction Company ◽

Exact Answer ◽

Search For Information

Domain based Question Answering is concerned with building systems which provide answers to natural language questions that are asked specific to a domain. It comes under Information Retrieval and Natural language processing. Using Information Retrieval, one can search for the relevant documents which may contain the answer but it won’t give the exact answer for the question asked. In the presented work, a question answering search engine has been developed which first finds out the relevant documents from a huge textual document data of a construction company and then goes a step beyond to extract answer from the extracted document. The robust question answering system developed uses Elastic Search for Information Retrieval [paragraphs extraction] and Deep Learning for answering the question from the short extracted paragraph. It leverages BERT Deep Learning Model to understand the layers and representations between the question and answer. The research work also focuses on how to improve the search accuracy of the Information Retrieval based Elastic Search engine which returns the relevant documents which may contain the answer.

Download Full-text

A comparative analysis on question classification task based on deep learning approaches

PeerJ Computer Science ◽

10.7717/peerj-cs.570 ◽

2021 ◽

Vol 7 ◽

pp. e570

Author(s):

Muhammad Zulqarnain ◽

Ahmed Khalaf Zager Alsaedi ◽

Rozaida Ghazali ◽

Muhammad Ghulam Ghouse ◽

Wareesa Sharif ◽

...

Keyword(s):

Deep Learning ◽

Language Processing ◽

Web Mining ◽

Question Answering ◽

Short Term Memory ◽

Learning Approaches ◽

Question Classification ◽

Considerable Impact ◽

Turkish Language ◽

Learning Architectures

Question classification is one of the essential tasks for automatic question answering implementation in natural language processing (NLP). Recently, there have been several text-mining issues such as text classification, document categorization, web mining, sentiment analysis, and spam filtering that have been successfully achieved by deep learning approaches. In this study, we illustrated and investigated our work on certain deep learning approaches for question classification tasks in an extremely inflected Turkish language. In this study, we trained and tested the deep learning architectures on the questions dataset in Turkish. In addition to this, we used three main deep learning approaches (Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN)) and we also applied two different deep learning combinations of CNN-GRU and CNN-LSTM architectures. Furthermore, we applied the Word2vec technique with both skip-gram and CBOW methods for word embedding with various vector sizes on a large corpus composed of user questions. By comparing analysis, we conducted an experiment on deep learning architectures based on test and 10-cross fold validation accuracy. Experiment results were obtained to illustrate the effectiveness of various Word2vec techniques that have a considerable impact on the accuracy rate using different deep learning approaches. We attained an accuracy of 93.7% by using these techniques on the question dataset.

Download Full-text

FindThatQuote: A Question-Answering Web-based System to Locate Quotes using Deep Learning and Natural-Language Processing

10.5121/csit.2021.110909 ◽

2021 ◽

Author(s):

Nathan Ji ◽

Yu Sun

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Success Rate ◽

Language Processing ◽

Question Answering ◽

Specific Information ◽

Web Based ◽

User Friendly ◽

Web Based System

The digital age gives us access to a multitude of both information and mediums in which we can interpret information. A majority of the time, many people find interpreting such information difficult as the medium may not be as user friendly as possible. This project has examined the inquiry of how one can identify specific information in a given text based on a question. This inquiry is intended to streamline one's ability to determine the relevance of a given text relative to his objective. The project has an overall 80% success rate given 10 articles with three questions asked per article. This success rate indicates that this project is likely applicable to those who are asking for content level questions within an article.

Download Full-text

Mining Health-Related Issues in Consumer Product Reviews by Using Scalable Text Analytics

Biomedical Informatics Insights ◽

10.4137/bii.s37791 ◽

2016 ◽

Vol 8s1 ◽

pp. BII.S37791 ◽

Cited By ~ 5

Author(s):

Manabu Torii ◽

Sameer S. Tilak ◽

Son Doan ◽

Daniel S. Zisook ◽

Jung-wei Fan

Keyword(s):

Machine Learning ◽

Language Processing ◽

Consumer Product ◽

Future Research ◽

Product Reviews ◽

Learning Tools ◽

Text Analytics ◽

Related Information ◽

Online Product Reviews ◽

Health Related

In an era when most of our life activities are digitized and recorded, opportunities abound to gain insights about population health. Online product reviews present a unique data source that is currently underexplored. Health-related information, although scarce, can be systematically mined in online product reviews. Leveraging natural language processing and machine learning tools, we were able to mine 1.3 million grocery product reviews for health-related information. The objectives of the study were as follows: (1) conduct quantitative and qualitative analysis on the types of health issues found in consumer product reviews; (2) develop a machine learning classifier to detect reviews that contain health-related issues; and (3) gain insights about the task characteristics and challenges for text analytics to guide future research.

Download Full-text

Textual Backdoor Defense via Poisoned Sample Recognition

Applied Sciences ◽

10.3390/app11219938 ◽

2021 ◽

Vol 11 (21) ◽

pp. 9938

Author(s):

Kun Shao ◽

Yu Zhang ◽

Junan Yang ◽

Hui Liu

Keyword(s):

Success Rate ◽

Language Processing ◽

Training Data ◽

Infection Model ◽

Search Range ◽

Word Level ◽

Sentence Level ◽

Preliminary Model ◽

Sample Recognition ◽

Better Than

Deep learning models are vulnerable to backdoor attacks. The success rate of textual backdoor attacks based on data poisoning in existing research is as high as 100%. In order to enhance the natural language processing model’s defense against backdoor attacks, we propose a textual backdoor defense method via poisoned sample recognition. Our method consists of two parts: the first step is to add a controlled noise layer after the model embedding layer, and to train a preliminary model with incomplete or no backdoor embedding, which reduces the effectiveness of poisoned samples. Then, we use the model to initially identify the poisoned samples in the training set so as to narrow the search range of the poisoned samples. The second step uses all the training data to train an infection model embedded in the backdoor, which is used to reclassify the samples selected in the first step, and finally identify the poisoned samples. Through detailed experiments, we have proved that our defense method can effectively defend against a variety of backdoor attacks (character-level, word-level and sentence-level backdoor attacks), and the experimental effect is better than the baseline method. For the BERT model trained by the IMDB dataset, this method can even reduce the success rate of word-level backdoor attacks to 0%.

Download Full-text

Semantic-Preserving Adversarial Text Attacks

10.36227/techrxiv.17102927.v1 ◽

2021 ◽

Author(s):

Xinghao Yang ◽

Yongshun Gong ◽

Weifeng Liu ◽

JAMES BAILEY ◽

Tianqing Zhu ◽

...

Keyword(s):

Deep Learning ◽

Potential Candidate ◽

Success Rates ◽

Learning Models ◽

Text Documents ◽

Word Level ◽

Sentence Level ◽

Adversarial Attack ◽

Semantic Preservation ◽

Adversarial Example

Deep learning models are known immensely brittle to adversarial image examples, yet their vulnerability in text classification is insufficiently explored. Existing text adversarial attack strategies can be roughly divided into three categories, i.e., character-level attack, word-level attack, and sentence-level attack. Despite the success brought by recent text attack methods, how to induce misclassification with the minimal text modifications while keeping the lexical correctness, syntactic soundness, and semantic consistency simultaneously is still a challenge. To examine the vulnerability of deep models, we devise a Bigram and Unigram based adaptive Semantic Preservation Optimization (BU-SPO) approach which attacks text documents not only at a unigram word level but also at a bigram level to avoid generating meaningless sentences. We also present a hybrid attack strategy that collects substitution words from both synonyms and sememe candidates, to enrich the potential candidate set. Besides, a Semantic Preservation Optimization (SPO) method is devised to determine the word substitution priority and reduce the perturbation cost. Furthermore, we constraint the SPO with a semantic Filter (dubbed SPOF) to improve the semantic similarity between the input text and the adversarial example. To estimate the effectiveness of our proposed methods, BU-SPO and BU-SPOF, we attack four victim deep learning models trained on three real-world text datasets. Experimental results demonstrate that our approaches accomplish the highest semantics consistency and attack success rates by making the minimal word modifications compared with competitive methods.

Download Full-text

Semantic-Preserving Adversarial Text Attacks

10.36227/techrxiv.17102927 ◽

2021 ◽

Author(s):

Xinghao Yang ◽

Yongshun Gong ◽

Weifeng Liu ◽

JAMES BAILEY ◽

Tianqing Zhu ◽

...

Keyword(s):

Deep Learning ◽

Potential Candidate ◽

Success Rates ◽

Learning Models ◽

Text Documents ◽

Word Level ◽

Sentence Level ◽

Adversarial Attack ◽

Semantic Preservation ◽

Adversarial Example

Deep learning models are known immensely brittle to adversarial image examples, yet their vulnerability in text classification is insufficiently explored. Existing text adversarial attack strategies can be roughly divided into three categories, i.e., character-level attack, word-level attack, and sentence-level attack. Despite the success brought by recent text attack methods, how to induce misclassification with the minimal text modifications while keeping the lexical correctness, syntactic soundness, and semantic consistency simultaneously is still a challenge. To examine the vulnerability of deep models, we devise a Bigram and Unigram based adaptive Semantic Preservation Optimization (BU-SPO) approach which attacks text documents not only at a unigram word level but also at a bigram level to avoid generating meaningless sentences. We also present a hybrid attack strategy that collects substitution words from both synonyms and sememe candidates, to enrich the potential candidate set. Besides, a Semantic Preservation Optimization (SPO) method is devised to determine the word substitution priority and reduce the perturbation cost. Furthermore, we constraint the SPO with a semantic Filter (dubbed SPOF) to improve the semantic similarity between the input text and the adversarial example. To estimate the effectiveness of our proposed methods, BU-SPO and BU-SPOF, we attack four victim deep learning models trained on three real-world text datasets. Experimental results demonstrate that our approaches accomplish the highest semantics consistency and attack success rates by making the minimal word modifications compared with competitive methods.

Download Full-text