Towards Adversarial Genetic Text Generation

Mapping Intimacies ◽

10.5121/csit.2021.110407 ◽

2021 ◽

Author(s):

Deniz Kavi

Keyword(s):

Genetic Algorithm ◽

Natural Language ◽

Language Processing ◽

Text Classification ◽

Future Research ◽

Grading System ◽

Text Generation ◽

Recent Success ◽

Clustering Model ◽

Better Than

Text generation is the task of generating natural language, and producing outputs similar to or better than human texts. Due to deep learning’s recent success in the field of natural language processing, computer generated text has come closer to becoming indistinguishable to human writing. Genetic Algorithms have not been as popular in the field of text generation. We propose a genetic algorithm combined with text classification and clustering models which automatically grade the texts generated by the genetic algorithm. The genetic algorithm is given poorly generated texts from a Markov chain, these texts are then graded by a text classifier and a text clustering model. We then apply crossover to pairs of texts, with emphasis on those that received higher grades. Changes to the grading system and further improvements to the genetic algorithm are to be the focus of future research.

Download Full-text

An empirical comparison of distance/similarity measures for Natural Language Processing

10.5753/eniac.2019.9328 ◽

2019 ◽

Author(s):

Dimmy Magalhães ◽

Aurora Pozo ◽

Roberto Santana

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Classification ◽

Euclidean Distance ◽

Similarity Measures ◽

Convolutional Networks ◽

Statistical Similarity ◽

The Impact ◽

Better Than

Text Classification is one of the tasks of Natural Language Processing (NLP). In this area, Graph Convolutional Networks (GCN) has achieved values higher than CNN's and other related models. For GCN, the metric that defines the correlation between words in a vector space plays a crucial role in the classification because it determines the weight of the edges between two words (represented by nodes in the graph). In this study, we empirically investigated the impact of thirteen measures of distance/similarity. A representation was built for each document using word embedding from word2vec model. Also, a graph-based representation of five dataset was created for each measure analyzed, where each word is a node in the graph, and each edge is weighted by distance/similarity between words. Finally, each model was run in a simple graph neural network. The results show that, concerning text classification, there is no statistical difference between the analyzed metrics and the Graph Convolution Network. Even with the incorporation of external words or external knowledge, the results were similar to the methods without the incorporation of words. However, the results indicate that some distance metrics behave better than others in relation to context capture, with Euclidean distance reaching the best values or having statistical similarity with the best.

Download Full-text

Data Mining

Applied Natural Language Processing ◽

10.4018/978-1-60960-741-8.ch005 ◽

2012 ◽

pp. 75-94 ◽

Cited By ~ 1

Author(s):

Martin Atzmueller

Keyword(s):

Data Mining ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Classification ◽

Pattern Mining ◽

Data Sources ◽

Future Research ◽

Research Directions ◽

Future Research Directions

Data Mining provides approaches for the identification and discovery of non-trivial patterns and models hidden in large collections of data. In the applied natural language processing domain, data mining usually requires preprocessed data that has been extracted from textual documents. Additionally, this data is often integrated with other data sources. This chapter provides an overview on data mining focusing on approaches for pattern mining, cluster analysis, and predictive model construction. For those, we discuss exemplary techniques that are especially useful in the applied natural language processing context. Additionally, we describe how the presented data mining approaches are connected to text mining, text classification, and clustering, and discuss interesting problems and future research directions.

Download Full-text

Design of GA and Ontology based NLP Frameworks for Online Opinion Mining

Recent Patents on Engineering ◽

10.2174/1872212112666180115162726 ◽

2019 ◽

Vol 13 (2) ◽

pp. 159-165

Author(s):

Manik Sharma ◽

Gurvinder Singh ◽

Rajinder Singh

Keyword(s):

Genetic Algorithm ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Opinion Mining ◽

Hybrid Genetic Algorithm ◽

Online Reviews ◽

Middle Tier ◽

Complete Set ◽

Mining Model

Background: For almost every domain, a tremendous degree of data is accessible in an online and offline mode. Billions of users are daily posting their views or opinions by using different online applications like WhatsApp, Facebook, Twitter, Blogs, Instagram etc. Objective: These reviews are constructive for the progress of the venture, civilization, state and even nation. However, this momentous amount of information is useful only if it is collectively and effectively mined. Methodology: Opinion mining is used to extract the thoughts, expression, emotions, critics, appraisal from the data posted by different persons. It is one of the prevailing research techniques that coalesce and employ the features from natural language processing. Here, an amalgamated approach has been employed to mine online reviews. Results: To improve the results of genetic algorithm based opining mining patent, here, a hybrid genetic algorithm and ontology based 3-tier natural language processing framework named GAO_NLP_OM has been designed. First tier is used for preprocessing and corrosion of the sentences. Middle tier is composed of genetic algorithm based searching module, ontology for English sentences, base words for the review, complete set of English words with item and their features. Genetic algorithm is used to expedite the polarity mining process. The last tier is liable for semantic, discourse and feature summarization. Furthermore, the use of ontology assists in progressing more accurate opinion mining model. Conclusion: GAO_NLP_OM is supposed to improve the performance of genetic algorithm based opinion mining patent. The amalgamation of genetic algorithm, ontology and natural language processing seems to produce fast and more precise results. The proposed framework is able to mine simple as well as compound sentences. However, affirmative preceded interrogative, hidden feature and mixed language sentences still be a challenge for the proposed framework.

Download Full-text