Text Classification Techniques: A Literature Review

Aim/Purpose: The aim of this paper is to analyze various text classification techniques employed in practice, their strengths and weaknesses, to provide an improved awareness regarding various knowledge extraction possibilities in the field of data mining. Background: Artificial Intelligence is reshaping text classification techniques to better acquire knowledge. However, in spite of the growth and spread of AI in all fields of research, its role with respect to text mining is not well understood yet. Methodology: For this study, various articles written between 2010 and 2017 on “text classification techniques in AI”, selected from leading journals of computer science, were analyzed. Each article was completely read. The research problems related to text classification techniques in the field of AI were identified and techniques were grouped according to the algorithms involved. These algorithms were divided based on the learning procedure used. Finally, the findings were plotted as a tree structure for visualizing the relationship between learning procedures and algorithms. Contribution: This paper identifies the strengths, limitations, and current research trends in text classification in an advanced field like AI. This knowledge is crucial for data scientists. They could utilize the findings of this study to devise customized data models. It also helps the industry to understand the operational efficiency of text mining techniques. It further contributes to reducing the cost of the projects and supports effective decision making. Findings: It has been found more important to study and understand the nature of data before proceeding into mining. The automation of text classification process is required, with the increasing amount of data and need for accuracy. Another interesting research opportunity lies in building intricate text data models with deep learning systems. It has the ability to execute complex Natural Language Processing (NLP) tasks with semantic requirements. Recommendations for Practitioners: Frame analysis, deception detection, narrative science where data expresses a story, healthcare applications to diagnose illnesses and conversation analysis are some of the recommendations suggested for practitioners. Recommendation for Researchers: Developing simpler algorithms in terms of coding and implementation, better approaches for knowledge distillation, multilingual text refining, domain knowledge integration, subjectivity detection, and contrastive viewpoint summarization are some of the areas that could be explored by researchers. Impact on Society: Text classification forms the base of data analytics and acts as the engine behind knowledge discovery. It supports state-of-the-art decision making, for example, predicting an event before it actually occurs, classifying a transaction as ‘Fraudulent’ etc. The results of this study could be used for developing applications dedicated to assisting decision making processes. These informed decisions will help to optimize resources and maximize benefits to the mankind. Future Research: In the future, better methods for parameter optimization will be identified by selecting better parameters that reflects effective knowledge discovery. The role of streaming data processing is still rarely explored when it comes to text classification.

Download Full-text

A Review on Knowledge Discovery using Text Classification Techniques in Text Mining

International Journal of Computer Applications ◽

10.5120/19542-0784 ◽

2015 ◽

Vol 111 (6) ◽

pp. 12-15 ◽

Cited By ~ 3

Author(s):

Chauhan ShrihariR ◽

Amish Desai

Keyword(s):

Text Mining ◽

Knowledge Discovery ◽

Text Classification ◽

Classification Techniques

Download Full-text

Towards Robust Text Classification with Semantics-Aware Recurrent Neural Architecture

Machine Learning and Knowledge Extraction ◽

10.3390/make1020034 ◽

2019 ◽

Vol 1 (2) ◽

pp. 575-589 ◽

Cited By ~ 1

Author(s):

Blaž Škrlj ◽

Jan Kralj ◽

Nada Lavrač ◽

Senja Pollak

Keyword(s):

Text Mining ◽

Language Processing ◽

Text Classification ◽

Deep Neural Networks ◽

Semantic Knowledge ◽

Text Documents ◽

Neural Architecture ◽

Classification Tasks ◽

And Gender ◽

Semantic Resources

Deep neural networks are becoming ubiquitous in text mining and natural language processing, but semantic resources, such as taxonomies and ontologies, are yet to be fully exploited in a deep learning setting. This paper presents an efficient semantic text mining approach, which converts semantic information related to a given set of documents into a set of novel features that are used for learning. The proposed Semantics-aware Recurrent deep Neural Architecture (SRNA) enables the system to learn simultaneously from the semantic vectors and from the raw text documents. We test the effectiveness of the approach on three text classification tasks: news topic categorization, sentiment analysis and gender profiling. The experiments show that the proposed approach outperforms the approach without semantic knowledge, with highest accuracy gain (up to 10%) achieved on short document fragments.

Download Full-text

Text Mining and Automation for Processing of Patient Referrals

Applied Clinical Informatics ◽

10.1055/s-0038-1639482 ◽

2018 ◽

Vol 09 (01) ◽

pp. 232-237 ◽

Cited By ~ 1

Author(s):

James Todd ◽

Brent Richards ◽

Bruce Vanstone ◽

Adrian Gepp

Keyword(s):

Pilot Study ◽

Text Mining ◽

Human Resources ◽

Language Processing ◽

Classification Model ◽

Future Research ◽

Manual Task ◽

Care Processes ◽

Patient Referrals ◽

Clinical Urgency

Background Various tasks within health care processes are repetitive and time-consuming, requiring personnel who could be better utilized elsewhere. The task of assigning clinical urgency categories to internal patient referrals is one such case of a time-consuming process, which may be amenable to automation through the application of text mining and natural language processing (NLP) techniques. Objective This article aims to trial and evaluate a pilot study for the first component of the task—determining reasons for referrals. Methods Text is extracted from scanned patient referrals before being processed to remove nonsensical symbols and identify key information. The processed data are compared against a list of conditions that represent possible reasons for referral. Similarity scores are used as a measure of overlap in terms used in the processed data and the condition list. Results This pilot study was successful, and results indicate that it would be valuable for future research to develop a more sophisticated classification model for determining reasons for referrals. Issues encountered in the pilot study and methods of addressing them were outlined and should be of use to researchers working on similar problems. Conclusion This pilot study successfully demonstrated that there is potential for automating the assignment of reasons for referrals and provides a foundation for further work to build on. This study also outlined a potential application of text mining and NLP to automating a manual task in hospitals to save time of human resources.

Download Full-text

Implicit Knowledge Discovery in Design Semantic Network by Applying Pythagorean Means on Shortest Path Searching

Volume 1: 37th Computers and Information in Engineering Conference ◽

10.1115/detc2017-67230 ◽

2017 ◽

Cited By ~ 1

Author(s):

Feng Shi ◽

Liuqing Chen ◽

Ji Han ◽

Peter Childs

Keyword(s):

Text Mining ◽

Knowledge Discovery ◽

Language Processing ◽

Shortest Path ◽

Large Scale ◽

Semantic Network ◽

Semantic Networks ◽

Implicit Knowledge ◽

Implicit Associations ◽

Correlation Degree

With the advent of the big-data era, massive textual information stored in electronic and digital documents have become valuable resources for knowledge discovery in the fields of design and engineering. Ontology technologies and semantic networks have been widely applied with text mining techniques including Natural Language Processing (NLP) to extract structured knowledge associations from the large-scale unstructured textual data. However, most existing works mainly focus on how to construct the semantic networks by developing various text mining methods such as statistical approaches and semantic approaches, while few studies are found to focus on how to subsequently analyze and fully utilize the already well-established semantic networks. In this paper, a specific network analysis method is proposed to discover the implicit knowledge associations from the existing semantic network for improving knowledge discovery and design innovation. Pythagorean means are applied with Dijkstra’s shortest path algorithm to discover the implicit knowledge associations either around a single knowledge concept or between two concepts. Six criteria are established to evaluate and rank the correlation degree of the implicit associations. Two engineering case studies were conducted to illustrate the proposed knowledge discovery process, and the results showed the effectiveness of the retrieved implicit knowledge associations on helping providing relevant knowledge from various aspects, and provoking creative ideas for engineering innovation.

Download Full-text

Text Mining in Cybersecurity

ACM Computing Surveys ◽

10.1145/3462477 ◽

2021 ◽

Vol 54 (7) ◽

pp. 1-36

Author(s):

Luciano Ignaczak ◽

Guilherme Goldschmidt ◽

Cristiano André Da Costa ◽

Rodrigo Da Rosa Righi

Keyword(s):

Neural Networks ◽

Text Mining ◽

Literature Review ◽

Text Classification ◽

Real World ◽

Classification Performance ◽

Unstructured Data ◽

Future Research ◽

Level Of Automation ◽

Data Volume

The growth of data volume has changed cybersecurity activities, demanding a higher level of automation. In this new cybersecurity landscape, text mining emerged as an alternative to improve the efficiency of the activities involving unstructured data. This article proposes a Systematic Literature Review ( SLR ) to present the application of text mining in the cybersecurity domain. Using a systematic protocol, we identified 2,196 studies, out of which 83 were summarized. As a contribution, we propose a taxonomy to demonstrate the different activities in the cybersecurity domain supported by text mining. We also detail the strategies evaluated in the application of text mining tasks and the use of neural networks to support activities involving unstructured data. The work also discusses text classification performance aiming its application in real-world solutions. The SLR also highlights open gaps for future research, such as the analysis of non-English content and the intensification in the usage of neural networks.

Download Full-text

Sentiment Analysis using Rapid Miner

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i3332.0789s319 ◽

2019 ◽

Vol 8 (9S3) ◽

pp. 1589-1594

Keyword(s):

Data Mining ◽

Text Mining ◽

Sentiment Analysis ◽

Language Processing ◽

Quality Information ◽

Classification Techniques ◽

Major Task ◽

Text Document ◽

Sentence Level ◽

Day By Day

Now a day the data grows day by day so data mining replaced by big data. Under data mining, Text mining is one of the processes of deriving structured or quality information or data from text document. It helps to business for finding valuable knowledge. Sentiment analysis is one of the applications in text mining. In sentiment analysis, determine the emotional tone under the text. It is the major task of natural language processing. The objective of this paper to categorize the document in sentence level and review level, and classification techniques applied on the dataset (electronic product data). There is an ensemble number of classification techniques applied on the dataset. Then compare each techniques, based on various parameters and find out which one is best. According to that give better suggestions to the company for improving the product.

Download Full-text

Towards Adversarial Genetic Text Generation

10.5121/csit.2021.110407 ◽

2021 ◽

Author(s):

Deniz Kavi

Keyword(s):

Genetic Algorithm ◽

Natural Language ◽

Language Processing ◽

Text Classification ◽

Future Research ◽

Grading System ◽

Text Generation ◽

Recent Success ◽

Clustering Model ◽

Better Than

Text generation is the task of generating natural language, and producing outputs similar to or better than human texts. Due to deep learning’s recent success in the field of natural language processing, computer generated text has come closer to becoming indistinguishable to human writing. Genetic Algorithms have not been as popular in the field of text generation. We propose a genetic algorithm combined with text classification and clustering models which automatically grade the texts generated by the genetic algorithm. The genetic algorithm is given poorly generated texts from a Markov chain, these texts are then graded by a text classifier and a text clustering model. We then apply crossover to pairs of texts, with emphasis on those that received higher grades. Changes to the grading system and further improvements to the genetic algorithm are to be the focus of future research.

Download Full-text

On Language Processing Shaping Decision Making

Current Directions in Psychological Science ◽

10.1177/0963721416680263 ◽

2017 ◽

Vol 26 (2) ◽

pp. 146-151 ◽

Cited By ~ 40

Author(s):

Albert Costa ◽

Marc–Lluís Vives ◽

Joanna D. Corey

Keyword(s):

Decision Making ◽

Foreign Language ◽

Language Processing ◽

Native Language ◽

Dual System ◽

Future Research ◽

Research Directions ◽

Decision Making Processes ◽

Future Research Directions ◽

The Impact

Recent research has revealed that people’s preferences, choices, and judgments are affected by whether information is presented in a foreign or a native language. Here, we review this evidence, focusing on various decision-making domains and advancing a variety of potential explanations for this foreign-language effect on decision making. We interpret the findings in the context of dual-system theories of decision making, entertaining the possibility that foreign-language processing reduces the impact of intuition and/or increases the impact of deliberation on people’s choices. In closing, we suggest future research directions for progressing our understanding of how language and decision-making processes interact when guiding people’s decisions.

Download Full-text

Data Mining

Applied Natural Language Processing ◽

10.4018/978-1-60960-741-8.ch005 ◽

2012 ◽

pp. 75-94 ◽

Cited By ~ 1

Author(s):

Martin Atzmueller

Keyword(s):

Data Mining ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Classification ◽

Pattern Mining ◽

Data Sources ◽

Future Research ◽

Research Directions ◽

Future Research Directions

Data Mining provides approaches for the identification and discovery of non-trivial patterns and models hidden in large collections of data. In the applied natural language processing domain, data mining usually requires preprocessed data that has been extracted from textual documents. Additionally, this data is often integrated with other data sources. This chapter provides an overview on data mining focusing on approaches for pattern mining, cluster analysis, and predictive model construction. For those, we discuss exemplary techniques that are especially useful in the applied natural language processing context. Additionally, we describe how the presented data mining approaches are connected to text mining, text classification, and clustering, and discuss interesting problems and future research directions.

Download Full-text

Bilingualism: A neurocognitive exercise in managing uncertainty

Neurobiology of Language ◽

10.1162/nol_a_00044 ◽

2021 ◽

pp. 1-43

Author(s):

Jason W. Gullifer ◽

Debra Titone

Keyword(s):

Decision Making ◽

Language Processing ◽

Executive Control ◽

Control Strategies ◽

Individual Variability ◽

Future Research ◽

Potential Mechanism ◽

Language Fluency ◽

Cognitive Framework ◽

Research Domains

Abstract Bilinguals have distinct linguistic experiences relative to monolinguals, stemming from interactions with the environment and individuals therein. Theories of language control hypothesize that these experiences play a role in adapting the neurocognitive systems responsible for control. Here we posit a potential mechanism for these adaptations, namely that bilinguals face additional language-related uncertainties on top of other ambiguities that regularly occur in language, such as lexical and syntactic competition. When faced with uncertainty in the environment, people adapt internal representations to lessen these uncertainties, which can aid in executive control and decision-making. We overview a cognitive framework on uncertainty, which we extend to language and bilingualism. We then review two “case studies” assessing language-related uncertainty for bilingual contexts using language entropy and network scientific approaches. Overall, we find that there is substantial individual variability in the extent to which people experience language related uncertainties in their environments, but also regularity across some contexts. This information, in turn, predicts cognitive adaptations associated with language fluency and engagement in proactive cognitive control strategies. These findings suggest that bilinguals adapt to the cumulative language-related uncertainties in the environment. We conclude by suggesting avenues for future research and links with other research domains. Ultimately, a focus on uncertainty will help bridge traditionally separate scientific domains, such as language processing, bilingualism, and decision-making.

Download Full-text