Classification of text documents through distance measurement: An experiment with multi-domain Bangla text documents

The computerized modeling of cognitive visual information has been a research field of great interest in the past several decades. The research field is interesting not only from a biological perspective, but also from an engineering point of view when systems are developed that aim to achieve similar goals as biological cognitive systems. This article introduces a general framework for the extraction and systematic storage of low-level visual features. The applicability of the framework is investigated in both unstructured and highly structured environments. In a first experiment, a linear categorization algorithm originally developed for the classification of text documents is used to classify natural images taken from the Caltech 101 database. In a second experiment, the framework is used to provide an automatically guided vehicle with obstacle detection and auto-positioning functionalities in highly structured environments. Results demonstrate that the model is highly applicable in structured environments, and also shows promising results in certain cases when used in unstructured environments.

Download Full-text

Roman Urdu Headline News Text Classification Using RNN, LSTM and CNN

Advances in Data Science and Adaptive Analysis ◽

10.1142/s2424922x20500084 ◽

2020 ◽

pp. 2050008

Author(s):

Irfan Ali Kandhro ◽

Sahar Zafar Jumani ◽

Kamlash Kumar ◽

Abdul Hafeez ◽

Fayyaz Ali

Keyword(s):

Text Classification ◽

Research Work ◽

Second Step ◽

Text Documents ◽

Data Set ◽

News Websites ◽

Testing Accuracy ◽

Headline News ◽

Automated Tool

This paper presents the automated tool for the classification of text with respect to predefined categories. It has always been considered as a vital method to manage and process a huge number of documents in digital forms which are widespread and continuously increasing. Most of the research work in text classification has been done in Urdu, English and other languages. But limited research work has been carried out on roman data. Technically, the process of the text classification follows two steps: the first step consists of choosing the main features from all the available features of the text documents with the usage of feature extraction techniques. The second step applies classification algorithms on those chosen features. The data set is collected through scraping tools from the most popular news websites Awaji Awaze and Daily Jhoongar. Furthermore, the data set splits in training and testing 70%, 30%, respectively. In this paper, the deep learning models, such as RNN, LSTM, and CNN, are used for classification of roman Urdu headline news. The testing accuracy of RNN (81%), LSTM (82%), and CNN (79%), and the experimental results demonstrate that the performance of the LSTM method is state-of-art method compared to CNN and RNN.

Download Full-text

An Efficient Filtered Classifier for Classification of Unseen Test Data in Text Documents

2017 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC) ◽

10.1109/iccic.2017.8524416 ◽

2017 ◽

Cited By ~ 3

Author(s):

G.Naga Chandrika ◽

E.Srinivasa Reddy

Keyword(s):

Test Data ◽

Text Documents

Download Full-text

Classification of text documents through distance measurement: An experiment with multi-domain Bangla text documents

Computing Correlative Association of Terms for Automatic Classification of Text Documents

An efficient approach for dimensionality reduction and classification of high dimensional text documents

Varying Naïve Bayes Models With Applications to Classification of Chinese Text Documents

Distributed Classification of Text Documents on Apache Spark Platform

Classification of Text Documents Using B-Tree

Classification of Text Documents

Multi-attribute Classification of Text Documents as a Tool for Ranking and Categorization of Educational Innovation Projects

A Generic Framework for Feature Representations in Image Categorization Tasks

Roman Urdu Headline News Text Classification Using RNN, LSTM and CNN

An Efficient Filtered Classifier for Classification of Unseen Test Data in Text Documents

Export Citation Format