text document classification Latest Research Papers

For a long time, both professionals and the lay public showed little interest in informal carers. Yet these people deals with multiple and common issues in their everyday lives. As the population is aging we can observe a change of this attitude. And thanks to the advances in computer science, we can offer them some effective assistance and support by providing necessary information and connecting them with both professional and lay public community. In this work we describe a project called “Research and development of support networks and information systems for informal carers for persons after stroke” producing an information system visible to public as a web portal. It does not provide just simple a set of information but using means of artificial intelligence, text document classification and crowdsourcing further improving its accuracy, it also provides means of effective visualization and navigation over the content made by most by the community itself and personalized on a level of informal carer’s phase of the care-taking timeline. In can be beneficial for informal carers as it allows to find a content specific to their current situation. This work describes our approach to classification of text documents and its improvement through crowdsourcing. Its goal is to test text documents classifier based on documents similarity measured by N-grams method and to design evaluation and crowdsourcing-based classification improvement mechanism. Interface for crowdsourcing was created using CMS WordPress. In addition to data collection, the purpose of interface is to evaluate classification accuracy, which leads to extension of classifier test data set, thus the classification is more successful.

Download Full-text

ArCAR: A Novel Deep Learning Computer-Aided Recognition for Character-Level Arabic Text Representation and Recognition

Algorithms ◽

10.3390/a14070216 ◽

2021 ◽

Vol 14 (7) ◽

pp. 216

Author(s):

Abdullah Y. Muaad ◽

Hanumanthappa Jayappa ◽

Mugahed A. Al-antari ◽

Sungyoung Lee

Keyword(s):

Deep Learning ◽

Sentiment Analysis ◽

Document Classification ◽

Arabic Text ◽

Text Representation ◽

Text Document ◽

Computer Aided ◽

Arabic Sentiment Analysis ◽

Arabic Text Classification ◽

Text Document Classification

Arabic text classification is a process to simultaneously categorize the different contextual Arabic contents into a proper category. In this paper, a novel deep learning Arabic text computer-aided recognition (ArCAR) is proposed to represent and recognize Arabic text at the character level. The input Arabic text is quantized in the form of 1D vectors for each Arabic character to represent a 2D array for the ArCAR system. The ArCAR system is validated over 5-fold cross-validation tests for two applications: Arabic text document classification and Arabic sentiment analysis. For document classification, the ArCAR system achieves the best performance using the Alarabiya-balance dataset in terms of overall accuracy, recall, precision, and F1-score by 97.76%, 94.08%, 94.16%, and 94.09%, respectively. Meanwhile, the ArCAR performs well for Arabic sentiment analysis, achieving the best performance using the hotel Arabic reviews dataset (HARD) balance dataset in terms of overall accuracy and F1-score by 93.58% and 93.23%, respectively. The proposed ArCAR seems to provide a practical solution for accurate Arabic text representation, understanding, and classification.

Download Full-text

A New Method of Automatic Text Document Classification

Automatic Documentation and Mathematical Linguistics ◽

10.3103/s0005105521030080 ◽

2021 ◽

Vol 55 (3) ◽

pp. 122-133

Author(s):

V. A. Yatsko

Keyword(s):

Document Classification ◽

New Method ◽

Text Document ◽

Text Document Classification ◽

Automatic Text

Download Full-text

Text document classification using a hybrid approach of ACOGA for feature selection

International Journal of Advanced Intelligence Paradigms ◽

10.1504/ijaip.2021.117613 ◽

2021 ◽

Vol 20 (1/2) ◽

pp. 158

Author(s):

Avjeet Singh ◽

Anoj Kumar

Keyword(s):

Feature Selection ◽

Hybrid Approach ◽

Document Classification ◽

Text Document ◽

Text Document Classification

Download Full-text

An Improved Approach of Unstructured Text Document Classification Using Predetermined Text Model and Probability Technique

Proceedings of the Fist International Conference on Advanced Scientific Innovation in Science, Engineering and Technology, ICASISET 2020, 16-17 May 2020, Chennai, India ◽

10.4108/eai.16-5-2020.2304041 ◽

2021 ◽

Author(s):

S Sreedhar ◽

Syed Ahmed ◽

P Flora ◽

LS Hemanth ◽

J Aishwarya ◽

...

Keyword(s):

Document Classification ◽

Text Document ◽

Unstructured Text ◽

Text Model ◽

Text Document Classification

Download Full-text

AUTOMATIC SUBJECT LABELING IN DOCUMENTS BY USING ONTOLOGY AND GRAPH DATABASES

Journal of Science and Technology - IUH ◽

10.46242/jst-iuh.v38i02.292 ◽

2020 ◽

Vol 38 (02) ◽

Author(s):

TẠ DUY CÔNG CHIẾN

Keyword(s):

Machine Learning ◽

Language Processing ◽

Data Science ◽

Document Classification ◽

Graph Database ◽

Graph Databases ◽

Text Documents ◽

Domain Specific ◽

Text Document ◽

Text Document Classification

Ontologies apply to many applications in recent years, such as information retrieval, information extraction, and text document classification. The purpose of domain-specific ontology is to enrich the identification of concept and the interrelationships. In our research, we use ontology to specify a set of generic subjects (concept) that characterizes the domain as well as their definitions and interrelationships. This paper introduces a system for labeling subjects of a text documents based on the differential layers of domain specific ontology, which contains the information and the vocabularies related to the computer domain. A document can contain several subjects such as data science, database, and machine learning. The subjects in text document classification are determined based on the differential layers of the domain specific ontology. We combine the methodologies of Natural Language Processing with domain ontology to determine the subjects in text document. In order to increase performance, we use graph database to store and access ontology. Besides, the paper focuses on evaluating our proposed algorithm with some other methods. Experimental results show that our proposed algorithm yields performance significantly

Download Full-text

Text document classification using fuzzy rough set based on robust nearest neighbor (FRS-RNN)

Soft Computing ◽

10.1007/s00500-020-05410-9 ◽

2020 ◽

Author(s):

Bichitrananda Behera ◽

G. Kumaravelan

Keyword(s):

Rough Set ◽

Nearest Neighbor ◽

Document Classification ◽

Fuzzy Rough Set ◽

Text Document ◽

Text Document Classification

Download Full-text

Crowd Sourcing as an Improvement of N-Grams Text Document Classification Algorithm

2020 15th International Workshop on Semantic and Social Media Adaptation and Personalization (SMA ◽

10.1109/smap49528.2020.9248454 ◽

2020 ◽

Author(s):

Petr Saloun ◽

David Andrsic ◽

Barbora Cigankova ◽

Ioannis Anagnostopoulos

Keyword(s):

Document Classification ◽

Classification Algorithm ◽

Crowd Sourcing ◽

Text Document ◽

Text Document Classification

Download Full-text

Correction to: Benchmarking performance of machine and deep learning-based methodologies for Urdu text document classification

Neural Computing and Applications ◽

10.1007/s00521-020-05435-z ◽

2020 ◽

Author(s):

Muhammad Nabeel Asim ◽

Muhammad Usman Ghani ◽

Muhammad Ali Ibrahim ◽

Waqar Mahmood ◽

Andreas Dengel ◽

...

Keyword(s):

Deep Learning ◽

Document Classification ◽

Text Document ◽

Text Document Classification ◽

Benchmarking Performance

Download Full-text

Document classification using convolutional neural networks with small window sizes and latent semantic analysis

Web Intelligence ◽

10.3233/web-200445 ◽

2020 ◽

Vol 18 (3) ◽

pp. 239-248

Author(s):

Eren Gultepe ◽

Mehran Kamkarhaghighi ◽

Masoud Makrehchi

Keyword(s):

Latent Semantic Analysis ◽

Semantic Analysis ◽

Classification Performance ◽

Document Classification ◽

Ease Of Use ◽

Linear Classifiers ◽

Text Document ◽

Small Window ◽

Average Improvement ◽

Text Document Classification

A parsimonious convolutional neural network (CNN) for text document classification that replicates the ease of use and high classification performance of linear methods is presented. This new CNN architecture can leverage locally trained latent semantic analysis (LSA) word vectors. The architecture is based on parallel 1D convolutional layers with small window sizes, ranging from 1 to 5 words. To test the efficacy of the new CNN architecture, three balanced text datasets that are known to perform exceedingly well with linear classifiers were evaluated. Also, three additional imbalanced datasets were evaluated to gauge the robustness of the LSA vectors and small window sizes. The new CNN architecture consisting of 1 to 4-grams, coupled with LSA word vectors, exceeded the accuracy of all linear classifiers on balanced datasets with an average improvement of 0.73%. In four out of the total six datasets, the LSA word vectors provided a maximum classification performance on par with or better than word2vec vectors in CNNs. Furthermore, in four out of the six datasets, the new CNN architecture provided the highest classification performance. Thus, the new CNN architecture and LSA word vectors could be used as a baseline method for text classification tasks.

Download Full-text

text document classification
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

SUPPORT OF INFORMAL CARERS FOR PEOPLE AFTER A STROKE WITH CROWDSOURCING AND NATURAL LANGUAGE PROCESSING

ArCAR: A Novel Deep Learning Computer-Aided Recognition for Character-Level Arabic Text Representation and Recognition

A New Method of Automatic Text Document Classification

Text document classification using a hybrid approach of ACOGA for feature selection

An Improved Approach of Unstructured Text Document Classification Using Predetermined Text Model and Probability Technique

AUTOMATIC SUBJECT LABELING IN DOCUMENTS BY USING ONTOLOGY AND GRAPH DATABASES

Text document classification using fuzzy rough set based on robust nearest neighbor (FRS-RNN)

Crowd Sourcing as an Improvement of N-Grams Text Document Classification Algorithm

Correction to: Benchmarking performance of machine and deep learning-based methodologies for Urdu text document classification

Document classification using convolutional neural networks with small window sizes and latent semantic analysis

Export Citation Format

text document classificationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

SUPPORT OF INFORMAL CARERS FOR PEOPLE AFTER A STROKE WITH CROWDSOURCING AND NATURAL LANGUAGE PROCESSING

ArCAR: A Novel Deep Learning Computer-Aided Recognition for Character-Level Arabic Text Representation and Recognition

A New Method of Automatic Text Document Classification

Text document classification using a hybrid approach of ACOGA for feature selection

An Improved Approach of Unstructured Text Document Classification Using Predetermined Text Model and Probability Technique

AUTOMATIC SUBJECT LABELING IN DOCUMENTS BY USING ONTOLOGY AND GRAPH DATABASES

Text document classification using fuzzy rough set based on robust nearest neighbor (FRS-RNN)

Crowd Sourcing as an Improvement of N-Grams Text Document Classification Algorithm

Correction to: Benchmarking performance of machine and deep learning-based methodologies for Urdu text document classification

Document classification using convolutional neural networks with small window sizes and latent semantic analysis

text document classification
Recently Published Documents