automated text classification Latest Research Papers

The definition of a set of informative features capable of representing and discriminating documents is paramount for the task of automatically classifying documents. In this doctoral dissertation, we present the most comprehensive study so far on the role of meta-features (high-level features built from lower-level ones) as an alternative for representing documents. We start by proposing new sets of (meta-)features that exploit distance measures in the original (bag-of-words) feature space to summarize potentially complex relationships between documents. We then (i) analyze the discriminative power of such meta-features with novel multi-objective feature selection strategies; (ii) provide new GPU implementations to reduce computational time; (iii) enrich distance relationships with labeled or context-specific information; (iv) adapt the proposed meta-features for tasks as hard as sentiment analysis. Our experimental results show that our meta-features can achieve remarkable classification results by distance exploitation, being the state-of-the-art in many situations and scenarios.

Download Full-text

The Classification of Short Scientific Texts Using Pretrained BERT Model

Studies in Health Technology and Informatics - Public Health and Informatics ◽

10.3233/shti210125 ◽

2021 ◽

Author(s):

Gleb Danilov ◽

Timur Ishankulov ◽

Konstantin Kotik ◽

Yuriy Orlov ◽

Mikhail Shifrin ◽

...

Keyword(s):

Language Processing ◽

Text Classification ◽

Binary Classification ◽

Scientific Texts ◽

Pubmed Database ◽

Automated Text Classification ◽

Literature Selection ◽

State Of Art ◽

Classification Quality

Automated text classification is a natural language processing (NLP) technology that could significantly facilitate scientific literature selection. A specific topical dataset of 630 article abstracts was obtained from the PubMed database. We proposed 27 parametrized options of PubMedBERT model and 4 ensemble models to solve a binary classification task on that dataset. Three hundred tests with resamples were performed in each classification approach. The best PubMedBERT model demonstrated F1-score = 0.857 while the best ensemble model reached F1-score = 0.853. We concluded that the short scientific texts classification quality might be improved using the latest state-of-art approaches.

Download Full-text

An Improved Multi-label Classifier Chain Method for Automated Text Classification

International Journal of Advanced Computer Science and Applications ◽

10.14569/ijacsa.2021.0120352 ◽

2021 ◽

Vol 12 (3) ◽

Author(s):

Adeleke Abdullahi ◽

Noor Azah ◽

Shamsul Kamal ◽

Zuhaila Ali

Keyword(s):

Text Classification ◽

Automated Text Classification

Download Full-text

How Threats Shape the Politics of Marginalized: Evidence from a Natural Experiment and Machine Learning

10.31235/osf.io/y65sd ◽

2020 ◽

Author(s):

Jae Yeon Kim ◽

Andrew Thompson

Keyword(s):

Machine Learning ◽

Information Seeking ◽

Natural Experiment ◽

Domestic Politics ◽

R Package ◽

Exogenous Shock ◽

Indian American ◽

Ethnic Newspapers ◽

September 11 Attacks ◽

Automated Text Classification

In this study, we used a natural experiment and machine learning to examine how threats prompt information seeking among marginalized populations. We traced how the September 11 attacks, an exogenous shock, increased the interest of Arab and Indian Americans in U.S. domestic politics. We classified 5,684 Arab American and Indian American newspaper articles using machine learning and estimated that three more articles on U.S. domestic politics were published daily in the post-9/11 period than in previous years. While the natural experiment design identifies the causal relationship between the intervention and the outcome variation, an automated text classification creates essential data for such a causal identification. This project also provides an accompanying R package that makes collecting data from the largest database of ethnic newspapers published in the U.S. easier and faster.

Download Full-text

Integrating Human and Machine Coding to Measure Political Issues in Ethnic Newspaper Articles

10.31235/osf.io/pg3aq ◽

2020 ◽

Author(s):

Jae Yeon Kim

Keyword(s):

African American ◽

Asian American ◽

Large Scale ◽

Minority Groups ◽

State Violence ◽

Training Data ◽

Political Issues ◽

Classification Framework ◽

The 1960S ◽

Automated Text Classification

The voices of racial minority groups have rarely been examined systematically with large-scale text analysis in political science. This study fills such a gap by applying an integrated classification framework to the analysis of the commonalities and differences in political issues that appeared in 78,305 articles from Asian American and African American newspapers from the 1960s to the 1980s. The automated text classification shows that Asian American newspapers focused on promoting collective gains more often than African American newspapers. Conversely, African American newspapers concentrated on preventing collective losses more than Asian American newspapers. The content analysis demonstrates that the issue priorities varied between the corpora, especially with respect to policy contexts. Gaining access to government resources was a more urgent issue for Asian Americans, while reducing or ending state violence, such as police brutality, was a more pressing matter for African Americans. It also helped avoid extreme interpretations of the machine coding, as the misalignment of political agendas between the two corpora widened up to 10 times when the training data were measured using the minimum, rather than the maximum, reliability threshold.

Download Full-text

Automated text classification of near-misses from safety reports: An improved deep learning approach

Advanced Engineering Informatics ◽

10.1016/j.aei.2020.101060 ◽

2020 ◽

Vol 44 ◽

pp. 101060 ◽

Cited By ~ 5

Author(s):

Weili Fang ◽

Hanbin Luo ◽

Shuangjie Xu ◽

Peter E.D. Love ◽

Zhenchuan Lu ◽

...

Keyword(s):

Deep Learning ◽

Text Classification ◽

Learning Approach ◽

Near Misses ◽

Automated Text Classification

Download Full-text

Deep learning in automated text classification: a case study using toxicological abstracts

Environment Systems & Decisions ◽

10.1007/s10669-020-09763-2 ◽

2020 ◽

Vol 40 (4) ◽

pp. 465-479 ◽

Cited By ~ 2

Author(s):

Arun Varghese ◽

George Agyeman-Badu ◽

Michelle Cawley

Keyword(s):

Deep Learning ◽

Text Classification ◽

Automated Text Classification

Download Full-text

Automated Text Classification System Based on Statistical Unified Model

Lecture Notes in Electrical Engineering - Advances in Automation ◽

10.1007/978-3-030-39225-3_114 ◽

2020 ◽

pp. 1079-1087

Author(s):

S. Skorynin ◽

A. Surkova

Keyword(s):

Text Classification ◽

Classification System ◽

Unified Model ◽

Automated Text Classification

Download Full-text

Automating quranic verses labeling using machine learning approach

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v16.i2.pp925-931 ◽

2019 ◽

Vol 16 (2) ◽

pp. 925

Author(s):

A. Adeleke ◽

N. Samsudin ◽

A. Mustapha ◽

S. Ahmad Khalid

Keyword(s):

Machine Learning ◽

Text Classification ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Accuracy Score ◽

Machine Learning Approach ◽

Class Labels ◽

Automated Text Classification

Classification of Quranic verses into predefined categories is an essential task in Quranic studies. However, in recent times, with the advancement in information technology and machine learning, several classification algorithms have been developed for the purpose of text classification tasks. Automated text classification (ATC) is a well-known technique in machine learning. It is the task of developing models that could be trained to automatically assign to each text instances a known label from a predefined state. In this paper, four conventional ML classifiers: support vector machine (SVM), naïve bayes (NB), decision trees (J48), nearest neighbor (k-NN), are used in classifying selected Quranic verses into three predefined class labels: faith (iman), worship (ibadah), etiquettes (akhlak). The Quranic data comprises of verses in chapter two (al-Baqara) of the holy scripture. In the results, the classifiers achieved above 80% accuracy score with naïve bayes (NB) algorithm recording the overall highest scores of 93.9% accuracy and 0.964 AUC.

Download Full-text

automated text classification
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Automated Text Classification of Maintenance Data of Higher Education Buildings Using Text Mining and Machine Learning Techniques

A Thorough Exploitation of Distance-Based Meta-Features for Automated Text Classification

The Classification of Short Scientific Texts Using Pretrained BERT Model

An Improved Multi-label Classifier Chain Method for Automated Text Classification

How Threats Shape the Politics of Marginalized: Evidence from a Natural Experiment and Machine Learning

Integrating Human and Machine Coding to Measure Political Issues in Ethnic Newspaper Articles

Automated text classification of near-misses from safety reports: An improved deep learning approach

Deep learning in automated text classification: a case study using toxicological abstracts

Automated Text Classification System Based on Statistical Unified Model

Automating quranic verses labeling using machine learning approach

Export Citation Format

automated text classificationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Automated Text Classification of Maintenance Data of Higher Education Buildings Using Text Mining and Machine Learning Techniques

A Thorough Exploitation of Distance-Based Meta-Features for Automated Text Classification

The Classification of Short Scientific Texts Using Pretrained BERT Model

An Improved Multi-label Classifier Chain Method for Automated Text Classification

How Threats Shape the Politics of Marginalized: Evidence from a Natural Experiment and Machine Learning

Integrating Human and Machine Coding to Measure Political Issues in Ethnic Newspaper Articles

Automated text classification of near-misses from safety reports: An improved deep learning approach

Deep learning in automated text classification: a case study using toxicological abstracts

Automated Text Classification System Based on Statistical Unified Model

Automating quranic verses labeling using machine learning approach

automated text classification
Recently Published Documents