text classifiers Latest Research Papers

Background: The spread of rumors related to COVID-19 on social media has posed substantial challenges to public health governance, and thus exposing rumors and curbing their spread quickly and effectively has become an urgent task. This study aimed to assist in formulating effective strategies to debunk rumors and curb their spread on social media.Methods: A total of 2,053 original postings and 100,348 comments that replied to the postings of five false rumors related to COVID-19 (dated from January 20, 2020, to June 28, 2020) belonging to three categories, authoritative, social, and political, on Sina Weibo in China were randomly selected. To study the effectiveness of different debunking methods, a new annotation scheme was proposed that divides debunking methods into six categories: denial, further fact-checking, refutation, person response, organization response, and combination methods. Text classifiers using deep learning methods were built to automatically identify four user stances in comments that replied to debunking postings: supporting, denying, querying, and commenting stances. Then, based on stance responses, a debunking effectiveness index (DEI) was developed to measure the effectiveness of different debunking methods.Results: The refutation method with cited evidence has the best debunking effect, whether used alone or in combination with other debunking methods. For the social category of Car rumor and political category of Russia rumor, using the refutation method alone can achieve the optimal debunking effect. For authoritative rumors, a combination method has the optimal debunking effect, but the most effective combination method requires avoiding the use of a combination of a debunking method where the person or organization defamed by the authoritative rumor responds personally and the refutation method.Conclusion: The findings provide relevant insights into ways to debunk rumors effectively, support crisis management of false information, and take necessary actions in response to rumors amid public health emergencies.

Download Full-text

Emotionally charged text classification with deep learning and sentiment semantic

Neural Computing and Applications ◽

10.1007/s00521-021-06542-1 ◽

2021 ◽

Author(s):

Jeow Li Huan ◽

Arif Ahmed Sekh ◽

Chai Quek ◽

Dilip K. Prasad

Keyword(s):

Language Processing ◽

Text Classification ◽

Classification Accuracy ◽

State Of The Art ◽

Document Representation ◽

Classical Technique ◽

Text Classifiers ◽

Vector Sequences ◽

Fully Connected ◽

Better Than

AbstractText classification is one of the widely used phenomena in different natural language processing tasks. State-of-the-art text classifiers use the vector space model for extracting features. Recent progress in deep models, recurrent neural networks those preserve the positional relationship among words achieve a higher accuracy. To push text classification accuracy even higher, multi-dimensional document representation, such as vector sequences or matrices combined with document sentiment, should be explored. In this paper, we show that documents can be represented as a sequence of vectors carrying semantic meaning and classified using a recurrent neural network that recognizes long-range relationships. We show that in this representation, additional sentiment vectors can be easily attached as a fully connected layer to the word vectors to further improve classification accuracy. On the UCI sentiment labelled dataset, using the sequence of vectors alone achieved an accuracy of 85.6%, which is better than 80.7% from ridge regression classifier—the best among the classical technique we tested. Additional sentiment information further increases accuracy to 86.3%. On our suicide notes dataset, the best classical technique—the Naíve Bayes Bernoulli classifier, achieves accuracy of 71.3%, while our classifier, incorporating semantic and sentiment information, exceeds that at 75% accuracy.

Download Full-text

An Effective Model to Predict the Extension of Code Changes in Bug Fixing Process Using Text Classifiers

Iranian Journal of Science and Technology Transactions of Electrical Engineering ◽

10.1007/s40998-021-00458-1 ◽

2021 ◽

Author(s):

Reza Sepahvand ◽

Reza Akbari ◽

Sattar Hashemi ◽

Omid Boushehrian

Keyword(s):

Effective Model ◽

Bug Fixing ◽

Text Classifiers ◽

Code Changes

Download Full-text

Efficiency of short text classifiers for payment classification

10.1109/itnt52450.2021.9649385 ◽

2021 ◽

Author(s):

Eduard Zubchuk ◽

Dmitry Menshikov ◽

Nikolay Mikhaylovsky

Keyword(s):

Short Text ◽

Text Classifiers

Download Full-text

Unsupervised law article mining based on deep pre-trained language representation models with application to the Italian civil code

Artificial Intelligence and Law ◽

10.1007/s10506-021-09301-8 ◽

2021 ◽

Author(s):

Andrea Tagarelli ◽

Andrea Simeri

Keyword(s):

Deep Learning ◽

Language Processing ◽

Civil Code ◽

Fine Tuning ◽

Learning Approaches ◽

Retrieval Task ◽

Learning Framework ◽

Text Classifiers ◽

The Law ◽

Prediction Problems

AbstractModeling law search and retrieval as prediction problems has recently emerged as a predominant approach in law intelligence. Focusing on the law article retrieval task, we present a deep learning framework named LamBERTa, which is designed for civil-law codes, and specifically trained on the Italian civil code. To our knowledge, this is the first study proposing an advanced approach to law article prediction for the Italian legal system based on a BERT (Bidirectional Encoder Representations from Transformers) learning framework, which has recently attracted increased attention among deep learning approaches, showing outstanding effectiveness in several natural language processing and learning tasks. We define LamBERTa models by fine-tuning an Italian pre-trained BERT on the Italian civil code or its portions, for law article retrieval as a classification task. One key aspect of our LamBERTa framework is that we conceived it to address an extreme classification scenario, which is characterized by a high number of classes, the few-shot learning problem, and the lack of test query benchmarks for Italian legal prediction tasks. To solve such issues, we define different methods for the unsupervised labeling of the law articles, which can in principle be applied to any law article code system. We provide insights into the explainability and interpretability of our LamBERTa models, and we present an extensive experimental analysis over query sets of different type, for single-label as well as multi-label evaluation tasks. Empirical evidence has shown the effectiveness of LamBERTa, and also its superiority against widely used deep-learning text classifiers and a few-shot learner conceived for an attribute-aware prediction task.

Download Full-text

Text classification to streamline online wildlife trade analyses

PLoS ONE ◽

10.1371/journal.pone.0254007 ◽

2021 ◽

Vol 16 (7) ◽

pp. e0254007

Author(s):

Oliver C. Stringham ◽

Stephanie Moncayo ◽

Katherine G. W. Hill ◽

Adam Toomes ◽

Lewis Mitchell ◽

...

Keyword(s):

Machine Learning ◽

Sensitivity Analysis ◽

Language Processing ◽

Text Classification ◽

Model Performance ◽

Wildlife Trade ◽

Online Data ◽

Vast Number ◽

Pet Birds ◽

Text Classifiers

Automated monitoring of websites that trade wildlife is increasingly necessary to inform conservation and biosecurity efforts. However, e-commerce and wildlife trading websites can contain a vast number of advertisements, an unknown proportion of which may be irrelevant to researchers and practitioners. Given that many wildlife-trade advertisements have an unstructured text format, automated identification of relevant listings has not traditionally been possible, nor attempted. Other scientific disciplines have solved similar problems using machine learning and natural language processing models, such as text classifiers. Here, we test the ability of a suite of text classifiers to extract relevant advertisements from wildlife trade occurring on the Internet. We collected data from an Australian classifieds website where people can post advertisements of their pet birds (n = 16.5k advertisements). We found that text classifiers can predict, with a high degree of accuracy, which listings are relevant (ROC AUC ≥ 0.98, F1 score ≥ 0.77). Furthermore, in an attempt to answer the question ‘how much data is required to have an adequately performing model?’, we conducted a sensitivity analysis by simulating decreases in sample sizes to measure the subsequent change in model performance. From our sensitivity analysis, we found that text classifiers required a minimum sample size of 33% (c. 5.5k listings) to accurately identify relevant listings (for our dataset), providing a reference point for future applications of this sort. Our results suggest that text classification is a viable tool that can be applied to the online trade of wildlife to reduce time dedicated to data cleaning. However, the success of text classifiers will vary depending on the advertisements and websites, and will therefore be context dependent. Further work to integrate other machine learning tools, such as image classification, may provide better predictive abilities in the context of streamlining data processing for wildlife trade related online data.

Download Full-text

A modularized framework for explaining hierarchical attention networks on text classifiers

10.21428/594757db.23db72bf ◽

2021 ◽

Author(s):

Mahtab Sarvmaili ◽

Amilcar Soares ◽

Riccardo Guidotti ◽

Anna Monreale ◽

Fosca Giannotti ◽

...

Keyword(s):

Attention Networks ◽

Text Classifiers

Download Full-text

Simple Baseline Machine Learning Text Classifiers for Small Datasets

SN Computer Science ◽

10.1007/s42979-021-00480-4 ◽

2021 ◽

Vol 2 (3) ◽

Author(s):

Martin Riekert ◽

Matthias Riekert ◽

Achim Klein

Keyword(s):

Machine Learning ◽

Repeated Measures ◽

Online Media ◽

Term Weighting ◽

Design Factor ◽

Text Classifiers ◽

Factor Combination ◽

The Cost ◽

Full Factorial ◽

Training Sets

AbstractText classification is important to better understand online media. A major problem for creating accurate text classifiers using machine learning is small training sets due to the cost of annotating them. On this basis, we investigated how SVM and NBSVM text classifiers should be designed to achieve high accuracy and how the training sets should be sized to efficiently use annotation labor. We used a four-way repeated-measures full-factorial design of 32 design factor combinations. For each design factor combination 22 training set sizes were examined. These training sets were subsets of seven public text datasets. We study the statistical variance of accuracy estimates by randomly drawing new training sets, resulting in accuracy estimates for 98,560 different experimental runs. Our major contribution is a set of empirically evaluated guidelines for creating online media text classifiers using small training sets. We recommend uni- and bi-gram features as text representation, btc term weighting and a linear-kernel NBSVM. Our results suggest that high classification accuracy can be achieved using a manually annotated dataset of only 300 examples.

Download Full-text

Text classification to streamline online wildlife trade analyses

10.32942/osf.io/593ve ◽

2021 ◽

Author(s):

Oliver C. Stringham ◽

Stephanie Moncayo ◽

Katherine G.W. Hill ◽

Adam Toomes ◽

Lewis Mitchell ◽

...

Keyword(s):

Machine Learning ◽

Sensitivity Analysis ◽

Language Processing ◽

Text Classification ◽

Model Performance ◽

Wildlife Trade ◽

Online Data ◽

Vast Number ◽

Pet Birds ◽

Text Classifiers

1.Automated monitoring of websites that trade wildlife is increasingly necessary to inform conservation and biosecurity efforts. However, e-commerce and wildlife trading websites can contain a vast number of advertisements, an unknown proportion of which may be irrelevant to researchers and practitioners. Given that many of these advertisements have an unstructured text format, automated identification of relevant listings has not traditionally been possible, nor attempted. Other scientific disciplines have solved similar problems using machine learning and natural language processing models, such as text classifiers. 2.Here, we test the ability of a suite of text classifiers to extract relevant advertisements from an Australian classifieds website where people can post advertisements of their pet birds (n = 16.5k advertisements). Furthermore, in an attempt to answer the question ‘how much data is required to have an adequately performing model?’, we conducted a sensitivity analysis by simulating decreases in sample sizes to measure the subsequent change in model performance.3.We found that text classifiers can predict, with a high degree of accuracy, which listings are relevant (ROC AUC ≥ 0.98, F1 score ≥ 0.77). From our sensitivity analysis, we found that text classifiers required a minimum sample size of 33% (c. 5.5k listings) to accurately identify relevant listings (for our dataset), providing a reference point for future applications of this sort. 4.Our results suggest that text classification is a viable tool that can be applied to the online trade of wildlife to reduce time dedicated to data cleaning. However, the success of text classifiers will vary depending on the advertisements and websites, and will therefore be context dependent. Further work to integrate other machine learning tools, such as image classification, may provide better predictive abilities in the context of streamlining data processing for wildlife trade related online data.

Download Full-text

text classifiers
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Improving Text Classifiers Through Controlled Text Generation Using Transformer Wasserstein Autoencoder

Evaluating Rumor Debunking Effectiveness During the COVID-19 Pandemic Crisis: Utilizing User Stance in Comments on Sina Weibo

Emotionally charged text classification with deep learning and sentiment semantic

An Effective Model to Predict the Extension of Code Changes in Bug Fixing Process Using Text Classifiers

Efficiency of short text classifiers for payment classification

Unsupervised law article mining based on deep pre-trained language representation models with application to the Italian civil code

Text classification to streamline online wildlife trade analyses

A modularized framework for explaining hierarchical attention networks on text classifiers

Simple Baseline Machine Learning Text Classifiers for Small Datasets

Text classification to streamline online wildlife trade analyses

Export Citation Format

text classifiersRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Improving Text Classifiers Through Controlled Text Generation Using Transformer Wasserstein Autoencoder

Evaluating Rumor Debunking Effectiveness During the COVID-19 Pandemic Crisis: Utilizing User Stance in Comments on Sina Weibo

Emotionally charged text classification with deep learning and sentiment semantic

An Effective Model to Predict the Extension of Code Changes in Bug Fixing Process Using Text Classifiers

Efficiency of short text classifiers for payment classification

Unsupervised law article mining based on deep pre-trained language representation models with application to the Italian civil code

Text classification to streamline online wildlife trade analyses

A modularized framework for explaining hierarchical attention networks on text classifiers

Simple Baseline Machine Learning Text Classifiers for Small Datasets

Text classification to streamline online wildlife trade analyses

text classifiers
Recently Published Documents