text classifiers
Recently Published Documents


TOTAL DOCUMENTS

117
(FIVE YEARS 46)

H-INDEX

15
(FIVE YEARS 4)

2021 ◽  
Vol 9 ◽  
Author(s):  
Xin Wang ◽  
Fan Chao ◽  
Guang Yu

Background: The spread of rumors related to COVID-19 on social media has posed substantial challenges to public health governance, and thus exposing rumors and curbing their spread quickly and effectively has become an urgent task. This study aimed to assist in formulating effective strategies to debunk rumors and curb their spread on social media.Methods: A total of 2,053 original postings and 100,348 comments that replied to the postings of five false rumors related to COVID-19 (dated from January 20, 2020, to June 28, 2020) belonging to three categories, authoritative, social, and political, on Sina Weibo in China were randomly selected. To study the effectiveness of different debunking methods, a new annotation scheme was proposed that divides debunking methods into six categories: denial, further fact-checking, refutation, person response, organization response, and combination methods. Text classifiers using deep learning methods were built to automatically identify four user stances in comments that replied to debunking postings: supporting, denying, querying, and commenting stances. Then, based on stance responses, a debunking effectiveness index (DEI) was developed to measure the effectiveness of different debunking methods.Results: The refutation method with cited evidence has the best debunking effect, whether used alone or in combination with other debunking methods. For the social category of Car rumor and political category of Russia rumor, using the refutation method alone can achieve the optimal debunking effect. For authoritative rumors, a combination method has the optimal debunking effect, but the most effective combination method requires avoiding the use of a combination of a debunking method where the person or organization defamed by the authoritative rumor responds personally and the refutation method.Conclusion: The findings provide relevant insights into ways to debunk rumors effectively, support crisis management of false information, and take necessary actions in response to rumors amid public health emergencies.


Author(s):  
Jeow Li Huan ◽  
Arif Ahmed Sekh ◽  
Chai Quek ◽  
Dilip K. Prasad

AbstractText classification is one of the widely used phenomena in different natural language processing tasks. State-of-the-art text classifiers use the vector space model for extracting features. Recent progress in deep models, recurrent neural networks those preserve the positional relationship among words achieve a higher accuracy. To push text classification accuracy even higher, multi-dimensional document representation, such as vector sequences or matrices combined with document sentiment, should be explored. In this paper, we show that documents can be represented as a sequence of vectors carrying semantic meaning and classified using a recurrent neural network that recognizes long-range relationships. We show that in this representation, additional sentiment vectors can be easily attached as a fully connected layer to the word vectors to further improve classification accuracy. On the UCI sentiment labelled dataset, using the sequence of vectors alone achieved an accuracy of 85.6%, which is better than 80.7% from ridge regression classifier—the best among the classical technique we tested. Additional sentiment information further increases accuracy to 86.3%. On our suicide notes dataset, the best classical technique—the Naíve Bayes Bernoulli classifier, achieves accuracy of 71.3%, while our classifier, incorporating semantic and sentiment information, exceeds that at 75% accuracy.


2021 ◽  
Author(s):  
Eduard Zubchuk ◽  
Dmitry Menshikov ◽  
Nikolay Mikhaylovsky
Keyword(s):  

Author(s):  
Andrea Tagarelli ◽  
Andrea Simeri

AbstractModeling law search and retrieval as prediction problems has recently emerged as a predominant approach in law intelligence. Focusing on the law article retrieval task, we present a deep learning framework named LamBERTa, which is designed for civil-law codes, and specifically trained on the Italian civil code. To our knowledge, this is the first study proposing an advanced approach to law article prediction for the Italian legal system based on a BERT (Bidirectional Encoder Representations from Transformers) learning framework, which has recently attracted increased attention among deep learning approaches, showing outstanding effectiveness in several natural language processing and learning tasks. We define LamBERTa models by fine-tuning an Italian pre-trained BERT on the Italian civil code or its portions, for law article retrieval as a classification task. One key aspect of our LamBERTa framework is that we conceived it to address an extreme classification scenario, which is characterized by a high number of classes, the few-shot learning problem, and the lack of test query benchmarks for Italian legal prediction tasks. To solve such issues, we define different methods for the unsupervised labeling of the law articles, which can in principle be applied to any law article code system. We provide insights into the explainability and interpretability of our LamBERTa models, and we present an extensive experimental analysis over query sets of different type, for single-label as well as multi-label evaluation tasks. Empirical evidence has shown the effectiveness of LamBERTa, and also its superiority against widely used deep-learning text classifiers and a few-shot learner conceived for an attribute-aware prediction task.


PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0254007
Author(s):  
Oliver C. Stringham ◽  
Stephanie Moncayo ◽  
Katherine G. W. Hill ◽  
Adam Toomes ◽  
Lewis Mitchell ◽  
...  

Automated monitoring of websites that trade wildlife is increasingly necessary to inform conservation and biosecurity efforts. However, e-commerce and wildlife trading websites can contain a vast number of advertisements, an unknown proportion of which may be irrelevant to researchers and practitioners. Given that many wildlife-trade advertisements have an unstructured text format, automated identification of relevant listings has not traditionally been possible, nor attempted. Other scientific disciplines have solved similar problems using machine learning and natural language processing models, such as text classifiers. Here, we test the ability of a suite of text classifiers to extract relevant advertisements from wildlife trade occurring on the Internet. We collected data from an Australian classifieds website where people can post advertisements of their pet birds (n = 16.5k advertisements). We found that text classifiers can predict, with a high degree of accuracy, which listings are relevant (ROC AUC ≥ 0.98, F1 score ≥ 0.77). Furthermore, in an attempt to answer the question ‘how much data is required to have an adequately performing model?’, we conducted a sensitivity analysis by simulating decreases in sample sizes to measure the subsequent change in model performance. From our sensitivity analysis, we found that text classifiers required a minimum sample size of 33% (c. 5.5k listings) to accurately identify relevant listings (for our dataset), providing a reference point for future applications of this sort. Our results suggest that text classification is a viable tool that can be applied to the online trade of wildlife to reduce time dedicated to data cleaning. However, the success of text classifiers will vary depending on the advertisements and websites, and will therefore be context dependent. Further work to integrate other machine learning tools, such as image classification, may provide better predictive abilities in the context of streamlining data processing for wildlife trade related online data.


2021 ◽  
Author(s):  
Mahtab Sarvmaili ◽  
Amilcar Soares ◽  
Riccardo Guidotti ◽  
Anna Monreale ◽  
Fosca Giannotti ◽  
...  

2021 ◽  
Vol 2 (3) ◽  
Author(s):  
Martin Riekert ◽  
Matthias Riekert ◽  
Achim Klein

AbstractText classification is important to better understand online media. A major problem for creating accurate text classifiers using machine learning is small training sets due to the cost of annotating them. On this basis, we investigated how SVM and NBSVM text classifiers should be designed to achieve high accuracy and how the training sets should be sized to efficiently use annotation labor. We used a four-way repeated-measures full-factorial design of 32 design factor combinations. For each design factor combination 22 training set sizes were examined. These training sets were subsets of seven public text datasets. We study the statistical variance of accuracy estimates by randomly drawing new training sets, resulting in accuracy estimates for 98,560 different experimental runs. Our major contribution is a set of empirically evaluated guidelines for creating online media text classifiers using small training sets. We recommend uni- and bi-gram features as text representation, btc term weighting and a linear-kernel NBSVM. Our results suggest that high classification accuracy can be achieved using a manually annotated dataset of only 300 examples.


2021 ◽  
Author(s):  
Oliver C. Stringham ◽  
Stephanie Moncayo ◽  
Katherine G.W. Hill ◽  
Adam Toomes ◽  
Lewis Mitchell ◽  
...  

1.Automated monitoring of websites that trade wildlife is increasingly necessary to inform conservation and biosecurity efforts. However, e-commerce and wildlife trading websites can contain a vast number of advertisements, an unknown proportion of which may be irrelevant to researchers and practitioners. Given that many of these advertisements have an unstructured text format, automated identification of relevant listings has not traditionally been possible, nor attempted. Other scientific disciplines have solved similar problems using machine learning and natural language processing models, such as text classifiers. 2.Here, we test the ability of a suite of text classifiers to extract relevant advertisements from an Australian classifieds website where people can post advertisements of their pet birds (n = 16.5k advertisements). Furthermore, in an attempt to answer the question ‘how much data is required to have an adequately performing model?’, we conducted a sensitivity analysis by simulating decreases in sample sizes to measure the subsequent change in model performance.3.We found that text classifiers can predict, with a high degree of accuracy, which listings are relevant (ROC AUC ≥ 0.98, F1 score ≥ 0.77). From our sensitivity analysis, we found that text classifiers required a minimum sample size of 33% (c. 5.5k listings) to accurately identify relevant listings (for our dataset), providing a reference point for future applications of this sort. 4.Our results suggest that text classification is a viable tool that can be applied to the online trade of wildlife to reduce time dedicated to data cleaning. However, the success of text classifiers will vary depending on the advertisements and websites, and will therefore be context dependent. Further work to integrate other machine learning tools, such as image classification, may provide better predictive abilities in the context of streamlining data processing for wildlife trade related online data.


Sign in / Sign up

Export Citation Format

Share Document