weighting schemes
Recently Published Documents


TOTAL DOCUMENTS

233
(FIVE YEARS 54)

H-INDEX

29
(FIVE YEARS 3)

2021 ◽  
pp. 1-12
Author(s):  
K. Seethappan ◽  
K. Premalatha

Although there have been various researches in the detection of different figurative language, there is no single work in the automatic classification of euphemisms. Our primary work is to present a system for the automatic classification of euphemistic phrases in a document. In this research, a large dataset consisting of 100,000 sentences is collected from different resources for identifying euphemism or non-euphemism utterances. In this work, several approaches are focused to improve the euphemism classification: 1. A Combination of lexical n-gram features 2.Three Feature-weighting schemes 3.Deep learning classification algorithms. In this paper, four machine learning (J48, Random Forest, Multinomial Naïve Bayes, and SVM) and three deep learning algorithms (Multilayer Perceptron, Convolutional Neural Network, and Long Short-Term Memory) are investigated with various combinations of features and feature weighting schemes to classify the sentences. According to our experiments, Convolutional Neural Network (CNN) achieves precision 95.43%, recall 95.06%, F-Score 95.25%, accuracy 95.26%, and Kappa 0.905 by using a combination of unigram and bigram features with TF-IDF feature weighting scheme in the classification of euphemism. These results of experiments show CNN with a strong combination of unigram and bigram features set with TF-IDF feature weighting scheme outperforms another six classification algorithms in detecting the euphemisms in our dataset.


Informatica ◽  
2021 ◽  
Vol 45 (3) ◽  
Author(s):  
Surender Singh Samant ◽  
NL Bhanu Murthy ◽  
Aruna Malapati

2021 ◽  
Author(s):  
Chuanxiao Li ◽  
Wenqiang Li ◽  
Zhong Tang ◽  
Song Li ◽  
Hai Xiang

Abstract As a vital step of text classification (TC) task, the assignment of term weight has a great influence on the performance of TC. Currently, masses of term weighting schemes can be utilized, such as term frequency-inverse documents frequency (TF-IDF) and term frequency-relevance frequency (TF-RF), and they are all consisted of local part (TF) and global part (e.g., IDF, RF). However, most of these schemes adopt the logarithmic processing on their respective global parts, and it is natural to consider whether the logarithmic processing apply to all these schemes or not. Actually, for a specific term weighting scheme, due to its different ratio of local weight and global weight resulting from logarithmic processing, it usually shows diverse text clasification results on different text sets, which presents poor robustness. To explore the influence of logarithmic processing imposed on the global weight on the classification result of term weighting schemes, TF-RF is selected as the representative because it can achieve a better performance among these schemes adopting logarithmic processing. Then, two propositions along with corresponding methods about the relation between TF part and RF part are proposed based on TF-RF. In addition, two groups of experiments are conducted on the two methods. The first group of experiments proves that one method (denoted as TF-ERF) is more helpful to the improvement than the other one (denoted as ETF-RF). The second group of experiments shows that TF-ERF not only ourperforms TF-RF but also obtains better performance than other existing term weighting schemes.


2021 ◽  
Vol 1 (1) ◽  
pp. 1-12
Author(s):  
Aytuğ Onan ◽  

With the advancement of information and communication technology, social networking and microblogging sites have become a vital source of information. Individuals can express their opinions, grievances, feelings, and attitudes about a variety of topics. Through microblogging platforms, they can express their opinions on current events and products. Sentiment analysis is a significant area of research in natural language processing because it aims to define the orientation of the sentiment contained in source materials. Twitter is one of the most popular microblogging sites on the internet, with millions of users daily publishing over one hundred million text messages (referred to as tweets). Choosing an appropriate term representation scheme for short text messages is critical. Term weighting schemes are critical representation schemes for text documents in the vector space model. We present a comprehensive analysis of Turkish sentiment analysis using nine supervised and unsupervised term weighting schemes in this paper. The predictive efficiency of term weighting schemes is investigated using four supervised learning algorithms (Naive Bayes, support vector machines, the k-nearest neighbor algorithm, and logistic regression) and three ensemble learning methods (AdaBoost, Bagging, and Random Subspace). The empirical evidence suggests that supervised term weighting models can outperform unsupervised term weighting models.


Sign in / Sign up

Export Citation Format

Share Document