scholarly journals A New Feature Selection Scheme for Emotion Recognition from Text

2020 ◽  
Vol 10 (15) ◽  
pp. 5351
Author(s):  
Zafer Erenel ◽  
Oluwatayomi Rereloluwa Adegboye ◽  
Huseyin Kusetogullari

This paper presents a new scheme for term selection in the field of emotion recognition from text. The proposed framework is based on utilizing moderately frequent terms during term selection. More specifically, all terms are evaluated by considering their relevance scores, based on the idea that moderately frequent terms may carry valuable information for discrimination as well. The proposed feature selection scheme performs better than conventional filter-based feature selection measures Chi-Square and Gini-Text in numerous cases. The bag-of-words approach is used to construct the vectors for document representation where each selected term is assigned the weight 1 if it exists or assigned the weight 0 if it does not exist in the document. The proposed scheme includes the terms that are not selected by Chi-Square and Gini-Text. Experiments conducted on a benchmark dataset show that moderately frequent terms boost the representation power of the term subsets as noticeable improvements are observed in terms of Accuracies.

Author(s):  
FENGXI SONG ◽  
DAVID ZHANG ◽  
YONG XU ◽  
JIZHONG WANG

Feature selection has been extensively applied in statistical pattern recognition as a mechanism for cleaning up the set of features that are used to represent data and as a way of improving the performance of classifiers. Four schemes commonly used for feature selection are Exponential Searches, Stochastic Searches, Sequential Searches, and Best Individual Features. The most popular scheme used in text categorization is Best Individual Features as the extremely high dimensionality of text feature spaces render the other three feature selection schemes time prohibitive. This paper proposes five new metrics for selecting Best Individual Features for use in text categorization. Their effectiveness have been empirically tested on two well- known data collections, Reuters-21578 and 20 Newsgroups. Experimental results show that the performance of two of the five new metrics, Bayesian Rule and F-one Value, is not significantly below that of a good traditional text categorization selection metric, Document Frequency. The performance of another two of these five new metrics, Low Loss Dimensionality Reduction and Relative Frequency Difference, is equal to or better than that of conventional good feature selection metrics such as Mutual Information and Chi-square Statistic.


Author(s):  
Esraa H. Abd Al-Ameer, Ahmed H. Aliwy

Documents classification is from most important fields for Natural language processing and text mining. There are many algorithms can be used for this task. In this paper, focuses on improving Text Classification by feature selection. This means determine some of the original features without affecting the accuracy of the work, where our work is a new feature selection method was suggested which can be a general formulation and mathematical model of Recursive Feature Elimination (RFE). The used method was compared with other two well-known feature selection methods: Chi-square and threshold. The results proved that the new method is comparable with the other methods, The best results were 83% when 60% of features used, 82% when 40% of features used, and 82% when 20% of features used. The tests were done with the Naïve Bayes (NB) and decision tree (DT) classification algorithms , where the used dataset is a well-known English data set “20 newsgroups text” consists of approximately 18846 files. The results showed that our suggested feature selection method is comparable with standard Like Chi-square.


2020 ◽  
Vol 9 (4) ◽  
pp. 1569-1577
Author(s):  
Zuherman Rustam ◽  
Sri Hartini

Feature selection is an essential issue in machine learning. It discards the unnecessary or redundant features in the dataset. This paper introduced the new feature selection based on kernel function using 16 the real-world datasets from UCI data repository, and k-means clustering was utilized as the classifier using radial basis function (RBF) and polynomial kernel function. After sorting the features using the new feature selection, 75 percent of it was examined and evaluated using 10-fold cross-validation, then the accuracy, F1-Score, and running time were compared. From the experiments, it was concluded that the performance of the new feature selection based on RBF kernel function varied according to the value of the kernel parameter, opposite with the polynomial kernel function. Moreover, the new feature selection based on RBF has a faster running time compared to the polynomial kernel function. Besides, the proposed method has higher accuracy and F1-Score until 40 percent difference in several datasets compared to the commonly used feature selection techniques such as Fisher score, Chi-Square test, and Laplacian score. Therefore, this method can be considered to use for feature selection


2018 ◽  
Vol 7 (1) ◽  
pp. 57-72
Author(s):  
H.P. Vinutha ◽  
Poornima Basavaraju

Day by day network security is becoming more challenging task. Intrusion detection systems (IDSs) are one of the methods used to monitor the network activities. Data mining algorithms play a major role in the field of IDS. NSL-KDD'99 dataset is used to study the network traffic pattern which helps us to identify possible attacks takes place on the network. The dataset contains 41 attributes and one class attribute categorized as normal, DoS, Probe, R2L and U2R. In proposed methodology, it is necessary to reduce the false positive rate and improve the detection rate by reducing the dimensionality of the dataset, use of all 41 attributes in detection technology is not good practices. Four different feature selection methods like Chi-Square, SU, Gain Ratio and Information Gain feature are used to evaluate the attributes and unimportant features are removed to reduce the dimension of the data. Ensemble classification techniques like Boosting, Bagging, Stacking and Voting are used to observe the detection rate separately with three base algorithms called Decision stump, J48 and Random forest.


2021 ◽  
Vol 25 (1) ◽  
pp. 21-34
Author(s):  
Rafael B. Pereira ◽  
Alexandre Plastino ◽  
Bianca Zadrozny ◽  
Luiz H.C. Merschmann

In many important application domains, such as text categorization, biomolecular analysis, scene or video classification and medical diagnosis, instances are naturally associated with more than one class label, giving rise to multi-label classification problems. This has led, in recent years, to a substantial amount of research in multi-label classification. More specifically, feature selection methods have been developed to allow the identification of relevant and informative features for multi-label classification. This work presents a new feature selection method based on the lazy feature selection paradigm and specific for the multi-label context. Experimental results show that the proposed technique is competitive when compared to multi-label feature selection techniques currently used in the literature, and is clearly more scalable, in a scenario where there is an increasing amount of data.


Author(s):  
Mircea Zloteanu ◽  
Eva G. Krumhuber ◽  
Daniel C. Richardson

AbstractPeople are accurate at classifying emotions from facial expressions but much poorer at determining if such expressions are spontaneously felt or deliberately posed. We explored if the method used by senders to produce an expression influences the decoder’s ability to discriminate authenticity, drawing inspiration from two well-known acting techniques: the Stanislavski (internal) and Mimic method (external). We compared spontaneous surprise expressions in response to a jack-in-the-box (genuine condition), to posed displays of senders who either focused on their past affective state (internal condition) or the outward expression (external condition). Although decoders performed better than chance at discriminating the authenticity of all expressions, their accuracy was lower in classifying external surprise compared to internal surprise. Decoders also found it harder to discriminate external surprise from spontaneous surprise and were less confident in their decisions, perceiving these to be similarly intense but less genuine-looking. The findings suggest that senders are capable of voluntarily producing genuine-looking expressions of emotions with minimal effort, especially by mimicking a genuine expression. Implications for research on emotion recognition are discussed.


Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3627
Author(s):  
Bo Jin ◽  
Chunling Fu ◽  
Yong Jin ◽  
Wei Yang ◽  
Shengbin Li ◽  
...  

Identifying the key genes related to tumors from gene expression data with a large number of features is important for the accurate classification of tumors and to make special treatment decisions. In recent years, unsupervised feature selection algorithms have attracted considerable attention in the field of gene selection as they can find the most discriminating subsets of genes, namely the potential information in biological data. Recent research also shows that maintaining the important structure of data is necessary for gene selection. However, most current feature selection methods merely capture the local structure of the original data while ignoring the importance of the global structure of the original data. We believe that the global structure and local structure of the original data are equally important, and so the selected genes should maintain the essential structure of the original data as far as possible. In this paper, we propose a new, adaptive, unsupervised feature selection scheme which not only reconstructs high-dimensional data into a low-dimensional space with the constraint of feature distance invariance but also employs ℓ2,1-norm to enable a matrix with the ability to perform gene selection embedding into the local manifold structure-learning framework. Moreover, an effective algorithm is developed to solve the optimization problem based on the proposed scheme. Comparative experiments with some classical schemes on real tumor datasets demonstrate the effectiveness of the proposed method.


Sign in / Sign up

Export Citation Format

Share Document