scholarly journals Deep Recurrent Network Based Feature Selection using Single Matrix Normalization and Eigen Vectors for Analyzing Sentiments

Sentiment analysis plays a major role in e-commerce and social media these days. Due to the increasing growth of social media, a huge number of peoples and users send their reviews through the Internet and several other sources. Analyzing this data is challenging in today's life. In this paper new normalization based feature selection method is proposed and the topic of interest here is to select the relevant features and perform the classification of the data and find the accuracy. Stability of the data is considered as the most important challenge in analyzing the sentiments. In this paper investigating the sentiments and selecting the relevant features from the data set places a major role. The aim is to work with the vector-based feature selection and check the classification performance using recurrent networks. In this paper, text mining depends on feature retrieval methods to improve accuracy and propose a single matrix normalization method to reduce the dimensions. The proposed method performs data preprocessing or sentiment classification and features reduction to improve accuracy. The proposed method achieves better accuracy than the N-gram feature selection method. The experimental results show that the proposed method has better accuracy than other traditional feature selection approaches and that the proposed method can decrease the implementation time.

2021 ◽  
pp. 1063293X2110160
Author(s):  
Dinesh Morkonda Gunasekaran ◽  
Prabha Dhandayudam

Nowadays women are commonly diagnosed with breast cancer. Feature based Selection method plays an important step while constructing a classification based framework. We have proposed Multi filter union (MFU) feature selection method for breast cancer data set. The feature selection process based on random forest algorithm and Logistic regression (LG) algorithm based union model is used for selecting important features in the dataset. The performance of the data analysis is evaluated using optimal features subset from selected dataset. The experiments are computed with data set of Wisconsin diagnostic breast cancer center and next the real data set from women health care center. The result of the proposed approach shows high performance and efficient when comparing with existing feature selection algorithms.


2021 ◽  
Author(s):  
Marta Ferreira ◽  
Pierre Lovinfosse ◽  
Johanne Hermesse ◽  
Marjolein Decuypere ◽  
Caroline Rousseau ◽  
...  

Abstract Background Features reproducibility and the generalizability of the models are currently among the most important limitations when integrating radiomics into the clinics. Radiomic features are sensitive to imaging acquisition protocols, reconstruction algorithms and parameters, as well as by the different steps of the usual radiomics workflow. We propose a framework for comparing the reproducibility of different pre-processing steps in PET/CT radiomic analysis in the prediction of disease free survival (DFS) across multi-scanners/centers. Results We evaluated and compared the prediction performance of several models that differ in i) the type of intensity discretization, ii) feature selection method, iii) features type i.e, original or tumour to liver ratio radiomic features (OR or TLR). We trained our models using data from one scanner/center and tested on two external scanner/centers. Our results show that there is a low reproducibility in predictions across scanners and discretization methods. Despite of this, TLR based models were generally more robust than OR. Maximum relevance minimum redundancy (MRMR) forward feature selection with Pearson correlation was the feature selection method that had the best mean area under the precision recall curve when using it combining the features from all discretization’s bin’s number (D_All_FBN) with TLR features for two of the four classifiers. Conclusion We evaluated and compared the prediction performance of several models in a data set containing hundred fifty-eight patients with locally advanced cervical cancer (LACC) from three distinct scanners. In our cohort of LAAC patients pre-processing of radiomic features in [18F]FDG PET affects DFS predictions performances across scanners and combining the D_All_FBN TLR approach with the MRMR forward Pearson feature selection method might help increasing robustness of radiomic studies.


2020 ◽  
Author(s):  
Esra Sarac Essiz ◽  
Murat Oturakci

Abstract As a nature-inspired algorithm, artificial bee colony (ABC) is an optimization algorithm that is inspired by the search behaviour of honey bees. The main aim of this study is to examine the effects of the ABC-based feature selection algorithm on classification performance for cyberbullying, which has become a significant worldwide social issue in recent years. With this purpose, the classification performance of the proposed ABC-based feature selection method is compared with three different traditional methods such as information gain, ReliefF and chi square. Experimental results present that ABC-based feature selection method outperforms than three traditional methods for the detection of cyberbullying. The Macro averaged F_measure of the data set is increased from 0.659 to 0.8 using proposed ABC-based feature selection method.


2015 ◽  
Vol 77 (7) ◽  
Author(s):  
Syamimi Mardiah Shaharum ◽  
Kenneth Sundaraj ◽  
Khaled Helmy

In this work, we show that the classification performance of a high-dimensional features data can be improved by applying feature selection method. One-way ANOVA were utilized and to evaluate the performance measure of the feature selection method, Artificial Neural Network (ANN) was used. From the results obtained, it can be concluded that ANN performance using feature that undergo feature selection method produce a better classification accuracy compared to the ANN performance using feature that did not undergo feature selection method with 93.33% against 80.00% accuracy achieved. Therefore can be conclude that feature selection is a process that is crucial to be done in order to produce a good performance rate. 


Author(s):  
Esraa H. Abd Al-Ameer, Ahmed H. Aliwy

Documents classification is from most important fields for Natural language processing and text mining. There are many algorithms can be used for this task. In this paper, focuses on improving Text Classification by feature selection. This means determine some of the original features without affecting the accuracy of the work, where our work is a new feature selection method was suggested which can be a general formulation and mathematical model of Recursive Feature Elimination (RFE). The used method was compared with other two well-known feature selection methods: Chi-square and threshold. The results proved that the new method is comparable with the other methods, The best results were 83% when 60% of features used, 82% when 40% of features used, and 82% when 20% of features used. The tests were done with the Naïve Bayes (NB) and decision tree (DT) classification algorithms , where the used dataset is a well-known English data set “20 newsgroups text” consists of approximately 18846 files. The results showed that our suggested feature selection method is comparable with standard Like Chi-square.


Symmetry ◽  
2020 ◽  
Vol 12 (12) ◽  
pp. 1995
Author(s):  
Chunlei Shi ◽  
Jiacai Zhang ◽  
Xia Wu

Autism spectrum disorder (ASD) is a neurodevelopmental disorder originating in infancy and childhood that may cause language barriers and social difficulties. However, in the diagnosis of ASD, the current machine learning methods still face many challenges in determining the location of biomarkers. Here, we proposed a novel feature selection method based on the minimum spanning tree (MST) to seek neuromarkers for ASD. First, we constructed an undirected graph with nodes of candidate features. At the same time, a weight calculation method considering both feature redundancy and discriminant ability was introduced. Second, we utilized the Prim algorithm to construct the MST from the initial graph structure. Third, the sum of the edge weights of all connected nodes was sorted for each node in the MST. Then, N features corresponding to the nodes with the first N smallest sum were selected as classification features. Finally, the support vector machine (SVM) algorithm was used to evaluate the discriminant performance of the aforementioned feature selection method. Comparative experiments results show that our proposed method has improved the ASD classification performance, i.e., the accuracy, sensitivity, and specificity were 86.7%, 87.5%, and 85.7%, respectively.


2014 ◽  
Vol 631-632 ◽  
pp. 1219-1223
Author(s):  
Jia Hao Chen ◽  
Jian Hua Wu

With the rapid development of Internet and occurrence of social media services, many users are becoming the creators of social information. However, the normal manual work can't deal with a large number of subjective messages. As a new kind of social media service, micro blog has been widely accepted and can be used for sentiment analysis. This paper compared performances of three machine learning methods on sentiment analysis of Chinese micro blog. We also proposed an improved feature selection method that increases the accuracy of classification. Experiment results show that SVM is closed to Naïve Bayes and they are better than logistic regression in most cases.


2021 ◽  
Author(s):  
Kun Yu ◽  
Weidong Xie ◽  
Linjie Wang ◽  
Wei Li

Abstract Background Finding significant genes or proteins from gene chip data for disease diagnosis and drug development is an important task, and the challenge comes from the curse of the data dimension. It is of great significance to use machine learning methods to find important features from the data and build an accurate classification model. Results The proposed Method has proved superior to the published advanced hybrid feature selection method and traditional feature selection method on different public microarray data sets. In addition, the results on the cleft lip and palate data set with known biomarkers provided by the cooperative hospital show that compared with other methods, our method can preferentially select these biomarkers. Method In this paper, a feature selection algorithm ILRC based on clustering and improved L1 regularization is proposed. In this method, the features are first clustered, and the redundant features in the sub-clusters are deleted. Then all the remaining features are iteratively evaluated using ILR, and the final result is output according to the cumulative weight reordering. Conclusion The proposed method can effectively remove redundant features. The algorithm's output has high stability and classification accuracy and can potentially select potential biomarkers.


2020 ◽  
Vol 10 (2) ◽  
pp. 588
Author(s):  
Sang Hoon Lee ◽  
Kwang-Yul Kim ◽  
Yoan Shin

Recently, in order to satisfy the requirements of commercial communication systems and military communication systems, automatic modulation classification (AMC) schemes have been considered. As a result, various artificial intelligence algorithms such as a deep neural network (DNN), a convolutional neural network (CNN), and a recurrent neural network (RNN) have been studied to improve the AMC performance. However, since the AMC process should be operated in real time, the computational complexity must be considered low enough. Furthermore, there is a lack of research to consider the complexity of the AMC process using the data-mining method. In this paper, we propose a correlation coefficient-based effective feature selection method that can maintain the classification performance while reducing the computational complexity of the AMC process. The proposed method calculates the correlation coefficients of second, fourth, and sixth-order cumulants with the proposed formula and selects an effective feature according to the calculated values. In the proposed method, the deep learning-based AMC method is used to measure and compare the classification performance. From the simulation results, it is indicated that the AMC performance of the proposed method is superior to the conventional methods even though it uses a small number of features.


Sign in / Sign up

Export Citation Format

Share Document