Ensemble incremental deep multiple layer perceptron model – sentiment analysis application

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Renuka Devi D. ◽  
Sasikala S.

Purpose The purpose of this paper is to enhance the accuracy of classification of streaming big data sets with lesser processing time. This kind of social analytics would contribute to society with inferred decisions at a correct time. The work is intended for streaming nature of Twitter data sets. Design/methodology/approach It is a demanding task to analyse the increasing Twitter data by the conventional methods. The MapReduce (MR) is used for quickest analytics. The online feature selection (OFS) accelerated bat algorithm (ABA) and ensemble incremental deep multiple layer perceptron (EIDMLP) classifier is proposed for Feature Selection and classification. Three Twitter data sets under varied categories are investigated (product, service and emotions). The proposed model is compared with Particle Swarm Optimization, Accelerated Particle Swarm Optimization, accelerated simulated annealing and mutation operator (ASAMO). Feature Selection algorithms and classifiers such as Naïve Bayes, support vector machine, Hoeffding tree and fuzzy minimal consistent class subset coverage with the k-nearest neighbour (FMCCSC-KNN). Findings The proposed model is compared with PSO, APSO, ASAMO. Feature Selection algorithms, and classifiers such as Naïve Bayes (NB), support vector machine (SVM), Hoeffding Tree (HT), and Fuzzy Minimal Consistent Class Subset Coverage with the K-Nearest Neighbour (FMCCSC-KNN). The outcome of the work has achieved an accuracy of 99%, 99.48%, 98.9% for the given data sets with the processing time of 0.0034, 0.0024, 0.0053, seconds respectively. Originality/value A novel framework is proposed for Feature Selection and classification. The work is compared with the authors’ previously developed classifiers with other state-of-the-art Feature Selection and classification algorithms.

2019 ◽  
Vol 47 (3) ◽  
pp. 154-170
Author(s):  
Janani Balakumar ◽  
S. Vijayarani Mohan

Purpose Owing to the huge volume of documents available on the internet, text classification becomes a necessary task to handle these documents. To achieve optimal text classification results, feature selection, an important stage, is used to curtail the dimensionality of text documents by choosing suitable features. The main purpose of this research work is to classify the personal computer documents based on their content. Design/methodology/approach This paper proposes a new algorithm for feature selection based on artificial bee colony (ABCFS) to enhance the text classification accuracy. The proposed algorithm (ABCFS) is scrutinized with the real and benchmark data sets, which is contrary to the other existing feature selection approaches such as information gain and χ2 statistic. To justify the efficiency of the proposed algorithm, the support vector machine (SVM) and improved SVM classifier are used in this paper. Findings The experiment was conducted on real and benchmark data sets. The real data set was collected in the form of documents that were stored in the personal computer, and the benchmark data set was collected from Reuters and 20 Newsgroups corpus. The results prove the performance of the proposed feature selection algorithm by enhancing the text document classification accuracy. Originality/value This paper proposes a new ABCFS algorithm for feature selection, evaluates the efficiency of the ABCFS algorithm and improves the support vector machine. In this paper, the ABCFS algorithm is used to select the features from text (unstructured) documents. Although, there is no text feature selection algorithm in the existing work, the ABCFS algorithm is used to select the data (structured) features. The proposed algorithm will classify the documents automatically based on their content.


2013 ◽  
Vol 23 (06) ◽  
pp. 1350026 ◽  
Author(s):  
WEI-YEN HSU

In this study, we propose a recognition system for single-trial analysis of motor imagery (MI) electroencephalogram (EEG) data. Applying event-related brain potential (ERP) data acquired from the sensorimotor cortices, the system chiefly consists of automatic artifact elimination, feature extraction, feature selection and classification. In addition to the use of independent component analysis, a similarity measure is proposed to further remove the electrooculographic (EOG) artifacts automatically. Several potential features, such as wavelet-fractal features, are then extracted for subsequent classification. Next, quantum-behaved particle swarm optimization (QPSO) is used to select features from the feature combination. Finally, selected sub-features are classified by support vector machine (SVM). Compared with without artifact elimination, feature selection using a genetic algorithm (GA) and feature classification with Fisher's linear discriminant (FLD) on MI data from two data sets for eight subjects, the results indicate that the proposed method is promising in brain–computer interface (BCI) applications.


2012 ◽  
Vol 546-547 ◽  
pp. 1538-1543 ◽  
Author(s):  
Chao Chen ◽  
Hao Dong Zhu

In order to enhance the operating speed and reduce the occupied memory space and filter out irrelevant or lower degree of features, feature selection algorithms must be used. However, most of existing feature selection methods are serial and are inefficient timely to be applied to massive text data sets, so it is a hotspot how to improve efficiency of feature selection by means of parallel thinking. This paper presented a feature selection method based on Parallel Binary Immune Quantum-Behaved Particle Swarm Optimization (PBIQPSO). The presented method uses the Binary Immune Quantum-Behaved Particle Swarm Optimization to select feature subset, takes advantage of multiple computing nodes to enhance time efficiency, so can acquire quickly the feature subsets which are more representative. Experimental results show that the method is effective.


Author(s):  
Midde Venkateswarlu Naik ◽  
D. Vasumathi ◽  
A.P. Siva Kumar

Aims: The proposed research work is on an evolutionary enhanced method for sentiment or emotion classification on unstructured review text in the big data field. The sentiment analysis plays a vital role for current generation of people for extracting valid decision points about any aspect such as movie ratings, education institute or politics ratings, etc. The proposed hybrid approach combined the optimal feature selection using Particle Swarm Optimization (PSO) and sentiment classification through Support Vector Machine (SVM). The current approach performance is evaluated with statistical measures, such as precision, recall, sensitivity, specificity, and was compared with the existing approaches. The earlier authors have achieved an accuracy of sentiment classifier in the English text up to 94% as of now. In the proposed scheme, an average accuracy of sentiment classifier on distinguishing datasets outperformed as 99% by tuning various parameters of SVM, such as constant c value and kernel gamma value in association with PSO optimization technique. The proposed method utilized three datasets, such as airline sentiment data, weather, and global warming datasets, that are publically available. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Background: The sentiment analysis plays a vital role for current generation people for extracting valid decisions about any aspect such as movie rating, education institute or even politics ratings, etc. Sentiment Analysis (SA) or opinion mining has become fascinated scientifically as a research domain for the present environment. The key area is sentiment classification on semi-structured or unstructured data in distinguish languages, which has become a major research aspect. User-Generated Content [UGC] from distinguishing sources has been hiked significantly with rapid growth in a web environment. The huge user-generated data over social media provides substantial value for discovering hidden knowledge or correlations, patterns, and trends or sentiment extraction about any specific entity. SA is a computational analysis to determine the actual opinion of an entity which is expressed in terms of text. SA is also called as computation of emotional polarity expressed over social media as natural text in miscellaneous languages. Usually, the automatic superlative sentiment classifier model depends on feature selection and classification algorithms. Methods: The proposed work used Support vector machine as classification technique and particle swarm optimization technique as feature selection purpose. In this methodology, we tune various permutations and combination parameters in order to obtain expected desired results with kernel and without kernel technique for sentiment classification on three datasets, including airline, global warming, weather sentiment datasets, that are freely hosted for research practices. Results: In the proposed scheme, The proposed method has outperformed with 99.2% of average accuracy to classify the sentiment on different datasets, among other machine learning techniques. The attained high accuracy in classifying sentiment or opinion about review text proves superior effectiveness over existing sentiment classifiers. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Conclusion: The objective of the research issue sentiment classifier accuracy has been hiked with the help of Kernel-based Support Vector Machine (SVM) based on parameter optimization. The optimal feature selection to classify sentiment or opinion towards review documents has been determined with the help of a particle swarm optimization approach. The proposed method utilized three datasets to simulate the results, such as airline sentiment data, weather sentiment data, and global warming data that are freely available datasets.


Sensor Review ◽  
2014 ◽  
Vol 34 (3) ◽  
pp. 304-311 ◽  
Author(s):  
Pengfei Jia ◽  
Fengchun Tian ◽  
Shu Fan ◽  
Qinghua He ◽  
Jingwei Feng ◽  
...  

Purpose – The purpose of the paper is to propose a new optimization algorithm to realize a synchronous optimization of sensor array and classifier, to improve the performance of E-nose in the detection of wound infection. When an electronic nose (E-nose) is used to detect the wound infection, sensor array’s optimization and parameters’ setting of classifier have a strong impact on the classification accuracy. Design/methodology/approach – An enhanced quantum-behaved particle swarm optimization based on genetic algorithm, genetic quantum-behaved particle swarm optimization (G-QPSO), is proposed to realize a synchronous optimization of sensor array and classifier. The importance-factor (I-F) method is used to weight the sensors of E-nose by its degree of importance in classification. Both radical basis function network and support vector machine are used for classification. Findings – The classification accuracy of E-nose is the highest when the weighting coefficients of the I-F method and classifier’s parameters are optimized by G-QPSO. All results make it clear that the proposed method is an ideal optimization method of E-nose in the detection of wound infection. Research limitations/implications – To make the proposed optimization method more effective, the key point of further research is to enhance the classifier of E-nose. Practical implications – In this paper, E-nose is used to distinguish the class of wound infection; meanwhile, G-QPSO is used to realize a synchronous optimization of sensor array and classifier of E-nose. These are all important for E-nose to realize its clinical application in wound monitoring. Originality/value – The innovative concept improves the performance of E-nose in wound monitoring and paves the way for the clinical detection of E-nose.


Author(s):  
Nazila Darabi ◽  
Abdalhossein Rezai ◽  
Seyedeh Shahrbanoo Falahieh Hamidpour

Breast cancer is a common cancer in female. Accurate and early detection of breast cancer can play a vital role in treatment. This paper presents and evaluates a thermogram based Computer-Aided Detection (CAD) system for the detection of breast cancer. In this CAD system, the Random Subset Feature Selection (RSFS) algorithm and hybrid of minimum Redundancy Maximum Relevance (mRMR) algorithm and Genetic Algorithm (GA) with RSFS algorithm are utilized for feature selection. In addition, the Support Vector Machine (SVM) and k-Nearest Neighbors (kNN) algorithms are utilized as classifier algorithm. The proposed CAD system is verified using MATLAB 2017 and a dataset that is composed of breast images from 78 patients. The implementation results demonstrate that using RSFS algorithm for feature selection and kNN and SVM algorithms as classifier have accuracy of 85.36% and 75%, and sensitivity of 94.11% and 79.31%, respectively. In addition, using hybrid GA and RSFS algorithm for feature selection and kNN and SVM algorithms as classifier have accuracy of 83.87% and 69.56%, and sensitivity of 96% and 81.81%, respectively, and using hybrid mRMR and RSFS algorithms for feature selection and kNN and SVM algorithms as classifier have accuracy of 77.41% and 73.07%, and sensitivity of 98% and 72.72%, respectively.


2019 ◽  
Vol 43 (1) ◽  
pp. 53-71 ◽  
Author(s):  
Ahmed Al-Rawi ◽  
Jacob Groshek ◽  
Li Zhang

PurposeThe purpose of this paper is to examine one of the largest data sets on the hashtag use of #fakenews that comprises over 14m tweets sent by more than 2.4m users.Design/methodology/approachTweets referencing the hashtag (#fakenews) were collected for a period of over one year from January 3 to May 7 of 2018. Bot detection tools were employed, and the most retweeted posts, most mentions and most hashtags as well as the top 50 most active users in terms of the frequency of their tweets were analyzed.FindingsThe majority of the top 50 Twitter users are more likely to be automated bots, while certain users’ posts like that are sent by President Donald Trump dominate the most retweeted posts that always associate mainstream media with fake news. The most used words and hashtags show that major news organizations are frequently referenced with a focus on CNN that is often mentioned in negative ways.Research limitations/implicationsThe research study is limited to the examination of Twitter data, while ethnographic methods like interviews or surveys are further needed to complement these findings. Though the data reported here do not prove direct effects, the implications of the research provide a vital framework for assessing and diagnosing the networked spammers and main actors that have been pivotal in shaping discourses around fake news on social media. These discourses, which are sometimes assisted by bots, can create a potential influence on audiences and their trust in mainstream media and understanding of what fake news is.Originality/valueThis paper offers results on one of the first empirical research studies on the propagation of fake news discourse on social media by shedding light on the most active Twitter users who discuss and mention the term “#fakenews” in connection to other news organizations, parties and related figures.


PLoS ONE ◽  
2021 ◽  
Vol 16 (8) ◽  
pp. e0255307
Author(s):  
Fujun Wang ◽  
Xing Wang

Feature selection is an important task in big data analysis and information retrieval processing. It reduces the number of features by removing noise, extraneous data. In this paper, one feature subset selection algorithm based on damping oscillation theory and support vector machine classifier is proposed. This algorithm is called the Maximum Kendall coefficient Maximum Euclidean Distance Improved Gray Wolf Optimization algorithm (MKMDIGWO). In MKMDIGWO, first, a filter model based on Kendall coefficient and Euclidean distance is proposed, which is used to measure the correlation and redundancy of the candidate feature subset. Second, the wrapper model is an improved grey wolf optimization algorithm, in which its position update formula has been improved in order to achieve optimal results. Third, the filter model and the wrapper model are dynamically adjusted by the damping oscillation theory to achieve the effect of finding an optimal feature subset. Therefore, MKMDIGWO achieves both the efficiency of the filter model and the high precision of the wrapper model. Experimental results on five UCI public data sets and two microarray data sets have demonstrated the higher classification accuracy of the MKMDIGWO algorithm than that of other four state-of-the-art algorithms. The maximum ACC value of the MKMDIGWO algorithm is at least 0.5% higher than other algorithms on 10 data sets.


Author(s):  
Ricco Rakotomalala ◽  
Faouzi Mhamdi

In this chapter, we are interested in proteins classification starting from their primary structures. The goal is to automatically affect proteins sequences to their families. The main originality of the approach is that we directly apply the text categorization framework for the protein classification with very minor modifications. The main steps of the task are clearly identified: we must extract features from the unstructured dataset, we use the fixed length n-grams descriptors; we select and combine the most relevant one for the learning phase; and then, we select the most promising learning algorithm in order to produce accurate predictive model. We obtain essentially two main results. First, the approach is credible, giving accurate results with only 2-grams descriptors length. Second, in our context where many irrelevant descriptors are automatically generated, we must combine aggressive feature selection algorithms and low variance classifiers such as SVM (Support Vector Machine).


Sign in / Sign up

Export Citation Format

Share Document