Analysis of Feature Reduction Techniques for Online News Popularity Prediction

2018 ◽  
Vol 4 (10) ◽  
pp. 6
Author(s):  
Shivangi Bhargava ◽  
Dr. Shivnath Ghosh

News popularity is the maximum growth of attention given for particular news article. The popularity of online news depends on various factors such as the number of social media, the number of visitor comments, the number of Likes, etc. It is therefore necessary to build an automatic decision support system to predict the popularity of the news as it will help in business intelligence too. The work presented in this study aims to find the best model to predict the popularity of online news using machine learning methods. In this work, the result analysis is performed by applying Co-relation algorithm, particle swarm optimization and principal component analysis. For performance evaluation support vector machine, naïve bayes, k-nearest neighbor and neural network classifiers are used to classify the popular and unpopular data. From the experimental results, it is observed that support vector machine and naïve bayes outperforms better with co-relation algorithm as well as k-NN and neural network outperforms better with particle swarm optimization.

2020 ◽  
Vol 2 (3) ◽  
pp. 169-178
Author(s):  
Zulia Imami Alfianti ◽  
Deni Gunawan ◽  
Ahmad Fikri Amin

Sentiment analysis is an area of ​​approach that solves problems by using reviews from various relevant scientific perspectives. Reading a review before buying a product is very important to know the advantages and disadvantages of the products we will use, besides reading a cosmetic review can find out the quality of the cosmetic brand is feasible or not be used. Before consumers decide to buy cosmetics, consumers should know in detail the products to be purchased, this can be learned from the testimonials or the results of reviews from consumers who have bought and used the previous product. The number of reviews is certainly very much making consumers reluctant to read reviews. Eventually, the reviews become useless. For this reason, the authors classify based on positive and negative classes, so consumers can find product comparisons quickly and precisely. The implementation of Particle Swarm Optimization (PSO) optimization can improve the accuracy of the Support Vector Machine (SVM) and Naïve Bayes (NB) algorithm can improve accuracy and provide solutions to the review classification problem to be more accurate and optimal. Comparison of accuracy resulting from testing this data is an SVM algorithm of 89.20% and AUC of 0.973, then compared to SVM based on PSO with an accuracy of 94.60% and AUC of 0.985. The results of testing the data for the NB algorithm are 88.50% accuracy and AUC is 0.536, then the accuracy is compared with the PSO based NB for 0.692. In these calculations prove that the application of PSO optimization can improve accuracy and provide more accurate and optimal solutions


2020 ◽  
Vol 8 (2) ◽  
pp. 91-100
Author(s):  
Muhamad Azhar ◽  
Noor Hafidz ◽  
Biktra Rudianto ◽  
Windu Gata

Abstract   Technology implementation in the marketplace world has attracted the attention of researchers to analyze the reviews from customers. The Klik Indomaret application page on GooglePlay is one application that can be used to get information on review data collection. However, getting information on consumer’s opinion or review is not an easy task and need a specific method in categorizing or grouping these reviews into certain groups, i.e. positive or negative reviews. The sentiment analysis study of a review application in GooglePlay is still rare. Therefore, this paper analysis the customer’s sentiment from klikindomaret app using Naive Bayes Classifier (NB) algorithm that is compared to Support Vector Machine (SVM) as well as optimizing the Feature Selection (FS) using the Particle Swarm Optimization method. The results for NB without using FS optimization were 69.74% for accuracy and 0.518 for Area Under Curve (AUC) and for SVM without using FS optimization were 81.21% for accuracy and 0.896 for AUC. While the results of cross-validation NB with FS are 75.21% for accuracy and 0.598 for AUC and cross-validation of SVM with FS is 81.84% for accuracy and 0.898 for AUC, while there is an increase when using the Feature Selection (FS) Particle Swarm Optimization and also the modeling algorithm SVM has a higher value compared to NB for the dataset used in this study.   Keywords: Naive Bayes, Particle Swarm Optimization, Support Vector Machine, Feature Selection, Consumer Review.


Author(s):  
Dedi Saputra ◽  
Windi Irmayani ◽  
Deasy Purwaningtias ◽  
Juniato Sidauruk

Heart disease is a general term for all of types of the disorders which is affects the heart. This research aims to compare several classification algorithms known as the C4.5 algorithm, Naïve Bayes, and Support Vector Machine. The algorithm is about to optimize of the heart disease predicting by applying Particle Swarm Optimization (PSO). Based on the test results, the accuracy value of the C4.5 algorithm is about 74.12% and Naïve Bayes algorithm accuracy value is about 85.26% and the last the Support Vector Machine algorithm is about 85.26%. From the three of algorithms above then continue to do an optimization by using Particle Swarm Optimization. The data is shown that Naïve Bayes algorithm with Particle Swarm Optimization has the highest value based on accuracy value of 86.30%, AUC of 0.895 and precision of 87.01%, while the highest recall value is Support Vector Machine algorithm with Particle Swarm Optimization of 96.00%. Based on the results of the research has been done, the algorithm is expected can be applied as an alternative for problem solving, especially in predicting of the heart disease.


Author(s):  
Anas Faisal ◽  
Yuris Alkhalifi ◽  
Achmad Rifai ◽  
Windu Gata

Penggunaan internet terutama media sosial telah menjadi bagian dari kehidupan bernegara. Hal ini salah satunya karena Anggota Dewan Perwakilan Rakyat Republik Indonesia (DPR RI) banyak yang menyampaikan ide, kebijakan maupun memberikan komentar atas kebijakan pemerintah melalui media sosial. Penelitian ini dilakukan untuk mengukur pendapat atau memisahkan antara sentimen positif dan sentimen negatif terhadap DPR RI. Data yang digunakan dalam penelitian ini didapatkan dengan melakukan crawling pada media sosial twitter. Penelitian dilakukan dengan menggunakan dua Algoritma yaitu Algoritma Support Vector Machine (SVM) dan Naive Bayes (NB). Kedua algoritma tersebut masing-masing dioptimasi menggunakan Particle Swarm Optimization (PSO). Hasil pengujian k-fold cross validation SVM dan NB mendapatkan nilai accuracy 71,04% dan 70,69% dengan nilai Area Under the Curve (AUC) 0,817 dan 0,661. Sedangkan hasil pengujian k-flod cross validation dengan menggunakan PSO, untuk SVM dan NB masing-masing mendapatkan nilai accuracy 75,03% dan 73,49% dengan nilai AUC 0,808 dan 0,719. Penggunaan PSO mampu meningkatkan nilai accuracy algoritma SVM sebesar 3,99% dan 2,8% pada algoritma NB. Hasil dari pengujian kedua algoritma tersebut nilai accuracy tertinggi adalah SVM dengan PSO sebesar 75,03%.


Faktor Exacta ◽  
2019 ◽  
Vol 12 (3) ◽  
pp. 230
Author(s):  
Hernawati Hernawati ◽  
Windu Gata Kedua

<p><em>It is known from various public sentiments conveyed through comments on social media twitter against the capture operations carried out by the corruption eradication commission (KPK) that currently it does not meet the expectations of the community, where officials who are only officials have small corruption rates, not corruption As for the classification algorithms that have strong accuracy at this time are Support Vector Machine and Naïve Bayes algorithms, calculation of Support Vector Machine method for tweet data from 78 positive tweet data and 78 negative tweet data, resulting in an accuracy of 80.77% and AUC 0.867. Whereas the results of accuracy with the Naïve Bayes method are 76.92% and AUC 0.729. Having a difference in accuracy of 3.3%, and after optimizing with the Operator Vector Machine (PSO) weight Particle Swarm Optimization the accuracy is 83.79% and AUC 0.910, while for Naïve Bayes (PSO) produces an accuracy of 80.13% and AUC 0.771 Has a difference in accuracy of 3.6%.</em></p><p><em><br /></em></p><p><em>Diketahui dari berbagai sentimen masyarakat yang disampaikan melalui komentar di media sosial <em>twiter </em>terhadap operasi tangkap tangan yang dilakukan oleh Komisi Pemberantasan Korupsi (KPK) nyatanya saat ini belum memenuhi harapan masyarakat, dimana pejabat yang di ott hanya pejabat yang mempunyai angka korupsi kecil, bukan korupsi yang besar adapun algoritma klasifikasi yang kuat akurasinya saat ini adalah algoritma<em> Support Vector Machine </em>untuk data <em>tweet</em> dari 78 data tweet positif dan 78 data tweet negatif, menghasilkan akurasi sebesar 80.77% dan AUC 0.867. Sedangkan hasil akurasi dengan metode<em> Naïve Bayes</em> adalah 76.92% dan AUC 0.729. Memiliki selisih akurasi sebesar 3.3%, dan setelah di optimalisasi dengan oprator <em>Weight Partical Swarm Optimization</em> untuk <em>Support Vector Machine (PSO)</em> menghasilkan akurasi 83.79% dan AUC 0.910, sedangkan untuk <em>Naïve Bayes</em> (PSO) menghasilkan akurasi sebesar 80.13% dan AUC 0.771 memiliki selisih akurasi sebesar 3.6%.</em></p>


2019 ◽  
Vol 3 (2) ◽  
pp. 176-183
Author(s):  
Sigit Kurniawan ◽  
Windu Gata ◽  
Dewi Ayu Puspitawati ◽  
Nurmalasari ◽  
Muhamad Tabrani ◽  
...  

General elections are an important part of the political process so that many political figures participate in the process. Electability is one of the concerns, various things are done to be able to increase the electability of political figures who participate in general elections. Media has become one of the important tools used to increase electability, one of which is online news media. Reader comments can be used as an assessment of political figures in the form of sentiment analysis. However, it is not easy to analyze sentiments from comments on online news media, because comments contain unstructured text, especially in Indonesian text. Text pre-processing in text mining is an important part of getting the basic information contained in the comments. This research uses Indonesian text pre-processing using the Gata Framework Tetmining. Then proceed with extracting information using the Naïve Bayes classification algorithm and Support Vector Machine which are optimized using Particle Swarm Optimization. Tests carried out with both methods get the results that, Particle Swarm Optimization based on Support Vector Machine is the best method with an accuracy of 78.40% and AUC 0.850. This study found an algorithm that was effective in classifying positive and negative comments related to political figures from online news media.


2016 ◽  
Vol 39 (3) ◽  
pp. 21-30 ◽  
Author(s):  
Ting Li ◽  
Yunong Yang ◽  
Yonghui Wang ◽  
Chao Chen ◽  
Jinbao Yao

To effectively predict traffic fatalities and promote the friendly development of transportation, a prediction model of traffic fatalities is established based on support vector machine (SVM). As the prediction accuracy of SVM largely depends on the selection of parameters, Particle Swarm Optimization (PSO) is introduced to find the optimal parameters. In this paper, small sample and nonlinear data are used to predict fatalities of traffic accident. Traffic accident statistics data of China from 1981 to 2012 are chosen as experimental data. The input variables for predicting accident are highway mileage, vehicle number and population size while the output variables are traffic fatality. To verify the validity of the proposed prediction method, the back-propagation neural network (BPNN) prediction model and SVM prediction model are also used to predict the traffic fatalities. The results show that compared with BPNN prediction model and SVM model, the prediction model of traffic fatalities based on PSO-SVM has higher prediction precision and smaller errors. The model can be more effective to forecast the traffic fatalities. And the method using particle swarm optimization algorithm for parameter optimization of SVM is feasible and effective. In addition, this method avoids overcomes the problem of “over learning” in neural network training progress


Author(s):  
Midde Venkateswarlu Naik ◽  
D. Vasumathi ◽  
A.P. Siva Kumar

Aims: The proposed research work is on an evolutionary enhanced method for sentiment or emotion classification on unstructured review text in the big data field. The sentiment analysis plays a vital role for current generation of people for extracting valid decision points about any aspect such as movie ratings, education institute or politics ratings, etc. The proposed hybrid approach combined the optimal feature selection using Particle Swarm Optimization (PSO) and sentiment classification through Support Vector Machine (SVM). The current approach performance is evaluated with statistical measures, such as precision, recall, sensitivity, specificity, and was compared with the existing approaches. The earlier authors have achieved an accuracy of sentiment classifier in the English text up to 94% as of now. In the proposed scheme, an average accuracy of sentiment classifier on distinguishing datasets outperformed as 99% by tuning various parameters of SVM, such as constant c value and kernel gamma value in association with PSO optimization technique. The proposed method utilized three datasets, such as airline sentiment data, weather, and global warming datasets, that are publically available. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Background: The sentiment analysis plays a vital role for current generation people for extracting valid decisions about any aspect such as movie rating, education institute or even politics ratings, etc. Sentiment Analysis (SA) or opinion mining has become fascinated scientifically as a research domain for the present environment. The key area is sentiment classification on semi-structured or unstructured data in distinguish languages, which has become a major research aspect. User-Generated Content [UGC] from distinguishing sources has been hiked significantly with rapid growth in a web environment. The huge user-generated data over social media provides substantial value for discovering hidden knowledge or correlations, patterns, and trends or sentiment extraction about any specific entity. SA is a computational analysis to determine the actual opinion of an entity which is expressed in terms of text. SA is also called as computation of emotional polarity expressed over social media as natural text in miscellaneous languages. Usually, the automatic superlative sentiment classifier model depends on feature selection and classification algorithms. Methods: The proposed work used Support vector machine as classification technique and particle swarm optimization technique as feature selection purpose. In this methodology, we tune various permutations and combination parameters in order to obtain expected desired results with kernel and without kernel technique for sentiment classification on three datasets, including airline, global warming, weather sentiment datasets, that are freely hosted for research practices. Results: In the proposed scheme, The proposed method has outperformed with 99.2% of average accuracy to classify the sentiment on different datasets, among other machine learning techniques. The attained high accuracy in classifying sentiment or opinion about review text proves superior effectiveness over existing sentiment classifiers. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Conclusion: The objective of the research issue sentiment classifier accuracy has been hiked with the help of Kernel-based Support Vector Machine (SVM) based on parameter optimization. The optimal feature selection to classify sentiment or opinion towards review documents has been determined with the help of a particle swarm optimization approach. The proposed method utilized three datasets to simulate the results, such as airline sentiment data, weather sentiment data, and global warming data that are freely available datasets.


Sign in / Sign up

Export Citation Format

Share Document