Movie Success Prediction Using Naïve Bayes, Logistic Regression and Support Vector Machine

Author(s):  
Rachaell Nihalaani ◽  
Apoorva Shete ◽  
Darakshan Khan
Author(s):  
Hongxin Wang ◽  
Lijing Jia ◽  
Heng Zhuang ◽  
Xueyan Li ◽  
Yuzhuo Zhao ◽  
...  

This study is to solve the problems of an overly-broad scale of medical indicators, lack of retrospective research samples, insufficient depth of data mining, and low disease prediction accuracy. In this paper, we propose an intelligent screening algorithm that combines a genetic algorithm, cellular automata, and rough set theory. This algorithm can achieve high accuracy in predicting patient outcomes with a small number of indicators. And we compare it with the traditional genetic algorithm. We built the prediction model with 64 indicators based on the logistic regression (AUC 0.8628), support vector machine (AUC 0.5319), Naïve Bayes (AUC 0.7102), and AdaBoost algorithms (AUC 0.9095). Using the cellular genetic algorithm for attribute screening not only effectively reduces the number of indicators but also achieve almost the same accuracy of prediction with 8 indicators based on the logistic regression (AUC 0.8782), support vector machine (AUC 0.8525), Naïve Bayes (AUC 0.8408), and AdaBoost algorithms (AUC 0.8770). Compared with the traditional scoring system, the predictive model established in this paper can more accurately predict rebleeding accidents based on physiological test indicators and continuous patient indicators.


2021 ◽  
Vol 1 (1) ◽  
pp. 14-20
Author(s):  
Tommy Tommy ◽  
Amir Mahmud Husein

Perguruan tinggi merupakan satuan penyelenggara pendidikan tinggi sebagai tingkat lanjut jenjang pendidikan menengah di jalur pendidikan formal. Aspek prestasi belajar merupakan salah satu aspek penilaian keberhasilan perguruan tinggi dalam proses belajar. Dalam makalah ini menyajikan hasil analisis hubungan antara pembelajaran dengan prestasi mahasiswa dimana tahapan yang dilakukan menggunakan pendetakan data science. Berdasarkan Analisis data terdapat tiga indikator penting dalam penilaian prestasi belajar yaitu pedagogi, profesional dan kepribadian. Ketiga fitur digunakan sebagai variabel dependen untuk memprediksi prestasi belajar dimana algoritma DecisionTree menghasilkan akurasi lebih baik dari pada model k-nearest neighbors (KNN), Logistic Regression, Support Vector Machine, Naive Bayes dan dengan tingkat akurasi 68%, kemudian KNN dengan akurasi 66% dan lainnya sebesar 55% pada masing-masing algoritma yang diusulkan.


2021 ◽  
Vol 9 (1) ◽  
pp. 126-136
Author(s):  
Rahmat Robi Waliyansyah ◽  
Umar Hafidz Asy'ari Hasbullah

Coffee is one of the many favorite drinks of Indonesians. In Indonesia there are 2 types of coffee, namely Arabica & Robusta. The classification of coffee beans is usually done in a traditional way & depends on the human senses. However, the human senses are often inconsistent, because it depends on the mental or physical condition in question at that time, and only qualitative measures can be determined. In this study, to classify coffee beans is done by digital image processing. The parameters used are texture analysis using the Gray Level Coocurrence Matrix (GLCM) method with 4 features, namely Energy, Correlation, Homogeneity & Contrast. For feature extraction using a classification algorithm, namely Naïve Bayes, Tree, Support Vector Machine (SVM) and Logistic Regression. The evaluation of the coffee bean classification model uses the following parameters: AUC, F1, CA, precision & recall. The dataset used is 29 images of Arabica coffee beans and 29 images of Robusta beans. To test the accuracy of the model using Cross Validation. The results obtained will be evaluated using the confusion Matrix. Based on the results of testing and evaluation of the model, it is obtained that the SVM method is the best with the value of AUC = 1, CA = 0.983, F1 = 0.983, Precision = 0.983 and Recall = 0.983.


Author(s):  
Parastoo Golpour ◽  
Majid Ghayour-Mobarhan ◽  
Azadeh Saki ◽  
Habibollah Esmaily ◽  
Ali Taghipour ◽  
...  

(1) Background: Coronary angiography is considered to be the most reliable method for the diagnosis of cardiovascular disease. However, angiography is an invasive procedure that carries a risk of complications; hence, it would be preferable for an appropriate method to be applied to determine the necessity for angiography. The objective of this study was to compare support vector machine, naïve Bayes and logistic regressions to determine the diagnostic factors that can predict the need for coronary angiography. These models are machine learning algorithms. Machine learning is considered to be a branch of artificial intelligence. Its aims are to design and develop algorithms that allow computers to improve their performance on data analysis and decision making. The process involves the analysis of past experiences to find practical and helpful regularities and patterns, which may also be overlooked by a human. (2) Materials and Methods: This cross-sectional study was performed on 1187 candidates for angiography referred to Ghaem Hospital, Mashhad, Iran from 2011 to 2012. A logistic regression, naive Bayes and support vector machine were applied to determine whether they could predict the results of angiography. Afterwards, the sensitivity, specificity, positive and negative predictive values, AUC (area under the curve) and accuracy of all three models were computed in order to compare them. All analyses were performed using R 3.4.3 software (R Core Team; Auckland, New Zealand) with the help of other software packages including receiver operating characteristic (ROC), caret, e1071 and rminer. (3) Results: The area under the curve for logistic regression, naïve Bayes and support vector machine were similar—0.76, 0.74 and 0.75, respectively. Thus, in terms of the model parsimony and simplicity of application, the naïve Bayes model with three variables had the best performance in comparison with the logistic regression model with seven variables and support vector machine with six variables. (4) Conclusions: Gender, age and fasting blood glucose (FBG) were found to be the most important factors to predict the result of coronary angiography. The naïve Bayes model performed well using these three variables alone, and they are considered important variables for the other two models as well. According to an acceptable prediction of the models, they can be used as pragmatic, cost-effective and valuable methods that support physicians in decision making.


Author(s):  
Viet-Ha Nhu ◽  
Ataollah Shirzadi ◽  
Himan Shahabi ◽  
Sushant K. Singh ◽  
Nadhir Al-Ansari ◽  
...  

Shallow landslides damage buildings and other infrastructure, disrupt agriculture practices, and can cause social upheaval and loss of life. As a result, many scientists study the phenomenon, and some of them have focused on producing landslide susceptibility maps that can be used by land-use managers to reduce injury and damage. This paper contributes to this effort by comparing the power and effectiveness of five machine learning, benchmark algorithms—Logistic Model Tree, Logistic Regression, Naïve Bayes Tree, Artificial Neural Network, and Support Vector Machine—in creating a reliable shallow landslide susceptibility map for Bijar City in Kurdistan province, Iran. Twenty conditioning factors were applied to 111 shallow landslides and tested using the One-R attribute evaluation (ORAE) technique for modeling and validation processes. The performance of the models was assessed by statistical-based indexes including sensitivity, specificity, accuracy, mean absolute error (MAE), root mean square error (RMSE), and area under the receiver operatic characteristic curve (AUC). Results indicate that all the five machine learning models performed well for shallow landslide susceptibility assessment, but the Logistic Model Tree model (AUC = 0.932) had the highest goodness-of-fit and prediction accuracy, followed by the Logistic Regression (AUC = 0.932), Naïve Bayes Tree (AUC = 0.864), ANN (AUC = 0.860), and Support Vector Machine (AUC = 0.834) models. Therefore, we recommend the use of the Logistic Model Tree model in shallow landslide mapping programs in semi-arid regions to help decision makers, planners, land-use managers, and government agencies mitigate the hazard and risk.


2019 ◽  
Vol 15 (2) ◽  
pp. 275-280
Author(s):  
Agus Setiyono ◽  
Hilman F Pardede

It is now common for a cellphone to receive spam messages. Great number of received messages making it difficult for human to classify those messages to Spam or no Spam.  One way to overcome this problem is to use Data Mining for automatic classifications. In this paper, we investigate various data mining techniques, named Support Vector Machine, Multinomial Naïve Bayes and Decision Tree for automatic spam detection. Our experimental results show that Support Vector Machine algorithm is the best algorithm over three evaluated algorithms. Support Vector Machine achieves 98.33%, while Multinomial Naïve Bayes achieves 98.13% and Decision Tree is at 97.10 % accuracy.


2018 ◽  
Vol 4 (10) ◽  
pp. 6
Author(s):  
Shivangi Bhargava ◽  
Dr. Shivnath Ghosh

News popularity is the maximum growth of attention given for particular news article. The popularity of online news depends on various factors such as the number of social media, the number of visitor comments, the number of Likes, etc. It is therefore necessary to build an automatic decision support system to predict the popularity of the news as it will help in business intelligence too. The work presented in this study aims to find the best model to predict the popularity of online news using machine learning methods. In this work, the result analysis is performed by applying Co-relation algorithm, particle swarm optimization and principal component analysis. For performance evaluation support vector machine, naïve bayes, k-nearest neighbor and neural network classifiers are used to classify the popular and unpopular data. From the experimental results, it is observed that support vector machine and naïve bayes outperforms better with co-relation algorithm as well as k-NN and neural network outperforms better with particle swarm optimization.


2020 ◽  
Vol 4 (2) ◽  
pp. 362-369
Author(s):  
Sharazita Dyah Anggita ◽  
Ikmah

The needs of the community for freight forwarding are now starting to increase with the marketplace. User opinion about freight forwarding services is currently carried out by the public through many things one of them is social media Twitter. By sentiment analysis, the tendency of an opinion will be able to be seen whether it has a positive or negative tendency. The methods that can be applied to sentiment analysis are the Naive Bayes Algorithm and Support Vector Machine (SVM). This research will implement the two algorithms that are optimized using the PSO algorithms in sentiment analysis. Testing will be done by setting parameters on the PSO in each classifier algorithm. The results of the research that have been done can produce an increase in the accreditation of 15.11% on the optimization of the PSO-based Naive Bayes algorithm. Improved accuracy on the PSO-based SVM algorithm worth 1.74% in the sigmoid kernel.


2020 ◽  
Vol 4 (3) ◽  
pp. 504-512
Author(s):  
Faried Zamachsari ◽  
Gabriel Vangeran Saragih ◽  
Susafa'ati ◽  
Windu Gata

The decision to move Indonesia's capital city to East Kalimantan received mixed responses on social media. When the poverty rate is still high and the country's finances are difficult to be a factor in disapproval of the relocation of the national capital. Twitter as one of the popular social media, is used by the public to express these opinions. How is the tendency of community responses related to the move of the National Capital and how to do public opinion sentiment analysis related to the move of the National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine to get the highest accuracy value is the goal in this study. Sentiment analysis data will take from public opinion using Indonesian from Twitter social media tweets in a crawling manner. Search words used are #IbuKotaBaru and #PindahIbuKota. The stages of the research consisted of collecting data through social media Twitter, polarity, preprocessing consisting of the process of transform case, cleansing, tokenizing, filtering and stemming. The use of feature selection to increase the accuracy value will then enter the ratio that has been determined to be used by data testing and training. The next step is the comparison between the Support Vector Machine and Naive Bayes methods to determine which method is more accurate. In the data period above it was found 24.26% positive sentiment 75.74% negative sentiment related to the move of a new capital city. Accuracy results using Rapid Miner software, the best accuracy value of Naive Bayes with Feature Selection is at a ratio of 9:1 with an accuracy of 88.24% while the best accuracy results Support Vector Machine with Feature Selection is at a ratio of 5:5 with an accuracy of 78.77%.


Sign in / Sign up

Export Citation Format

Share Document