Spatial prediction of landslide susceptibility using data mining-based kernel logistic regression, naive Bayes and RBFNetwork models for the Long County area (China)

YouTube has become a popular social media among the users. Due to YouTube popularity, it became a platform for spammer to distribute spam through the comments on YouTube. This has become a concern because spam can lead to phishing attack which the target can be any user that click any malicious link. Spam has its own features that can be analyzed and detected by classification. Hence, enhancement features are proposed to detect YouTube spam. In order to conduct the experiments, a YouTube Spam detection framework that consists of five (5) phases such as data collection, pre-processing, features selection and extraction, classification and detection were developed. This paper, proposed the YouTube detection framework, examined and validate each of the phases by using two types of data mining tool. The features are constructed from analysis by using data collected from YouTube Spam dataset by using Naïve Bayes and Logistic Regression and tested in two different data mining tools which is Weka and Rapid Miner. From the analysis, thirteen (13) features that had been tested on Weka and RapidMiner shows high accuracy, hence is being used throughout the experiment in this research. Result of Naïve Bayes and Logistic Regression run in Weka is slightly higher than RapidMiner. In addition, result of Naïve Bayes is higher than Logistic Regression with 87.21% and 85.29% respectively in Weka. While in RapidMiner there is slightly different of accuracy between Naïve Bayes and Logistic Regression 80.41% and 80.88%. But, precision of Naïve Bayes is higher than Logistic Regression.

Download Full-text

Algoritma Naïve Bayes Untuk Memprediksi Kredit Macet Pada Koperasi Simpan Pinjam

Jurnal Informatika Upgris ◽

10.26877/jiu.v4i2.2919 ◽

2019 ◽

Vol 4 (2) ◽

Author(s):

Diah Puspitasari ◽

Syifa Sintia Al Khautsar ◽

Wida Prima Mustika

Keyword(s):

Data Mining ◽

Predictive Value ◽

Naive Bayes ◽

False Negative ◽

False Negative Rate ◽

True Positive Rate ◽

Naïve Bayes ◽

Data Mining Technique ◽

Application Form ◽

Using Data

Cooperatives are a forum that can help people, especially small and medium-sized communities. Cooperatives play an important role in the economic growth of the community such as the price of basic commodities which are relatively cheap and there are also cooperatives that offer borrowing and storing money for the community. Constraints that have been felt by this cooperative are that borrowers find it difficult to repay loan installments, causing bad credit. Because the cooperative in conducting credit analysis is carried out in a personal manner, namely by filling out the loan application form along with the requirements and conducting a field survey. Therefore there is a need for an evaluation to be carried out in lending to borrowers. To minimize these problems, it is necessary to detect customer criteria that are used to predict bad loans and to determine whether or not the elites are eligible to take credit using data mining. The data mining technique used is classification with the Naive Bayes method. Based on testing the accuracy of the resulting model obtained accuracy level of 59%, sensitivity (True Positive Rate (TP Rate) or Recall) of 46.80%, specificity (False Negative Rate (FN Rate or Precision) of 69.81%, Positive Predictive Value (PPV) of 57.89%, and Negative Predictive Value (NPV) of 59.67%.

Download Full-text

Predicting heart ailment in patients with varying number of features using data mining techniques

International Journal of Informatics and Communication Technology (IJ-ICT) ◽

10.11591/ijict.v8i1.pp56-62 ◽

2019 ◽

Vol 8 (1) ◽

pp. 56

Author(s):

T R Stella Mary ◽

Shoney Sebastian

Keyword(s):

Data Mining ◽

Heart Disease ◽

Random Forest ◽

Naive Bayes ◽

Heart Diseases ◽

Naïve Bayes ◽

Bayes Classifier ◽

Data Mining Techniques ◽

Using Data ◽

Almost All

<span>Data mining can be defined as a process of extracting unknown, verifiable and possibly helpful data from information. Among the various ailments, heart ailment is one of the primary reason behind death of individuals around the globe, hence in order to curb this, a detailed analysis is done using Data Mining. Many a times we limit ourselves with minimal attributes that are required to predict a patient with heart disease. By doing so we are missing on a lot of important attributes that are main causes for heart diseases. Hence, this research aims at considering almost all the important features affecting heart disease and performs the analysis step by step with minimal to maximum set of attributes using Data Mining techniques to predict heart ailments. The various classification methods used are Naïve Bayes classifier, Random Forest and Random Tree which are applied on three datasets with different number of attributes but with a common class label. From the analysis performed, it shows that there is a gradual increase in prediction accuracies with the increase in the attributes irrespective of the classifiers used and Naïve Bayes and Random Forest algorithms comparatively outperforms with these sets of data.</span>

Download Full-text

Predicting Heart Ailment in Patients with Varying number of Features using Data Mining Techniques

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i4.pp2675-2681 ◽

2019 ◽

Vol 9 (4) ◽

pp. 2675

Author(s):

T R Stella Mary ◽

Shoney Sebastian

Keyword(s):

Data Mining ◽

Heart Disease ◽

Random Forest ◽

Naive Bayes ◽

Heart Diseases ◽

Naïve Bayes ◽

Bayes Classifier ◽

Data Mining Techniques ◽

Using Data ◽

Almost All

<span lang="EN-US">Data mining can be defined as a process of extracting unknown, verifiable and possibly helpful data from information. Among the various ailments, heart ailment is one of the primary reason behind death of individuals around the globe, hence in order to curb this, a detailed analysis is done using Data Mining. Many a times we limit ourselves with minimal attributes that are required to predict a patient with heart disease. By doing so we are missing on a lot of important attributes that are main causes for heart diseases. Hence, this research aims at considering almost all the important features affecting heart disease and performs the analysis step by step with minimal to maximum set of attributes using Data Mining techniques to predict heart ailments. The various classification methods used are Naïve Bayes classifier, Random Forest and Random Tree which are applied on three datasets with different number of attributes but with a common class label. From the analysis performed, it shows that there is a gradual increase in prediction accuracies with the increase in the attributes irrespective of the classifiers used and Naïve Bayes and Random Forest algorithms comparatively outperforms with these sets of data.</span>

Download Full-text

Uji Performa Algoritma Naïve Bayes untuk Prediksi Masa Studi Mahasiswa

Creative Information Technology Journal ◽

10.24076/citec.2019v6i1.178 ◽

2020 ◽

Vol 6 (1) ◽

pp. 1

Author(s):

Irkham Widhi Saputro ◽

Bety Wulan Sari

Keyword(s):

Data Mining ◽

Cross Validation ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Study Program ◽

New Students ◽

Using Data ◽

The Many ◽

Fold Cross Validation

Universitas AMIKOM Yogyakarta adalah salah satu perguruan tinggi yang memiliki ribuan mahasiswa baru khususnya pada prodi Informatika. Pada tahun 2012 tercatat ada 1009 mahasiswa baru, dan pada tahun 2013 juga tercatat ada sebanyak 859 mahasiswa baru. Namun sayangnya, dari sekian banyak mahasiswa hanya sekitar 50% saja yang dapat lulus dengan tepat waktu. Data tersebut untuk membuat sistem klasifikasi menggunakan teknik data mining dengan metode Naïve Bayes. Dataset yang akan digunakan sebanyak 300 data yang bersumber dari data alumni angkatan 2012, dan 2013 dengan masing-masing data sebanyak 150. Data yang diperoleh memiliki 144 mahasiswa dengan keterangan lulus tepat waktu, dan 156 mahasiswa dengan keterangan lulus tidak tepat waktu. Proses pengujian akan dilakukan menggunakan metode 10-Fold Cross Validation, dan Confusion Matrix. Hasil pengujian menunjukkan bahwa rata-rata performa dari model Naïve Bayes mempunyai nilai akurasi sebesar 68%, nilai precision sebesar 61.3%, nilai recall sebesar 65.3%, dan nilai f1-score sebesar 61%. Nilai performa dari model dapat dipengaruhi oleh dataset yang digunakan untuk pembuatan model.Kata Kunci — data mining, Naïve Bayes, K-Fold Cross Validation, Confusion MatrixAMIKOM Yogyakarta University is one of the colleges that has thousands of new students, especially in the Informatics study program. In 2012 there were 1009 new students, and in 2013 there were 859 new students. But unfortunately, of the many students only around 50% can graduate on time. The data is to make the classification system using data mining techniques with the Naïve Bayes method. The dataset will be used as much as 300 data sourced from alumni data of 2012, and 2013 with each data as much as 150. The data obtained has 144 students with information passed on time, and 156 students with graduation information not on time. The testing process will be carried out using the 10-Fold Cross Validation, and Confusion Matrix method. The test results show that the average performance of the Naïve Bayes model has an accuracy value of 68%, precision value is 61.3%, recall value is 65.3%, and f1-score is 61%. The performance value of the model can be influenced by the dataset used for modeling.Keywords — data mining, classification, Naïve Bayes, graduation time

Download Full-text

Comparison of a logistic regression and Naïve Bayes classifier in landslide susceptibility assessments: The influence of models complexity and training dataset size

CATENA ◽

10.1016/j.catena.2016.06.004 ◽

2016 ◽

Vol 145 ◽

pp. 164-179 ◽

Cited By ~ 151

Author(s):

Paraskevas Tsangaratos ◽

Ioanna Ilia

Keyword(s):

Logistic Regression ◽

Landslide Susceptibility ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Dataset ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Dataset Size ◽

And Training

Download Full-text

Arrangement of Players Position in Soccer Using the Technique of Naive Bayes

ComTech Computer Mathematics and Engineering Applications ◽

10.21512/comtech.v6i4.2204 ◽

2015 ◽

Vol 6 (4) ◽

pp. 627

Author(s):

Gusti Made Trisetya Putra ◽

Muhammad Rusli

Keyword(s):

Data Mining ◽

Decision Support ◽

Decision Support System ◽

Support System ◽

Soccer Player ◽

Naive Bayes ◽

Research Result ◽

Naïve Bayes ◽

Young Age ◽

Using Data

In the modern soccer era, soccer is already considered as an entertainment, even modern soccer already become as an industry or a business that considered can bring a great profit to the club owner. One of the most important factor in building a team is young age soccer player development. Right young age soccer player development method, can be very helpful in establish a good team. A professional team must have acoach, for the first team or junior team. The duties of a coach is determine a right position for soccer player in the game, this duties sometimes make a coach is hard to making a right decision. This research will discussabout how to design a decision support system for determine soccer player using naive bayes technique. Data mining used naive bayes technique for find a prediction for soccer player based on the player skill test result. From this research result, it can be seen that by using decision support system using data mining with naive bayes technique can be help coach performance in determine position for soccer player especially for young age soccer player development so that can help coach in the making right decision effectively and efficiently.

Download Full-text

PENERAPAN DATA MINING MENGGUNAKAN METODE TEKNIK CLASSIFICATION UNTUK MELIHAT POTENSI KEPATUHAN WAJIB PAJAK BUMI DAN BANGUNAN

Jurnal Ilmiah Matrik ◽

10.33557/jurnalmatrik.v20i2.119 ◽

2019 ◽

Vol 20 (2) ◽

pp. 157-168

Author(s):

Qoriani Widayati

Keyword(s):

Data Mining ◽

Regional Development ◽

Naive Bayes ◽

Naïve Bayes ◽

Regional Government ◽

Data Mining Techniques ◽

Know How ◽

Bayes Algorithm ◽

Using Data

The goverment implements development in Indonesia, requires substantial funds. The entry of cash from the Land and Building Tax is the most important part for the development of a region, with the results that have been obtained by the regional government can increase regional development with various infrastructures that help the community to carry out various activities and make the area more advanced. One type of tax is the Land and Building Tax (PBB). With the increasing number of taxpayers and data paying contributions directly into the treasury of state finances, the UPT BPPD of SU II Subdistrict of Palembang city did not know how many obedient and non-compliant taxpayers. In this study using data mining techniques, namely classification by applying the Naive Bayes algorithm and getting from the number of taxpayers as many as 1,647 taxpayers with an accuracy of 99.33% which has the potential to not be on time in 16 ulu villages at 0,437 and sub-district households with data of 0.229.

Download Full-text

Development of a Novel Hybrid Intelligence Approach for Landslide Spatial Prediction

Applied Sciences ◽

10.3390/app9142824 ◽

2019 ◽

Vol 9 (14) ◽

pp. 2824 ◽

Cited By ~ 30

Author(s):

Nguyen ◽

Tuyen ◽

Shirzadi ◽

Pham ◽

Shahabi ◽

...

Keyword(s):

Landslide Susceptibility ◽

Naive Bayes ◽

Spatial Prediction ◽

Absolute Error ◽

Naïve Bayes ◽

Landslide Susceptibility Mapping ◽

Support Vector ◽

Operating Characteristics ◽

Conditioning Factors ◽

Hybrid Intelligence

We proposed an innovative hybrid intelligent approach, namely, the multiboost based naïve bayes trees (MBNBT) method for the spatial prediction of landslides in the Mu Cang Chai District of Yen Bai Province, Vietnam. The MBNBT, which is an ensemble of the multiboost (MB) and naïve bayes trees (NBT) base classifier, has rarely been applied for landslide susceptibility mapping around the world. For the modeling, we selected 248 landslide locations in the hilly terrain of the study area. Fifteen landslide conditioning factors were selected for the construction of the database based on the one-R attribute evaluation (ORAE) technique. Model validation was done using statistical metrics, namely, sensitivity, specificity, accuracy, mean absolute error (MAE), root mean square error (RMSE), and the area under the receiver operating characteristics curve (AUC). Performance of the hybrid model was evaluated and compared with popular soft computing benchmark models, namely, multiple perceptron neural network (MLPN), Support Vector Machines (SVM), and single NBT. Results indicated that the proposed MBNBT (AUC = 0.824) model outperformed the popular models, namely, the MLPN (AUC = 0.804), SVM (AUC = 0.804), and NBT (AUC = 0.800) models. Analysis of the model results also suggested that the MB meta classifier ensemble model could enhance the prediction power of the NBT model. Therefore, the MBNBT is a suitable method for the assessment of landslide susceptibility in landslide prone areas.

Download Full-text