Bayesian Model Assessment and Comparison Using Cross-Validation Predictive Densities

In this work, we discuss practical methods for the assessment, comparison, and selection of complex hierarchical Bayesian models. A natural way to assess the goodness of the model is to estimate its future predictive capability by estimating expected utilities. Instead of just making a point estimate, it is important to obtain the distribution of the expected utility estimate because it describes the uncertainty in the estimate. The distributions of the expected utility estimates can also be used to compare models, for example, by computing the probability of one model having a better expected utility than some other model. We propose an approach using cross-validation predictive densities to obtain expected utility estimates and Bayesian bootstrap to obtain samples from their distributions. We also discuss the probabilistic assumptions made and properties of two practical cross-validation methods, importance sampling and k-fold cross-validation. As illustrative examples, we use multilayer perceptron neural networks and gaussian processes with Markov chain Monte Carlo sampling in one toy problem and two challenging real-world problems.

Download Full-text

Implementasi Algoritma C5.0 Untuk Menganalisa Gejala Prioritas Pada Anak Yang Mengalami Bullying

Repositor ◽

10.22219/repositor.v2i8.410 ◽

2020 ◽

Vol 2 (8) ◽

Author(s):

Nabillah Annisa Rahmayanti ◽

Yufis Azhar ◽

Gita Indah Marthasari

Keyword(s):

Feature Selection ◽

Cross Validation ◽

Evaluation Method ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Training Data ◽

Victims Of Bullying ◽

Fold Cross Validation ◽

Selection Of

AbstrakBullying sering terjadi pada anak-anak khususnya remaja dan meresahkan para orang tua. Maraknya kasus bullying di negeri ini bahkan sampai menyebabkan korban jiwa. Hal ini dapat dicegah dengan cara mengetahui gejala-gejala seorang anak yang mengalami bullying. Kondisi seorang anak yang tidak dapat mengungkapkan keluh kesahnya, tentu membuat orang tua dan juga guru di sekolah sukar dalam mengerti apa yang sedang menimpanya. Hal tersebut bisa saja dikarenakan anak sedang mengalami tindakan bullying oleh teman-temannya. Oleh karena itu peneliti memiliki tujuan untuk menghasilkan fitur yang telah terseleksi dengan menggunakan algoritma C5.0. Sehingga dengan menggunakan fitur yang telah terseleksi dapat meringankan pekerjaan dalam mengisi kuisioner dan juga mempersingkat waktu dalam menentukan seorang anak apakah terkena bullying atau tidak berdasarkan gejala yang ada di setiap pertanyaan pada kuisioner. Untuk menunjang data dalam penelitian ini, peneliti menggunakan kuisioner untuk mendapatkan jawaban dari pertanyaan yang berisi tentang gejala anak yang menjadi korban bullying. Jawaban dari responden akan diolah menjadi kumpulan data yang nantinya akan dibagi menjadi data latih dan data uji untuk selanjutnya diteliti dengan menggunakan Algoritma C5.0. Metode evaluasi yang digunakan pada penelitian ini yaitu 10 fold cross validation dan untuk menilai akurasi menggunakan confusion matrix. Penelitian ini juga melaukan perbandingan dengan beberapa algoritma klasifikasi lainnya yaitu Naive Bayes dan KNN yang bertujuan untuk melhat seberapa akurat algoritma C5.0 dalam melakukan seleksi fitur. Hasil pengujian menunjukkan bahwa algoritma C5.0 mampu melakukan seleksi fitur dan juga memiliki tingkat akurasi yang lebih baik jika dibandingkan dengan algoritma Naive Bayes dan KNN dengan hasil akurasi sebelum menggunakan seleksi fitur sebesar 92,77% dan setelah menggunakan seleksi fitur sebesar 93,33%. Abstract Bullying often occurs in children, especially teenagers and unsettles parents. The rise of cases of bullying in this country even caused casualties. This can be prevented by knowing the symptoms of a child who has bullying. The condition of a child who cannot express his complaints, certainly makes parents and teachers at school difficult to understand what is happening to them. This could be because the child is experiencing bullying by his friends. Therefore, researchers have a goal to produce selected features using the C5.0 algorithm. So using the selected features can ease the work in filling out questionnaires and also shorten the time in determining whether a child is exposed to bullying or not based on the symptoms in each question in the questionnaire. To support the data in this study, the researcher used a questionnaire to get answers to questions that contained the symptoms of children who were victims of bullying. The answer from the respondent will be processed into a data collection which will later be divided into training data and test data for further research using the C5.0 Algorithm. The evaluation method used in this study is 10 fold cross validation and to assess accuracy using confusion matrix. This study also carried out a comparison with several other classification algorithms, namely Naive Bayes and KNN which aimed to see how accurate the C5.0 algorithm was in feature selection. The test results show that the C5.0 algorithm is capable of feature selection and also has a better accuracy compared to the Naive Bayes and KNN algorithms with accuracy results before using feature selection of 92.77% and after using feature selection of 93.33%

Download Full-text

Parameter Selection of SVR Based on Improved K-Fold Cross Validation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.462-463.182 ◽

2013 ◽

Vol 462-463 ◽

pp. 182-186 ◽

Cited By ~ 2

Author(s):

Ju E Wang ◽

Jian Zhong Qiao

Keyword(s):

Time Series ◽

Cross Validation ◽

Time Series Data ◽

Parameter Selection ◽

Small Samples ◽

Series Data ◽

Support Vector ◽

Model Parameters ◽

Fold Cross Validation ◽

Selection Of

This article firstly uses svm to forecast cashmere price time series. The forecasting result mainly depends on parameter selection. The normal parameter selection is based on k-fold cross validation. The k-fold cross validation is suitable for classification. In this essay, k-fold cross validation is improved to ensure that only the older data can be used to forecast latter data to improve prediction accuracy. This essay trains the cashmere price time series data to build mathematical model based on SVM. The selection of the model parameters are based on improved cross validation. The price of Cashmere can be forecasted by the model. The simulation results show that support vector machine has higher fitting precision in the situation of small samples. It is feasible to forecast cashmere price based on SVM.

Download Full-text

K-Nearest Neighbor for Classification of Tomato Maturity Level Based on Hue, Saturation, and Value Colors

Indonesian Journal of Artificial Intelligence and Data Mining ◽

10.24014/ijaidm.v2i2.7975 ◽

2019 ◽

Vol 2 (2) ◽

pp. 101

Author(s):

Suwanto Sanjaya ◽

Morina Lisa Pura ◽

Siska Kurnia Gusti ◽

Febi Yanto ◽

Fadhilah Syafria

Keyword(s):

Cross Validation ◽

Nearest Neighbor ◽

Image Size ◽

K Nearest Neighbor ◽

Color Information ◽

Maturity Level ◽

Total Data ◽

Fold Cross Validation ◽

Selection Of

The selection of tomatoes can use several indicators. One of the indicators is the fruit color. In digital image processing, one of the color information that could be used in Hue, Saturation, and Value (HSV). In this research, HSV is proposed as a color model feature for information on the ripeness of tomatoes. The total data of tomato images used in this research were 400 images from four sides. The maturity level of tomatoes uses five levels, namely green, turning, pink, light red, and red. The process of divide data uses K-Fold Cross Validation with ten folds. The method used for classification is k-Nearest Neighbor (kNN). The scenario of the test performed is to combine the image size with the parameter value of the neighbor (k). The image sizes tested are 100x100 pixels, 300x300 pixels, 600x600 pixels and 1000x1000 pixels. The “k” values tested were 1, 3, 5, 7, 9, 11, and 13. The highest accuracy reached 92.5% in the image size 1000x1000 pixels with a parameter “k” is 3. The result of the experiment showed that the image size has a significant influence of accuracy, but the parameter value of neighbor (k) has an influence that is not too significant.

Download Full-text

Combination of Support Vector Machine and K-Fold cross-validation for prediction of long-term degradation of the compressive strength of marine concrete

International Journal of Computational Physics Series ◽

10.29167/a1i1p120-130 ◽

2018 ◽

Vol 1 (1) ◽

pp. 120-130 ◽

Cited By ~ 1

Author(s):

Chunxiang Qian ◽

Wence Kang ◽

Hao Ling ◽

Hua Dong ◽

Chengyao Liang ◽

...

Keyword(s):

Support Vector Machine ◽

Environmental Factors ◽

Cross Validation ◽

Concrete Strength ◽

Simulation Method ◽

Support Vector ◽

Svm Model ◽

Artificial Neural Network Ann ◽

Influence Degree ◽

Fold Cross Validation

Support Vector Machine (SVM) model optimized by K-Fold cross-validation was built to predict and evaluate the degradation of concrete strength in a complicated marine environment. Meanwhile, several mathematical models, such as Artificial Neural Network (ANN) and Decision Tree (DT), were also built and compared with SVM to determine which one could make the most accurate predictions. The material factors and environmental factors that influence the results were considered. The materials factors mainly involved the original concrete strength, the amount of cement replaced by fly ash and slag. The environmental factors consisted of the concentration of Mg2+, SO42-, Cl-, temperature and exposing time. It was concluded from the prediction results that the optimized SVM model appeared to perform better than other models in predicting the concrete strength. Based on SVM model, a simulation method of variables limitation was used to determine the sensitivity of various factors and the influence degree of these factors on the degradation of concrete strength.

Download Full-text

Rancang Bangun Sistem Informasi Untuk Menentukan Kapabilitas Konsumen Dalam Mengambil Pinjaman KPR

Jurnal ULTIMA InfoSys ◽

10.31937/si.v7i2.543 ◽

2016 ◽

Vol 7 (2) ◽

pp. 75-80

Author(s):

Adhi Kusnadi ◽

Risyad Ananda Putra

Keyword(s):

Data Mining ◽

Low Income ◽

Cross Validation ◽

Classification Tree ◽

Large Population ◽

Housing Development ◽

Good Precision ◽

Index Terms ◽

The Government ◽

Fold Cross Validation

Indonesia is one country that has a relatively large population . The government in the period of 5 years, annually hold a procurement program 1 million FLPP house units. This program is held in an effort to provide a decent home for low income people. FLPP housing development requires good precision and speed of development on the part of the developer, this is often hampered by the bank process, because it is difficult to predict the results and speed of data processing in the bank. Knowing the ability of consumers to get subsidized credit, has many advantages, among others, developers can plan a better cash flow, and developers can replace consumers who will be rejected before entering the bank process. For that reason built a system that can help developers. There are many methods that can be used to create this application. One of them is data mining with Classification tree. The results of 10-fold-cross-validation applications have an accuracy of 92%. Index Terms-Data Mining, Classification Tree, Housing, FLPP, 10-fold-cross Validation, Consumer Capability

Download Full-text

Klasifikasi Berita Kriminal Menggunakan NaÃ¯ve Bayes Classifier (NBC) dengan Pengujian K-Fold Cross Validation

Jurnal Sains dan Informatika ◽

10.34128/jsi.v5i2.177 ◽

2019 ◽

Vol 5 (2) ◽

pp. 108-117

Author(s):

Herfia Rhomadhona ◽

Jaka Permadi

Keyword(s):

Cross Validation ◽

Online Media ◽

Bayes Classifier ◽

Ve Bayes ◽

Fold Cross Validation

Berita kriminalitas merupakan berita yang selalu menjadi trending topik di setiap media massa, khususnya media massa online. Media massa online terlah menyediakan beberapa fasilitas untuk mempermudah masyarakan dalam mencari sebuah berita berdasarkan topik. Media massa online melabeli suatu berita berdasarkan kategorinya. Namun, media massa online tidak memberikan sub kategori pada berita tersebut. Sebagai contoh jika seorang pengguna membuka kategori kriminal, maka yang ditampilkan adalah semua jenis berita kriminal tanpa memberikan informasi yang spesifik dari jenis kriminalitasnya. Permasalahan tersebut dapat diatasi dengan mengklasifikasikan berita kriminalitas berdasarkan subkategori. Penelitian ini menggunakan metode NaÃ¯ve Bayes Classifier (NBC) untuk mengklasifikasi berita berdasarkan sub kategorinya. Adapun subkategori terbagi kedalam 5 kategori yaitu korupsi, narkoba, pencurian, pemerkosaan dan pembunuhan. Penelitian ini bertujuan untuk mengetahui kemampuan NBC dalam mengklasifikasi berita dengan melakukan pengujian menggunakan teknik K-Fold Cross Validation dengan nilai K dari 3 sampai 10. Hasil pengujian menyatakan bahwa NBC memiliki kemampuan dalam klasifikasi berita kriminal dengan nilai precision sebesar 98,53 %, nilai recall sebesar 98,44 % dan nilai accuracy sebesar 99,38 %.

Download Full-text

Prediction of K562 Cells Functional Inhibitors Based on Machine Learning Approaches

Current Pharmaceutical Design ◽

10.2174/1381612825666191107092214 ◽

2020 ◽

Vol 25 (40) ◽

pp. 4296-4302 ◽

Cited By ~ 2

Author(s):

Yuan Zhang ◽

Zhenyan Han ◽

Qian Gao ◽

Xiaoyi Bai ◽

Chi Zhang ◽

...

Keyword(s):

Machine Learning ◽

Inclusion Bodies ◽

Cross Validation ◽

Independent Set ◽

K562 Cells ◽

Machine Learning Algorithms ◽

Learning Approaches ◽

Validation Test ◽

Excess Number ◽

Fold Cross Validation

Background: β thalassemia is a common monogenic genetic disease that is very harmful to human health. The disease arises is due to the deletion of or defects in β-globin, which reduces synthesis of the β-globin chain, resulting in a relatively excess number of α-chains. The formation of inclusion bodies deposited on the cell membrane causes a decrease in the ability of red blood cells to deform and a group of hereditary haemolytic diseases caused by massive destruction in the spleen. Methods: In this work, machine learning algorithms were employed to build a prediction model for inhibitors against K562 based on 117 inhibitors and 190 non-inhibitors. Results: The overall accuracy (ACC) of a 10-fold cross-validation test and an independent set test using Adaboost were 83.1% and 78.0%, respectively, surpassing Bayes Net, Random Forest, Random Tree, C4.5, SVM, KNN and Bagging. Conclusion: This study indicated that Adaboost could be applied to build a learning model in the prediction of inhibitors against K526 cells.

Download Full-text

The Expected Utility Hypothesis and the Selection of Optimal Deductibles for a Given Insurance Policy

The Journal of Business ◽

10.1086/295179 ◽

1969 ◽

Vol 42 (2) ◽

pp. 143 ◽

Cited By ~ 44

Author(s):

John P. Gould

Keyword(s):

Expected Utility ◽

Insurance Policy ◽

Selection Of

Download Full-text

Environmental Sensitivity and Awareness as Differentiating Factors in the Purchase Decision-Making Process in the Smartphone Industry—Case of Polish Consumers

Sustainability ◽

10.3390/su13010348 ◽

2021 ◽

Vol 13 (1) ◽

pp. 348

Author(s):

Lukasz Skowron ◽

Monika Sak-Skowron

Keyword(s):

Expectation Maximization ◽

Cross Validation ◽

Expectation Maximization Algorithm ◽

Decision Making Process ◽

Environmental Sensitivity ◽

Significance Level ◽

On Line ◽

Purchase Process ◽

The Impact ◽

Fold Cross Validation

The first of the research objectives discussed in this article was to analyze the differences related to the valuation of particular factors influencing the purchase process in the smartphone industry, expressed by respondents with different sensitivity and environmental awareness, as well as the assessment of their knowledge about the impact of smartphones on the natural environment. The second objective of the research was to determine whether the level of environmental sensitivity, awareness and knowledge about the impact of smartphones on the environment has a statistically significant influence on the respondents’ choice of smartphone brand. The survey was conducted using an on-line questionnaire, distributed by a specialized research agency on a representative sample of over 1000 Polish residents. In order to identify the various customers clusters, the expectation-maximization algorithm and the v-fold cross-validation were used. Additionally, in order to analyze the significance level of differences between clusters the nonparametric Mann-Whitney U-test was carried out. The results show unequivocally that people with a different approach to ecological issues demonstrate statistically significant differences in their purchasing behaviors in the smartphone industry. Furthermore, it was noticed that in the case of comparing some smartphones brands, there is a statistically confirmed difference in the environmental sensitivity and awareness of the customers who use them. Moreover, the research has shown that in Polish customers’ consciousness smartphones are mistakenly considered to be relatively safe and environmentally friendly products.

Download Full-text

Interclass Interference Suppression in Multi-Class Problems

Applied Sciences ◽

10.3390/app11010450 ◽

2021 ◽

Vol 11 (1) ◽

pp. 450

Author(s):

Jinfu Liu ◽

Mingliang Bai ◽

Na Jiang ◽

Ran Cheng ◽

Xianling Li ◽

...

Keyword(s):

Classification Accuracy ◽

Cross Validation ◽

Selection Process ◽

Interference Suppression ◽

Generalization Ability ◽

Suppression Effect ◽

Binary Classifiers ◽

The One ◽

Fold Cross Validation ◽

Validation Experiments

Multi-classifiers are widely applied in many practical problems. But the features that can significantly discriminate a certain class from others are often deleted in the feature selection process of multi-classifiers, which seriously decreases the generalization ability. This paper refers to this phenomenon as interclass interference in multi-class problems and analyzes its reason in detail. Then, this paper summarizes three interclass interference suppression methods including the method based on all-features, one-class classifiers and binary classifiers and compares their effects on interclass interference via the 10-fold cross-validation experiments in 14 UCI datasets. Experiments show that the method based on binary classifiers can suppress the interclass interference efficiently and obtain the best classification accuracy among the three methods. Further experiments were done to compare the suppression effect of two methods based on binary classifiers including the one-versus-one method and one-versus-all method. Results show that the one-versus-one method can obtain a better suppression effect on interclass interference and obtain better classification accuracy. By proposing the concept of interclass inference and studying its suppression methods, this paper significantly improves the generalization ability of multi-classifiers.

Download Full-text