Performance comparison of different kernel functions in SVM for different k value in k-fold cross-validation

Requiring only a few relevant characteristics from patients when diagnosing bacterial vaginosis is highly useful for physicians as it makes it less time consuming to collect these data. This would result in having a dataset of patients that can be more accurately diagnosed using only a subset of informative or relevant features in contrast to using the entire set of features. As such, this is a feature selection (FS) problem. In this work, decision tree and Relief algorithms were used as feature selectors. Experiments were conducted on a real dataset for bacterial vaginosis with 396 instances and 252 features/attributes. The dataset was obtained from universities located in Baltimore and Atlanta. The FS algorithms utilized feature rankings, from which the top fifteen features formed a new dataset that was used as input for both support vector machine (SVM) and logistic regression (LR) algorithms for classification. For performance evaluation, averages of 30 runs of 10-fold cross-validation were reported, along with balanced accuracy, sensitivity, and specificity as performance measures. A performance comparison of the results was made between using the total number of features against using the top fifteen. These results found similar attributes from our rankings compared to those reported in the literature. This study is part of ongoing research that is investigating a range of feature selection and classification methods.

Download Full-text

Comparison between fuzzy kernel k-medoids using radial basis function kernel and polynomial kernel function in hepatitis classification

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v10.i1.pp60-65 ◽

2021 ◽

Vol 10 (1) ◽

pp. 60

Author(s):

Glori Stephani Saragih ◽

Sri Hartini ◽

Zuherman Rustam

Keyword(s):

Radial Basis Function ◽

Kernel Function ◽

Basis Function ◽

Cross Validation ◽

Kernel Functions ◽

Polynomial Kernel ◽

Radial Basis ◽

Rbf Kernel ◽

Rbf Kernel Function ◽

Fold Cross Validation

<span id="docs-internal-guid-10508d4e-7fff-5011-7a0e-441840e858c8"><span>This paper compares the fuzzy kernel k-medoids using radial basis function (RBF) and polynomial kernel function in hepatitis classification. These two kernel functions were chosen due to their popularity in any kernel-based machine learning method for solving the classification task. The hepatitis dataset then used to evaluate the performance of both methods that were expected to provide an accurate diagnosis in patients to obtain treatment at an early phase. The data were obtained from two hospitals in Indonesia, consisting of 89 hepatitis-B and 31 hepatitis-C samples. The data were analyzed using several cases of k-fold cross-validation, and the performances were compared according to their accuracy, sensitivity, precision, F1-Score, and running time. From the experiments, it was concluded that fuzzy kernel k-medoids using RBF kernel function is better compared to polynomial kernel function with the 6% increment of accuracy, 13% enhancement of sensitivity, and 5% improvement in F1-Score. On the other side, the precision of fuzzy kernel k-medoids using polynomial kernel function is 2% higher than using the RBF kernel function. According to the results, the use of RBF or polynomial kernel function in fuzzy kernel medoids can be considered according to the primary goal of the classification.</span></span>

Download Full-text

Impression Classification of Endek (Balinese Fabric) Image Using K-Nearest Neighbors Method

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v3i3.611 ◽

2018 ◽

pp. 213-220 ◽

Cited By ~ 1

Author(s):

Gede Aditra Pradnyana ◽

I Komang Agus Suryantara ◽

I Gede Mahendra Darmawiguna

Keyword(s):

Cross Validation ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

K Value ◽

Training Samples ◽

And Training ◽

Validation Testing ◽

Fold Cross Validation ◽

Learning Data

An impression can be interpreted as a psychological feeling toward a product and it plays an important role in decision making. Therefore, the understanding of the data in the domain of impressions will be very useful. This research had the objective of knowing the performance of K-Nearest Neighbors method to classify endek image impression using K-Fold Cross Validation method. The images were taken from 3 locations, namely CV. Artha Dharma, Agung Bali Collection, and Pengrajin Sri Rejeki. To get the image impression was done by consulting with an endek expert named Dr. D.A Tirta Ray, M.Si. The process of data mining was done by using K-Nearest Neighbors Method which was a classification method to a set of data based on learning data that had been classified previously and to classify new objects based on attributes and training samples. K-Fold Cross Validation testing obtained accuracy of 91% with K value in K-Nearest Neighbors of 3, 4, 7, 8.

Download Full-text

Modelling Freshwater Eutrophication with Limited Limnological Data Using Artificial Neural Networks

Water ◽

10.3390/w13111590 ◽

2021 ◽

Vol 13 (11) ◽

pp. 1590

Author(s):

Ekaterini Hadjisolomou ◽

Konstantinos Stefanidis ◽

Herodotos Herodotou ◽

Michalis Michaelides ◽

George Papatheodorou ◽

...

Keyword(s):

Neural Networks ◽

Water Quality ◽

Artificial Neural Networks ◽

Chlorophyll A ◽

Cross Validation ◽

Computational Time ◽

Data Set ◽

K Value ◽

Artificial Neural ◽

Fold Cross Validation

Artificial Neural Networks (ANNs) have wide applications in aquatic ecology and specifically in modelling water quality and biotic responses to environmental predictors. However, data scarcity is a common problem that raises the need to optimize modelling approaches to overcome data limitations. With this paper, we investigate the optimal k-fold cross validation in building an ANN using a small water-quality data set. The ANN was created to model the chlorophyll-a levels of a shallow eutrophic lake (Mikri Prespa) located in N. Greece. The typical water quality parameters serving as the ANN’s inputs are pH, dissolved oxygen, water temperature, phosphorus, nitrogen, electric conductivity, and Secchi disk depth. The available data set was small, containing only 89 data samples. For that reason, k-fold cross validation was used for training the ANN. To find the optimal k value for the k-fold cross validation, several values of k were tested (ranging from 3 to 30). Additionally, the leave-one-out (LOO) cross validation, which is an extreme case of the k-fold cross validation, was also applied. The ANN’s performance indices showed a clear trend to be improved as the k number was increased, while the best results were calculated for the LOO cross validation as expected. The computational times were calculated for each k value, where it was found the computational time is relatively low when applying the more expensive LOO cross validation; therefore, the LOO is recommended. Finally, a sensitivity analysis was examined using the ANN to investigate the interactions of the input parameters with the Chlorophyll-a, and hence examining the potential use of the ANN as a water management tool for nutrient control.

Download Full-text

Automatic Sleep Staging Algorithm Based on Time Attention Mechanism

Frontiers in Human Neuroscience ◽

10.3389/fnhum.2021.692054 ◽

2021 ◽

Vol 15 ◽

Author(s):

Li-Xiao Feng ◽

Xin Li ◽

Hong-Yu Wang ◽

Wen-Yin Zheng ◽

Yong-Qing Zhang ◽

...

Keyword(s):

Cross Validation ◽

Conditional Random Field ◽

Recognition Rate ◽

Performance Comparison ◽

Attention Mechanism ◽

Sleep Stages ◽

Sleep Staging ◽

Time Frequency ◽

Proposed Model ◽

Fold Cross Validation

The most important part of sleep quality assessment is the automatic classification of sleep stages. Sleep staging is helpful in the diagnosis of sleep-related diseases. This study proposes an automatic sleep staging algorithm based on the time attention mechanism. Time-frequency and non-linear features are extracted from the physiological signals of six channels and then normalized. The time attention mechanism combined with the two-way bi-directional gated recurrent unit (GRU) was used to reduce computing resources and time costs, and the conditional random field (CRF) was used to obtain information between tags. After five-fold cross-validation on the Sleep-EDF dataset, the values of accuracy, WF1, and Kappa were 0.9218, 0.9177, and 0.8751, respectively. After five-fold cross-validation on the our own dataset, the values of accuracy, WF1, and Kappa were 0.9006, 0.8991, and 0.8664, respectively, which is better than the result of the latest algorithm. In the study of sleep staging, the recognition rate of the N1 stage was low, and the imbalance has always been a problem. Therefore, this study introduces a type of balancing strategy. By adopting the proposed strategy, SEN-N1 and ACC of 0.7 and 0.86, respectively, can be achieved. The experimental results show that compared to the latest method, the proposed model can achieve significantly better performance and significantly improve the recognition rate of the N1 period. The performance comparison of different channels shows that even when the EEG channel was not used, considerable accuracy can be obtained.

Download Full-text

Application Development of Student's Graduation Classification Model based on The First 2 Years Performance using K-Nearest Neighbor

10.31227/osf.io/ftwre ◽

2018 ◽

Author(s):

Purwono Prasetyawan ◽

Muhammad Faridz Abadi

Keyword(s):

Cross Validation ◽

Nearest Neighbor ◽

Educational Institution ◽

Training Data ◽

Classification Model ◽

K Nearest Neighbor ◽

Application Development ◽

K Value ◽

The Status ◽

Fold Cross Validation

A College keeps a lot of data such as, academic data, administration, student biodata and others. The existing student data has not been fully utilized. In the student education system is an important asset for an educational institution and for that it is necessary to note the graduation rate of students on time. Differences in the ability of students to complete the study on time required the monitoring and evaluation, so that it can find new information or knowledge to make decisions. The purpose of this study, to know the relationship between IP variables Semester 1, IP Semester 2, IP Semester 3, IP Semester 4, Gender, Student Status on Student Study Duration using k-nearest neighbor algorithm. The result of this research in the classification of students' graduation using the knn algorithm based on student status, gender, ip semester 1 - ip semester 4 with k-fold cross validation in can mean value of K1 accuracy 88%, K3 accuracy 88.67%, K5 accuracy of 93.78%, K7 86% accuracy, K9 accuracy 86.22%, K11 accuracy 92.44%, K13 accuracy 89.55%, K15 accuracy 93.78%, K17 accuracy 99.78%, and K19 accuracy 100 %. Of the 500 training data in the status of 188 students, 312 students, the status of students work longer in completing the lecture and in the gender of 290 men, 210 women, then women longer in finishing college. Finding the optimal k value using k-fold cross validation. The result of accuracy using k-fold cross validation is K19 with 100% accuracy.

Download Full-text

Combination of Support Vector Machine and K-Fold cross-validation for prediction of long-term degradation of the compressive strength of marine concrete

International Journal of Computational Physics Series ◽

10.29167/a1i1p120-130 ◽

2018 ◽

Vol 1 (1) ◽

pp. 120-130 ◽

Cited By ~ 1

Author(s):

Chunxiang Qian ◽

Wence Kang ◽

Hao Ling ◽

Hua Dong ◽

Chengyao Liang ◽

...

Keyword(s):

Support Vector Machine ◽

Environmental Factors ◽

Cross Validation ◽

Concrete Strength ◽

Simulation Method ◽

Support Vector ◽

Svm Model ◽

Artificial Neural Network Ann ◽

Influence Degree ◽

Fold Cross Validation

Support Vector Machine (SVM) model optimized by K-Fold cross-validation was built to predict and evaluate the degradation of concrete strength in a complicated marine environment. Meanwhile, several mathematical models, such as Artificial Neural Network (ANN) and Decision Tree (DT), were also built and compared with SVM to determine which one could make the most accurate predictions. The material factors and environmental factors that influence the results were considered. The materials factors mainly involved the original concrete strength, the amount of cement replaced by fly ash and slag. The environmental factors consisted of the concentration of Mg2+, SO42-, Cl-, temperature and exposing time. It was concluded from the prediction results that the optimized SVM model appeared to perform better than other models in predicting the concrete strength. Based on SVM model, a simulation method of variables limitation was used to determine the sensitivity of various factors and the influence degree of these factors on the degradation of concrete strength.

Download Full-text

Rancang Bangun Sistem Informasi Untuk Menentukan Kapabilitas Konsumen Dalam Mengambil Pinjaman KPR

Jurnal ULTIMA InfoSys ◽

10.31937/si.v7i2.543 ◽

2016 ◽

Vol 7 (2) ◽

pp. 75-80

Author(s):

Adhi Kusnadi ◽

Risyad Ananda Putra

Keyword(s):

Data Mining ◽

Low Income ◽

Cross Validation ◽

Classification Tree ◽

Large Population ◽

Housing Development ◽

Good Precision ◽

Index Terms ◽

The Government ◽

Fold Cross Validation

Indonesia is one country that has a relatively large population . The government in the period of 5 years, annually hold a procurement program 1 million FLPP house units. This program is held in an effort to provide a decent home for low income people. FLPP housing development requires good precision and speed of development on the part of the developer, this is often hampered by the bank process, because it is difficult to predict the results and speed of data processing in the bank. Knowing the ability of consumers to get subsidized credit, has many advantages, among others, developers can plan a better cash flow, and developers can replace consumers who will be rejected before entering the bank process. For that reason built a system that can help developers. There are many methods that can be used to create this application. One of them is data mining with Classification tree. The results of 10-fold-cross-validation applications have an accuracy of 92%. Index Terms-Data Mining, Classification Tree, Housing, FLPP, 10-fold-cross Validation, Consumer Capability

Download Full-text

Klasifikasi Berita Kriminal Menggunakan NaÃ¯ve Bayes Classifier (NBC) dengan Pengujian K-Fold Cross Validation

Jurnal Sains dan Informatika ◽

10.34128/jsi.v5i2.177 ◽

2019 ◽

Vol 5 (2) ◽

pp. 108-117

Author(s):

Herfia Rhomadhona ◽

Jaka Permadi

Keyword(s):

Cross Validation ◽

Online Media ◽

Bayes Classifier ◽

Ve Bayes ◽

Fold Cross Validation

Berita kriminalitas merupakan berita yang selalu menjadi trending topik di setiap media massa, khususnya media massa online. Media massa online terlah menyediakan beberapa fasilitas untuk mempermudah masyarakan dalam mencari sebuah berita berdasarkan topik. Media massa online melabeli suatu berita berdasarkan kategorinya. Namun, media massa online tidak memberikan sub kategori pada berita tersebut. Sebagai contoh jika seorang pengguna membuka kategori kriminal, maka yang ditampilkan adalah semua jenis berita kriminal tanpa memberikan informasi yang spesifik dari jenis kriminalitasnya. Permasalahan tersebut dapat diatasi dengan mengklasifikasikan berita kriminalitas berdasarkan subkategori. Penelitian ini menggunakan metode NaÃ¯ve Bayes Classifier (NBC) untuk mengklasifikasi berita berdasarkan sub kategorinya. Adapun subkategori terbagi kedalam 5 kategori yaitu korupsi, narkoba, pencurian, pemerkosaan dan pembunuhan. Penelitian ini bertujuan untuk mengetahui kemampuan NBC dalam mengklasifikasi berita dengan melakukan pengujian menggunakan teknik K-Fold Cross Validation dengan nilai K dari 3 sampai 10. Hasil pengujian menyatakan bahwa NBC memiliki kemampuan dalam klasifikasi berita kriminal dengan nilai precision sebesar 98,53 %, nilai recall sebesar 98,44 % dan nilai accuracy sebesar 99,38 %.

Download Full-text

Prediction of K562 Cells Functional Inhibitors Based on Machine Learning Approaches

Current Pharmaceutical Design ◽

10.2174/1381612825666191107092214 ◽

2020 ◽

Vol 25 (40) ◽

pp. 4296-4302 ◽

Cited By ~ 2

Author(s):

Yuan Zhang ◽

Zhenyan Han ◽

Qian Gao ◽

Xiaoyi Bai ◽

Chi Zhang ◽

...

Keyword(s):

Machine Learning ◽

Inclusion Bodies ◽

Cross Validation ◽

Independent Set ◽

K562 Cells ◽

Machine Learning Algorithms ◽

Learning Approaches ◽

Validation Test ◽

Excess Number ◽

Fold Cross Validation

Background: β thalassemia is a common monogenic genetic disease that is very harmful to human health. The disease arises is due to the deletion of or defects in β-globin, which reduces synthesis of the β-globin chain, resulting in a relatively excess number of α-chains. The formation of inclusion bodies deposited on the cell membrane causes a decrease in the ability of red blood cells to deform and a group of hereditary haemolytic diseases caused by massive destruction in the spleen. Methods: In this work, machine learning algorithms were employed to build a prediction model for inhibitors against K562 based on 117 inhibitors and 190 non-inhibitors. Results: The overall accuracy (ACC) of a 10-fold cross-validation test and an independent set test using Adaboost were 83.1% and 78.0%, respectively, surpassing Bayes Net, Random Forest, Random Tree, C4.5, SVM, KNN and Bagging. Conclusion: This study indicated that Adaboost could be applied to build a learning model in the prediction of inhibitors against K526 cells.

Download Full-text