A Method of Classification Performance Improvement Via a Strategy of Clustering-Based Data Elimination Integrated with k-Fold Cross-Validation

Author(s):  
Onur Inan ◽  
Mustafa Serter Uzer
2019 ◽  
pp. 495-501
Author(s):  
Balasaheb Tarle ◽  
Muddana Akkalaksmi

In medical data classification, if the size of data sets is small and if it contains multiple missing attribute values, in such cases improving classification performance is an important issue. The foremost objective of machine learning research is to improve the classification performance of the classifiers. The number of training instances provided for training must be sufficient in size. In the proposed algorithm, we substitute missing attribute values with attribute available domain values and generate additional training tuples that are in addition to original training tuples. These additional, plus original training samples provide sufficient data samples for learning. The neuro-fuzzy classifier trained on this dataset. The classification performance on test data for the neuro-fuzzy classifier is obtained using the k-fold cross-validation method. The proposed method attains around 2.8% and 3.61% improvement in classification accuracy for this classifier.


2019 ◽  
Vol 1 (2) ◽  
pp. 23-35
Author(s):  
Dwi Normawati ◽  
Dewi Pramudi Ismi

Coronary heart disease is a disease that often causes human death, occurs when there is atherosclerosis blocking blood flow to the heart muscle in the coronary arteries. The doctor's referral method for diagnosing coronary heart disease is coronary angiography, but it is invasive, high risk and expensive. The purpose of this study is to analyze the effect of implementing the k-Fold Cross Validation (CV) dataset on the rule-based feature selection to diagnose coronary heart disease, using the Cleveland heart disease dataset. The research conducted a feature selection using a medical expert-based (MFS) and computer-based method, namely the Variable Precision Rough Set (VPRS), which is the development of the Rough Set theory. Evaluation of classification performance using the k-Fold method of 10-Fold, 5-Fold and 3-Fold. The results of the study are the number of attributes of the feature selection results are different in each Fold, both for the VPRS and MFS methods, for accuracy values obtained from the average accuracy resulting from 10-Fold, 5-Fold and 3-Fold. The result was the highest accuracy value in the VPRS method 76.34% with k = 5, while the MTF accuracy was 71.281% with k = 3. So, the k-fold implementation for this case is less effective, because the division of data is still structured, according to the order of records that apply in each fold, while the amount of testing data is too small and too structured. This affects the results of the accuracy because the testing rules are not thoroughly represented


Author(s):  
Gokhan Altan ◽  
Yakup Kutlu ◽  
Adnan Ozhan Pekmezci ◽  
Serkan Nural

Lung auscultation is the most effective and indispensable method for diagnosing various respiratory disorders by using the sounds from the airways during inspirium and exhalation using a stethoscope. In this study, the statistical features are calculated from intrinsic mode functions that are extracted by applying the Hilbert-Huang Transform to the lung sounds from 12 different auscultation regions on the chest and back. The classification of the lung sounds from asthma and healthy subjects is performed using Deep Belief Networks (DBN). The DBN classifier model with two hidden layers has been tested using 5-fold cross validation method. The proposed DBN separated lung sounds from asthmatic and healthy subjects with high classification performance rates of 84.61%, 85.83%, and 77.11% for overall accuracy, sensitivity, and selectivity, respectively using frequencytime analysis.


2017 ◽  
Vol 42 (2) ◽  
pp. 223-233 ◽  
Author(s):  
Natasa Reljin ◽  
David Pokrajac

Abstract In this paper, we investigated the possibility to classify different performers playing the same melodies at the same manner being subjectively quite similar and very difficult to distinguish even for musically skilled persons. For resolving this problem we propose the use of multifractal (MF) analysis, which is proven as an efficient method for describing and quantifying complex natural structures, phenomena or signals. We found experimentally that parameters associated to some characteristic points within the MF spectrum can be used as music descriptors, thus permitting accurate discrimination of music performers. Our approach is tested on the dataset containing the same songs performed by music group ABBA and by actors in the movie Mamma Mia. As a classifier we used the support vector machines and the classification performance was evaluated by using the four-fold cross-validation. The results of proposed method were compared with those obtained using mel-frequency cepstral coefficients (MFCCs) as descriptors. For the considered two-class problem, the overall accuracy and F-measure higher than 98% are obtained with the MF descriptors, which was considerably better than by using the MFCC descriptors when the best results were less than 77%.


2021 ◽  
Author(s):  
Chao Cong ◽  
Yoko Kato ◽  
Henrique D. Vasconcellos ◽  
Mohammad R. Ostovaneh ◽  
Joao A.C. Lima ◽  
...  

AbstractBackgroundAutomatic coronary angiography (CAG) assessment may help in faster screening and diagnosis of patients. Current CNN-based vessel-segmentation suffers from sampling imbalance, candidate frame selection, and overfitting; few have shown adequate performance for CAG stenosis classification. We aimed to provide an end-to-end workflow that may solve these problems.MethodsA deep learning-based end-to-end workflow was employed as follows: 1) Candidate frame selection from CAG videograms with CNN+LSTM network, 2) Stenosis classification with Inception-v3 using 2 or 3 categories (<25%, >25%, and/or total occlusion) with and without redundancy training, and 3) Stenosis localization with two methods of class activation map (CAM) and anchor-based feature pyramid network (FPN). Overall 13744 frames from 230 studies were used for the stenosis classification training and 4-fold cross-validation for image-, artery-, and per-patient-level. For the stenosis localization training and 4-fold cross-validation, 690 images with >25% stenosis were used.ResultsOur model achieved an accuracy of 0.85, sensitivity of 0.96, and AUC of 0.86 in per-patient level stenosis classification. Redundancy training was effective to improve classification performance. Stenosis position localization was adequate with better quantitative results in anchor-based FPN model, achieving global-sensitivity for LCA and RCA of 0.68 and 0.70 with mean square error (MSE) values of 39.3 and 37.6 pixels respectively, in the 520 × 520 pixel image.ConclusionA fully-automatic end-to-end deep learning-based workflow that eliminates the vessel extraction and segmentation step was feasible in coronary artery stenosis classification and localization on CAG images.Key PointsThe fully-automatic, end-to-end workflow which eliminated the vessel extraction and segmentation step for supervised-learning was feasible in the stenosis classification on CAG images, achieving an accuracy of 0.85, sensitivity of 0.96, and AUC of 0.86 in per-patient level.The redundancy training improved the AUC values, accuracy, F1-score, and kappa score of the stenosis classification.Stenosis position localization was assessed in two methods of CAM-based and anchor-based models, which performance was acceptable with better quantitative results in anchor-based models.Summary StatementA fully-automatic end-to-end deep learning-based workflow which eliminated the vessel extraction and segmentation step was feasible in the stenosis classification and localization on CAG images. The redundancy training improved the stenosis classification performance.


Gene ◽  
2015 ◽  
Vol 569 (1) ◽  
pp. 21-26 ◽  
Author(s):  
Hui Zhang ◽  
Sheng Yang ◽  
Li Guo ◽  
Yang Zhao ◽  
Fang Shao ◽  
...  

2021 ◽  
Vol 8 (6) ◽  
pp. 1287
Author(s):  
Adam Syarif Hidayatullah ◽  
Fitra Abdurrachman Bachtiar ◽  
Imam Cholissodin

<p class="Abstrak">Keberhasilan sebuah perusahaan terjadi karena dapat mengelola sumber daya manusianya dengan baik begitu juga sebaliknya. Salah satu instansi yang mengelola sumber daya manusia menggunakan Manajemen Talenta adalah Badan Kepegawaian Daerah (BKD) kota Malang, dengan mengevaluasi pegawainya setiap tahunnya setelah pekerjaan selesai dilakukan. Hal ini menyebabkan hasil pekerjaan yang telah dilakukan tidak optimal, sehingga perlu identifikasi dini pegawai yang memiliki kinerja dibawah rata – rata sehingga dapat dievaluasi dan meminimalisir hasil pekerjaan yang tidak optimal dengan menggunakan teknik klasifikasi. Penelitian ini menggunakan teknik klasifikasi <em>Nearest Centroid Neighbor Classifier Based on K Local Means Using Harmonic Mean Distance</em> (LMKHNCN). Metode ini merupakan metode modifikasi dari metode <em>K-Nearest Neighbor</em> (KNN) dan dibuktikan memiliki performa lebih baik dibandingkan dengan metode aslinya KNN. Dilakukan pengujian <em>F1-Score</em> dan akurasi menggunakan <em>K-Fold Cross Validation</em> untuk mengetahui persebaran akurasi dan juga pengujian mengenai pengaruh normalisasi karena tidak ada informasi normalisasi pada penelitian sebelumnya. Metode pada kasus ini menghasilkan performa klasifikasi yang baik, dibuktikan bahwa hasil akurasi dan <em>F1-Score</em> oleh metode ini berturut – turut ialah mencapai 98,8% dan 98,1%.</p><p class="Abstrak"> </p><p class="Judul2"><strong><em>Abstract</em></strong></p><p><em>The success of company occurs because is manage human resources well and vice versa. One of institute that mange human resource using Talent Management is Malang city Badan Kepegawaian Daerah (BKD), which evaluates its employee annually after the work is completed. This can cause not optimal work result, so it necessary to early identification of employees who have performance below average performance so that can be evaluated and minimize not optimal result. This study is use classification technique Nearest Centroid Neighbor Classifier Based on K Local Means Using Harmonic Mean Distance (LMKHNCN). This method is modified base algorithm of K-Nearest Neighbor (KNN). F1-Score and Accuracy using K-Fold Cross Validation to measure performance of this method and normalization testing due to no any information about that in previous study. This method is proven to have better performance compared to it original algorithm KNN. The method in this study has produced good classification performance. The result of classification accuracy and F1-Score by this method reach </em><em>98,8% dan 98,1%</em>.</p>


2019 ◽  
Vol 8 (3) ◽  
pp. 366-376
Author(s):  
Annisa Sugesti ◽  
Moch. Abdul Mukid ◽  
Tarno Tarno

Credit feasibility analysis is important for lenders to avoid the risk among the increasement of credit applications. This analysis can be carried out by the classification technique. Classification technique used in this research is instance-based classification. These techniques tend to be simple, but are very dependent on the determination of  K values. K is number of nearest neighbor considered for class classification of new data. A small value of K is very sensitive to outliers. This weakness can be overcome using an algorithm that is able to handle outliers, one of them is Mutual K-Nearest Neighbor (MKNN). MKNN removes outliers first, then predicts new observation classes based on the majority class of their mutual nearest neighbors. The algorithm will be compared with KNN without outliers. The model is evaluated by 10-fold cross validation and the classification performance is measured by Gemoetric-Mean of sensitivity and specificity. Based on the analysis the optimal value of K is 9 for MKNN and 3 for KNN, with the highest G-Mean produced by KNN is equal to 0.718, meanwhile G-Mean produced by MKNN is 0.702. The best alternative to classifying credit feasibility in this study is K-Nearest Neighbor (KNN) algorithm with K=3.Keywords: Classification, Credit, MKNN, KNN, G-Mean.


2018 ◽  
Vol 1 (1) ◽  
pp. 120-130 ◽  
Author(s):  
Chunxiang Qian ◽  
Wence Kang ◽  
Hao Ling ◽  
Hua Dong ◽  
Chengyao Liang ◽  
...  

Support Vector Machine (SVM) model optimized by K-Fold cross-validation was built to predict and evaluate the degradation of concrete strength in a complicated marine environment. Meanwhile, several mathematical models, such as Artificial Neural Network (ANN) and Decision Tree (DT), were also built and compared with SVM to determine which one could make the most accurate predictions. The material factors and environmental factors that influence the results were considered. The materials factors mainly involved the original concrete strength, the amount of cement replaced by fly ash and slag. The environmental factors consisted of the concentration of Mg2+, SO42-, Cl-, temperature and exposing time. It was concluded from the prediction results that the optimized SVM model appeared to perform better than other models in predicting the concrete strength. Based on SVM model, a simulation method of variables limitation was used to determine the sensitivity of various factors and the influence degree of these factors on the degradation of concrete strength.


Sign in / Sign up

Export Citation Format

Share Document