scholarly journals K-Fold Cross Validation for Selection of Cardiovascular Disease Diagnosis Features by Applying Rule-Based Datamining

2019 ◽  
Vol 1 (2) ◽  
pp. 23-35
Author(s):  
Dwi Normawati ◽  
Dewi Pramudi Ismi

Coronary heart disease is a disease that often causes human death, occurs when there is atherosclerosis blocking blood flow to the heart muscle in the coronary arteries. The doctor's referral method for diagnosing coronary heart disease is coronary angiography, but it is invasive, high risk and expensive. The purpose of this study is to analyze the effect of implementing the k-Fold Cross Validation (CV) dataset on the rule-based feature selection to diagnose coronary heart disease, using the Cleveland heart disease dataset. The research conducted a feature selection using a medical expert-based (MFS) and computer-based method, namely the Variable Precision Rough Set (VPRS), which is the development of the Rough Set theory. Evaluation of classification performance using the k-Fold method of 10-Fold, 5-Fold and 3-Fold. The results of the study are the number of attributes of the feature selection results are different in each Fold, both for the VPRS and MFS methods, for accuracy values obtained from the average accuracy resulting from 10-Fold, 5-Fold and 3-Fold. The result was the highest accuracy value in the VPRS method 76.34% with k = 5, while the MTF accuracy was 71.281% with k = 3. So, the k-fold implementation for this case is less effective, because the division of data is still structured, according to the order of records that apply in each fold, while the amount of testing data is too small and too structured. This affects the results of the accuracy because the testing rules are not thoroughly represented

2017 ◽  
Vol 26 (2) ◽  
pp. 15-26
Author(s):  
Eman S. Al-Shamery ◽  
Ali A.Rahoomi Al-Obaidi

In this paper a new approach of rough set features selection has been proposed. Feature selection has been used for several reasons a) decrease time of prediction b) feature possibly is not found c) present of feature case bad prediction. Rough set has been used to select most significant features. The proposed rough set has been applied on heart diseases data sets. The main problem is how to predict patient has heart disease or not depend on given features. The problem is challenge, because it cannot determine decision directly .Rough set has been modified to get attributes for prediction by ignored unnecessary and bad features. Bayes net has been used for classified method. 10-fold cross validation is used for evaluation. The Correct Classified Instances were 82.17, 83.49, and 74.58 when use full, 12, 7 length of attributes respectively. Traditional rough set has been applied, the minimum Correct Classified Instances were 58.41 and 81.51 when use 2 length of attributes respectively


2014 ◽  
Vol 5 (3) ◽  
pp. 78-101 ◽  
Author(s):  
Walid Moudani ◽  
Mohamad Hussein ◽  
Mariam abdelRazzak ◽  
Félix Mora-Camino

The health industry collects huge amounts of health data which, unfortunately, are not mined to discover hidden information. However, there is a lack of effective analytical tools to discover hidden relationships and trends in data. Information technologies can provide alternative approaches to the diagnosis of the heart attach disease. In this study, a proficient methodology for the extraction of significant patterns from the Coronary Heart Disease warehouses for heart attack prediction, which unfortunately continues to be a leading cause of mortality in the whole world, has been presented. For this purpose, we propose to develop an innovative fuzzy classification solution approach based on dynamic reduced sets of potential risk factors using the promising Rough Set theory which is a new mathematical approach to data analysis based on classification of objects. Therefore, we propose to validate the classification using Multi-classifier decision tree to identify the risky heart disease cases. This work is based on a dataset collected from several clinical institutions based on the medical profile of patient. Moreover, the experts' knowledge in this field has been taken into consideration in order to define the disease, its risk factors, to follow up the issue results, and to establish significant knowledge relationships between medical factors related to Coronary Heart Disease. To identify cases of heart attack, experiments of several classification techniques have been performed leading to rank the suitable techniques. The reduction of potential risk factors contributes to enumerate dynamically one or more optimal subsets of the potential risk factors of high interest which implicitly leads to reduce the complexity of the classification problems while maintaining the prediction classification quality. The performance of the proposed model is analyzed and evaluated based on set of benchmark techniques applied in this classification problem.


Author(s):  
Wiharto Wiharto ◽  
Hari Kusnanto ◽  
Herianto Herianto

<span lang="EN-US">Improved system performance diagnosis of coronary heart disease becomes an important topic in research for several decades. One improvement would be done by features selection, so only the attributes that influence is used in the diagnosis system using data mining algorithms. Unfortunately, the most feature selection is done with the assumption has provided all the necessary attributes, regardless of the stage of obtaining the attribute, and cost required. This research proposes a hybrid model system</span><span> for</span><span lang="EN-US"> diagnosis of coronary heart disease. System diagnosis preceded the feature selection process, using tiered multivariate analysis. The analytical method used is logistic regression. The next stage, the classification by using multi-layer perceptron neural network. Based on test results, system performance proposed value</span><span> for</span><span lang="EN-US"> accuracy 86.3%, sensitivity 84.80%, specificity 88.20%, positive prediction value (PPV) 90.03%, negative prediction value (NPV) 81.80%</span><span>, accuracy 86,30% </span><span lang="EN-US"> and area under the curve (AUC) of 92.1%. The performance of a diagnosis using a combination attributes of risk factors,</span><span lang="EN-US">symptoms and exercise ECG. The conclusion that can be drawn</span><span> is</span><span lang="EN-US"> that the proposed diagnosis system capable of delivering performance in the </span><span>very good</span><span lang="EN-US"> category, with a number of attributes that are not a lot of checks and a relatively low cost</span><span>.</span>


Author(s):  
BING XUE ◽  
LIAM CERVANTE ◽  
LIN SHANG ◽  
WILL N. BROWNE ◽  
MENGJIE ZHANG

Feature selection is a multi-objective problem, where the two main objectives are to maximize the classification accuracy and minimize the number of features. However, most of the existing algorithms belong to single objective, wrapper approaches. In this work, we investigate the use of binary particle swarm optimization (BPSO) and probabilistic rough set (PRS) for multi-objective feature selection. We use PRS to propose a new measure for the number of features based on which a new filter based single objective algorithm (PSOPRSE) is developed. Then a new filter-based multi-objective algorithm (MORSE) is proposed, which aims to maximize a measure for the classification performance and minimize the new measure for the number of features. MORSE is examined and compared with PSOPRSE, two existing PSO-based single objective algorithms, two traditional methods, and the only existing BPSO and PRS-based multi-objective algorithm (MORSN). Experiments have been conducted on six commonly used discrete datasets with a relative small number of features and six continuous datasets with a large number of features. The classification performance of the selected feature subsets are evaluated by three classification algorithms (decision trees, Naïve Bayes, and k-nearest neighbors). The results show that the proposed algorithms can automatically select a smaller number of features and achieve similar or better classification performance than using all features. PSOPRSE achieves better performance than the other two PSO-based single objective algorithms and the two traditional methods. MORSN and MORSE outperform all these five single objective algorithms in terms of both the classification performance and the number of features. MORSE achieves better classification performance than MORSN. These filter algorithms are general to the three different classification algorithms.


Entropy ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. 704
Author(s):  
Jiucheng Xu ◽  
Kanglin Qu ◽  
Meng Yuan ◽  
Jie Yang

Feature selection is one of the core contents of rough set theory and application. Since the reduction ability and classification performance of many feature selection algorithms based on rough set theory and its extensions are not ideal, this paper proposes a feature selection algorithm that combines the information theory view and algebraic view in the neighborhood decision system. First, the neighborhood relationship in the neighborhood rough set model is used to retain the classification information of continuous data, to study some uncertainty measures of neighborhood information entropy. Second, to fully reflect the decision ability and classification performance of the neighborhood system, the neighborhood credibility and neighborhood coverage are defined and introduced into the neighborhood joint entropy. Third, a feature selection algorithm based on neighborhood joint entropy is designed, which improves the disadvantage that most feature selection algorithms only consider information theory definition or algebraic definition. Finally, experiments and statistical analyses on nine data sets prove that the algorithm can effectively select the optimal feature subset, and the selection result can maintain or improve the classification performance of the data set.


Symmetry ◽  
2019 ◽  
Vol 11 (12) ◽  
pp. 1470
Author(s):  
Guobao Zhao ◽  
Haiying Wang ◽  
Deli Jia ◽  
Quanbin Wang

Considering the crucial influence of feature selection on data classification accuracy, a grey wolf optimizer based on quantum computing and uncertain symmetry rough set (QCGWORS) was proposed. QCGWORS was to apply a parallel of three theories to feature selection, and each of them owned the unique advantages of optimizing feature selection algorithm. Quantum computing had a good balance ability when exploring feature sets between global and local searches. Grey wolf optimizer could effectively explore all possible feature subsets, and uncertain symmetry rough set theory could accurately evaluate the correlation of potential feature subsets. QCGWORS intelligent algorithm could minimize the number of features while maximizing classification performance. In the experimental stage, k nearest neighbors (KNN) classifier and random forest (RF) classifier guided the machine learning process of the proposed algorithm, and 13 datasets were compared for testing experiments. Experimental results showed that compared with other feature selection methods, QCGWORS improved the classification accuracy on 12 datasets, among which the best accuracy was increased by 20.91%. In attribute reduction, each dataset had a benefit of the reduction effect of the minimum feature number.


Author(s):  
Abdulraheem Abdul ◽  
Rafiu M. Isiaka ◽  
Ronke S. Babatunde ◽  
Jumoke F. Ajao

Aims: This work aim is to develop an enhanced predictive system for Coronary Heart Disease (CHD). Study Design: Synthetic Minority Oversampling Technique and Random Forest. Methodology: The Framingham heart disease dataset was used, which was collected from a study in Framingham, Massachusetts, the data was cleaned, normalized, rebalanced. Classifiers such as random forest, artificial neural network, naïve bayes, logistic regression, k-nearest neighbor and support vector machine were used for classification. Results: Random Forest outperformed other classifiers with an accuracy of 98%, a sensitivity of 99% and a precision of 95.8%. Feature selection was employed for better classification, but  no significant improvement was recorded on the performance of the classifier with feature selection. Train test split also performed better that cross validation. Conclusion: Random Forest is recommended for research in Coronary Heart Disease prediction domain.


2020 ◽  
pp. 5-10
Author(s):  
O. M. Korzh

Among the cardiovascular diseases associated with atherosclerosis, chronic coronary heart disease, including angina, is the most common form. It is the myocardium lesion that develops as a result of an imbalance between the coronary circulation and metabolic needs of heart muscle. The presence of angina symptoms often indicates a pronounced narrowing of one or more coronary arteries, but also occurs in non−obstructive arterial impairment and even in normal coronary arteries. Factors of functional damage to the coronary arteries are spasm, temporary platelet aggregation and intravascular thrombosis. Today there are opportunities not only to use the therapy with proven effectiveness, aimed at reducing the risk of complications, including fatal, but also to treat angina (ischemia), which improves the patient's life quality. The drug protocol includes the ones with a proven positive effect on this disease prognosis, which are mandatory if there are no direct contraindications to use, as well as a large group of antianginal or anti−ischemic drugs. The choice of a particular drug or its combinations with other drugs is carried out in accordance with generally accepted recommendations: taking into account the individual approach, the severity of angina, hemodynamic parameters (heart rate and blood pressure, presence of comorbid conditions). If drug therapy is ineffective, the option of coronary myocardial revascularization (percutaneous coronary angioplasty or coronary artery bypass grafting) is considered. Due to the high mortality and morbidity rates of coronary heart disease worldwide, one of the priorities of practical health care is the prevention of diseases caused by atherosclerosis. Key words: coronary heart disease, angina, family physician, prognosis, drug therapy.


Sign in / Sign up

Export Citation Format

Share Document