scholarly journals Implementasi Greedy Forward Selection untuk Prediksi Metode Penyakit Kutil Menggunakan Decision Tree

2020 ◽  
Vol 9 (1) ◽  
Author(s):  
Fitriyani Fitriyani ◽  
Toni Arifin
2011 ◽  
Vol 1 (2) ◽  
Author(s):  
Nima Salehi-Moghaddami ◽  
Hadi Yazdi ◽  
Hanieh Poostchi

AbstractOne of the most commonly used predictive models in classification is the decision tree (DT). The task of a DT is to map observations to target values. In the DT, each branch represents a rule. A rule’s consequent is the leaf of the branch and its antecedent is the conjunction of the features. Most applied algorithms in this field use the concept of Information Entropy and Gini Index as the splitting criterion when building a tree. In this paper, a new splitting criterion to build DTs is proposed. A splitting criterion specifies the tree’s best splitting variable as well as the variable’s threshold for further splitting. Using the idea from classical Forward Selection method and its enhanced versions, the variable having the largest absolute correlation with the target value is chosen as the best splitting variable at each node. Then, the idea of maximizing the margin between classes in a support vector machine (SVM) is used to find the best classification threshold on the selected variable. This procedure will execute recursively at each node, until reaching the leaf nodes. The final decision tree has a shorter height than previous methods, which effectively reduces useless variables and the time needed for classification of future data. Unclassified regions are also generated under the proposed method, which can be interpreted as an advantage or disadvantage. The simulation results demonstrate an improvement in the generated decision tree compared to previous methods.


Sensors ◽  
2021 ◽  
Vol 21 (19) ◽  
pp. 6682
Author(s):  
Zubaer Md. Abdullah Al ◽  
Keshav Thapa ◽  
Sung-Hyun Yang

R peak detection is crucial in electrocardiogram (ECG) signal analysis to detect and diagnose cardiovascular diseases (CVDs). Herein, the dynamic mode selected energy (DMSE) and adaptive window sizing (AWS) algorithm are proposed for detecting R peaks with better efficiency. The DMSE algorithm adaptively separates the QRS components and all non-objective components from the ECG signal. Based on local peaks in QRS components, the AWS algorithm adaptively determines the Region of Interest (ROI). The Feature Extraction process computes the statistical properties of energy, frequency, and noise from each ROI. The Sequential Forward Selection (SFS) procedure is used to find the best subsets of features. Based on these characteristics, an ensemble of decision tree algorithms detects the R peaks. Finally, the R peak position on the initial ECG signal is adjusted using the R location correction (RLC) algorithm. The proposed method has an experimental accuracy of 99.94%, a sensitivity of 99.98%, positive predictability of 99.96%, and a detection error rate of 0.06%. Given the high efficiency in detection and fast processing speed, the proposed approach is ideal for intelligent medical and wearable devices in the diagnosis of CVDs.


2021 ◽  
Vol 11 (8) ◽  
pp. 3529
Author(s):  
Himer Avila-George ◽  
Miguel De-la-Torre ◽  
Wilson Castro ◽  
Danny Dominguez ◽  
Josué E. Turpo-Chaparro ◽  
...  

Computer-aided diagnosis is a research area of increasing interest in third-level pediatric hospital care. The effectiveness of surgical treatments improves with accurate and timely information, and machine learning techniques have been employed to assist practitioners in making decisions. In this context, the prediction of the discharge diagnosis of new incoming patients could make a difference for successful treatments and optimal resource use. In this paper, a computer-aided diagnosis system is proposed to provide statistical information on the discharge diagnosis of a new incoming patient, based on the historical records from previously treated patients. The proposed system was trained and tested using a dataset of 1196 records; the dataset was coded according to the International Classification of Diseases, version 10 (ICD10). Among the processing steps, relevant features for classification were selected using the sequential forward selection wrapper, and outliers were removed using the density-based spatial clustering of applications with noise. Ensembles of decision trees were trained with different strategies, and the highest classification accuracy was obtained with the extreme Gradient boosting algorithm. A 10-fold cross-validation strategy was employed for system evaluation, and performance comparison was performed in terms of accuracy and F-measure. Experimental results showed an average accuracy of 84.62%, and the resulting decision tree learned from the experience in samples allowed it to visualize suitable treatments related to the historical record of patients. According to computer simulations, the proposed classification approach using XGBoost provided higher classification performance than other ensemble approaches; the resulting decision tree can be employed to inform possible paths and risks according to previous experience learned by the system. Finally, the adaptive system may learn from new cases to increase decisions’ accuracy through incremental learning.


2020 ◽  
Vol 5 (2) ◽  
pp. 572
Author(s):  
Zuraida Khairudin ◽  
Nurfatin Adila Abdul Razak ◽  
Hezlin Aryani Abd Rahman ◽  
Norbaizura Kamaruddin ◽  
Nor Azimah Abd Aziz

Diabetic retinopathy is one of the leading causes of visual disability and blindness worldwide. It is estimated that 4.8% out of 37 million cases of blindness were due to diabetic retinopathy, globally. It affects patients suffering from prolonged diabetes, which probably results in permanent blindness. The earliest symptoms surfaced when the patients have vision problems. Therefore, regular eyes examination and early intervention normally controls this disease. Many studies for early intervention and prevention of diabetic retinopathy uses various predictive models. The booming of database and digital storage technology creates an abundance of health records. Thus, data mining techniques helps uncover meaningful patterns while attending to sensitivity health record issues. Hence, this study took the data mining approach in predicting the presence of diabetic retinopathy narrowing to only Type II diabetic patients as well as to determine the risk factors that contribute to the presence of diabetic retinopathy. The data mining models selected for this study is the Logistic Regression, Decision Tree and Artificial Neural Network. The dataset of 361 Type II diabetic patients from Ophthalmology Clinic, UiTM Medical Specialist Centre were selected between January 2014 to December 2018, consists of 17 variables. The result shows that the Logistic Regression using Forward selection method model is the best model it had the highest sensitivity (Sen=50.0%), specificity (Spe=79.03%) and accuracy rate (Acc=66.36%) on the validation dataset compared to other Logistic Regression selection options. Meanwhile among the Decision Tree models, DT using Gini is the best model. Logistic Regression (Forward) and Decision Tree (Gini) were then compared with Artificial Neural Network model (Sen=56.25%, Spe=70.97%, Acc=64.55%). The results demonstrated that Logistic Regression using Forward selection method was the best model to predict the presence of diabetic retinopathy among the Type II diabetic patients compared to other models. The significant risk factors associated with the presence of the diabetic retinopathy obtained are duration of diabetes, HbA1C level, diabetic foot ulcer, nephropathy, and neuropathy.


1986 ◽  
Vol 25 (04) ◽  
pp. 207-214 ◽  
Author(s):  
P. Glasziou

SummaryThe development of investigative strategies by decision analysis has been achieved by explicitly drawing the decision tree, either by hand or on computer. This paper discusses the feasibility of automatically generating and analysing decision trees from a description of the investigations and the treatment problem. The investigation of cholestatic jaundice is used to illustrate the technique.Methods to decrease the number of calculations required are presented. It is shown that this method makes practical the simultaneous study of at least half a dozen investigations. However, some new problems arise due to the possible complexity of the resulting optimal strategy. If protocol errors and delays due to testing are considered, simpler strategies become desirable. Generation and assessment of these simpler strategies are discussed with examples.


2018 ◽  
Vol 14 (2) ◽  
pp. 145
Author(s):  
Aji Sudibyo ◽  
Taufik Asra ◽  
Bakhtiar Rifai
Keyword(s):  

internet sangat biasa untuk sekarang ini, penggunaaan internetnya tak lepas dari penggunaan email, salah satu ancaman yang terjadi ketika menggunakan email adalah spam, spam  merupakan pesan atau email yang tidak diinginkan oleh penerimanya dan dikirimkan secara massa.        Penelitian tentang serangan spam didapat dari dataset spam sebanyak 4601 record yang terdiri 1813 record dianggap spam dan 278 data bukan spam dengan atribut awal sebanyak 57 atribute dengan 1 atribute class, pada ekperimen yang dilakukan menggunakan select attribute dengan decision tree menjadi 15 atribute dengan 1 atribute class dilakukan 3 percobaan pengujian dengan persentase atribute 30%, 50% dan 70% select atribute didapat hasil fitur select atribute sebesar 70% didapat hasil lebih baik dari 30% ataupun 50% dengan nilai accuracy sebesar 92.469%.


Sign in / Sign up

Export Citation Format

Share Document