Perbandingan Performansi Kinerja Algoritma Pengklasifikasian Terpandu Untuk Kasus Penyakit Kardiovaskular

One of the health problems that occur in Indonesia is the increasing number of NCD (Non-Communicable Disease) such as heart attack and cardiovascular disease. There are two factors that cause cardiovascular disease, i.e. factor that can be changed and cannot be changed. This study aim to analyze the best performance of several classification algorithms such as k-nearest neighbors algorithm (k-NN), stochastic gradient descent (SGD), random forest (RF), neural network (NN) and logistic regression (LR) in classifying cardiovascular based on factors that caused those diseases. There are two aspects that need to be examined, the performance of each algorithm which is evaluated using the Confusion matrix method with the parameters of accuracy, precision, recall and AUC (Area Under the Curve). The dataset uses 425.195 samples from result data of cardiovascular disease diagnosed. The testing mode uses percentage split and cross-validation technique. The experimental results show that the performance of NN algorithms produces the best prediction accuracy compared to other algorithms, which is accuracy of 89.60%, AUC of 0.873, precision of 0.877, and recall of 0.896 using percentage split and cross-validation testing mode using Orange. For the accuracy of 89.46%, AUC of 0.865, precision of 0.875, and recall of 0.895 using cross-validation testing mode using Weka. By KNIME, the result of accuracy value is 88.55%, AUC value is 0.768, precision value is 0.854, and recall value is 0.886 using cross-validation testing mode.

Download Full-text

Impression Classification of Endek (Balinese Fabric) Image Using K-Nearest Neighbors Method

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v3i3.611 ◽

2018 ◽

pp. 213-220 ◽

Cited By ~ 1

Author(s):

Gede Aditra Pradnyana ◽

I Komang Agus Suryantara ◽

I Gede Mahendra Darmawiguna

Keyword(s):

Cross Validation ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

K Value ◽

Training Samples ◽

And Training ◽

Validation Testing ◽

Fold Cross Validation ◽

Learning Data

An impression can be interpreted as a psychological feeling toward a product and it plays an important role in decision making. Therefore, the understanding of the data in the domain of impressions will be very useful. This research had the objective of knowing the performance of K-Nearest Neighbors method to classify endek image impression using K-Fold Cross Validation method. The images were taken from 3 locations, namely CV. Artha Dharma, Agung Bali Collection, and Pengrajin Sri Rejeki. To get the image impression was done by consulting with an endek expert named Dr. D.A Tirta Ray, M.Si. The process of data mining was done by using K-Nearest Neighbors Method which was a classification method to a set of data based on learning data that had been classified previously and to classify new objects based on attributes and training samples. K-Fold Cross Validation testing obtained accuracy of 91% with K value in K-Nearest Neighbors of 3, 4, 7, 8.

Download Full-text

Perbandingan Prediksi Kualitas Kopi Arabika dengan Menggunakan Algoritma SGD, Random Forest dan Naive Bayes

EDUMATIC Jurnal Pendidikan Informatika ◽

10.29408/edumatic.v4i2.2202 ◽

2020 ◽

Vol 4 (2) ◽

pp. 1-9

Author(s):

Veronica Sari ◽

◽

Feranandah Firdausi ◽

Yufis Azhar ◽

◽

...

Keyword(s):

Random Forest ◽

Gradient Descent ◽

Cross Validation ◽

Naive Bayes ◽

Area Under The Curve ◽

Naïve Bayes ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Quality Institute ◽

Fold Cross Validation

Classification is one of the techniques that exist in data mining and is useful for grouping a data based on the attachment of the data with the sample data. The dataset that is used in this study is the coffee dataset taken from Dataset Coffee Quality Institute on the GitHub platform. The attributes that contained in the dataset are Aroma, Aftertaste, Flavor, Acidity, Balance, Body, Uniformity, Sweetness, Clean Cup, and Copper points. There are 3 classification methods that are used in this study, Stochastic Gradient Descent, Random Forest and Naive Bayes. The aim of this study is to find out which algorithm is the most effective to predict the coffee quality in the dataset. After that, the prediction results will be tested using K-Fold Cross Validation and Area Under the Curve (AUC) method. The results show that Stochastic Gradient Descent obtained the best accuracy results compared to the other two methods with an accuracy of 98% and increased to 99% after tested using K-fold Cross Validation and AUC method.

Download Full-text

S-CCCapsule: Pneumonia detection in chest X-ray images using skip-connected convolutions and capsule neural network

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202638 ◽

2021 ◽

pp. 1-25

Author(s):

Kwabena Adu ◽

Yongbin Yu ◽

Jingye Cai ◽

Victor Dela Tattrah ◽

James Adu Ansere ◽

...

Keyword(s):

Neural Network ◽

Human Error ◽

Medical Center ◽

Confusion Matrix ◽

Area Under The Curve ◽

Dynamic Routing ◽

Sigmoid Function ◽

Radiological Society ◽

Chest X Ray ◽

Sensitivity Specificity

The squash function in capsule networks (CapsNets) dynamic routing is less capable of performing discrimination of non-informative capsules which leads to abnormal activation value distribution of capsules. In this paper, we propose vertical squash (VSquash) to improve the original squash by preventing the activation values of capsules in the primary capsule layer to shrink non-informative capsules, promote discriminative capsules and avoid high information sensitivity. Furthermore, a new neural network, (i) skip-connected convolutional capsule (S-CCCapsule), (ii) Integrated skip-connected convolutional capsules (ISCC) and (iii) Ensemble skip-connected convolutional capsules (ESCC) based on CapsNets are presented where the VSquash is applied in the dynamic routing. In order to achieve uniform distribution of coupling coefficient of probabilities between capsules, we use the Sigmoid function rather than Softmax function. Experiments on Guangzhou Women and Children’s Medical Center (GWCMC), Radiological Society of North America (RSNA) and Mendeley CXR Pneumonia datasets were performed to validate the effectiveness of our proposed methods. We found that our proposed methods produce better accuracy compared to other methods based on model evaluation metrics such as confusion matrix, sensitivity, specificity and Area under the curve (AUC). Our method for pneumonia detection performs better than practicing radiologists. It minimizes human error and reduces diagnosis time.

Download Full-text

Prediction of Transcription Factor Binding Sites of SP1 on Human Chromosome1

Applied Sciences ◽

10.3390/app11115123 ◽

2021 ◽

Vol 11 (11) ◽

pp. 5123

Author(s):

Maiada M. Mahmoud ◽

Nahla A. Belal ◽

Aliaa Youssif

Keyword(s):

Transcription Factor ◽

Binding Sites ◽

Messenger Rna ◽

Area Under The Curve ◽

Noisy Data ◽

Transcription Factor Binding Sites ◽

Classification Problem ◽

Transcription Factor Binding ◽

K Nearest Neighbors ◽

Factor Binding

Transcription factors (TFs) are proteins that control the transcription of a gene from DNA to messenger RNA (mRNA). TFs bind to a specific DNA sequence called a binding site. Transcription factor binding sites have not yet been completely identified, and this is considered to be a challenge that could be approached computationally. This challenge is considered to be a classification problem in machine learning. In this paper, the prediction of transcription factor binding sites of SP1 on human chromosome1 is presented using different classification techniques, and a model using voting is proposed. The highest Area Under the Curve (AUC) achieved is 0.97 using K-Nearest Neighbors (KNN), and 0.95 using the proposed voting technique. However, the proposed voting technique is more efficient with noisy data. This study highlights the applicability of the voting technique for the prediction of binding sites, and highlights the outperformance of KNN on this type of data. The study also highlights the significance of using voting.

Download Full-text

Multiple biomarker panel to screen for severe aortic stenosis: results from the CASABLANCA study

Open Heart ◽

10.1136/openhrt-2018-000916 ◽

2018 ◽

Vol 5 (2) ◽

pp. e000916 ◽

Cited By ~ 1

Author(s):

Sammy Elmariah ◽

Cian McCarthy ◽

Nasrien Ibrahim ◽

Deborah Furman ◽

Renata Mukai ◽

...

Keyword(s):

Aortic Valve ◽

Predictive Value ◽

Cross Validation ◽

Severe Aortic Stenosis ◽

Area Under The Curve ◽

Diagnostic Model ◽

Aortic Valve Area ◽

Academic Medical Centre ◽

Diagnostic Score ◽

Biomarker Panel

ObjectiveSevere aortic valve stenosis (AS) develops via insidious processes and can be challenging to correctly diagnose. We sought to develop a circulating biomarker panel to identify patients with severe AS.MethodsWe enrolled study participants undergoing coronary or peripheral angiography for a variety of cardiovascular diseases at a single academic medical centre. A panel of 109 proteins were measured in blood obtained at the time of the procedure. Statistical learning methods were used to identify biomarkers and clinical parameters that associate with severe AS. A diagnostic model incorporating clinical and biomarker results was developed and evaluated using Monte Carlo cross-validation.ResultsOf 1244 subjects (age 66.4±11.5 years, 28.7% female), 80 (6.4%) had severe AS (defined as aortic valve area (AVA) <1.0 cm2). A final model included age, N-terminal pro-B-type natriuretic peptide, von Willebrand factor and fetuin-A. The model had good discrimination for severe AS (OR=5.9, 95% CI 3.5 to 10.1, p<0.001) with an area under the curve of 0.76 insample and 0.74 with cross-validation. A diagnostic score was generated. Higher prevalence of severe AS was noted in those with higher scores, such that 1.6% of those with a score of 1 had severe AS compared with 15.3% with a score of 5 (p<0.001), and score values were inversely correlated with AVA (r=−0.35; p<0.001). At optimal model cut-off, we found 76% sensitivity, 65% specificity, 13% positive predictive value and 98% negative predictive value.ConclusionsWe describe a novel, multiple biomarker approach for diagnostic evaluation of severe AS.Trial registration numberNCT00842868.

Download Full-text

Urinary MicroRNA Biomarkers for Detecting The Presence of Esophageal Cancer

10.21203/rs.3.rs-124851/v1 ◽

2020 ◽

Author(s):

Yusuke Okuda ◽

Takaya Shimura ◽

Hiroyasu Iwasaki ◽

Shigeki Fukusada ◽

Ruriko Nishigaki ◽

...

Keyword(s):

Esophageal Cancer ◽

Poor Prognosis ◽

Cross Validation ◽

Area Under The Curve ◽

Pcr Analysis ◽

Discovery Cohort ◽

Qrt Pcr ◽

Urinary Levels ◽

Noninvasive Biomarker ◽

Test Sets

Abstract Background: Esophageal cancer (EC) including esophageal squamous cell carcinoma (ESCC) and adenocarcinoma (EAC) generally exhibits poor prognosis; hence, a noninvasive biomarker enabling early detection is necessary. Methods: Age- and sex-matched 150 healthy controls (HCs) and 43 patients with ESCC were randomly divided into two groups: 9 patients in the discovery cohort for microarray analysis and 184 patients in the training/test cohort with cross-validation for qRT-PCR analysis. Using 152 urine samples (144 HCs and 8 EACs), we validated the urinary miRNA biomarkers for EAC diagnosis.Results: Among eight miRNAs selected in the discovery cohort, urinary levels of five miRNAs (miR-1273f, miR-619-5p, miR-150-3p, miR-4327, and miR-3135b) were significantly higher in the ESCC group than in the HC group, in the training/test cohort. Consistently, these five urinary miRNAs were significantly different between HC and ESCC in both training and test sets. Especially, urinary miR-1273f and miR-619-5p showed excellent values of area under the curve (AUC) ≥ 0.80 for diagnosing stage I ESCC. Similarly, the EAC group had significantly higher urinary levels of these five miRNAs than the HC group, with AUC values of approximately 0.80.Conclusion: The present study established novel urinary miRNA biomarkers that can early detect ESCC and EAC.

Download Full-text

Electrocardiogram Classification for Arrhythmia using Convolutional Neural Network 2D and Adabound Optimizer

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e4591.018520 ◽

2020 ◽

Vol 8 (5) ◽

pp. 1277-1284

Keyword(s):

Neural Network ◽

Cardiovascular Disease ◽

Convolutional Neural Network ◽

Gradient Descent ◽

Stochastic Gradient Descent ◽

Transform Method ◽

The World ◽

Optimal Accuracy ◽

Deadly Disease ◽

Electrocardiogram Ecg

Cardiovascular disease is the number one deadly disease in the world. Arrhythmia is one of the types of cardiovascular disease which is hard to detect but by using the routine electrocardiogram (ECG) recording. Due to the variety and the noise of ECG, it is very time consuming to detect it only by experts using bare eyes.Learning from the previous research in order to help the experts, this research develop 11 layers Convolutional Neural Network 2D (CNN 2D) using MITBIH Arrhythmia Dataset. The dataset is firstly preprocessed by using wavelet transform method, then being segmented by R-peak method. The challenge is how to conquer the imbalance and small amount of data but still get the optimal accuracy. This research can be helpful in helping the doctors figure out the type of arrhythmia of the patient. Therefore, this research did the comparison of various optimizers attach in CNN 2D namely, Adabound, Adadelta, Adagrad, Amsbound, Adam and Stochastic Gradient Descent (SGD). The result is Adabound get the highest performance with 91% accuracy and faster 1s training duration than Adam which is approximately 18s per epoch.

Download Full-text

PREDIKSI INDEKS PRESTASI MAHASISWA YANG BERKULIAH SAMBIL BEKERJA DI UNIVERSITAS ADVENT INDONESIA DENGAN MENGGUNAKAN METODE DECISION TREE C4.5 DAN SMOTE

TeIKa ◽

10.36342/teika.v10i01.2281 ◽

2020 ◽

Vol 10 (01) ◽

pp. 69-77

Author(s):

Yusran Timur Samuel ◽

Chrystle Beatrix Allbright Nahuway

Keyword(s):

Decision Tree ◽

Cross Validation ◽

Confusion Matrix ◽

Split Test

Pendidikan tinggi adalah salah satu cara agar mendapat pekerjaan lebih mudah, hal tersebut disebabkan karena melalui pendidikan individu tersebut mampu meningkatkan kuliatas sumber daya manusia pada zaman ini. Namun biaya pendidikan yang tinggi sangat mahal sehingga individu yang ingin berkuliah harus juga bekerja disaat yang bersamaan, maka penelitian ini bertujuan untuk memprediksi indeks prestasi mahasiswa yang berkuliah sambil bekerja di Universitas Advent Indonesia. Dari hasil penelitian ini terdapat 8 atribut yang berpengaruh dalam memprediksi indek prestasi mahasiswa di Universitas Advent Indonesia yaitu Departemen Pekerjaan, Jam Kerja, Jurusan, Jenis Kelamin, Tempat Tinggal, Usia, Jumlah SKS dan Indeks Prestasi. Metode yang digunakan dalam penelitian ini adalah Decision Tree C4.5 yang diimplementasikan pada program WEKA dengan algoritma J48. Penelitian ini juga menggunakan algoritma SMOTE (Synthetic Minority Oversampling Technique) untuk menyeimbangkan jumlah data pada kelas minor. Root teratas dari penelitian ini adalah Jenis Kelamin yang mempengaruhi indeks prestasi mahasiswa di Universitas Advent Indonesia. Algoritma SMOTE pada penelitian ini berguna untuk membantu menaikan hasil dari penelitian ini sebesar 7-8% bisa dilihat dari hasil akurasi pengujian cross validation 10 folds adalah 63.6672%, kemudian rata-rata hasil dari precision dan recall adalah 0.621 dan 0.637. Sementara untuk hasil akurasi dari split test 70:30 adalah 62.7955%, kemudian rata-rata hasil dari precision dan recall adalah 0.621 dan 0.628. Jika dibandingkan dengan penggunaan algoritma decision tree C4.5 saja maka, akurasi dari pengujian cross validation 10 folds adalah 55.5044%, dengan rata-rata hasil dari precision dan recall adalah 0.545 dan 0.555. Sementara hasil akurasi dari split test 70:30 adalah 55.2995% dengan rata-rata hasil dari precision dan recall adalah 0.554 dan 0.553. Hasil analisa menggunakan confusion matrix serta kurva ROC dengan hasil dari 0.688 menjadi 0.756, yang berada dalam rentang 0.70 – 0.80 yang masuk kedalam tingkat diagnose fair classification. Dapat disimpulkan bawa terdapat pengaruh berkuliah sambil bekerja yang cukup kuat terhadap indeks prestasi mahasiswa. Dengan urutan atribut dari yang paling atas adalah Jenis Kelamin, Jumlah SKS, Jurusan, Umur, Departemen Kerja, Jam Kerja dan Tempat Tinggal.

Download Full-text

CT-Based Radiomics Analysis for Preoperative Diagnosis of Pancreatic Mucinous Cystic Neoplasm and Atypical Serous Cystadenomas

Frontiers in Oncology ◽

10.3389/fonc.2021.621520 ◽

2021 ◽

Vol 11 ◽

Author(s):

Tiansong Xie ◽

Xuanyi Wang ◽

Zehua Zhang ◽

Zhengrong Zhou

Keyword(s):

Predictive Value ◽

Cross Validation ◽

Clinical Decision Making ◽

Area Under The Curve ◽

Clinical Decision ◽

Cystic Neoplasm ◽

Cystic Neoplasms ◽

Serous Cystadenomas ◽

Radiological Model ◽

Fold Cross Validation

ObjectivesTo investigate the value of CT-based radiomics analysis in preoperatively discriminating pancreatic mucinous cystic neoplasms (MCN) and atypical serous cystadenomas (ASCN).MethodsA total of 103 MCN and 113 ASCN patients who underwent surgery were retrospectively enrolled. A total of 764 radiomics features were extracted from preoperative CT images. The optimal features were selected by Mann-Whitney U test and minimum redundancy and maximum relevance method. The radiomics score (Rad-score) was then built using random forest algorithm. Radiological/clinical features were also assessed for each patient. Multivariable logistic regression was used to construct a radiological model. The performance of the Rad-score and the radiological model was evaluated using 10-fold cross-validation for area under the curve (AUC), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and accuracy.ResultsTen screened optimal features were identified and the Rad-score was then built based on them. The radiological model was built based on four radiological/clinical factors. In the 10-fold cross-validation, the Rad-score was proved to be robust and reliable (average AUC: 0.784, sensitivity: 0.847, specificity: 0.745, PPV: 0.767, NPV: 0.849, accuracy: 0.793). The radiological model performed slightly less well in classification (average AUC: average AUC: 0.734 sensitivity: 0.748, specificity: 0.705, PPV: 0.732, NPV: 0.798, accuracy: 0.728.ConclusionsThe CT-based radiomics analysis provided promising performance for preoperatively discriminating MCN from ASCN and showed good potential in improving diagnostic power, which may serve as a novel tool for guiding clinical decision-making for these patients.

Download Full-text

ANALISIS SUARA PERNAPASAN PARU-PARU ASMA DENGAN TIDAK ASMA MENGGUNAKAN METODE K NEAREST NEIGHBORS

DIELEKTRIKA ◽

10.29303/dielektrika.v8i1.251 ◽

2021 ◽

Vol 8 (1) ◽

pp. 1

Author(s):

Ari Satriadi

Keyword(s):

Cross Validation ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

Fold Cross Validation

Asma adalah penyakit pada saluran napas yang menyebabkan peningkatan hiperesponsif jalan napas dan menimbulkan gejala mengi/wheeze (napas berbunyi ngik-ngik). Bunyi napas wheeze merupakan salah satu ciri yang menandakan seseorang menderita asma. Penelitian ini dilakukan untuk membuat serta menguji suatu sistem yang dapat mengidentifikasi perbedaan ciri suara pernapasan wheeze pada pasien asma dan pernapasan lainnya dengan metode k-Nearest Neighbors (k-NN). Ciri suara yang digunakan yaitu rata-rata sinyal dan standar deviasi sinyal dalam domain waktu, rata-rata spektrum, standar deviasi spektrum, magnitude tertinggi saat frekuensi 0Hz, frekuensi dengan magnitude tertinggi pertama, kedua, dan ketiga. K-NN adalah sebuah metode untuk melakukan klasifikasi terhadap objek berdasarkan data pembelajaran yang jaraknya paling dekat dengan objek tersebut. Didapatkan data suara pernapasan wheeze dan non wheeze melalui perekaman langsung kepada subjek penderita asma dan tidak asma. Dari seluruh data suara yang didapatkan kemudian dilakukan segmentasi data untuk mengambil event pernapasasn yang dibutuhkan kemudian dilakukan ekstraksi ciri untuk mendapatkan ciri matematis dari suara tersebut. 80% dari total keseluruhan data dilakukan pelatihan menggunakan metode 10 fold cross validation dan diapatkan hasil pelatihan dengan kemampuan klasifikasi maksimum pada k=3 dan k=5 dengan validitas yang sama 97,2%. Untuk pengujian kinerja k-NN pada tahap akhir diperoleh kemampuan maksimum pengklasifikasian untuk k=3 adalah 86,6% dan k=5 adalah 86,6%.

Download Full-text