Rancang Bangun Sistem Pakar untuk Deteksi Dini Katarak Menggunakan Algoritma C4.5

In 2010, 51% of 39 million blindness are caused by cataract. In 2013, there are 1.8% of 1.027.763 Indonesian people who suffered from cataract. Half of them are not treated yet due to their ignorance on the cataract disease. Therefore, in this research, we tried to build a system that can detect early cataract disease as the ophthalmologist would do. The system will use C4.5 algorithm that receives 150 training data set as an input, resulting in a set of rules which can be used as decision factors. To test the system, k-fold cross validation technique is been used with k equals to 10. From the analysis result, the accuracy of the system is 93.2% to detect cataract disease and 80.5% to detect the type of cataract disease one might suffered. Index terms-C4.5 algorithm, cataract, k-fold cross validation, machine learning

Download Full-text

Classification of Brain Tumors from MRI Images Using a Convolutional Neural Network

Applied Sciences ◽

10.3390/app10061999 ◽

2020 ◽

Vol 10 (6) ◽

pp. 1999 ◽

Cited By ~ 7

Author(s):

Milica M. Badža ◽

Marko Č. Barjaktarović

Keyword(s):

Neural Network ◽

Machine Learning ◽

Brain Tumors ◽

Convolutional Neural Network ◽

Cross Validation ◽

Magnetic Resonance Images ◽

Generalization Capability ◽

Data Set ◽

Fold Cross Validation

The classification of brain tumors is performed by biopsy, which is not usually conducted before definitive brain surgery. The improvement of technology and machine learning can help radiologists in tumor diagnostics without invasive measures. A machine-learning algorithm that has achieved substantial results in image segmentation and classification is the convolutional neural network (CNN). We present a new CNN architecture for brain tumor classification of three tumor types. The developed network is simpler than already-existing pre-trained networks, and it was tested on T1-weighted contrast-enhanced magnetic resonance images. The performance of the network was evaluated using four approaches: combinations of two 10-fold cross-validation methods and two databases. The generalization capability of the network was tested with one of the 10-fold methods, subject-wise cross-validation, and the improvement was tested by using an augmented image database. The best result for the 10-fold cross-validation method was obtained for the record-wise cross-validation for the augmented data set, and, in that case, the accuracy was 96.56%. With good generalization capability and good execution speed, the new developed CNN architecture could be used as an effective decision-support tool for radiologists in medical diagnostics.

Download Full-text

Convolutional Neural Networks for automatic image quality control and EARL compliance of PET images

10.21203/rs.3.rs-964263/v1 ◽

2021 ◽

Author(s):

Elisabeth Pfaehler ◽

Daniela Euba ◽

Andreas Rinscheid ◽

Otto S. Hoekstra ◽

Josee Zijlstra ◽

...

Keyword(s):

Machine Learning ◽

Cross Validation ◽

Training Data ◽

Independent Dataset ◽

Pet Ct ◽

Image Quality Control ◽

The Cross ◽

The Impact ◽

Pet Scanners ◽

Fold Cross Validation

Abstract Background Machine learning studies require a large number of images often obtained on different PET scanners. When merging these images, the use of harmonized images following EARL-standards is essential. However, when including retrospective images, EARL accreditation might not have been in place. The aim of this study was to develop a convolutional neural network (CNN) that can identify retrospectively if an image is EARL compliant and if it is meeting older or newer EARL-standards. Materials and Methods 96 PET images acquired on three PET/CT systems were included in the study. All images were reconstructed with the locally clinically preferred, EARL1, and EARL2 compliant reconstruction protocols. After image pre-processing, one CNN was trained to separate clinical and EARL compliant reconstructions. A second CNN was optimized to identify EARL1 and EARL2 compliant images. The accuracy of both CNNs was assessed using 5-fold cross validation. The CNNs were validated on 24 images acquired on a PET scanner not included in the training data. To assess the impact of image noise on the CNN decision, the 24 images were reconstructed with different scan durations. Results In the cross-validation, the first CNN classified all images correctly. When identifying EARL1 and EARL2 compliant images, the second CNN identified 100% EARL1 compliant and 85% EARL2 compliant images correctly. The accuracy in the independent dataset was comparable to the cross-validation accuracy. The scan duration had almost no impact on the results. Conclusion The two CNNs trained in this study can be used to retrospectively include images in a multi-center setting by e.g. adding additional smoothing. This method is especially important for machine learning studies where the harmonization of images from different PET systems is essential.

Download Full-text

High Accurate and a Variant of k-fold Cross Validation Technique for Predicting the Decision Tree Classifier Accuracy

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8403.0110321 ◽

2021 ◽

Vol 10 (2) ◽

pp. 105-110

Author(s):

D. Mabuni ◽

S. Aquter Babu

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Classification Accuracy ◽

Cross Validation ◽

Training Dataset ◽

Decision Tree Classification ◽

Testing Dataset ◽

Tree Classifier ◽

Validation Technique ◽

Fold Cross Validation

In machine learning data usage is the most important criterion than the logic of the program. With very big and moderate sized datasets it is possible to obtain robust and high classification accuracies but not with small and very small sized datasets. In particular only large training datasets are potential datasets for producing robust decision tree classification results. The classification results obtained by using only one training and one testing dataset pair are not reliable. Cross validation technique uses many random folds of the same dataset for training and validation. In order to obtain reliable and statistically correct classification results there is a need to apply the same algorithm on different pairs of training and validation datasets. To overcome the problem of the usage of only a single training dataset and a single testing dataset the existing k-fold cross validation technique uses cross validation plan for obtaining increased decision tree classification accuracy results. In this paper a new cross validation technique called prime fold is proposed and it is experimentally tested thoroughly and then verified correctly using many bench mark UCI machine learning datasets. It is observed that the prime fold based decision tree classification accuracy results obtained after experimentation are far better than the existing techniques of finding decision tree classification accuracies.

Download Full-text

An Efficient Classification Algorithm for Traditional Textile Patterns from Different Cultures Based on Structures

Journal on Computing and Cultural Heritage ◽

10.1145/3465381 ◽

2021 ◽

Vol 14 (4) ◽

pp. 1-22

Author(s):

Vuong M. Ngo ◽

Thuy-Van T. Duong ◽

Tat-Bao-Thien Nguyen ◽

Phuong T. Nguyen ◽

Owen Conlan

Keyword(s):

Cross Validation ◽

Feature Vector ◽

Three Dimensional ◽

Complex Structures ◽

Data Set ◽

Three Dimensional Objects ◽

Validation Technique ◽

Different Cultures ◽

Fold Cross Validation ◽

Novel Algorithm

Textiles have an important role in many cultures and have been digitised. They are three-dimensional objects and have complex structures, especially archaeological fabric specimens and artifact textiles created manually by traditional craftsmen. In this article, we propose a novel algorithm for textile classification based on their structures. First, a hypergraph is used to represent the textile structure. Second, multisets of k -neighbourhoods are extracted from the hypergraph and converted to one feature vector for representation of each textile. Then, the k -neighbourhood vectors are classified using seven most popular supervised learning methods. Finally, we evaluate experimentally the different variants of our approach on a data set of 1,600 textile samples with the 4-fold cross-validation technique. The experimental results indicate that comparing the variants, the best classification accuracies are 0.999 with LR, 0.994 with LDA, 0.996 with KNN, 0.994 with CART, 0.998 with NB, 0.974 with SVM, and 0.999 with NNM.

Download Full-text

Convolutional Neural Networks for Automatic Image Quality Control and EARL Compliance of PET Images

10.21203/rs.3.rs-964263/v2 ◽

2021 ◽

Author(s):

Elisabeth Pfaehler ◽

Daniela Euba ◽

Andreas Rinscheid ◽

Otto S. Hoekstra ◽

Josee Zijlstra ◽

...

Keyword(s):

Machine Learning ◽

Cross Validation ◽

Training Data ◽

Independent Dataset ◽

Pet Ct ◽

Image Quality Control ◽

The Cross ◽

The Impact ◽

Pet Scanners ◽

Fold Cross Validation

Abstract Background: Machine learning studies require a large number of images often obtained on different PET scanners. When merging these images, the use of harmonized images following EARL-standards is essential. However, when including retrospective images, EARL accreditation might not have been in place. The aim of this study was to develop a convolutional neural network (CNN) that can identify retrospectively if an image is EARL compliant and if it is meeting older or newer EARL-standards. Materials and Methods: 96 PET images acquired on three PET/CT systems were included in the study. All images were reconstructed with the locally clinically preferred, EARL1, and EARL2 compliant reconstruction protocols. After image pre-processing, one CNN was trained to separate clinical and EARL compliant reconstructions. A second CNN was optimized to identify EARL1 and EARL2 compliant images. The accuracy of both CNNs was assessed using 5-fold cross validation. The CNNs were validated on 24 images acquired on a PET scanner not included in the training data. To assess the impact of image noise on the CNN decision, the 24 images were reconstructed with different scan durations.Results: In the cross-validation, the first CNN classified all images correctly. When identifying EARL1 and EARL2 compliant images, the second CNN identified 100% EARL1 compliant and 85% EARL2 compliant images correctly. The accuracy in the independent dataset was comparable to the cross-validation accuracy. The scan duration had almost no impact on the results. Conclusion: The two CNNs trained in this study can be used to retrospectively include images in a multi-center setting by e.g. adding additional smoothing. This method is especially important for machine learning studies where the harmonization of images from different PET systems is essential.

Download Full-text

Ensemble of Data-Driven Prognostic Algorithms With Weight Optimization and K-Fold Cross Validation

Volume 3: 30th Computers and Information in Engineering Conference, Parts A and B ◽

10.1115/detc2010-29182 ◽

2010 ◽

Cited By ~ 8

Author(s):

Chao Hu ◽

Byeng D. Youn ◽

Pingfeng Wang

Keyword(s):

Cross Validation ◽

Real Data ◽

Weighting Scheme ◽

Training Data ◽

Data Driven ◽

Algorithm Selection ◽

Data Set ◽

Weighting Schemes ◽

Testing Data ◽

Fold Cross Validation

The traditional data-driven prognostic approach is to construct multiple candidate algorithms using a training data set, evaluate their respective performance using a testing data set, and select the one with the best performance while discarding all the others. This approach has three shortcomings: (i) the selected standalone algorithm may not be robust, i.e., it may be less accurate when the real data acquired after the deployment differs from the testing data; (ii) it wastes the resources for constructing the algorithms that are discarded in the deployment; (iii) it requires the testing data in addition to the training data, which increases the overall expenses for the algorithm selection. To overcome these drawbacks, this paper proposes an ensemble data-driven prognostic approach which combines multiple member algorithms with a weighted-sum formulation. Three weighting schemes, namely, the accuracy-based weighting, diversity-based weighting and optimization-based weighting, are proposed to determine the weights of member algorithms for data-driven prognostics. The k-fold cross validation (CV) is employed to estimate the prediction error required by the weighting schemes. Two case studies were employed to demonstrate the effectiveness of the proposed prognostic approach. The results suggest that the ensemble approach with any weighting scheme gives more accurate RUL predictions compared to any sole algorithm and that the optimization-based weighting scheme gives the best overall performance among the three weighting schemes.

Download Full-text

Rancang Bangun Sistem Informasi Untuk Menentukan Kapabilitas Konsumen Dalam Mengambil Pinjaman KPR

Jurnal ULTIMA InfoSys ◽

10.31937/si.v7i2.543 ◽

2016 ◽

Vol 7 (2) ◽

pp. 75-80

Author(s):

Adhi Kusnadi ◽

Risyad Ananda Putra

Keyword(s):

Data Mining ◽

Low Income ◽

Cross Validation ◽

Classification Tree ◽

Large Population ◽

Housing Development ◽

Good Precision ◽

Index Terms ◽

The Government ◽

Fold Cross Validation

Indonesia is one country that has a relatively large population . The government in the period of 5 years, annually hold a procurement program 1 million FLPP house units. This program is held in an effort to provide a decent home for low income people. FLPP housing development requires good precision and speed of development on the part of the developer, this is often hampered by the bank process, because it is difficult to predict the results and speed of data processing in the bank. Knowing the ability of consumers to get subsidized credit, has many advantages, among others, developers can plan a better cash flow, and developers can replace consumers who will be rejected before entering the bank process. For that reason built a system that can help developers. There are many methods that can be used to create this application. One of them is data mining with Classification tree. The results of 10-fold-cross-validation applications have an accuracy of 92%. Index Terms-Data Mining, Classification Tree, Housing, FLPP, 10-fold-cross Validation, Consumer Capability

Download Full-text

Prediction of K562 Cells Functional Inhibitors Based on Machine Learning Approaches

Current Pharmaceutical Design ◽

10.2174/1381612825666191107092214 ◽

2020 ◽

Vol 25 (40) ◽

pp. 4296-4302 ◽

Cited By ~ 2

Author(s):

Yuan Zhang ◽

Zhenyan Han ◽

Qian Gao ◽

Xiaoyi Bai ◽

Chi Zhang ◽

...

Keyword(s):

Machine Learning ◽

Inclusion Bodies ◽

Cross Validation ◽

Independent Set ◽

K562 Cells ◽

Machine Learning Algorithms ◽

Learning Approaches ◽

Validation Test ◽

Excess Number ◽

Fold Cross Validation

Background: β thalassemia is a common monogenic genetic disease that is very harmful to human health. The disease arises is due to the deletion of or defects in β-globin, which reduces synthesis of the β-globin chain, resulting in a relatively excess number of α-chains. The formation of inclusion bodies deposited on the cell membrane causes a decrease in the ability of red blood cells to deform and a group of hereditary haemolytic diseases caused by massive destruction in the spleen. Methods: In this work, machine learning algorithms were employed to build a prediction model for inhibitors against K562 based on 117 inhibitors and 190 non-inhibitors. Results: The overall accuracy (ACC) of a 10-fold cross-validation test and an independent set test using Adaboost were 83.1% and 78.0%, respectively, surpassing Bayes Net, Random Forest, Random Tree, C4.5, SVM, KNN and Bagging. Conclusion: This study indicated that Adaboost could be applied to build a learning model in the prediction of inhibitors against K526 cells.

Download Full-text

Comparative Analysis of Machine Learning Techniques Using Predictive Modeling

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200904164539 ◽

2020 ◽

Vol 13 ◽

Author(s):

Ritu Khandelwal ◽

Hemlata Goyal ◽

Rajveer Singh Shekhawat

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Data Science ◽

Training Data ◽

Machine Learning Techniques ◽

Future Trends ◽

Data Set ◽

Learning Stage ◽

Learning Techniques ◽

Different Types

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.

Download Full-text

Building Damage Detection from Post-Event Aerial Imagery Using Single Shot Multibox Detector

Applied Sciences ◽

10.3390/app9061128 ◽

2019 ◽

Vol 9 (6) ◽

pp. 1128 ◽

Cited By ~ 12

Author(s):

Yundong Li ◽

Wei Hu ◽

Han Dong ◽

Xueyan Zhang

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Hurricane Sandy ◽

Training Data ◽

Aerial Images ◽

Detection Methods ◽

Single Shot ◽

Data Set ◽

Augmentation Strategies ◽

Post Disaster

Using aerial cameras, satellite remote sensing or unmanned aerial vehicles (UAV) equipped with cameras can facilitate search and rescue tasks after disasters. The traditional manual interpretation of huge aerial images is inefficient and could be replaced by machine learning-based methods combined with image processing techniques. Given the development of machine learning, researchers find that convolutional neural networks can effectively extract features from images. Some target detection methods based on deep learning, such as the single-shot multibox detector (SSD) algorithm, can achieve better results than traditional methods. However, the impressive performance of machine learning-based methods results from the numerous labeled samples. Given the complexity of post-disaster scenarios, obtaining many samples in the aftermath of disasters is difficult. To address this issue, a damaged building assessment method using SSD with pretraining and data augmentation is proposed in the current study and highlights the following aspects. (1) Objects can be detected and classified into undamaged buildings, damaged buildings, and ruins. (2) A convolution auto-encoder (CAE) that consists of VGG16 is constructed and trained using unlabeled post-disaster images. As a transfer learning strategy, the weights of the SSD model are initialized using the weights of the CAE counterpart. (3) Data augmentation strategies, such as image mirroring, rotation, Gaussian blur, and Gaussian noise processing, are utilized to augment the training data set. As a case study, aerial images of Hurricane Sandy in 2012 were maximized to validate the proposed method’s effectiveness. Experiments show that the pretraining strategy can improve of 10% in terms of overall accuracy compared with the SSD trained from scratch. These experiments also demonstrate that using data augmentation strategies can improve mAP and mF1 by 72% and 20%, respectively. Finally, the experiment is further verified by another dataset of Hurricane Irma, and it is concluded that the paper method is feasible.

Download Full-text