Perbandingan Algoritma Regresi Logistic Dan Neural Network Pada Prediksi Nilai Hasil Pembinaan Dan Kelulusan Tepat Waktu

INTISARIPenelitian ini didasari pada keinginan memanfaatkan informasi akademis mahasiswa yang tinggal di asrama yang memiliki pendidikan karakter dengan program pembelajaraan milik Universitas Muhammadiyah Yogyakarta yang disediakan untuk sebagian mahasiswanya. Hubungan antara pembinaan di asrama mahasiswa dengan prestasi di kampus belum pernah diteliti secara khusus. Penelitian sebelumnya yang penulis temukan menjelasakan hubungan antara nilai di kampus dan kelulusannya. Adanya visi asrama yang salah satunya adalah prestasi studi juga tersedianya data Nilai pendaftaran hingga raport hasil pembelajaran di Asrama serta data kelulusan di kampus, sehingga penulis ingin melihat apakah mahasiswa asrama dapat lulus tepat waktu di kampus, dibutuhkan data mining untuk memprediksi, dipilihlah algoritma Regresi Logisitic dan Neural Network. Dari hasil pengolahan data angkatan tahun 2014-2015 yang digunakan untuk training dan testing, didapatkan hasil dari 5x iterasi k-fold cross validation untuk Regresi Logistic dengan akurasi 65 % dan Neural Network 69%. Dengan begitu algoritma Neural network cendrung lebih baik Regresi Logistic. Kata kunci — data mining, kelulusan, klasifikasi, neural network, prediksi, regresi logistic ABSTRACTThis research is based on the desire to utilize the academic information of students living in dormitories who have character education with the learning program of the University of Muhammadiyah Yogyakarta provided for some of its students. The relationship between development in student dormitories with achievements on campus has not been specifically examined. Previous research that the authors found explained the relationship between grades on campus and graduation. The existence of a dormitory vision, one of which is the achievement of the study as well as the availability of data Registration value to report cards of learning outcomes at the Dormitory as well as graduation data on campus, so the writer wants to see whether boarding students can graduate on time on campus, data mining is needed to predict, chosen Logistic Regression algorithm and Neural Network. From the results of the 2014-2015 batch data processing used for training and testing, the results of 5 times the k-fold cross validation iteration for Logistic Regression with an accuracy of 65% and a 69% Neural Network. Thus the Neural network algorithm tends to be better than Logistic Regression. Keywords — data mining, graduation, klasification, neural nework, prediction, regresi logistic.

Download Full-text

Rule Extraction from Privacy Preserving Neural Network: Application to Banking

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.403-408.920 ◽

2011 ◽

Vol 403-408 ◽

pp. 920-928 ◽

Cited By ~ 1

Author(s):

Nekuri Naveen ◽

V. Ravi ◽

C. Raghavendra Rao

Keyword(s):

Neural Network ◽

Data Mining ◽

Privacy Preservation ◽

Cross Validation ◽

Hybrid Approach ◽

Rule Extraction ◽

Privacy Preserving ◽

Preservation Method ◽

Network Application ◽

Fold Cross Validation

In the last two decades in areas like banking, finance and medical research privacy policies restrict the data owners to share the data for data mining purpose. This issue throws up a new area of research namely privacy preserving data mining. In this paper, we proposed a privacy preservation method by employing Particle Swarm Optimization (PSO) trained Auto Associative Neural Network (PSOAANN). The modified (privacy preserved) input values are fed to a decision tree (DT) and a rule induction algorithm viz., Ripper for rule extraction purpose. The performance of the hybrid is tested on four benchmark and bankruptcy datasets using 10-fold cross validation. The results are compared with those obtained using the original datasets where privacy is not preserved. The proposed hybrid approach achieved good results in all datasets.

Download Full-text

Komparasi Algoritma Klasifikasi Data Mining untuk Memprediksi Tingkat Kematian Dini Kanker dengan Dataset Early Death Cancer

JOINTECS (Journal of Information Technology and Computer Science) ◽

10.31328/jointecs.v4i2.1008 ◽

2019 ◽

Vol 4 (2) ◽

pp. 63

Author(s):

Panny Agustia Rahayuningsih

Keyword(s):

Neural Network ◽

Data Mining ◽

Random Forest ◽

Cross Validation ◽

Naive Bayes ◽

Early Death ◽

Naïve Bayes ◽

T Test ◽

Fold Cross Validation

Penyakit Kanker merupakan sepuluh besar penyakit pembunuh di dunia. Kanker merupakan penyakit yang ganas dan sulit disembuhkan jika penyebarannya sudah terlalu luas. Akan tetapi, pendeteksian sel kanker sedini mungkin dapat mengurangi resiko kematian. Penelitian ini bertujuan untuk memprediksikan tingkat kematian dini kanker pada penduduk Eropa dengan menggunakan 5algoritma klasifikasi yaitu: Desecion Tree, Naïve Bayes, k-Nearset Neighbour, Random Forest dan Neural Network dari algoritma tersebut algoritma mana yang dianggap paling baik untuk penelitian ini. Pengujian dilakukan dengan beberapa tahapan penelitian antara lain: dataset (pengumpulan data), pengolahan data awal, metode yang diusulkan, pengujian metode menggunakan 10-fold cross validation, evaluasi hasil dan uji beda t-test. Nilai alpha yang digunakan adalah 0.05. jika probabilitasnya >0.05 maka H0 diterima. Sedangkan jika probabilitasnya <0.05 maka Ho ditolak.Hasil dari penelitian yang mendapatkan performe terbaik dengan nilai akurasi sebesar 98,35% adalah algoritma Neural Network. Sedangkan, hasil penelitian menggunakan uji t-test algoritma dengan model terbaik yaitu: algoritma Random Forest dan Neural Network, algoritma Naïve Bayes lumanyan baik, algoritma Desecion Tree cukup baik dan algoritma yang kurang baik adalah algoritma K-Nearset Neighbour (K-NN).

Download Full-text

A PROPOSED PARADIGM FOR INTELLIGENT HEART DISEASE PREDICTION SYSTEM USING DATA MINING TECHNIQUES

Journal of Southwest Jiaotong University ◽

10.35741/issn.0258-2724.56.4.19 ◽

2021 ◽

Vol 56 (4) ◽

pp. 220-240

Author(s):

Shimaa Ouf ◽

Ahmed I. B. ElSeddawy

Keyword(s):

Neural Network ◽

Data Mining ◽

Heart Disease ◽

Cross Validation ◽

Prediction Models ◽

Heart Diseases ◽

Optimal Combination ◽

Disease Prediction ◽

Scientific Papers ◽

Fold Cross Validation

The data mining techniques-based systems could have a crucial impact on the employees’ lifestyle to predict heart diseases. There are many scientific papers, which use the techniques of data mining to predict heart diseases. However, limited scientific papers have addressed the four cross-validation techniques of splitting the data set that plays an important role in selecting the best technique for predicting heart disease. It is important to choose the optimal combination between the cross-validation techniques and the data mining, classification techniques that can enhance the performance of the prediction models. This paper aims to apply the four-cross-validation techniques (holdout, k-fold cross-validation, stratified k fold cross-validation, and repeated random) with the eight data mining, classification techniques (Linear Discriminant Analysis, Logistic regression, Support Vector Model, KNN, Decision Tree, Naïve Bayes, Random Forest, and Neural Network) to improve the accuracy of heart disease prediction and select the best prediction models. It analyzes these techniques on a small and large dataset collected from different data sources like Kaggle and the UCI machine-learning repository. The evaluation metrics like accuracy, precision, recall, and F-measure were used to measure the performance of prediction models. Experimentation is performed on two datasets, and the results show that when the dataset is colossal (70000 records), the optimal combination that achieves the highest accuracy is holdout cross-validation with the neural network with an accuracy of 71.82%. At the same time, Repeated Random with Random Forest considers the optimal combination in a small dataset (303 records) with an accuracy of 89.01%. The best models will be recommended to the physicians in business organizations to help them predicting heart disease in employees into one of two categories, cardiac and non-cardiac, at an early stage. The early detection of heart diseases in employees will improve productivity in the business organization.

Download Full-text

Rancang Bangun Sistem Informasi Untuk Menentukan Kapabilitas Konsumen Dalam Mengambil Pinjaman KPR

Jurnal ULTIMA InfoSys ◽

10.31937/si.v7i2.543 ◽

2016 ◽

Vol 7 (2) ◽

pp. 75-80

Author(s):

Adhi Kusnadi ◽

Risyad Ananda Putra

Keyword(s):

Data Mining ◽

Low Income ◽

Cross Validation ◽

Classification Tree ◽

Large Population ◽

Housing Development ◽

Good Precision ◽

Index Terms ◽

The Government ◽

Fold Cross Validation

Indonesia is one country that has a relatively large population . The government in the period of 5 years, annually hold a procurement program 1 million FLPP house units. This program is held in an effort to provide a decent home for low income people. FLPP housing development requires good precision and speed of development on the part of the developer, this is often hampered by the bank process, because it is difficult to predict the results and speed of data processing in the bank. Knowing the ability of consumers to get subsidized credit, has many advantages, among others, developers can plan a better cash flow, and developers can replace consumers who will be rejected before entering the bank process. For that reason built a system that can help developers. There are many methods that can be used to create this application. One of them is data mining with Classification tree. The results of 10-fold-cross-validation applications have an accuracy of 92%. Index Terms-Data Mining, Classification Tree, Housing, FLPP, 10-fold-cross Validation, Consumer Capability

Download Full-text

Study onYang-XuUsing Body Constitution Questionnaire and Blood Variables in Healthy Volunteers

Evidence-based Complementary and Alternative Medicine ◽

10.1155/2016/9437382 ◽

2016 ◽

Vol 2016 ◽

pp. 1-7 ◽

Cited By ~ 7

Author(s):

Hong-Jhang Chen ◽

Yii-Jeng Lin ◽

Pei-Chen Wu ◽

Wei-Hsiang Hsu ◽

Wan-Chung Hu ◽

...

Keyword(s):

Healthy Subjects ◽

Logistic Regression Model ◽

Cross Validation ◽

Blood Biomarkers ◽

Metabolic Characteristics ◽

Body Constitution ◽

Leave One Out ◽

The Relationship ◽

Fold Cross Validation ◽

Blood Variables

Traditional Chinese medicine (TCM) formulates treatment according to body constitution (BC) differentiation. Different constitutions have specific metabolic characteristics and different susceptibility to certain diseases. This study aimed to assess theYang-Xuconstitution using a body constitution questionnaire (BCQ) and clinical blood variables. A BCQ was employed to assess the clinical manifestation ofYang-Xu. The logistic regression model was conducted to explore the relationship between BC scores and biomarkers. Leave-one-out cross-validation (LOOCV) and K-fold cross-validation were performed to evaluate the accuracy of a predictive model in practice. Decision trees (DTs) were conducted to determine the possible relationships between blood biomarkers and BC scores. According to the BCQ analysis, 49% participants without any BC were classified as healthy subjects. Among them, 130 samples were selected for further analysis and divided into two groups. One group comprised healthy subjects without any BC (68%), while subjects of the other group, named as the sub-healthy group, had three BCs (32%). Six biomarkers, CRE, TSH, HB, MONO, RBC, and LH, were found to have the greatest impact on BCQ outcomes inYang-Xusubjects. This study indicated significant biochemical differences inYang-Xusubjects, which may provide a connection between blood variables and theYang-XuBC.

Download Full-text

Genome-wide 5hmC profiles to enable cancer detection and tissue of origin classification in breast, colorectal, lung, ovarian, and pancreatic cancers.

Journal of Clinical Oncology ◽

10.1200/jco.2021.39.15_suppl.3044 ◽

2021 ◽

Vol 39 (15_suppl) ◽

pp. 3044-3044

Author(s):

David Haan ◽

Anna Bergamaschi ◽

Yuhong Ning ◽

William Gibb ◽

Michael Kesling ◽

...

Keyword(s):

Logistic Regression ◽

Cancer Detection ◽

Normal Tissue ◽

Cross Validation ◽

Tissue Type ◽

Normal Tissues ◽

Pancreatic Cancers ◽

Tissue Of Origin ◽

Fold Cross Validation

3044 Background: Epigenomics assays have recently become popular tools for identification of molecular biomarkers, both in tissue and in plasma. In particular 5-hydroxymethyl-cytosine (5hmC) method, has been shown to enable the epigenomic regulation of gene expression and subsequent gene activity, with different patterns, across several tumor and normal tissues types. In this study we show that 5hmC profiles enable discrete classification of tumor and normal tissue for breast, colorectal, lung ovary and pancreas. Such classification was also recapitulated in cfDNA from patient with breast, colorectal, lung, ovarian and pancreatic cancers. Methods: DNA was isolated from 176 fresh frozen tissues from breast, colorectal, lung, ovary and pancreas (44 per tumor per tissue type and up to 11 tumor tissues for each stage (I-IV)) and up to 10 normal tissues per tissue type. cfDNA was isolated from plasma from 783 non-cancer individuals and 569 cancer patients. Plasma-isolated cfDNA and tumor genomic DNA, were enriched for the 5hmC fraction using chemical labelling, sequenced, and aligned to a reference genome to construct features sets of 5hmC patterns. Results: 5hmC multinomial logistic regression analysis was employed across tumor and normal tissues and identified a set of specific and discrete tumor and normal tissue gene-based features. This indicates that we can classify samples regardless of source, with a high degree of accuracy, based on tissue of origin and also distinguish between normal and tumor status.Next, we employed a stacked ensemble machine learning algorithm combining multiple logistic regression models across diverse feature sets to the cfDNA dataset composed of 783 non cancers and 569 cancers comprising 67 breast, 118 colorectal, 210 Lung, 71 ovarian and 100 pancreatic cancers. We identified a genomic signature that enable the classification of non-cancer versus cancers with an outer fold cross validation sensitivity of 49% (CI 45%-53%) at 99% specificity. Further, individual cancer outer fold cross validation sensitivity at 99% specificity, was measured as follows: breast 30% (CI 119% -42%); colorectal 41% (CI 32%-50%); lung 49% (CI 42%-56%); ovarian 72% (CI 60-82%); pancreatic 56% (CI 46%-66%). Conclusions: This study demonstrates that 5hmC profiles can distinguish cancer and normal tissues based on their origin. Further, 5hmC changes in cfDNA enables detection of the several cancer types: breast, colorectal, lung, ovarian and pancreatic cancers. Our technology provides a non-invasive tool for cancer detection with low risk sample collection enabling improved compliance than current screening methods. Among other utilities, we believe our technology could be applied to asymptomatic high-risk individuals thus enabling enrichment for those subjects that most need a diagnostic imaging follow up.

Download Full-text

Applying Logistic Regression Data mining techniques for Ethiopian Government Agricultural Open Data Sets

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/061022021 ◽

2021 ◽

Vol 10 (2) ◽

pp. 473-488

Keyword(s):

Data Mining ◽

Logistic Regression ◽

Agricultural Sector ◽

Model Building ◽

Open Data ◽

Research Process ◽

Attribute Selection ◽

Sufficient Evidence ◽

Vegetable Crops ◽

Logistic Regression Algorithm

Ethiopia has a great agricultural potential because of its vast areas of fertile land, diverse climate, generally adequate rainfall, and large labor force. With its verified importance to the Ethiopian economy, there is sufficient evidence to show that the potential of the agricultural sector can be expanded considerably by attracting investors towards the sector. This study aims at applying classification techniques in developing a predictive model that can estimate yield production of vegetable crops and the correlation of crops based on their class. In the process of building a model, different steps were undertaken. Among the steps, data collection, data preprocessing and model building and validation were the major ones. Different tasks performed in each step are mentioned as follows. The data were collected Food and Agriculture Organization of the United Nations (FAO). Under preprocessing, data cleaning, discretization and attribute selection were done. The final step was model building and validation and it was performed using the selected tools and techniques. The data mining tool used in this research was Weka. In this software the logistic regression algorithm was selected since it is capable to score more accuracy. After successive experiments were done using this software, a model that can classify crop yield as high, medium and low with better accuracy to the extent of 88.6%. Experimental results show that logistic regression is a very helpful tool to depict the contribution of yield estimation and crop correlation. The reported findings are optimistic, making the proposed model a useful tool in the decision making process. Eventually, the whole research process can be a good input for further indepth research

Download Full-text

Novel Ensemble of Multivariate Adaptive Regression Spline with Spatial Logistic Regression and Boosted Regression Tree for Gully Erosion Susceptibility

Remote Sensing ◽

10.3390/rs12203284 ◽

2020 ◽

Vol 12 (20) ◽

pp. 3284

Author(s):

Paramita Roy ◽

Subodh Chandra Pal ◽

Alireza Arabameri ◽

Rabin Chakrabortty ◽

Biswajeet Pradhan ◽

...

Keyword(s):

Logistic Regression ◽

River Basin ◽

Cross Validation ◽

Gully Erosion ◽

Multivariate Adaptive Regression Spline ◽

Boosted Regression Tree ◽

Regression Spline ◽

Adaptive Regression ◽

Fold Cross Validation ◽

Very High

The extreme form of land degradation through different forms of erosion is one of the major problems in sub-tropical monsoon dominated region. The formation and development of gullies is the dominant form or active process of erosion in this region. So, identification of erosion prone regions is necessary for escaping this type of situation and maintaining the correspondence between different spheres of the environment. The major goal of this study is to evaluate the gully erosion susceptibility in the rugged topography of the Hinglo River Basin of eastern India, which ultimately contributes to sustainable land management practices. Due to the nature of data instability, the weakness of the classifier andthe ability to handle data, the accuracy of a single method is not very high. Thus, in this study, a novel resampling algorithm was considered to increase the robustness of the classifier and its accuracy. Gully erosion susceptibility maps have been prepared using boosted regression trees (BRT), multivariate adaptive regression spline (MARS) and spatial logistic regression (SLR) with proposed resampling techniques. The re-sampling algorithm was able to increase the efficiency of all predicted models by improving the nature of the classifier. Each variable in the gully inventory map was randomly allocated with 5-fold cross validation, 10-fold cross validation, bootstrap and optimism bootstrap, while each consisted of 30% of the database. The ensemble model was tested using 70% and validated with the other 30% using the K-fold cross validation (CV) method to evaluate the influence of the random selection of training and validation database. Here, all resampling methods are associated with higher accuracy, but SLR bootstrap optimism is more optimal than any other methods according to its robust nature. The AUC values of BRT optimism bootstrap, MARS optimism bootstrap and SLR optimism bootstrap are 87.40%, 90.40% and 90.60%, respectively. According to the SLR optimism bootstrap, the 107,771 km2 (27.51%) area of this region is associated with a very high to high susceptible to gully erosion. This potential developmental area of the gully was found primarily in the Hinglo River Basin, where lateral exposure was mainly observed with scarce vegetation. The outcome of this work can help policy-makers to implement remedial measures to minimize the damage caused by erosion of the gully.

Download Full-text

Clinical risk factors and predictive tool of bacteremia in patients with cirrhosis

Journal of International Medical Research ◽

10.1177/0300060520919220 ◽

2020 ◽

Vol 48 (5) ◽

pp. 030006052091922

Author(s):

Qiao Yang ◽

Xian Zhong Jiang ◽

Yong Fen Zhu ◽

Fang Fang Lv

Keyword(s):

Risk Factors ◽

Risk Assessment ◽

Logistic Regression ◽

Cross Validation ◽

Bloodstream Infections ◽

Roc Curves ◽

Predictive Tool ◽

Reactive Protein ◽

Significant Difference ◽

Fold Cross Validation

Objective We aimed to analyze the risk factors and to establish a predictive tool for the occurrence of bloodstream infections (BSI) in patients with cirrhosis. Methods A total of 2888 patients with cirrhosis were retrospectively included. Multivariate analysis for risk factors of BSI were tested using logistic regression. Multivariate logistic regression was validated using five-fold cross-validation. Results Variables that were independently associated with incidence of BSI were white blood cell count (odds ratio [OR] = 1.094, 95% confidence interval [CI] 1.063–1.127)], C-reactive protein (OR = 1.005, 95% CI 1.002–1.008), total bilirubin (OR = 1.003, 95% CI 1.002–1.004), and previous antimicrobial exposure (OR = 4.556, 95% CI 3.369–6.160); albumin (OR = 0.904, 95% CI 0.883–0.926), platelet count (OR = 0.996, 95% CI 0.994–0.998), and serum creatinine (OR = 0.989, 95% CI 0.985–0.994) were associated with lower odds of BSI. The area under receiver operating characteristic (ROC) curve of the risk assessment scale was 0.850, and its sensitivity and specificity were 0.762 and 0.801, respectively. There was no significant difference between the ROC curves of cross-validation and risk assessment. Conclusions We developed a predictive tool for BSI in patients with cirrhosis, which could help with early identification of such episodes at admission, to improve outcome in these patients.

Download Full-text

Coordinate Transformation between Global and Local Datums Based on Artificial Neural Network with K-Fold Cross-Validation: A Case Study, Ghana

Earth Sciences Research Journal ◽

10.15446/esrj.v23n1.63860 ◽

2019 ◽

Vol 23 (1) ◽

pp. 67-77 ◽

Cited By ~ 3

Author(s):

Yao Yevenyo Ziggah ◽

Hu Youjian ◽

Alfonso Rodrigo Tierra ◽

Prosper Basommi Laari

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Coordinate Transformation ◽

Cross Validation ◽

Data Partitioning ◽

Transformation Model ◽

Data Partition ◽

Data Set ◽

Artificial Neural ◽

Fold Cross Validation

The popularity of Artificial Neural Network (ANN) methodology has been growing in a wide variety of areas in geodesy and geospatial sciences. Its ability to perform coordinate transformation between different datums has been well documented in literature. In the application of the ANN methods for the coordinate transformation, only the train-test (hold-out cross-validation) approach has usually been used to evaluate their performance. Here, the data set is divided into two disjoint subsets thus, training (model building) and testing (model validation) respectively. However, one major drawback in the hold-out cross-validation procedure is inappropriate data partitioning. Improper split of the data could lead to a high variance and bias in the results generated. Besides, in a sparse dataset situation, the hold-out cross-validation is not suitable. For these reasons, the K-fold cross-validation approach has been recommended. Consequently, this study, for the first time, explored the potential of using K-fold cross-validation method in the performance assessment of radial basis function neural network and Bursa-Wolf model under data-insufficient situation in Ghana geodetic reference network. The statistical analysis of the results revealed that incorrect data partition could lead to a false reportage on the predictive performance of the transformation model. The findings revealed that the RBFNN and Bursa-Wolf model produced a transformation accuracy of 0.229 m and 0.469 m, respectively. It was also realised that a maximum horizontal error of 0.881 m and 2.131 m was given by the RBFNN and Bursa-Wolf. The obtained results per the cadastral surveying and plan production requirement set by the Ghana Survey and Mapping Division are applicable. This study will contribute to the usage of K-fold cross-validation approach in developing countries having the same sparse dataset situation like Ghana as well as in the geodetic sciences where ANN users seldom apply the statistical resampling technique.

Download Full-text