A Machine Learning View for Health Data Mining Emphasizes on the Decision Trees

Author(s):  
Nahida Akhter Shemu ◽  
Md. Zakaria Hossain ◽  
Sabbir M. Saleh ◽  
Khondoker Ali Asgor Pavel
2017 ◽  
Vol 27 (09n10) ◽  
pp. 1579-1589 ◽  
Author(s):  
Reinier Morejón ◽  
Marx Viana ◽  
Carlos Lucena

Data mining is a hot topic that attracts researchers of different areas, such as database, machine learning, and agent-oriented software engineering. As a consequence of the growth of data volume, there is an increasing need to obtain knowledge from these large datasets that are very difficult to handle and process with traditional methods. Software agents can play a significant role performing data mining processes in ways that are more efficient. For instance, they can work to perform selection, extraction, preprocessing, and integration of data as well as parallel, distributed, or multisource mining. This paper proposes a framework based on multiagent systems to apply data mining techniques to health datasets. Last but not least, the usage scenarios that we use are datasets for hypothyroidism and diabetes and we run two different mining processes in parallel in each database.


Crystals ◽  
2021 ◽  
Vol 11 (10) ◽  
pp. 1218
Author(s):  
Natasha Dropka ◽  
Klaus Böttcher ◽  
Martin Holena

The aim of this study was to assess the ability of the various data mining and supervised machine learning techniques: correlation analysis, k-means clustering, principal component analysis and decision trees (regression and classification), to derive, optimize and understand the factors influencing VGF-GaAs growth. Training data were generated by Computational Fluid Dynamics (CFD) simulations and consisted of 130 datasets with 6 inputs (growth rate and power of 5 heaters) and 5 outputs (interface position and deflection, and temperatures at various positions in GaAs). Data mining results confirmed a good dispersion of the training data without the feasibility of a dimensionality reduction. Data clustering was observed in relation to the position of the crystallization front relative to the side heaters. Based on the statistical performance criteria and training results, decision trees identified the most decisive inputs and their ranges for a favorable interface shape and to keep GaAs temperature beyond limits for heavy arsenic evaporation. Decision trees are a recommendable machine learning technique with short training times and acceptable predictive accuracy based on small volume of CFD training data, capable of providing guidelines for understanding the crystal growth process, which is a prerequisite for the growth of low-cost, high-quality bulk crystals.


Author(s):  
Hyontai Sug

For the classification task of machine learning algorithms independency between conditional attributes is a precondition for success of data mining. On the other hand, decision trees are one of the mostly used machine learning algorithms because of their good understandability. So, because dependency between conditional attributes can cause more complex trees, supplying conditional attributes independent each other is very important, the requirement of conditional attributes for decision trees as well as other machine learning algorithms is that they are independent each other and dependent on decisional attributes only. Statistical method to check independence between attributes is Chi-square test, but the test can be effective for categorical attributes only. So, the applicability of Chi-square test is limited, because most datasets for data mining have mixed attributes of categorical and numerical. In order to overcome the problem, and as a way to test dependency between conditional attributes, a novel method based on functional dependency based on data that can be applied to any datasets irrespective of data type of attributes is suggested. After removing highly dependent attributes between conditional attributes, we can generate better decision trees. Experiments were performed to show that the method is effective, and the experiments showed very good results.


2021 ◽  
Vol 2021 ◽  
pp. 1-5
Author(s):  
Pengyuan Wang ◽  
Jie Li

This article analyzes the application process of data mining technology in the medical and health management system and uses machine learning algorithms to design a medical and health data mining system. The system collects patient’s physical health data based on wireless sensing technology and uses machine learning algorithms to analyze the data. The system uploads the collected health data to the system for cluster analysis. Finally, the method is applied to the diagnosis data mining of patients, so as to prove the effectiveness of the classification method in the medical field through examples.


Author(s):  
Alven Safik Ritonga ◽  
Isnaini Muhandhis

Peningkatan kunjungan wisatawan ke suatu destinasi wisata, dipengaruhi oleh kepuasan wisatawan waktu berkunjung. Untuk mengetahui suatu destinasi pariwisata sudah sesuai dengan yang diharapkan wisatawan, perlu dilakukan evaluasi terhadap kepuasan wisatawan. Tujuan penelitian ini adalah mendapatkan model klasifikasi yang mempunyai akurasi tinggi dalam melakukan klasifikasi ulasan kepuasan destinasi wisata dan menghasilkan alat bantu untuk pengambilan keputusan dalam pengembagan destinasi wisata. Data yang dipakai pada penelitian ini dimensinya cukup besar, hal ini nantinya membuat waktu komputasi untuk pengklasifikasian makin lama, membuat analisis tidak praktis atau tidak layak, maka reduksi dimensi data diterapkan pada penelitian ini untuk mendapatkan dimensi data yang jauh lebih kecil, namun tetap mempertahankan integritas data asli. Metode yang digunakan untuk pengklasifikasian ulasan kepuasan destinasi wisata adalah kombinasi antara metode Principal Component Analysis (PCA) sebagai metode reduksi dimensi data, dengan tiga metode data mining berikut ini; Support Vector Machine (SVM), Jaringan Saraf Tiruan (JST), dan Decision Trees. Penelitian ini menggunakan data kedua yang diambil dari UCI Machine Learning Repository. Hasil penelitian dengan mengkombinasikan PCA pada ketiga metode memperlihatkan bahwa akurasi klasifikasi lebih baik untuk beberapa metode. Dari ketiga metode yang dipakai, SVM-PCA mempunyai akurasi yang lebih baik dengan 91,50% disusul oleh metode ANN-PCA sebesar 89,46% dan metode Decision-PCA sebesar 88,78%.             


2020 ◽  
Author(s):  
Mohammed J. Zaki ◽  
Wagner Meira, Jr
Keyword(s):  

2019 ◽  
Vol 12 (3) ◽  
pp. 171-179 ◽  
Author(s):  
Sachin Gupta ◽  
Anurag Saxena

Background: The increased variability in production or procurement with respect to less increase of variability in demand or sales is considered as bullwhip effect. Bullwhip effect is considered as an encumbrance in optimization of supply chain as it causes inadequacy in the supply chain. Various operations and supply chain management consultants, managers and researchers are doing a rigorous study to find the causes behind the dynamic nature of the supply chain management and have listed shorter product life cycle, change in technology, change in consumer preference and era of globalization, to name a few. Most of the literature that explored bullwhip effect is found to be based on simulations and mathematical models. Exploring bullwhip effect using machine learning is the novel approach of the present study. Methods: Present study explores the operational and financial variables affecting the bullwhip effect on the basis of secondary data. Data mining and machine learning techniques are used to explore the variables affecting bullwhip effect in Indian sectors. Rapid Miner tool has been used for data mining and 10-fold cross validation has been performed. Weka Alternating Decision Tree (w-ADT) has been built for decision makers to mitigate bullwhip effect after the classification. Results: Out of the 19 selected variables affecting bullwhip effect 7 variables have been selected which have highest accuracy level with minimum deviation. Conclusion: Classification technique using machine learning provides an effective tool and techniques to explore bullwhip effect in supply chain management.


Sign in / Sign up

Export Citation Format

Share Document