scholarly journals Decision Tree-Based Data Mining and Rule Induction for Identifying High Quality Groundwater Zones to Water Supply Management: a Novel Hybrid Use of Data Mining and GIS

2019 ◽  
Vol 34 (1) ◽  
pp. 139-154 ◽  
Author(s):  
Mehrdad Jeihouni ◽  
Ara Toomanian ◽  
Ali Mansourian

AbstractGroundwater is an important source to supply drinking water demands in both arid and semi-arid regions. Nevertheless, locating high quality drinking water is a major challenge in such areas. Against this background, this study proceeds to utilize and compare five decision tree-based data mining algorithms including Ordinary Decision Tree (ODT), Random Forest (RF), Random Tree (RT), Chi-square Automatic Interaction Detector (CHAID), and Iterative Dichotomiser 3 (ID3) for rule induction in order to identify high quality groundwater zones for drinking purposes. The proposed methodology works by initially extracting key relevant variables affecting water quality (electrical conductivity, pH, hardness and chloride) out of a total of eight existing parameters, and using them as inputs for the rule induction process. The algorithms were evaluated with reference to both continuous and discrete datasets. The findings were speculative of the superiority, performance-wise, of rule induction using the continuous dataset as opposed to the discrete dataset. Based on validation results, in continuous dataset, RF and ODT showed higher and RT showed acceptable performance. The groundwater quality maps were generated by combining the effective parameters distribution maps using inducted rules from RF, ODT, and RT, in GIS environment. A quick glance at the generated maps reveals a drop in the quality of groundwater from south to north as well as from east to west in the study area. The RF showed the highest performance (accuracy of 97.10%) among its counterparts; and so the generated map based on rules inducted from RF is more reliable. The RF and ODT methods are more suitable in the case of continuous dataset and can be applied for rule induction to determine water quality with higher accuracy compared to other tested algorithms.

Author(s):  
Moloud Abdar ◽  
Sharareh R. Niakan Kalhori ◽  
Tole Sutikno ◽  
Imam Much Ibnu Subroto ◽  
Goli Arji

Heart diseases are among the nation’s leading couse of mortality and moribidity. Data mining teqniques can predict the likelihood of patients getting a heart disease. The purpose of this study is comparison of different data mining algorithm on prediction of heart diseases. This work applied and compared data mining techniques to predict the risk of heart diseases. After feature analysis, models by five algorithms including decision tree (C5.0), neural network, support vector machine (SVM), logistic regression and k-nearest neighborhood (KNN) were developed and validated. C5.0 Decision tree has been able to build a model with greatest accuracy 93.02%, KNN, SVM, Neural network have been 88.37%, 86.05% and 80.23% respectively. Produced results of decision tree can be simply interpretable and applicable; their rules can be understood easily by different clinical practitioner.


2020 ◽  
Vol 99 (6) ◽  
pp. 563-568
Author(s):  
Yuliya A. Novikova ◽  
K. B. Friedman ◽  
V. N. Fedorov ◽  
A. A. Kovshov ◽  
N. A. Tikhonova ◽  
...  

Introduction. Regulation of drinking water quality is a very important area of health care and improving the quality of life of the population of the Russian Federation.The aim of this work is the development a model for the assessment of the drinking water quality and calculating the share of the population, including urban, provided with high-quality drinking water from centralized water supply systems, taking into account new methodological approaches to the evaluation of the quality of drinking water using the example of water supply to settlements in the Leningrad Region. Material and methods. The data on the organization of centralized cold water supply systems and monitoring systems for drinking water quality and the results of laboratory studies of drinking water quality in the cities of Volkhov, Svetogorsk, Slantsy, Tosno were studied. Statistical processing of the results was performed, the categories of quality of drinking water supplied to the population were determined, the number of the population provided with high-quality drinking water from the water supply system was calculated in accordance with Guidelines 2.1.4.0143-19.Results. In 2018, 100% of the population was provided with quality drinking water only in the city of Slantsy. In the city of Tosno, this index reached of 83.5%. In the cities of Volkhov and Svetogorsk, drinking water was rated as low-quality. But it is worth noting that in the cities of Volkhov and Slantsy laboratory tests were carried out at 2 points, in the city of Svetogorsk - only at the 1 point, which, given the number of residents, is not enough. For an objective assessment of the state of drinking water and the development of measures aimed at improving its quality, it is necessary to increase the number of monitoring points, as well as to include the results of control and supervision measures and production laboratory control conducted by water supply organizations in the volume of laboratory information.Conclusion. The proposed model allows us to assess the drinking water quality in centralized water supply systems and the proportion of the population, including urban, provided with quality drinking water at the level of the water supply system, settlement, municipal district (urban district), subject of the Russian Federation


Author(s):  
Diego Liberati

Four main general purpose approaches inferring knowledge from data are presented as a useful pool of at least partially complementary techniques also in the cyber intrusion identification context. In order to reduce the dimensionality of the problem, the most salient variables can be selected by cascading to a K-means a Divisive Partitioning of data orthogonal to the Principal Directions. A rule induction method based on logical circuits synthesis after proper binarization of the original variables proves to be also able to further prune redundant variables, besides identifying logical relationships among them in an understandable “if . then ..” form. Adaptive Bayesian networks are used to build a decision tree over the hierarchy of variables ordered by Minimum Description Length. Finally, Piece-Wise Affine Identification also provides a model of the dynamics of the process underlying the data, by detecting possible switches and changes of trends on the time course of the monitoring.


Author(s):  
Rifky Lana Rahardian ◽  
Made Sudarma

Data Mining is the term used to describe the process extract value / information from the database. Four things are needed in order to effectively data mining: data that has a high quality, right of data, examples of which are adequate, and the correct device. To obtain valuable information in the required data mining algorithms applied in data mining in large databases. There are a lot of complex algorithms in data mining. One is the so-called Neural networks have an important role in data mining.


Author(s):  
Geert Wets ◽  
Koen Vanhoof ◽  
Theo Arentze ◽  
Harry Timmermans

The utility-maximizing framework—in particular, the logit model—is the dominantly used framework in transportation demand modeling. Computational process modeling has been introduced as an alternative approach to deal with the complexity of activity-based models of travel demand. Current rule-based systems, however, lack a methodology to derive rules from data. The relevance and performance of data-mining algorithms that potentially can provide the required methodology are explored. In particular, the C4 algorithm is applied to derive a decision tree for transport mode choice in the context of activity scheduling from a large activity diary data set. The algorithm is compared with both an alternative method of inducing decision trees (CHAID) and a logit model on the basis of goodness-of-fit on the same data set. The ratio of correctly predicted cases of a holdout sample is almost identical for the three methods. This suggests that for data sets of comparable complexity, the accuracy of predictions does not provide grounds for either rejecting or choosing the C4 method. However, the method may have advantages related to robustness. Future research is required to determine the ability of decision tree-based models in predicting behavioral change.


2014 ◽  
Vol 955-959 ◽  
pp. 3088-3092
Author(s):  
Yu Bo Yang ◽  
Zhong Jun Deng

In order to research the groundwater quality of the Taikang Formation in Tertiary in west of Daqing Oilfield, resistivity logging data of 140 wells in the study area were used to analyze the relations between the groundwater quality and sedimentary sand body and buried depth. The south area of Hongweixing well area and the east of Xishuiyuan well area, including Ranghulu, Qianjincun, Dulitun and Nanshuiyuan well area are favorable for high quality groundwater. The shallower buried water quality is better than the deeper buried water quality of the Taikang Formation in Tertiary. The research provides the evidences to evaluate the groundwater quality in west Daqing Oilfield and determine specific well location, improving the efficiency of exploring the underground drinking water.


2019 ◽  
Vol 7 (3) ◽  
pp. 202
Author(s):  
Muhammad Sony Maulana ◽  
Raja Sabarudin ◽  
Wahyu Nugraha

AMIK BSI Pontianak merupakan salah satu perguruan tinggi swasta yang memiliki jumlah mahasiswa yang banyak, namun dalam perjalanannya masih terdapat permasalahan yang setiap tahun nya terjadi yaitu permasalahan jumlah kelulusan mahasiswa yang tepat waktu dan terlambat. Jumlah mahasiswa yang lulus tepat waktu menjadi indikator efektifitas dari sebuah perguruan tinggi baik negeri dan swasta. Perguruan tinggi perlu mendeteksi perilaku  dari mahasiswa aktif sehingga dapat dilihat faktor yang menyebabkan mahasiswa tidak lulus tepat waktu. Pada penelitian ini, akan mengkomparasikan atau membandingkan 5 metode data mining untuk menentukan metode mana yang paling optimal dalam menentukan ketepatan kelulusan mahasiswa dengan teknik pengujian T-Test, metode yang dibandingkan adalah metode Decision Tree, Naive Bayes, K-NN, Rule Induction, dan Random Forest. Hasil dari penelitian ini menghasilkan bahwa algoritma Rule Induction dan C4.5 adalah metode yang paling optimal performanya dalam menentukan ketepatan kelulusan mahasiswa diploma AMIK BSI Pontianak


2019 ◽  
Vol 8 (4) ◽  
pp. 4558-4562

In existing systems, it happens that sometimes the data is not accurate and proper data mining techniques not being used and this increases the complexity.We as humans are bound to make mistakes while predicting weather conditions which might result in damage to both life and property. To avoid this, we use data mining algorithms for early warning of climatic conditions such as like maximum temperature, minimum temperature wind speed, rainfall, humidity, pressure, dew point, cloud, sunshine and wind direction from data to predict rainfall. But by using proper algorithms for datasets and using the right metrics, we can achieve the accurate results in prediction of rainfall. Hence, we apply the Decision tree algorithm using Gini Index in order to predict the precipitation with accuracy and it is completely based on the historical data.


2008 ◽  
pp. 2281-2288
Author(s):  
Diego Liberati

Four main general purpose approaches inferring knowledge from data are presented as a useful pool of at least partially complementary techniques also in the cyber intrusion identification context. In order to reduce the dimensionality of the problem, the most salient variables can be selected by cascading to a K-means a Divisive Partitioning of data orthogonal to the Principal Directions. A rule induction method based on logical circuits synthesis after proper binarization of the original variables proves to be also able to further prune redundant variables, besides identifying logical relationships among them in an understandable “if . then ..” form. Adaptive Bayesian networks are used to build a decision tree over the hierarchy of variables ordered by Minimum Description Length. Finally, Piece-Wise Affine Identification also provides a model of the dynamics of the process underlying the data, by detecting possible switches and changes of trends on the time course of the monitoring.


Sign in / Sign up

Export Citation Format

Share Document