Context-Sensitive Attribute Evaluation

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch052 ◽

2011 ◽

pp. 328-332

Author(s):

Marko Robnik-Šikonja

Keyword(s):

Data Mining ◽

Intelligent Systems ◽

Regression Tree ◽

Feature Weighting ◽

Feature Subset Selection ◽

Context Sensitive ◽

Relief Algorithm ◽

Sensitive Attribute ◽

Attribute Evaluation ◽

Tree Building

The research in machine learning, data mining, and statistics has provided a number of methods that estimate the usefulness of an attribute (feature) for prediction of the target variable. The estimates of attributes’ utility are subsequently used in various important tasks, e.g., feature subset selection, feature weighting, feature ranking, feature construction, data transformation, decision and regression tree building, data discretization, visualization, and comprehension. These tasks frequently occur in data mining, robotics, and in the construction of intelligent systems in general. A majority of attribute evaluation measures used are myopic in a sense that they estimate the quality of one feature independently of the context of other features. In problems which possibly involve much feature interactions these measures are not appropriate. The measures which are historically based on the Relief algorithm (Kira & Rendell, 1992) take context into account through distance between the instances and are efficient in problems with strong dependencies between attributes.

Download Full-text

Spatial Modelling of Gully Erosion Using GIS and R Programing: A Comparison among Three Data Mining Algorithms

Applied Sciences ◽

10.3390/app8081369 ◽

2018 ◽

Vol 8 (8) ◽

pp. 1369 ◽

Cited By ~ 52

Author(s):

Alireza Arabameri ◽

Biswajeet Pradhan ◽

Hamid Reza Pourghasemi ◽

Khalil Rezaei ◽

Norman Kerle

Keyword(s):

Data Mining ◽

Spatial Relationship ◽

Area Under The Curve ◽

Regression Tree ◽

Drainage Density ◽

Gully Erosion ◽

Slope Aspect ◽

Topographic Wetness Index ◽

Boosted Regression Tree ◽

Area Index

Gully erosion triggers land degradation and restricts the use of land. This study assesses the spatial relationship between gully erosion (GE) and geo-environmental variables (GEVs) using Weights-of-Evidence (WoE) Bayes theory, and then applies three data mining methods—Random Forest (RF), boosted regression tree (BRT), and multivariate adaptive regression spline (MARS)—for gully erosion susceptibility mapping (GESM) in the Shahroud watershed, Iran. Gully locations were identified by extensive field surveys, and a total of 172 GE locations were mapped. Twelve gully-related GEVs: Elevation, slope degree, slope aspect, plan curvature, convergence index, topographic wetness index (TWI), lithology, land use/land cover (LU/LC), distance from rivers, distance from roads, drainage density, and NDVI were selected to model GE. The results of variables importance by RF and BRT models indicated that distance from road, elevation, and lithology had the highest effect on GE occurrence. The area under the curve (AUC) and seed cell area index (SCAI) methods were used to validate the three GE maps. The results showed that AUC for the three models varies from 0.911 to 0.927, whereas the RF model had a prediction accuracy of 0.927 as per SCAI values, when compared to the other models. The findings will be of help for planning and developing the studied region.

Download Full-text

Cascading GA & CFS for Feature Subset selection in Medical Data Mining

2009 IEEE International Advance Computing Conference ◽

10.1109/iadcc.2009.4809226 ◽

2009 ◽

Cited By ~ 11

Author(s):

Asha Gowda Karegowda ◽

M.A. Jayaram

Keyword(s):

Data Mining ◽

Subset Selection ◽

Medical Data ◽

Feature Subset Selection ◽

Feature Subset ◽

Medical Data Mining

Download Full-text

Groundwater Augmentation through the Site Selection of Floodwater Spreading Using a Data Mining Approach (Case study: Mashhad Plain, Iran)

Water ◽

10.3390/w10101405 ◽

2018 ◽

Vol 10 (10) ◽

pp. 1405 ◽

Cited By ~ 12

Author(s):

Seyed Naghibi ◽

Mehdi Vafakhah ◽

Hossein Hashemi ◽

Biswajeet Pradhan ◽

Seyed Alavi

Keyword(s):

Data Mining ◽

Regression Tree ◽

Roc Curves ◽

Flood Damage ◽

Boosted Regression Trees ◽

Classification And Regression Tree ◽

Operating Characteristics ◽

National Guidelines ◽

Conditioning Factors ◽

Suitability Maps

It is a well-known fact that sustainable development goals are difficult to achieve without a proper water resources management strategy. This study tries to implement some state-of-the-art statistical and data mining models i.e., weights-of-evidence (WoE), boosted regression trees (BRT), and classification and regression tree (CART) to identify suitable areas for artificial recharge through floodwater spreading (FWS). At first, suitable areas for the FWS project were identified in a basin in north-eastern Iran based on the national guidelines and a literature survey. Using the same methodology, an identical number of FWS unsuitable areas were also determined. Afterward, a set of different FWS conditioning factors were selected for modeling FWS suitability. The models were applied using 70% of the suitable and unsuitable locations and validated with the rest of the input data (i.e., 30%). Finally, a receiver operating characteristics (ROC) curve was plotted to compare the produced FWS suitability maps. The findings depicted acceptable performance of the BRT, CART, and WoE for FWS suitability mapping with an area under the ROC curves of 92, 87.5, and 81.6%, respectively. Among the considered variables, transmissivity, distance from rivers, aquifer thickness, and electrical conductivity were determined as the most important contributors in the modeling. FWS suitability maps produced by the proposed method in this study could be used as a guideline for water resource managers to control flood damage and obtain new sources of groundwater. This methodology could be easily replicated to produce FWS suitability maps in other regions with similar hydrogeological conditions.

Download Full-text

Prediction of Body Weight of Turkish Tazi Dogs using Data Mining Techniques: Classification and Regression Tree (CART) and Multivariate Adaptive Regression Splines (MARS)

Pakistan Journal of Zoology ◽

10.17582/journal.pjz/2018.50.2.575.583 ◽

2018 ◽

Vol 50 (2) ◽

Cited By ~ 4

Author(s):

Senol Celik ◽

Orhan Yilmaz

Keyword(s):

Data Mining ◽

Body Weight ◽

Regression Tree ◽

Multivariate Adaptive Regression Splines ◽

Classification And Regression Tree ◽

Regression Splines ◽

Adaptive Regression ◽

Classification And Regression ◽

Using Data ◽

Adaptive Regression Splines

Download Full-text

Failure Analysis in University and Computer Science Contexts With Data Mining

10.5753/wei.2020.11132 ◽

2020 ◽

Author(s):

Daniela De Souza Gomes ◽

Marcos Henrique Fonseca Ribeiro ◽

Giovanni Ventorim Comarela ◽

Gabriel Philippe Pereira

Keyword(s):

Data Mining ◽

Decision Making ◽

Failure Analysis ◽

Computer Science ◽

Educational Administration ◽

Intelligent Systems ◽

Data Set ◽

Data Mining Techniques ◽

Study Case ◽

Support Students

High failure rates are a worrying and relevant problem in Brazilian universities. From a data set of student transcripts, we performed a study case for both general and Computer Science contexts, in which Data Mining Techniques were used to find patterns concerning failures. The knowledge acquired can be used for better educational administration and also build intelligent systems to support students’ decision making.

Download Full-text

A Fuzzy Logic based Privacy Preservation Clustering method for achieving K- Anonymity using EMD in dLink Model

JOURNAL OF ADVANCES IN CHEMISTRY ◽

10.24297/jac.v12i12.4824 ◽

2016 ◽

Vol 12 (12) ◽

pp. 4601-4610 ◽

Cited By ~ 1

Author(s):

D. Palanikkumar ◽

S. Priya ◽

S. Priya

Keyword(s):

Data Mining ◽

Privacy Preservation ◽

Numerical Data ◽

Fuzzy Membership Function ◽

Data Mining Technique ◽

Sensitive Data ◽

Sensitive Attribute ◽

Modification Technique ◽

Earth Mover ◽

Preservation Technique

Privacy preservation is the data mining technique which is to be applied on the databases without violating the privacy of individuals. The sensitive attribute can be selected from the numerical data and it can be modified by any data modification technique. After modification, the modified data can be released to any agency. If they can apply data mining techniques such as clustering, classification etc for data analysis, the modified data does not affect the result. In privacy preservation technique, the sensitive data is converted into modified data using S-shaped fuzzy membership function. K-means clustering is applied for both original and modified data to get the clusters. t-closeness requires that the distribution of sensitive attribute in any equivalence class is close to the distribution of the attribute in the overall table. Earth Mover Distance (EMD) is used to measure the distance between the two distributions should be no more than a threshold t. Hence privacy is preserved and accuracy of the data is maintained.

Download Full-text

A Novel Design of Classification of Coronary Artery Disease Using Deep Learning and Data Mining Algorithms

Revue d intelligence artificielle ◽

10.18280/ria.350304 ◽

2021 ◽

Vol 35 (3) ◽

pp. 209-215

Author(s):

Pratibha Verma ◽

Vineet Kumar Awasthi ◽

Sanat Kumar Sahu

Keyword(s):

Neural Network ◽

Data Mining ◽

Deep Learning ◽

Regression Tree ◽

Principal Component ◽

Classification And Regression Tree ◽

Support Vector ◽

Data Mining Algorithms ◽

R Programming ◽

Hidden Layer

Data mining techniques are included with Ensemble learning and deep learning for the classification. The methods used for classification are, Single C5.0 Tree (C5.0), Classification and Regression Tree (CART), kernel-based Support Vector Machine (SVM) with linear kernel, ensemble (CART, SVM, C5.0), Neural Network-based Fit single-hidden-layer neural network (NN), Neural Networks with Principal Component Analysis (PCA-NN), deep learning-based H2OBinomialModel-Deeplearning (HBM-DNN) and Enhanced H2OBinomialModel-Deeplearning (EHBM-DNN). In this study, experiments were conducted on pre-processed datasets using R programming and 10-fold cross-validation technique. The findings show that the ensemble model (CART, SVM and C5.0) and EHBM-DNN are more accurate for classification, compared with other methods.

Download Full-text

AN INFORMATION-THEORETIC FILTER METHOD FOR FEATURE WEIGHTING IN NAIVE BAYES

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001414510070 ◽

2014 ◽

Vol 28 (05) ◽

pp. 1451007 ◽

Cited By ~ 2

Author(s):

CHANG-HWAN LEE

Keyword(s):

Data Mining ◽

Bayesian Learning ◽

State Of The Art ◽

Feature Weighting ◽

New Method ◽

Filter Method ◽

Information Theoretic ◽

Naive Bayesian ◽

Naïve Bayesian ◽

Unrealistic Assumption

In spite of its simplicity, naive Bayesian learning has been widely used in many data mining applications. However, the unrealistic assumption that all features are equally important negatively impacts the performance of naive Bayesian learning. In this paper, we propose a new method that uses a Kullback–Leibler measure to calculate the weights of the features analyzed in naive Bayesian learning. Its performance is compared to that of other state-of-the-art methods over a number of datasets.

Download Full-text