An Extended Laplacian Score Algorithm for Unsupervised Feature Selection

Experts from various sectors, utilize data mining techniques to discover most useful information from the huge amount of data, to improve their quality of outcomes. The Presence of irrelevant and redundant features affects the accuracy of mining result. Before applying any mining technique, the data need to be preprocessed. Feature selection, a preprocessing step in data mining provides better mining performance. In this paper, we propose a new two step algorithm for unsupervised feature selection. In the first step Laplacian Score is used to select the important features. And in the second step, Symmetric Uncertainty is used to remove redundant features. The experimental results show that the proposed algorithm outperforms the Laplacian Score algorithm.

Download Full-text

The Spinning Quality Control Management Based on Decision Making by Data Mining Techniques

International Journal of Emerging Research in Management and Technology ◽

10.23956/ijermt.v7i1.25 ◽

2018 ◽

Vol 7 (1) ◽

pp. 72

Author(s):

Khalid AA Abakar ◽

Chongwen Yu

Keyword(s):

Data Mining ◽

Kernel Functions ◽

Support Vector ◽

Ann Model ◽

Data Mining Techniques ◽

Yarn Quality ◽

Yarn Properties ◽

Svm Model ◽

Rbf Kernel

This work demonstrated the possibility of using the data mining techniques such as artificial neural networks (ANN) and support vector machine (SVM) based model to predict the quality of the spinning yarn parameters. Three different kernel functions were used as SVM kernel functions which are Polynomial and Radial Basis Function (RBF) and Pearson VII Function-based Universal Kernel (PUK) and ANN model were used as data mining techniques to predict yarn properties. In this paper, it was found that the SVM model based on Person VII kernel function (PUK) have the same performance in prediction of spinning yarn quality in comparison with SVM based RBF kernel. The comparison with the ANN model showed that the two SVM models give a better prediction performance than an ANN model.

Download Full-text

A Comparative Study on Heart Disease Prediction Using Data Mining Techniques and Feature Selection

2021 2nd International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST) ◽

10.1109/icrest51555.2021.9331158 ◽

2021 ◽

Author(s):

Farzana Tasnim ◽

Sultana Umme Habiba

Keyword(s):

Data Mining ◽

Feature Selection ◽

Heart Disease ◽

Comparative Study ◽

Disease Prediction ◽

Data Mining Techniques ◽

Using Data

Download Full-text

CLASSIFICATION OF HIGH-DIMENSIONAL MICROARRAY DATA WITH A TWO-STEP PROCEDURE VIA A WILCOXON CRITERION AND MULTILAYER PERCEPTRON

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026811002969 ◽

2011 ◽

Vol 10 (01) ◽

pp. 1-14

Author(s):

VLADIMIR NIKULIN ◽

TIAN-HSIANG HUANG ◽

GEOFFREY J. MCLACHLAN

Keyword(s):

Data Mining ◽

Feature Selection ◽

High Dimensional ◽

Second Step ◽

Support Vector ◽

Step Procedure ◽

Leave One Out ◽

Natural Combination ◽

Feature Selection Techniques

The method presented in this paper is novel as a natural combination of two mutually dependent steps. Feature selection is a key element (first step) in our classification system, which was employed during the 2010 International RSCTC data mining (bioinformatics) Challenge. The second step may be implemented using any suitable classifier such as linear regression, support vector machine or neural networks. We conducted leave-one-out (LOO) experiments with several feature selection techniques and classifiers. Based on the LOO evaluations, we decided to use feature selection with the separation type Wilcoxon-based criterion for all final submissions. The method presented in this paper was tested successfully during the RSCTC data mining Challenge, where we achieved the top score in the Basic track.

Download Full-text

Detection of financial statement fraud and feature selection using data mining techniques

Decision Support Systems ◽

10.1016/j.dss.2010.11.006 ◽

2011 ◽

Vol 50 (2) ◽

pp. 491-500 ◽

Cited By ~ 174

Author(s):

P. Ravisankar ◽

V. Ravi ◽

G. Raghava Rao ◽

I. Bose

Keyword(s):

Data Mining ◽

Feature Selection ◽

Financial Statement ◽

Financial Statement Fraud ◽

Data Mining Techniques ◽

Using Data

Download Full-text

Fast Backward Iterative Laplacian Score for Unsupervised Feature Selection

Knowledge Science, Engineering and Management - Lecture Notes in Computer Science ◽

10.1007/978-3-030-55130-8_36 ◽

2020 ◽

pp. 409-420

Author(s):

Qing-Qing Pang ◽

Li Zhang

Keyword(s):

Feature Selection ◽

Unsupervised Feature Selection ◽

Laplacian Score

Download Full-text

Impact of Data Mining Technique in Education Institutions

International Journal of New Practices in Management and Engineering ◽

10.17762/ijnpme.v4i02.35 ◽

2015 ◽

Vol 4 (02) ◽

pp. 01-07

Author(s):

Mr. Bhushan Bandre, Ms. Rashmi Khalatkar

Keyword(s):

Data Mining ◽

Educational Institution ◽

Educational Institutions ◽

Data Mining Technique ◽

Admission Process ◽

Data Mining Techniques ◽

Mining Technique ◽

Major Parameter ◽

Using Data ◽

Major Decision

Major decision making process using large amount of data can be done by various techniques using data mining. In education sectors various data mining techniques are implemented to analyze the student’s data from the admission process itself. Due to large number of educational institution in India, excellence becomes a major parameter for the institutions to grow and with stand. Nowadays education institutions use data mining techniques to show their excellence. The main objective of this work to present an analysis of individual semester wise results of engineering college students using different techniques of data mining. Here we used different classification algorithms like decision tree, rule based, function based and Bayesian algorithms to analyze the semester results and comparison is made by considering parameters like accuracy and error rate. Our output shows the most suited algorithm for analyzing data in educational institutions.

Download Full-text

ANGEL Mining

Higher Education Institutions and Learning Management Systems ◽

10.4018/978-1-60960-884-2.ch005 ◽

2012 ◽

pp. 94-115 ◽

Cited By ~ 2

Author(s):

Tyler Swanger ◽

Kaitlyn Whitlock ◽

Anthony Scime ◽

Brendan P. Post

Keyword(s):

Data Mining ◽

Decision Making ◽

Feature Selection ◽

Decision Tree ◽

Management System ◽

Association Mining ◽

Data Mining Techniques ◽

Usage Patterns ◽

Design Selection ◽

Comprehensive College

This chapter data mines the usage patterns of the ANGEL Learning Management System (LMS) at a comprehensive college. The data includes counts of all the features ANGEL offers its users for the Fall and Spring semesters of the academic years beginning in 2007 and 2008. Data mining techniques are applied to evaluate which LMS features are used most commonly and most effectively by instructors and students. Classification produces a decision tree which predicts the courses that will use the ANGEL system based on course specific attributes. The dataset undergoes association mining to discover the usage of one feature’s effect on the usage of another set of features. Finally, clustering the data identifies messages and files as the features most commonly used. These results can be used by this institution, as well as similar institutions, for decision making concerning feature selection and overall usefulness of LMS design, selection and implementation.

Download Full-text

Effective Intelligent Data Mining Using Dempster-Shafer Theory

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch188 ◽

2008 ◽

pp. 2943-2963

Author(s):

Malcolm J. Beynon

Keyword(s):

Data Mining ◽

Missing Values ◽

Uncertain Reasoning ◽

Worst Case ◽

Data Mining Techniques ◽

Imperfect Data ◽

Dempster Shafer Theory ◽

External Management ◽

Shafer Theory

The efficacy of data mining lies in its ability to identify relationships amongst data. This chapter investigates that constraining this efficacy is the quality of the data analysed, including whether the data is imprecise or in the worst case incomplete. Through the description of Dempster-Shafer theory (DST), a general methodology based on uncertain reasoning, it argues that traditional data mining techniques are not structured to handle such imperfect data, instead requiring the external management of missing values, and so forth. One DST based technique is classification and ranking belief simplex (CaRBS), which allows intelligent data mining through the acceptance of missing values in the data analysed, considering them a factor of ignorance, and not requiring their external management. Results presented here, using CaRBS and a number of simplex plots, show the effect of managing and not managing of imperfect data.

Download Full-text

Analysis of flight delays in aviation system using different classification algorithms and feature selection methods

The Aeronautical Journal ◽

10.1017/aer.2019.72 ◽

2019 ◽

Vol 123 (1267) ◽

pp. 1415-1436 ◽

Cited By ~ 1

Author(s):

A. B. A. Anderson ◽

A. J. Sanjeev Kumar ◽

A. B. Arockia Christopher

Keyword(s):

Data Mining ◽

Feature Selection ◽

Classification Model ◽

System Level ◽

Support Vector ◽

Flight Delays ◽

Data Mining Techniques ◽

Mining Methods ◽

Artificial Neural Network Ann ◽

Aircraft System

ABSTRACTData mining is a process of finding correlations and collecting and analysing a huge amount of data in a database to discover patterns or relationships. Flight delay creates significant problems in the present aviation system. Data mining techniques are desired for analysing the performance in which micro-level causes propagate to make system-level patterns of delay. Analysing flight delays is very difficult – both when looking from a historical view as well as when estimating delays with forecast demand. This paper proposes using Decision Tree (DT), Support Vector Machine (SVM), Naive Bayesian (NB), K-nearest neighbour (KNN) and Artificial Neural Network (ANN) to study and analyse delays among aircrafts. The performance of different data mining methods is found in the different regions of the updated datasets on these classifiers. Finally, the result shows a significant variation in the performance of different data mining methods and feature selection for this problem. This paper aims to deal with how data mining techniques can be used to understand difficult aircraft system delays in aviation. Our aim is to develop a classification model for studying and reducing delay using different data mining methods and, in this manner, to show that DT has a greater classification accuracy. The different feature selectors are used in this study in order to reduce the number of initial attributes. Our results clearly demonstrate the value of DT for analysing and visualising how system-level effects happen from subsystem-level causes.

Download Full-text

Detection of Automobile Insurance Fraud Using Feature Selection and Data Mining Techniques

International Journal of Rough Sets and Data Analysis ◽

10.4018/ijrsda.2018070101 ◽

2018 ◽

Vol 5 (3) ◽

pp. 1-20 ◽

Cited By ~ 2

Author(s):

Sharmila Subudhi ◽

Suvasini Panigrahi

Keyword(s):

Data Mining ◽

Feature Selection ◽

Feature Selection Method ◽

Automobile Insurance ◽

Test Set ◽

Data Mining Techniques ◽

Weighted Extreme Learning Machine ◽

Original Dataset ◽

Novel Approach ◽

Learning Machine

This article presents a novel approach for fraud detection in automobile insurance claims by applying various data mining techniques. Initially, the most relevant attributes are chosen from the original dataset by using an evolutionary algorithm based feature selection method. A test set is then extracted from the selected attribute set and the remaining dataset is subjected to the Possibilistic Fuzzy C-Means (PFCM) clustering technique for the undersampling approach. The 10-fold cross validation method is then used on the balanced dataset for training and validating a group of Weighted Extreme Learning Machine (WELM) classifiers generated from various combinations of WELM parameters. Finally, the test set is applied on the best performing model for classification purpose. The efficacy of the proposed system is illustrated by conducting several experiments on a real-world automobile insurance defraud dataset. Besides, a comparative analysis with another approach justifies the superiority of the proposed system.

Download Full-text