scholarly journals An Extended Laplacian Score Algorithm for Unsupervised Feature Selection

Experts from various sectors, utilize data mining techniques to discover most useful information from the huge amount of data, to improve their quality of outcomes. The Presence of irrelevant and redundant features affects the accuracy of mining result. Before applying any mining technique, the data need to be preprocessed. Feature selection, a preprocessing step in data mining provides better mining performance. In this paper, we propose a new two step algorithm for unsupervised feature selection. In the first step Laplacian Score is used to select the important features. And in the second step, Symmetric Uncertainty is used to remove redundant features. The experimental results show that the proposed algorithm outperforms the Laplacian Score algorithm.

Author(s):  
Khalid AA Abakar ◽  
Chongwen Yu

This work demonstrated the possibility of using the data mining techniques such as artificial neural networks (ANN) and support vector machine (SVM) based model to predict the quality of the spinning yarn parameters. Three different kernel functions were used as SVM kernel functions which are Polynomial and Radial Basis Function (RBF) and Pearson VII Function-based Universal Kernel (PUK) and ANN model were used as data mining techniques to predict yarn properties. In this paper, it was found that the SVM model based on Person VII kernel function (PUK) have the same performance in prediction of spinning yarn quality in comparison with SVM based RBF kernel. The comparison with the ANN model showed that the two SVM models give a better prediction performance than an ANN model.


Author(s):  
VLADIMIR NIKULIN ◽  
TIAN-HSIANG HUANG ◽  
GEOFFREY J. MCLACHLAN

The method presented in this paper is novel as a natural combination of two mutually dependent steps. Feature selection is a key element (first step) in our classification system, which was employed during the 2010 International RSCTC data mining (bioinformatics) Challenge. The second step may be implemented using any suitable classifier such as linear regression, support vector machine or neural networks. We conducted leave-one-out (LOO) experiments with several feature selection techniques and classifiers. Based on the LOO evaluations, we decided to use feature selection with the separation type Wilcoxon-based criterion for all final submissions. The method presented in this paper was tested successfully during the RSCTC data mining Challenge, where we achieved the top score in the Basic track.


Author(s):  
Mr. Bhushan Bandre, Ms. Rashmi Khalatkar

Major decision making process using large amount of data can be done by various techniques using data mining. In education sectors various data mining techniques are implemented to analyze the student’s data from the admission process itself. Due to large number of educational institution in India, excellence becomes a major parameter for the institutions to grow and with stand. Nowadays education institutions use data mining techniques to show their excellence. The main objective of this work to present an analysis of individual semester wise results of engineering college students using different techniques of data mining. Here we used different classification algorithms like decision tree, rule based, function based and Bayesian algorithms to analyze the semester results and comparison is made by considering parameters like accuracy and error rate. Our output shows the most suited algorithm for analyzing data in educational institutions.


Author(s):  
Tyler Swanger ◽  
Kaitlyn Whitlock ◽  
Anthony Scime ◽  
Brendan P. Post

This chapter data mines the usage patterns of the ANGEL Learning Management System (LMS) at a comprehensive college. The data includes counts of all the features ANGEL offers its users for the Fall and Spring semesters of the academic years beginning in 2007 and 2008. Data mining techniques are applied to evaluate which LMS features are used most commonly and most effectively by instructors and students. Classification produces a decision tree which predicts the courses that will use the ANGEL system based on course specific attributes. The dataset undergoes association mining to discover the usage of one feature’s effect on the usage of another set of features. Finally, clustering the data identifies messages and files as the features most commonly used. These results can be used by this institution, as well as similar institutions, for decision making concerning feature selection and overall usefulness of LMS design, selection and implementation.


2008 ◽  
pp. 2943-2963
Author(s):  
Malcolm J. Beynon

The efficacy of data mining lies in its ability to identify relationships amongst data. This chapter investigates that constraining this efficacy is the quality of the data analysed, including whether the data is imprecise or in the worst case incomplete. Through the description of Dempster-Shafer theory (DST), a general methodology based on uncertain reasoning, it argues that traditional data mining techniques are not structured to handle such imperfect data, instead requiring the external management of missing values, and so forth. One DST based technique is classification and ranking belief simplex (CaRBS), which allows intelligent data mining through the acceptance of missing values in the data analysed, considering them a factor of ignorance, and not requiring their external management. Results presented here, using CaRBS and a number of simplex plots, show the effect of managing and not managing of imperfect data.


2019 ◽  
Vol 123 (1267) ◽  
pp. 1415-1436 ◽  
Author(s):  
A. B. A. Anderson ◽  
A. J. Sanjeev Kumar ◽  
A. B. Arockia Christopher

ABSTRACTData mining is a process of finding correlations and collecting and analysing a huge amount of data in a database to discover patterns or relationships. Flight delay creates significant problems in the present aviation system. Data mining techniques are desired for analysing the performance in which micro-level causes propagate to make system-level patterns of delay. Analysing flight delays is very difficult – both when looking from a historical view as well as when estimating delays with forecast demand. This paper proposes using Decision Tree (DT), Support Vector Machine (SVM), Naive Bayesian (NB), K-nearest neighbour (KNN) and Artificial Neural Network (ANN) to study and analyse delays among aircrafts. The performance of different data mining methods is found in the different regions of the updated datasets on these classifiers. Finally, the result shows a significant variation in the performance of different data mining methods and feature selection for this problem. This paper aims to deal with how data mining techniques can be used to understand difficult aircraft system delays in aviation. Our aim is to develop a classification model for studying and reducing delay using different data mining methods and, in this manner, to show that DT has a greater classification accuracy. The different feature selectors are used in this study in order to reduce the number of initial attributes. Our results clearly demonstrate the value of DT for analysing and visualising how system-level effects happen from subsystem-level causes.


2018 ◽  
Vol 5 (3) ◽  
pp. 1-20 ◽  
Author(s):  
Sharmila Subudhi ◽  
Suvasini Panigrahi

This article presents a novel approach for fraud detection in automobile insurance claims by applying various data mining techniques. Initially, the most relevant attributes are chosen from the original dataset by using an evolutionary algorithm based feature selection method. A test set is then extracted from the selected attribute set and the remaining dataset is subjected to the Possibilistic Fuzzy C-Means (PFCM) clustering technique for the undersampling approach. The 10-fold cross validation method is then used on the balanced dataset for training and validating a group of Weighted Extreme Learning Machine (WELM) classifiers generated from various combinations of WELM parameters. Finally, the test set is applied on the best performing model for classification purpose. The efficacy of the proposed system is illustrated by conducting several experiments on a real-world automobile insurance defraud dataset. Besides, a comparative analysis with another approach justifies the superiority of the proposed system.


Sign in / Sign up

Export Citation Format

Share Document