Feature Selection for Genomic and Proteomic Data Mining

In this study, a new model is provided for customized privacy in privacy preserving data mining in which the data owners define different levels for privacy for different features. Additionally, in order to improve perturbation methods, a method combined of singular value decomposition (SVD) and feature selection methods is defined so as to benefit from the advantages of both domains. Also, to assess the amount of distortion created by the proposed perturbation method, new distortion criteria are defined in which the amount of created distortion in the process of feature selection is considered based on the value of privacy in each feature. Different tests and results analysis show that offered method based on this model compared to previous approaches, caused the improved privacy, accuracy of mining results and efficiency of privacy preserving data mining systems.

Download Full-text

Feature Selection for Knowledge Discovery and Data Mining

10.1007/978-1-4615-5689-3 ◽

1998 ◽

Cited By ~ 704

Author(s):

Huan Liu ◽

Hiroshi Motoda

Keyword(s):

Data Mining ◽

Feature Selection ◽

Knowledge Discovery ◽

Selection For

Download Full-text

Data mining feature selection for credit scoring models

Journal of the Operational Research Society ◽

10.1057/palgrave.jors.2601976 ◽

2005 ◽

Vol 56 (9) ◽

pp. 1099-1108 ◽

Cited By ~ 53

Author(s):

Y Liu ◽

M Schumann

Keyword(s):

Data Mining ◽

Feature Selection ◽

Credit Scoring ◽

Selection For

Download Full-text

Spectral Feature Selection for Data Mining

10.1201/b11426 ◽

2011 ◽

Cited By ~ 37

Author(s):

Zheng Alan Zhao ◽

Huan Liu

Keyword(s):

Data Mining ◽

Feature Selection ◽

Spectral Feature ◽

Selection For ◽

Spectral Feature Selection

Download Full-text

Latest Tools for Data Mining and Machine Learning

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i1003.0789s19 ◽

2019 ◽

Vol 8 (9S) ◽

pp. 18-23 ◽

Cited By ~ 2

Keyword(s):

Machine Learning ◽

Data Mining ◽

Decision Making ◽

Feature Selection ◽

Open Source ◽

Predictive Analysis ◽

Learning Tools ◽

Pros And Cons ◽

Selection For ◽

Extract Information

Nowadays, Data Mining is used everywhere for extracting information from the data and in turn, acquires knowledge for decision making. Data Mining analyzes patterns which are used to extract information and knowledge for making decisions. Many open source and licensed tools like Weka, RapidMiner, KNIME, and Orange are available for Data Mining and predictive analysis. This paper discusses about different tools available for Data Mining and Machine Learning, followed by the description, pros and cons of these tools. The article provides details of all the algorithms like classification, regression, characterization, discretization, clustering, visualization and feature selection for Data Mining and Machine Learning tools. It will help people for efficient decision making and suggests which tool is suitable according to their requirement.

Download Full-text

Simultaneous Feature Selection and Tuple Selection for Efficient Classification

Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development ◽

10.4018/978-1-60566-748-5.ch012 ◽

2010 ◽

pp. 270-285

Author(s):

Manoranjan Dash ◽

Vivekanand Gopalkrishnan

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Data Mining ◽

Feature Selection ◽

Distance Measure ◽

Microarray Gene Expression ◽

Research Areas ◽

Microarray Gene ◽

Selection For ◽

Learning Data

Feature selection and tuple selection help the classifier to focus to achieve similar (or even better) accuracy as compared to the classification without feature selection and tuple selection. Although feature selection and tuple selection have been studied earlier in various research areas such as machine learning, data mining, and so on, they have rarely been studied together. The contribution of this chapter is that the authors propose a novel distance measure to select the most representative features and tuples. Their experiments are conducted over some microarray gene expression datasets, UCI machine learning and KDD datasets. Results show that the proposed method outperforms the existing methods quite significantly.

Download Full-text