DaMiRseq—an R/Bioconductor package for data mining of RNA-Seq data: normalization, feature selection and classification

The method presented in this paper is novel as a natural combination of two mutually dependent steps. Feature selection is a key element (first step) in our classification system, which was employed during the 2010 International RSCTC data mining (bioinformatics) Challenge. The second step may be implemented using any suitable classifier such as linear regression, support vector machine or neural networks. We conducted leave-one-out (LOO) experiments with several feature selection techniques and classifiers. Based on the LOO evaluations, we decided to use feature selection with the separation type Wilcoxon-based criterion for all final submissions. The method presented in this paper was tested successfully during the RSCTC data mining Challenge, where we achieved the top score in the Basic track.

Download Full-text

A Perturbation Method Based on Singular Value Decomposition and Feature Selection for Privacy Preserving Data Mining

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2014010104 ◽

2014 ◽

Vol 10 (1) ◽

pp. 55-76 ◽

Cited By ~ 1

Author(s):

Mohammad Reza Keyvanpour ◽

Somayyeh Seifi Moradi

Keyword(s):

Data Mining ◽

Feature Selection ◽

Singular Value Decomposition ◽

Perturbation Method ◽

Privacy Preserving ◽

Singular Value ◽

Privacy Preserving Data Mining ◽

Selection For ◽

Value Decomposition ◽

Different Levels

In this study, a new model is provided for customized privacy in privacy preserving data mining in which the data owners define different levels for privacy for different features. Additionally, in order to improve perturbation methods, a method combined of singular value decomposition (SVD) and feature selection methods is defined so as to benefit from the advantages of both domains. Also, to assess the amount of distortion created by the proposed perturbation method, new distortion criteria are defined in which the amount of created distortion in the process of feature selection is considered based on the value of privacy in each feature. Different tests and results analysis show that offered method based on this model compared to previous approaches, caused the improved privacy, accuracy of mining results and efficiency of privacy preserving data mining systems.

Download Full-text

An efficient framework for heart disease classification using feature extraction and feature selection technique in data mining

2016 International Conference on Emerging Trends in Engineering, Technology and Science (ICETETS) ◽

10.1109/icetets.2016.7603000 ◽

2016 ◽

Cited By ~ 14

Author(s):

R. Kavitha ◽

E. Kannan

Keyword(s):

Data Mining ◽

Feature Extraction ◽

Feature Selection ◽

Heart Disease ◽

Disease Classification ◽

Feature Selection Technique ◽

Selection Technique

Download Full-text

Detection of financial statement fraud and feature selection using data mining techniques

Decision Support Systems ◽

10.1016/j.dss.2010.11.006 ◽

2011 ◽

Vol 50 (2) ◽

pp. 491-500 ◽

Cited By ~ 174

Author(s):

P. Ravisankar ◽

V. Ravi ◽

G. Raghava Rao ◽

I. Bose

Keyword(s):

Data Mining ◽

Feature Selection ◽

Financial Statement ◽

Financial Statement Fraud ◽

Data Mining Techniques ◽

Using Data

Download Full-text

Dimension Reduction for Objects Composed of Vector Sets

International Journal of Applied Mathematics and Computer Science ◽

10.1515/amcs-2017-0012 ◽

2017 ◽

Vol 27 (1) ◽

pp. 169-180 ◽

Cited By ~ 1

Author(s):

Marton Szemenyei ◽

Ferenc Vajda

Keyword(s):

Machine Learning ◽

Data Mining ◽

Feature Selection ◽

Discriminant Analysis ◽

Probability Distribution ◽

Dimension Reduction ◽

Pose Estimation ◽

Real World ◽

Single Object ◽

Real World Datasets

Abstract Dimension reduction and feature selection are fundamental tools for machine learning and data mining. Most existing methods, however, assume that objects are represented by a single vectorial descriptor. In reality, some description methods assign unordered sets or graphs of vectors to a single object, where each vector is assumed to have the same number of dimensions, but is drawn from a different probability distribution. Moreover, some applications (such as pose estimation) may require the recognition of individual vectors (nodes) of an object. In such cases it is essential that the nodes within a single object remain distinguishable after dimension reduction. In this paper we propose new discriminant analysis methods that are able to satisfy two criteria at the same time: separating between classes and between the nodes of an object instance. We analyze and evaluate our methods on several different synthetic and real-world datasets.

Download Full-text

Data Transformation Techniques for Academic Datasets

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a9711.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 2214-2218

Keyword(s):

Data Mining ◽

Life Cycle ◽

Data Preprocessing ◽

Data Normalization ◽

Multi Layer Perceptron ◽

Data Preparation ◽

Transformation Techniques ◽

Development Life Cycle ◽

Heterogeneous Datasets ◽

Mining Model

Data mining is a real-world procedure of discovering useful patterns from heterogeneous datasets. All most all industry uses data mining in their day to day activities. To build an effective mining model, a series of development steps are to be followed. It starts with discovering the business problem and ends with communicating the results. In this development life cycle, the most important step is data preparation or data preprocessing. Data preprocessing is converting raw data into data understandable by the machine. Data normalization is a phase in data preprocessing where the data values are scaled to 0 and 1. Right normalization of the datasets leads to improved mining results. In this paper, academic data of students is taken. The dataset is normalization using six normalization technique. Multi Layer Perceptron classifier is applied to normalized dataset and results are obtained. Results of this study reveal the best normalization technique which can be used for normalizing academic datasets. Finally, in a line, the goal of this work is to discover the best normalization technique which produces better mining result when applied to academic datasets.

Download Full-text

A Survey on Phishing Detection and The Importance of Feature Selection In Data Mining Classification Algorithms

Issue 4 - Journal of Science and Technology ◽

10.46243/jst.2020.v5.i6.pp11-18 ◽

2020 ◽

pp. 11-18

Keyword(s):

Data Mining ◽

Feature Selection ◽

Support Vector ◽

Classification Algorithms ◽

End User ◽

Preparation Methods ◽

Survey Paper ◽

Vector Machines ◽

Feature Selection Techniques ◽

Phishing Detection

: In this era of Internet, the issue of security of information is at its peak. One of the main threats in this cyber world is phishing attacks which is an email or website fraud method that targets the genuine webpage or an email and hacks it without the consent of the end user. There are various techniques which help to classify whether the website or an email is legitimate or fake. The major contributors in the process of detection of these phishing frauds include the classification algorithms, feature selection techniques or dataset preparation methods and the feature extraction that plays an important role in detection as well as in prevention of these attacks. This Survey Paper studies the effect of all these contributors and the approaches that are applied in the study conducted on the recent papers. Some of the classification algorithms that are implemented includes Decision tree, Random Forest , Support Vector Machines, Logistic Regression , Lazy K Star, Naive Bayes and J48 etc.

Download Full-text

Model-based feature selection and clustering of RNA-seq data for unsupervised subtype discovery

The Annals of Applied Statistics ◽

10.1214/20-aoas1407 ◽

2021 ◽

Vol 15 (1) ◽

Author(s):

David K. Lim ◽

Naim U. Rashid ◽

Joseph G. Ibrahim

Keyword(s):

Feature Selection ◽

Rna Seq ◽

Model Based

Download Full-text

Feature Selection Based Data Mining Approach for Coronary Artery Disease Diagnosis

Academic Platform Journal of Engineering and Science ◽

10.21541/apjes.899055 ◽

2021 ◽

Vol 9 (3) ◽

pp. 451-459

Author(s):

Kemal AKYOL

Keyword(s):

Data Mining ◽

Coronary Artery Disease ◽

Feature Selection ◽

Coronary Artery ◽

Disease Diagnosis ◽

Data Mining Approach ◽

Artery Disease ◽

Coronary Artery Disease Diagnosis

Download Full-text