scholarly journals DaMiRseq—an R/Bioconductor package for data mining of RNA-Seq data: normalization, feature selection and classification

2017 ◽  
Vol 34 (8) ◽  
pp. 1416-1418 ◽  
Author(s):  
Mattia Chiesa ◽  
Gualtiero I Colombo ◽  
Luca Piacentini
Author(s):  
VLADIMIR NIKULIN ◽  
TIAN-HSIANG HUANG ◽  
GEOFFREY J. MCLACHLAN

The method presented in this paper is novel as a natural combination of two mutually dependent steps. Feature selection is a key element (first step) in our classification system, which was employed during the 2010 International RSCTC data mining (bioinformatics) Challenge. The second step may be implemented using any suitable classifier such as linear regression, support vector machine or neural networks. We conducted leave-one-out (LOO) experiments with several feature selection techniques and classifiers. Based on the LOO evaluations, we decided to use feature selection with the separation type Wilcoxon-based criterion for all final submissions. The method presented in this paper was tested successfully during the RSCTC data mining Challenge, where we achieved the top score in the Basic track.


2014 ◽  
Vol 10 (1) ◽  
pp. 55-76 ◽  
Author(s):  
Mohammad Reza Keyvanpour ◽  
Somayyeh Seifi Moradi

In this study, a new model is provided for customized privacy in privacy preserving data mining in which the data owners define different levels for privacy for different features. Additionally, in order to improve perturbation methods, a method combined of singular value decomposition (SVD) and feature selection methods is defined so as to benefit from the advantages of both domains. Also, to assess the amount of distortion created by the proposed perturbation method, new distortion criteria are defined in which the amount of created distortion in the process of feature selection is considered based on the value of privacy in each feature. Different tests and results analysis show that offered method based on this model compared to previous approaches, caused the improved privacy, accuracy of mining results and efficiency of privacy preserving data mining systems.


2017 ◽  
Vol 27 (1) ◽  
pp. 169-180 ◽  
Author(s):  
Marton Szemenyei ◽  
Ferenc Vajda

Abstract Dimension reduction and feature selection are fundamental tools for machine learning and data mining. Most existing methods, however, assume that objects are represented by a single vectorial descriptor. In reality, some description methods assign unordered sets or graphs of vectors to a single object, where each vector is assumed to have the same number of dimensions, but is drawn from a different probability distribution. Moreover, some applications (such as pose estimation) may require the recognition of individual vectors (nodes) of an object. In such cases it is essential that the nodes within a single object remain distinguishable after dimension reduction. In this paper we propose new discriminant analysis methods that are able to satisfy two criteria at the same time: separating between classes and between the nodes of an object instance. We analyze and evaluate our methods on several different synthetic and real-world datasets.


Data mining is a real-world procedure of discovering useful patterns from heterogeneous datasets. All most all industry uses data mining in their day to day activities. To build an effective mining model, a series of development steps are to be followed. It starts with discovering the business problem and ends with communicating the results. In this development life cycle, the most important step is data preparation or data preprocessing. Data preprocessing is converting raw data into data understandable by the machine. Data normalization is a phase in data preprocessing where the data values are scaled to 0 and 1. Right normalization of the datasets leads to improved mining results. In this paper, academic data of students is taken. The dataset is normalization using six normalization technique. Multi Layer Perceptron classifier is applied to normalized dataset and results are obtained. Results of this study reveal the best normalization technique which can be used for normalizing academic datasets. Finally, in a line, the goal of this work is to discover the best normalization technique which produces better mining result when applied to academic datasets.


: In this era of Internet, the issue of security of information is at its peak. One of the main threats in this cyber world is phishing attacks which is an email or website fraud method that targets the genuine webpage or an email and hacks it without the consent of the end user. There are various techniques which help to classify whether the website or an email is legitimate or fake. The major contributors in the process of detection of these phishing frauds include the classification algorithms, feature selection techniques or dataset preparation methods and the feature extraction that plays an important role in detection as well as in prevention of these attacks. This Survey Paper studies the effect of all these contributors and the approaches that are applied in the study conducted on the recent papers. Some of the classification algorithms that are implemented includes Decision tree, Random Forest , Support Vector Machines, Logistic Regression , Lazy K Star, Naive Bayes and J48 etc.


2021 ◽  
Vol 15 (1) ◽  
Author(s):  
David K. Lim ◽  
Naim U. Rashid ◽  
Joseph G. Ibrahim

Sign in / Sign up

Export Citation Format

Share Document