Latest Tools for Data Mining and Machine Learning

Nowadays, Data Mining is used everywhere for extracting information from the data and in turn, acquires knowledge for decision making. Data Mining analyzes patterns which are used to extract information and knowledge for making decisions. Many open source and licensed tools like Weka, RapidMiner, KNIME, and Orange are available for Data Mining and predictive analysis. This paper discusses about different tools available for Data Mining and Machine Learning, followed by the description, pros and cons of these tools. The article provides details of all the algorithms like classification, regression, characterization, discretization, clustering, visualization and feature selection for Data Mining and Machine Learning tools. It will help people for efficient decision making and suggests which tool is suitable according to their requirement.

Download Full-text

Simultaneous Feature Selection and Tuple Selection for Efficient Classification

Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development ◽

10.4018/978-1-60566-748-5.ch012 ◽

2010 ◽

pp. 270-285

Author(s):

Manoranjan Dash ◽

Vivekanand Gopalkrishnan

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Data Mining ◽

Feature Selection ◽

Distance Measure ◽

Microarray Gene Expression ◽

Research Areas ◽

Microarray Gene ◽

Selection For ◽

Learning Data

Feature selection and tuple selection help the classifier to focus to achieve similar (or even better) accuracy as compared to the classification without feature selection and tuple selection. Although feature selection and tuple selection have been studied earlier in various research areas such as machine learning, data mining, and so on, they have rarely been studied together. The contribution of this chapter is that the authors propose a novel distance measure to select the most representative features and tuples. Their experiments are conducted over some microarray gene expression datasets, UCI machine learning and KDD datasets. Results show that the proposed method outperforms the existing methods quite significantly.

Download Full-text

Tools, Technologies, and Methodologies to Support Data Science

Advances in Data Mining and Database Management - Handbook of Research on Engineering, Business, and Healthcare Applications of Data Science and Analytics ◽

10.4018/978-1-7998-3053-5.ch004 ◽

2021 ◽

pp. 50-72

Author(s):

Ricardo A. Barrera-Cámara ◽

Ana Canepa-Saenz ◽

Jorge A. Ruiz-Vanoye ◽

Alejandro Fuentes-Penna ◽

Miguel Ángel Ruiz-Jaimes ◽

...

Keyword(s):

Machine Learning ◽

Data Mining ◽

Decision Making ◽

Information Systems ◽

Data Science ◽

Smart Phones ◽

Predictive Analysis ◽

Biomedical Equipment ◽

Abstract Knowledge ◽

Learning Data

Various devices such as smart phones, computers, tablets, biomedical equipment, sports equipment, and information systems generate a large amount of data and useful information in transactional information systems. However, these generate information that may not be perceptible or analyzed adequately for decision-making. There are technology, tools, algorithms, models that support analysis, visualization, learning, and prediction. Data science involves techniques, methods to abstract knowledge generated through diverse sources. It combines fields such as statistics, machine learning, data mining, visualization, and predictive analysis. This chapter aims to be a guide regarding applicable statistical and computational tools in data science.

Download Full-text

Machine Learning in Nutritional Follow-up Research

Open Computer Science ◽

10.1515/comp-2017-0008 ◽

2017 ◽

Vol 7 (1) ◽

pp. 41-45 ◽

Cited By ~ 5

Author(s):

Rita Reis ◽

Hugo Peixoto ◽

José Machado ◽

António Abelha

Keyword(s):

Machine Learning ◽

Data Mining ◽

Decision Making ◽

Daily Basis ◽

Decision Makers ◽

Learning Tools ◽

Healthcare Organizations ◽

Data Mining Techniques ◽

Take The Best

Abstract Healthcare is one of the world’s fastest growing industries, having large volumes of data collected on a daily basis. It is generally perceived as being ‘information rich’ yet ‘knowledge poor’. Hidden relationships and valuable knowledge can be discovered in the collected data from the application of data mining techniques. These techniques are being increasingly implemented in healthcare organizations in order to respond to the needs of doctors in their daily decision-making activities. To help the decision-makers to take the best decision it is fundamental to develop a solution able to predict events before their occurrence. The aim of this project was to predict if a patient would need to be followed by a nutrition specialist, by combining a nutritional dataset with data mining classification techniques, using WEKA machine learning tools. The achieved results showed to be very promising, presenting accuracy around 91%, specificity around 97% and precision about 95%.

Download Full-text

Optimal feature selection for machine learning based intrusion detection system by exploiting attribute dependence

Materials Today Proceedings ◽

10.1016/j.matpr.2021.04.643 ◽

2021 ◽

Author(s):

Ghanshyam Prasad Dubey ◽

Dr. Rakesh Kumar Bhujade

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Optimal Feature Selection ◽

Selection For ◽

Optimal Feature

Download Full-text

Feature Selection for Unsupervised Machine Learning of Accelerometer Data Physical Activity Clusters – A Systematic Review

Gait & Posture ◽

10.1016/j.gaitpost.2021.08.007 ◽

2021 ◽

Author(s):

Petra J. Jones ◽

Mike Catt ◽

Melanie J. Davies ◽

Charlotte L. Edwardson ◽

Evgeny M. Mirkes ◽

...

Keyword(s):

Physical Activity ◽

Machine Learning ◽

Systematic Review ◽

Feature Selection ◽

Accelerometer Data ◽

Unsupervised Machine Learning ◽

Selection For

Download Full-text

A Perturbation Method Based on Singular Value Decomposition and Feature Selection for Privacy Preserving Data Mining

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2014010104 ◽

2014 ◽

Vol 10 (1) ◽

pp. 55-76 ◽

Cited By ~ 1

Author(s):

Mohammad Reza Keyvanpour ◽

Somayyeh Seifi Moradi

Keyword(s):

Data Mining ◽

Feature Selection ◽

Singular Value Decomposition ◽

Perturbation Method ◽

Privacy Preserving ◽

Singular Value ◽

Privacy Preserving Data Mining ◽

Selection For ◽

Value Decomposition ◽

Different Levels

In this study, a new model is provided for customized privacy in privacy preserving data mining in which the data owners define different levels for privacy for different features. Additionally, in order to improve perturbation methods, a method combined of singular value decomposition (SVD) and feature selection methods is defined so as to benefit from the advantages of both domains. Also, to assess the amount of distortion created by the proposed perturbation method, new distortion criteria are defined in which the amount of created distortion in the process of feature selection is considered based on the value of privacy in each feature. Different tests and results analysis show that offered method based on this model compared to previous approaches, caused the improved privacy, accuracy of mining results and efficiency of privacy preserving data mining systems.

Download Full-text

Feature Selection for Machine Learning-Based Early Detection of Distributed Cyber Attacks

2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech) ◽

10.1109/dasc/picom/datacom/cyberscitec.2018.00040 ◽

2018 ◽

Cited By ~ 9

Author(s):

Yaokai Feng ◽

Hitoshi Akiyama ◽

Liang Lu ◽

Kouichi Sakurai

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Early Detection ◽

Cyber Attacks ◽

Selection For

Download Full-text

Dimension Reduction for Objects Composed of Vector Sets

International Journal of Applied Mathematics and Computer Science ◽

10.1515/amcs-2017-0012 ◽

2017 ◽

Vol 27 (1) ◽

pp. 169-180 ◽

Cited By ~ 1

Author(s):

Marton Szemenyei ◽

Ferenc Vajda

Keyword(s):

Machine Learning ◽

Data Mining ◽

Feature Selection ◽

Discriminant Analysis ◽

Probability Distribution ◽

Dimension Reduction ◽

Pose Estimation ◽

Real World ◽

Single Object ◽

Real World Datasets

Abstract Dimension reduction and feature selection are fundamental tools for machine learning and data mining. Most existing methods, however, assume that objects are represented by a single vectorial descriptor. In reality, some description methods assign unordered sets or graphs of vectors to a single object, where each vector is assumed to have the same number of dimensions, but is drawn from a different probability distribution. Moreover, some applications (such as pose estimation) may require the recognition of individual vectors (nodes) of an object. In such cases it is essential that the nodes within a single object remain distinguishable after dimension reduction. In this paper we propose new discriminant analysis methods that are able to satisfy two criteria at the same time: separating between classes and between the nodes of an object instance. We analyze and evaluate our methods on several different synthetic and real-world datasets.

Download Full-text

Application of all relevant feature selection for failure analysis of parameter-induced simulation crashes in climate models

Geoscientific Model Development Discussions ◽

10.5194/gmdd-8-5419-2015 ◽

2015 ◽

Vol 8 (7) ◽

pp. 5419-5435 ◽

Cited By ~ 1

Author(s):

W. Paja ◽

M. Wrzesie&nacute; ◽

R. Niemiec ◽

W. R. Rudnicki

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Climate Models ◽

Original Study ◽

Relative Importance ◽

Relevant Feature ◽

Machine Learning Methods ◽

Selection For ◽

Robust Prediction ◽

Physical Components

Abstract. The climate models are extremely complex pieces of software. They reflect best knowledge on physical components of the climate, nevertheless, they contain several parameters, which are too weakly constrained by observations, and can potentially lead to a crash of simulation. Recently a study by Lucas et al. (2013) has shown that machine learning methods can be used for predicting which combinations of parameters can lead to crash of simulation, and hence which processes described by these parameters need refined analyses. In the current study we reanalyse the dataset used in this research using different methodology. We confirm the main conclusion of the original study concerning suitability of machine learning for prediction of crashes. We show, that only three of the eight parameters indicated in the original study as relevant for prediction of the crash are indeed strongly relevant, three other are relevant but redundant, and two are not relevant at all. We also show that the variance due to split of data between training and validation sets has large influence both on accuracy of predictions and relative importance of variables, hence only cross-validated approach can deliver robust prediction of performance and relevance of variables.

Download Full-text

Feature Selection for Knowledge Discovery and Data Mining

10.1007/978-1-4615-5689-3 ◽

1998 ◽

Cited By ~ 704

Author(s):

Huan Liu ◽

Hiroshi Motoda

Keyword(s):

Data Mining ◽

Feature Selection ◽

Knowledge Discovery ◽

Selection For

Download Full-text