A REVIEW OF FEATURE EXTRACTION METHODS ON MACHINE LEARNING

Extracting features from input data is vital for successful classification and machine learning tasks. Classification is the process of declaring an object into one of the predefined categories. Many different feature selection and feature extraction methods exist, and they are being widely used. Feature extraction, obviously, is a transformation of large input data into a low dimensional feature vector, which is an input to classification or a machine learning algorithm. The task of feature extraction has major challenges, which will be discussed in this paper. The challenge is to learn and extract knowledge from text datasets to make correct decisions. The objective of this paper is to give an overview of methods used in feature extraction for various applications, with a dataset containing a collection of texts taken from social media.

Download Full-text

Feature Engineering for Various Data Types in Data Science

Advances in Data Mining and Database Management - Handbook of Research on Automated Feature Engineering and Advanced Applications in Data Science ◽

10.4018/978-1-7998-6659-6.ch001 ◽

2021 ◽

pp. 1-16

Author(s):

Nilesh Kumar Sahu ◽

Manorama Patnaik ◽

Itu Snigdh

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Input Data ◽

Data Science ◽

Learning Algorithm ◽

Feature Engineering ◽

Machine Learning Algorithm ◽

Data Types ◽

Data Set ◽

Machine Learning Models

The precision of any machine learning algorithm depends on the data set, its suitability, and its volume. Therefore, data and its characteristics have currently become the predominant components of any predictive or precision-based domain like machine learning. Feature engineering refers to the process of changing and preparing this input data so that it is ready for training machine learning models. Several features such as categorical, numerical, mixed, date, and time are to be considered for feature extraction in feature engineering. Datasets containing characteristics such as cardinality, missing data, and rare labels for categorical features, distribution, outliers, and magnitude are currently considered as features. This chapter discusses various data types and their techniques for applying to feature engineering. This chapter also focuses on the implementation of various data techniques for feature extraction.

Download Full-text

Feature extraction and prediction of Dengue Outbreaks

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit206544 ◽

2020 ◽

pp. 216-222

Author(s):

Kunal Parikh ◽

Tanvi Makadia ◽

Harshil Patel

Keyword(s):

Public Health ◽

Machine Learning ◽

Developing Countries ◽

Feature Extraction ◽

Predictive Analytics ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Health Concerns ◽

The World ◽

Dengue Outbreaks

Dengue is unquestionably one of the biggest health concerns in India and for many other developing countries. Unfortunately, many people have lost their lives because of it. Every year, approximately 390 million dengue infections occur around the world among which 500,000 people are seriously infected and 25,000 people have died annually. Many factors could cause dengue such as temperature, humidity, precipitation, inadequate public health, and many others. In this paper, we are proposing a method to perform predictive analytics on dengue’s dataset using KNN: a machine-learning algorithm. This analysis would help in the prediction of future cases and we could save the lives of many.

Download Full-text

Machine learning algorithm to identifies fraud emails with feature selection

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/1088/1/012011 ◽

2021 ◽

Vol 1088 (1) ◽

pp. 012011

Author(s):

Anita Sindar Sinaga ◽

Musthafa Haris Munandar ◽

Arjon Samuel Sitio

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Learning Algorithm ◽

Machine Learning Algorithm

Download Full-text

Intrusion Detection Using Feature Selection and Machine Learning Algorithm with Misuse Detection

International Journal of Computer Science and Information Technology ◽

10.5121/ijcsit.2016.8102 ◽

2016 ◽

Vol 8 (1) ◽

pp. 17-25 ◽

Cited By ~ 6

Author(s):

Harvinder Pal Singh Sasan ◽

Meenakshi Sharma

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Intrusion Detection ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Misuse Detection

Download Full-text

Interpretation of Neural Networks Is Fragile

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013681 ◽

2019 ◽

Vol 33 ◽

pp. 3681-3688 ◽

Cited By ~ 20

Author(s):

Amirata Ghorbani ◽

Abubakar Abid ◽

James Zou

Keyword(s):

Neural Network ◽

Machine Learning ◽

Neural Networks ◽

Input Data ◽

Learning Algorithm ◽

Hessian Matrix ◽

Machine Learning Algorithm ◽

Feature Importance ◽

Adversarial Attack ◽

Measurement Biases

In order for machine learning to be trusted in many applications, it is critical to be able to reliably explain why the machine learning algorithm makes certain predictions. For this reason, a variety of methods have been developed recently to interpret neural network predictions by providing, for example, feature importance maps. For both scientific robustness and security reasons, it is important to know to what extent can the interpretations be altered by small systematic perturbations to the input data, which might be generated by adversaries or by measurement biases. In this paper, we demonstrate how to generate adversarial perturbations that produce perceptively indistinguishable inputs that are assigned the same predicted label, yet have very different interpretations. We systematically characterize the robustness of interpretations generated by several widely-used feature importance interpretation methods (feature importance maps, integrated gradients, and DeepLIFT) on ImageNet and CIFAR-10. In all cases, our experiments show that systematic perturbations can lead to dramatically different interpretations without changing the label. We extend these results to show that interpretations based on exemplars (e.g. influence functions) are similarly susceptible to adversarial attack. Our analysis of the geometry of the Hessian matrix gives insight on why robustness is a general challenge to current interpretation approaches.

Download Full-text

Feature Extraction for Emotion Recognition in Speech with Machine Learning Algorithm

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2020/116942020 ◽

2020 ◽

Vol 9 (4) ◽

pp. 4998-5002

Author(s):

Aishwarya R.

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Emotion Recognition ◽

Learning Algorithm ◽

Machine Learning Algorithm

Download Full-text

Classification of Diabetes using Random Forest with Feature Selection Algorithm

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l3595.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 1295-1300 ◽

Cited By ~ 1

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Electronic Health Records ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Health Records

Diabetes has become a serious problem now a day. So there is a need to take serious precautions to eradicate this. To eradicate, we should know the level of occurrence. In this project we predict the level of occurrence of diabetes. We predict the level of occurrence of diabetes using Random Forest, a Machine Learning Algorithm. Using the patient’s Electronic Health Records (EHR) we can build accurate models that predict the presence of diabetes.

Download Full-text

Sequential Feature Selection and Machine Learning Algorithm-Based Patient’s Death Events Prediction and Diagnosis in Heart Disease

SN Computer Science ◽

10.1007/s42979-020-00370-1 ◽

2020 ◽

Vol 1 (6) ◽

Author(s):

Ritu Aggrawal ◽

Saurabh Pal

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Heart Disease ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Sequential Feature Selection

Download Full-text

Feature selection using autoencoders with Bayesian methods to high-dimensional data

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211348 ◽

2021 ◽

pp. 1-10

Author(s):

Lei Shu ◽

Kun Huang ◽

Wenhao Jiang ◽

Wenming Wu ◽

Hongling Liu

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Bayesian Methods ◽

Large Scale ◽

High Dimensional Data ◽

Hybrid Approach ◽

High Dimensional ◽

Real World Data ◽

Learning Tasks ◽

Low Dimensional

It is easy to lead to poor generalization in machine learning tasks using real-world data directly, since such data is usually high-dimensional dimensionality and limited. Through learning the low dimensional representations of high-dimensional data, feature selection can retain useful features for machine learning tasks. Using these useful features effectively trains machine learning models. Hence, it is a challenge for feature selection from high-dimensional data. To address this issue, in this paper, a hybrid approach consisted of an autoencoder and Bayesian methods is proposed for a novel feature selection. Firstly, Bayesian methods are embedded in the proposed autoencoder as a special hidden layer. This of doing is to increase the precision during selecting non-redundant features. Then, the other hidden layers of the autoencoder are used for non-redundant feature selection. Finally, compared with the mainstream approaches for feature selection, the proposed method outperforms them. We find that the way consisted of autoencoders and probabilistic correction methods is more meaningful than that of stacking architectures or adding constraints to autoencoders as regards feature selection. We also demonstrate that stacked autoencoders are more suitable for large-scale feature selection, however, sparse autoencoders are beneficial for a smaller number of feature selection. We indicate that the value of the proposed method provides a theoretical reference to analyze the optimality of feature selection.

Download Full-text

An Effective Multi-Label Feature Selection Model Towards Eliminating Noisy Features

Applied Sciences ◽

10.3390/app10228093 ◽

2020 ◽

Vol 10 (22) ◽

pp. 8093

Author(s):

Jun Wang ◽

Yuanyuan Xu ◽

Hengpeng Xu ◽

Zhe Sun ◽

Zhenglu Yang ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Learning Performance ◽

Space Structures ◽

Learning Tasks ◽

Feature Spaces ◽

Selection Approach ◽

Label Correlations ◽

Feature Selection Approach ◽

Low Dimensional

Feature selection has devoted a consistently great amount of effort to dimension reduction for various machine learning tasks. Existing feature selection models focus on selecting the most discriminative features for learning targets. However, this strategy is weak in handling two kinds of features, that is, the irrelevant and redundant ones, which are collectively referred to as noisy features. These features may hamper the construction of optimal low-dimensional subspaces and compromise the learning performance of downstream tasks. In this study, we propose a novel multi-label feature selection approach by embedding label correlations (dubbed ELC) to address these issues. Particularly, we extract label correlations for reliable label space structures and employ them to steer feature selection. In this way, label and feature spaces can be expected to be consistent and noisy features can be effectively eliminated. An extensive experimental evaluation on public benchmarks validated the superiority of ELC.

Download Full-text