A novel method of constrained feature selection by the measurement of pairwise constraints uncertainty

Abstract In the past decades, the rapid growth of computer and database technologies has led to the rapid growth of large-scale datasets. On the other hand, data mining applications with high dimensional datasets that require high speed and accuracy are rapidly increasing. Semi-supervised learning is a class of machine learning in which unlabeled data and labeled data are used simultaneously to improve feature selection. The goal of feature selection over partially labeled data (semi-supervised feature selection) is to choose a subset of available features with the lowest redundancy with each other and the highest relevancy to the target class, which is the same objective as the feature selection over entirely labeled data. This method actually used the classification to reduce ambiguity in the range of values. First, the similarity values of each pair are collected, and then these values are divided into intervals, and the average of each interval is determined. In the next step, for each interval, the number of pairs in this range is counted. Finally, by using the strength and similarity matrices, a new constraint feature selection ranking is proposed. The performance of the presented method was compared to the performance of the state-of-the-art, and well-known semi-supervised feature selection approaches on eight datasets. The results indicate that the proposed approach improves previous related approaches with respect to the accuracy of the constrained score. In particular, the numerical results showed that the presented approach improved the classification accuracy by about 3% and reduced the number of selected features by 1%. Consequently, it can be said that the proposed method has reduced the computational complexity of the machine learning algorithm despite increasing the classification accuracy.

Download Full-text

A Novel Method of Constrained Feature Selection by the Measurement of Pairwise Constraints Uncertainty

10.21203/rs.3.rs-31751/v2 ◽

2020 ◽

Author(s):

Kamal Berahmand ◽

Mehrdad Rostami ◽

Saman Forouzandeh

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Rapid Growth ◽

Classification Accuracy ◽

High Speed ◽

Large Scale ◽

Learning Algorithm ◽

Target Class ◽

Partially Labeled Data ◽

Novel Method

Download Full-text

A Novel Method of Constrained Feature Selection by the Measurement of Pairwise Constraints Uncertainty

10.21203/rs.3.rs-31751/v1 ◽

2020 ◽

Author(s):

Kamal Berahmand ◽

Mehrdad Rostami ◽

Saman Forouzandeh

Keyword(s):

Feature Selection ◽

Classification Accuracy ◽

Learning Algorithm ◽

Side Information ◽

Computational Cost ◽

Cost Savings ◽

Learning Task ◽

Pairwise Constraint ◽

Target Class ◽

Partially Labeled Data

Abstract In recent years, with the development of science and technology, there were considerable advances in datasets in various sciences, and many features are also shown for these datasets nowadays. With a high-dimensional dataset, many features are generally redundant and/or irrelevant for a provided learning task, which has adverse effects with regard to computational cost and/or performance. The goal of feature selection over partially labeled data (semi-supervised feature selection) is to choose a subset of available features with the lowest redundancy with each other and the highest relevancy to the target class, which is the same objective as the feature selection over entirely labeled data. By appropriate reduction of the dimensions, in addition to time-cost savings, performance increases as well. In this paper, side information such as pairwise constraint is used to rank and reduce the dimensions. In the proposed method, the authors deal with checking the quality (strength or uncertainty) of the pairwise constraint. Usually, the quality of the pair of constraints on the dimension reduction is not calculated. In the first step, the strength matrix is created through a similarity matrix and uncertainty region. And then, by using the strength and similarity matrices, a new constraint feature selection ranking is proposed. The performance of the presented method was compared to the performance of the state-of-the-art, and well-known semi-supervised feature selection approaches on eight datasets. The findings indicate that the proposed approach improves previous related approaches with respect to the accuracy of constrained clustering. In particular, the numerical results showed that the presented approach improved the classification accuracy by about 3% and reduced the number of selected features by 1%. Consequently, it can be said that the proposed method has reduced the computational complexity of the machine learning algorithm despite increasing the classification accuracy.

Download Full-text

An Empirical Evaluation of Feature Selection Methods

Improving Knowledge Discovery through the Integration of Data Mining Techniques - Advances in Data Mining and Database Management ◽

10.4018/978-1-4666-8513-0.ch012 ◽

2015 ◽

pp. 233-258 ◽

Cited By ~ 1

Author(s):

Mohsin Iqbal ◽

Saif Ur Rehman ◽

Saira Gillani ◽

Sohail Asghar

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Classification Accuracy ◽

Information Gain ◽

Learning Algorithm ◽

Empirical Evaluation ◽

Machine Learning Algorithms ◽

Selection Methods ◽

The One ◽

Processing And Storage

The key objective of the chapter would be to study the classification accuracy, using feature selection with machine learning algorithms. The dimensionality of the data is reduced by implementing Feature selection and accuracy of the learning algorithm improved. We test how an integrated feature selection could affect the accuracy of three classifiers by performing feature selection methods. The filter effects show that Information Gain (IG), Gain Ratio (GR) and Relief-f, and wrapper effect show that Bagging and Naive Bayes (NB), enabled the classifiers to give the highest escalation in classification accuracy about the average while reducing the volume of unnecessary attributes. The achieved conclusions can advise the machine learning users, which classifier and feature selection methods to use to optimize the classification accuracy, and this can be important, especially at risk-sensitive applying Machine Learning whereas in the one of the aim to reduce costs of collecting, processing and storage of unnecessary data.

Download Full-text

Machine learning algorithm to identifies fraud emails with feature selection

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/1088/1/012011 ◽

2021 ◽

Vol 1088 (1) ◽

pp. 012011

Author(s):

Anita Sindar Sinaga ◽

Musthafa Haris Munandar ◽

Arjon Samuel Sitio

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Learning Algorithm ◽

Machine Learning Algorithm

Download Full-text

Geometrical design of a crystal growth system guided by a machine learning algorithm

CrystEngComm ◽

10.1039/d1ce00106j ◽

2021 ◽

Author(s):

Wancheng Yu ◽

Can Zhu ◽

Yosuke Tsunooka ◽

Wei Huang ◽

Yifan Dang ◽

...

Keyword(s):

Machine Learning ◽

Crystal Growth ◽

High Speed ◽

Learning Algorithm ◽

Computational Techniques ◽

Machine Learning Algorithm ◽

Geometrical Design ◽

Large Numbers ◽

Growth System ◽

Speed Method

This study proposes a new high-speed method for designing crystal growth systems. It is capable of optimizing large numbers of parameters simultaneously which is difficult for traditional experimental and computational techniques.

Download Full-text

Real-Time AI-Based Informational Decision-Making Support System Utilizing Dynamic Text Sources

Applied Sciences ◽

10.3390/app11136237 ◽

2021 ◽

Vol 11 (13) ◽

pp. 6237

Author(s):

Azharul Islam ◽

KyungHi Chang

Keyword(s):

Machine Learning ◽

Decision Making ◽

Random Forest ◽

Support System ◽

Classification Accuracy ◽

Short Term Memory ◽

Learning Algorithm ◽

Unstructured Data ◽

Stochastic Gradient Descent ◽

Decision Making Support

Unstructured data from the internet constitute large sources of information, which need to be formatted in a user-friendly way. This research develops a model that classifies unstructured data from data mining into labeled data, and builds an informational and decision-making support system (DMSS). We often have assortments of information collected by mining data from various sources, where the key challenge is to extract valuable information. We observe substantial classification accuracy enhancement for our datasets with both machine learning and deep learning algorithms. The highest classification accuracy (99% in training, 96% in testing) was achieved from a Covid corpus which is processed by using a long short-term memory (LSTM). Furthermore, we conducted tests on large datasets relevant to the Disaster corpus, with an LSTM classification accuracy of 98%. In addition, random forest (RF), a machine learning algorithm, provides a reasonable 84% accuracy. This research’s main objective is to increase the application’s robustness by integrating intelligence into the developed DMSS, which provides insight into the user’s intent, despite dealing with a noisy dataset. Our designed model selects the random forest and stochastic gradient descent (SGD) algorithms’ F1 score, where the RF method outperforms by improving accuracy by 2% (to 83% from 81%) compared with a conventional method.

Download Full-text

A REVIEW OF FEATURE EXTRACTION METHODS ON MACHINE LEARNING

Journal of Information System and Technology Management ◽

10.35631/jistm.622005 ◽

2021 ◽

Vol 6 (22) ◽

pp. 51-59

Author(s):

Mustazzihim Suhaidi ◽

Rabiah Abdul Kadir ◽

Sabrina Tiun

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Feature Selection ◽

Input Data ◽

Feature Vector ◽

Learning Algorithm ◽

Extraction Methods ◽

Machine Learning Algorithm ◽

Learning Tasks ◽

Low Dimensional

Extracting features from input data is vital for successful classification and machine learning tasks. Classification is the process of declaring an object into one of the predefined categories. Many different feature selection and feature extraction methods exist, and they are being widely used. Feature extraction, obviously, is a transformation of large input data into a low dimensional feature vector, which is an input to classification or a machine learning algorithm. The task of feature extraction has major challenges, which will be discussed in this paper. The challenge is to learn and extract knowledge from text datasets to make correct decisions. The objective of this paper is to give an overview of methods used in feature extraction for various applications, with a dataset containing a collection of texts taken from social media.

Download Full-text

Machine learning-based feature importance approach for sensitivity analysis of steel frames

10.31224/osf.io/mvkf3 ◽

2021 ◽

Author(s):

Hyeyoung Koh ◽

Hannah Beth Blum

Keyword(s):

Machine Learning ◽

Sensitivity Analysis ◽

Feature Selection ◽

Large Scale ◽

Failure Modes ◽

Model Development ◽

Predictive Performance ◽

Computational Effort ◽

Structural Systems ◽

Feature Importance

This study presents a machine learning-based approach for sensitivity analysis to examine how parameters affect a given structural response while accounting for uncertainty. Reliability-based sensitivity analysis involves repeated evaluations of the performance function incorporating uncertainties to estimate the influence of a model parameter, which can lead to prohibitive computational costs. This challenge is exacerbated for large-scale engineering problems which often carry a large quantity of uncertain parameters. The proposed approach is based on feature selection algorithms that rank feature importance and remove redundant predictors during model development which improve model generality and training performance by focusing only on the significant features. The approach allows performing sensitivity analysis of structural systems by providing feature rankings with reduced computational effort. The proposed approach is demonstrated with two designs of a two-bay, two-story planar steel frame with different failure modes: inelastic instability of a single member and progressive yielding. The feature variables in the data are uncertainties including material yield strength, Young’s modulus, frame sway imperfection, and residual stress. The Monte Carlo sampling method is utilized to generate random realizations of the frames from published distributions of the feature parameters, and the response variable is the frame ultimate strength obtained from finite element analyses. Decision trees are trained to identify important features. Feature rankings are derived by four feature selection techniques including impurity-based, permutation, SHAP, and Spearman's correlation. Predictive performance of the model including the important features are discussed using the evaluation metric for imbalanced datasets, Matthews correlation coefficient. Finally, the results are compared with those from reliability-based sensitivity analysis on the same example frames to show the validity of the feature selection approach. As the proposed machine learning-based approach produces the same results as the reliability-based sensitivity analysis with improved computational efficiency and accuracy, it could be extended to other structural systems.

Download Full-text

Zero-Shot Feature Selection via Transferring Supervised Knowledge

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2021040101 ◽

2021 ◽

Vol 17 (2) ◽

pp. 1-20

Author(s):

Zheng Wang ◽

Qiao Wang ◽

Tingzhang Zhao ◽

Chaokun Wang ◽

Xiaojun Ye

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Dimensionality Reduction ◽

Real World ◽

Rapid Growth ◽

Learning Systems ◽

Training Data ◽

Effective Technique ◽

Supervised Methods ◽

Real World Datasets

Feature selection, an effective technique for dimensionality reduction, plays an important role in many machine learning systems. Supervised knowledge can significantly improve the performance. However, faced with the rapid growth of newly emerging concepts, existing supervised methods might easily suffer from the scarcity and validity of labeled data for training. In this paper, the authors study the problem of zero-shot feature selection (i.e., building a feature selection model that generalizes well to “unseen” concepts with limited training data of “seen” concepts). Specifically, they adopt class-semantic descriptions (i.e., attributes) as supervision for feature selection, so as to utilize the supervised knowledge transferred from the seen concepts. For more reliable discriminative features, they further propose the center-characteristic loss which encourages the selected features to capture the central characteristics of seen concepts. Extensive experiments conducted on various real-world datasets demonstrate the effectiveness of the method.

Download Full-text

Recognition Technology of Athlete’s Limb Movement Combined Based on the Integrated Learning Algorithm

Journal of Sensors ◽

10.1155/2021/3057557 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Fei Tan ◽

Xiaoqing Xie

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Learning Algorithm ◽

Human Motion ◽

Machine Learning Algorithms ◽

Support Vector ◽

Recording Device ◽

Table Tennis ◽

Movement Recognition ◽

Random Forest Tree

Human motion recognition based on inertial sensor is a new research direction in the field of pattern recognition. It carries out preprocessing, feature selection, and feature selection by placing inertial sensors on the surface of the human body. Finally, it mainly classifies and recognizes the extracted features of human action. There are many kinds of swing movements in table tennis. Accurately identifying these movement modes is of great significance for swing movement analysis. With the development of artificial intelligence technology, human movement recognition has made many breakthroughs in recent years, from machine learning to deep learning, from wearable sensors to visual sensors. However, there is not much work on movement recognition for table tennis, and the methods are still mainly integrated into the traditional field of machine learning. Therefore, this paper uses an acceleration sensor as a motion recording device for a table tennis disc and explores the three-axis acceleration data of four common swing motions. Traditional machine learning algorithms (decision tree, random forest tree, and support vector) are used to classify the swing motion, and a classification algorithm based on the idea of integration is designed. Experimental results show that the ensemble learning algorithm developed in this paper is better than the traditional machine learning algorithm, and the average recognition accuracy is 91%.

Download Full-text