Feature Reduction

Author(s):  
John A. Richards
Keyword(s):  
2018 ◽  
Vol 4 (10) ◽  
pp. 6
Author(s):  
Shivangi Bhargava ◽  
Dr. Shivnath Ghosh

News popularity is the maximum growth of attention given for particular news article. The popularity of online news depends on various factors such as the number of social media, the number of visitor comments, the number of Likes, etc. It is therefore necessary to build an automatic decision support system to predict the popularity of the news as it will help in business intelligence too. The work presented in this study aims to find the best model to predict the popularity of online news using machine learning methods. In this work, the result analysis is performed by applying Co-relation algorithm, particle swarm optimization and principal component analysis. For performance evaluation support vector machine, naïve bayes, k-nearest neighbor and neural network classifiers are used to classify the popular and unpopular data. From the experimental results, it is observed that support vector machine and naïve bayes outperforms better with co-relation algorithm as well as k-NN and neural network outperforms better with particle swarm optimization.


2020 ◽  
Author(s):  
Nalika Ulapane ◽  
Karthick Thiyagarajan ◽  
sarath kodagoda

<div>Classification has become a vital task in modern machine learning and Artificial Intelligence applications, including smart sensing. Numerous machine learning techniques are available to perform classification. Similarly, numerous practices, such as feature selection (i.e., selection of a subset of descriptor variables that optimally describe the output), are available to improve classifier performance. In this paper, we consider the case of a given supervised learning classification task that has to be performed making use of continuous-valued features. It is assumed that an optimal subset of features has already been selected. Therefore, no further feature reduction, or feature addition, is to be carried out. Then, we attempt to improve the classification performance by passing the given feature set through a transformation that produces a new feature set which we have named the “Binary Spectrum”. Via a case study example done on some Pulsed Eddy Current sensor data captured from an infrastructure monitoring task, we demonstrate how the classification accuracy of a Support Vector Machine (SVM) classifier increases through the use of this Binary Spectrum feature, indicating the feature transformation’s potential for broader usage.</div><div><br></div>


2021 ◽  
Vol 13 (3) ◽  
pp. 526
Author(s):  
Shengliang Pu ◽  
Yuanfeng Wu ◽  
Xu Sun ◽  
Xiaotong Sun

The nascent graph representation learning has shown superiority for resolving graph data. Compared to conventional convolutional neural networks, graph-based deep learning has the advantages of illustrating class boundaries and modeling feature relationships. Faced with hyperspectral image (HSI) classification, the priority problem might be how to convert hyperspectral data into irregular domains from regular grids. In this regard, we present a novel method that performs the localized graph convolutional filtering on HSIs based on spectral graph theory. First, we conducted principal component analysis (PCA) preprocessing to create localized hyperspectral data cubes with unsupervised feature reduction. These feature cubes combined with localized adjacent matrices were fed into the popular graph convolution network in a standard supervised learning paradigm. Finally, we succeeded in analyzing diversified land covers by considering local graph structure with graph convolutional filtering. Experiments on real hyperspectral datasets demonstrated that the presented method offers promising classification performance compared with other popular competitors.


Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3663
Author(s):  
Zun Shen ◽  
Qingfeng Wu ◽  
Zhi Wang ◽  
Guoyi Chen ◽  
Bin Lin

(1) Background: Diabetic retinopathy, one of the most serious complications of diabetes, is the primary cause of blindness in developed countries. Therefore, the prediction of diabetic retinopathy has a positive impact on its early detection and treatment. The prediction of diabetic retinopathy based on high-dimensional and small-sample-structured datasets (such as biochemical data and physical data) was the problem to be solved in this study. (2) Methods: This study proposed the XGB-Stacking model with the foundation of XGBoost and stacking. First, a wrapped feature selection algorithm, XGBIBS (Improved Backward Search Based on XGBoost), was used to reduce data feature redundancy and improve the effect of a single ensemble learning classifier. Second, in view of the slight limitation of a single classifier, a stacking model fusion method, Sel-Stacking (Select-Stacking), which keeps Label-Proba as the input matrix of meta-classifier and determines the optimal combination of learners by a global search, was used in the XGB-Stacking model. (3) Results: XGBIBS greatly improved the prediction accuracy and the feature reduction rate of a single classifier. Compared to a single classifier, the accuracy of the Sel-Stacking model was improved to varying degrees. Experiments proved that the prediction model of XGB-Stacking based on the XGBIBS algorithm and the Sel-Stacking method made effective predictions on diabetes retinopathy. (4) Conclusion: The XGB-Stacking prediction model of diabetic retinopathy based on biochemical and physical data had outstanding performance. This is highly significant to improve the screening efficiency of diabetes retinopathy and reduce the cost of diagnosis.


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Parvathaneni Rajendra Kumar ◽  
Suban Ravichandran ◽  
Satyala Narayana

AbstractObjectivesThis research work exclusively aims to develop a novel heart disease prediction framework including three major phases, namely proposed feature extraction, dimensionality reduction, and proposed ensemble-based classification.MethodsAs the novelty, the training of NN is carried out by a new enhanced optimization algorithm referred to as Sea Lion with Canberra Distance (S-CDF) via tuning the optimal weights. The improved S-CDF algorithm is the extended version of the existing “Sea Lion Optimization (SLnO)”. Initially, the statistical and higher-order statistical features are extracted including central tendency, degree of dispersion, and qualitative variation, respectively. However, in this scenario, the “curse of dimensionality” seems to be the greatest issue, such that there is a necessity of dimensionality reduction in the extracted features. Hence, the principal component analysis (PCA)-based feature reduction approach is deployed here. Finally, the dimensional concentrated features are fed as the input to the proposed ensemble technique with “Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbor (KNN)” with optimized Neural Network (NN) as the final classifier.ResultsAn elaborative analyses as well as discussion have been provided by concerning the parameters, like evaluation metrics, year of publication, accuracy, implementation tool, and utilized datasets obtained by various techniques.ConclusionsFrom the experiment outcomes, it is proved that the accuracy of the proposed work with the proposed feature set is 5, 42.85, and 10% superior to the performance with other feature sets like central tendency + dispersion feature, central tendency qualitative variation, and dispersion qualitative variation, respectively.ResultsFinally, the comparative evaluation shows that the presented work is appropriate for heart disease prediction as it has high accuracy than the traditional works.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Hakan Gunduz

AbstractIn this study, the hourly directions of eight banking stocks in Borsa Istanbul were predicted using linear-based, deep-learning (LSTM) and ensemble learning (LightGBM) models. These models were trained with four different feature sets and their performances were evaluated in terms of accuracy and F-measure metrics. While the first experiments directly used the own stock features as the model inputs, the second experiments utilized reduced stock features through Variational AutoEncoders (VAE). In the last experiments, in order to grasp the effects of the other banking stocks on individual stock performance, the features belonging to other stocks were also given as inputs to our models. While combining other stock features was done for both own (named as allstock_own) and VAE-reduced (named as allstock_VAE) stock features, the expanded dimensions of the feature sets were reduced by Recursive Feature Elimination. As the highest success rate increased up to 0.685 with allstock_own and LSTM with attention model, the combination of allstock_VAE and LSTM with the attention model obtained an accuracy rate of 0.675. Although the classification results achieved with both feature types was close, allstock_VAE achieved these results using nearly 16.67% less features compared to allstock_own. When all experimental results were examined, it was found out that the models trained with allstock_own and allstock_VAE achieved higher accuracy rates than those using individual stock features. It was also concluded that the results obtained with the VAE-reduced stock features were similar to those obtained by own stock features.


Sign in / Sign up

Export Citation Format

Share Document