1D embedding multi-category classification methods

Author(s):  
Luoqing Li ◽  
Chuanwu Yang ◽  
Qiwei Xie

In this paper, we propose a novel semi-supervised multi-category classification method based on one-dimensional (1D) multi-embedding. Based on the multiple 1D embedding based interpolation technique, we embed the high-dimensional data into several different 1D manifolds and perform binary classification firstly. Then we construct the multi-category classifiers by means of one-versus-rest and one-versus-one strategies separately. A weight strategy is employed in our algorithm for improving the classification performance. The proposed method shows promising results in the classification of handwritten digits and facial images.

2012 ◽  
Vol 8 (2) ◽  
pp. 44-63 ◽  
Author(s):  
Baoxun Xu ◽  
Joshua Zhexue Huang ◽  
Graham Williams ◽  
Qiang Wang ◽  
Yunming Ye

The selection of feature subspaces for growing decision trees is a key step in building random forest models. However, the common approach using randomly sampling a few features in the subspace is not suitable for high dimensional data consisting of thousands of features, because such data often contains many features which are uninformative to classification, and the random sampling often doesn’t include informative features in the selected subspaces. Consequently, classification performance of the random forest model is significantly affected. In this paper, the authors propose an improved random forest method which uses a novel feature weighting method for subspace selection and therefore enhances classification performance over high-dimensional data. A series of experiments on 9 real life high dimensional datasets demonstrated that using a subspace size of features where M is the total number of features in the dataset, our random forest model significantly outperforms existing random forest models.


Sensors ◽  
2021 ◽  
Vol 21 (21) ◽  
pp. 7417
Author(s):  
Alex J. Hope ◽  
Utkarsh Vashisth ◽  
Matthew J. Parker ◽  
Andreas B. Ralston ◽  
Joshua M. Roper ◽  
...  

Concussion injuries remain a significant public health challenge. A significant unmet clinical need remains for tools that allow related physiological impairments and longer-term health risks to be identified earlier, better quantified, and more easily monitored over time. We address this challenge by combining a head-mounted wearable inertial motion unit (IMU)-based physiological vibration acceleration (“phybrata”) sensor and several candidate machine learning (ML) models. The performance of this solution is assessed for both binary classification of concussion patients and multiclass predictions of specific concussion-related neurophysiological impairments. Results are compared with previously reported approaches to ML-based concussion diagnostics. Using phybrata data from a previously reported concussion study population, four different machine learning models (Support Vector Machine, Random Forest Classifier, Extreme Gradient Boost, and Convolutional Neural Network) are first investigated for binary classification of the test population as healthy vs. concussion (Use Case 1). Results are compared for two different data preprocessing pipelines, Time-Series Averaging (TSA) and Non-Time-Series Feature Extraction (NTS). Next, the three best-performing NTS models are compared in terms of their multiclass prediction performance for specific concussion-related impairments: vestibular, neurological, both (Use Case 2). For Use Case 1, the NTS model approach outperformed the TSA approach, with the two best algorithms achieving an F1 score of 0.94. For Use Case 2, the NTS Random Forest model achieved the best performance in the testing set, with an F1 score of 0.90, and identified a wider range of relevant phybrata signal features that contributed to impairment classification compared with manual feature inspection and statistical data analysis. The overall classification performance achieved in the present work exceeds previously reported approaches to ML-based concussion diagnostics using other data sources and ML models. This study also demonstrates the first combination of a wearable IMU-based sensor and ML model that enables both binary classification of concussion patients and multiclass predictions of specific concussion-related neurophysiological impairments.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Jing Zhang ◽  
Guang Lu ◽  
Jiaquan Li ◽  
Chuanwen Li

Mining useful knowledge from high-dimensional data is a hot research topic. Efficient and effective sample classification and feature selection are challenging tasks due to high dimensionality and small sample size of microarray data. Feature selection is necessary in the process of constructing the model to reduce time and space consumption. Therefore, a feature selection model based on prior knowledge and rough set is proposed. Pathway knowledge is used to select feature subsets, and rough set based on intersection neighborhood is then used to select important feature in each subset, since it can select features without redundancy and deals with numerical features directly. In order to improve the diversity among base classifiers and the efficiency of classification, it is necessary to select part of base classifiers. Classifiers are grouped into several clusters by k-means clustering using the proposed combination distance of Kappa-based diversity and accuracy. The base classifier with the best classification performance in each cluster will be selected to generate the final ensemble model. Experimental results on three Arabidopsis thaliana stress response datasets showed that the proposed method achieved better classification performance than existing ensemble models.


MATEMATIKA ◽  
2020 ◽  
Vol 36 (1) ◽  
pp. 43-49
Author(s):  
T Dwi Ary Widhianingsih ◽  
Heri Kuswanto ◽  
Dedy Dwi Prastyo

Logistic regression is one of the commonly used classification methods. It has some advantages, specifically related to hypothesis testing and its objective function. However, it also has some disadvantages in the case of high-dimensional data, such as multicolinearity, over-fitting, and a high computational burden. Ensemblebased classification methods have been proposed to overcome these problems. The logistic regression ensemble (LORENS) method is expected to improve the classification performance of basic logistic regression. In this paper, we apply it to the case of drug discovery with the objective of obtaining candidate compounds to protect the normal non-cancerous cells, which is considered to be a problem with a data-set of high dimensionality. The experimental results show that it performs well, with an accuracy of 69% and AUC of 0.7306.


Author(s):  
Jianzhong Wang

We propose a novel semi-supervised learning (SSL) scheme using adaptive interpolation on multiple one-dimensional (1D) embedded data. For a given high-dimensional dataset, we smoothly map it onto several different 1D sequences, so that the labeled subset is converted to a 1D subset for each of these sequences. Applying the cubic interpolation of the labeled subset, we obtain a subset of unlabeled points, which are assigned to the same label in all interpolations. Selecting a proportion of these points at random and adding them to the current labeled subset, we build a larger labeled subset for the next interpolation. Repeating the embedding and interpolation, we enlarge the labeled subset gradually, and finally reach a labeled set with a reasonable large size, based on which the final classifier is constructed. We explore the use of the proposed scheme in the classification of handwritten digits and show promising results.


PLoS ONE ◽  
2019 ◽  
Vol 14 (8) ◽  
pp. e0220765
Author(s):  
Shesh N. Rai ◽  
Sudhir Srivastava ◽  
Jianmin Pan ◽  
Xiaoyong Wu ◽  
Somesh P. Rai ◽  
...  

2009 ◽  
Vol 21 (2) ◽  
pp. 203-216 ◽  
Author(s):  
Katarina Domijan ◽  
Simon P. Wilson

Sign in / Sign up

Export Citation Format

Share Document