MULTI-CLASS CLASSIFICATION VIA SUBSPACE MODELING

Aiming to build a satisfactory supervised classifier, this paper proposes a Multi-class Subspace Modeling (MSM) classification framework. The framework consists of three parts, namely Principal Component Classifier Training Array, Principal Component Classifier Testing Array, and Label Coordinator. The role of Principal Component Classifier Training Array is to get a set of optimized parameters and principal components from each subspace-based training classifier and pass them to the corresponding subspace-based testing classifier in Principal Component Classifier Testing Array. In each subspace-based training classifier, the instances are projected from the original space into the principal component (PC) subspace, where a PC selection method is developed and applied to construct the PC subspace. In Principal Component Classifier Testing Array, each subspace-based testing classifier will utilize the parameters and PCs from its corresponding subspace-based training classifier to determine whether to assign its class label to the instances. Since one instance may be assigned zero or more than one label by the Principal Component Classifier Testing Array, the Label Coordinator is designed to coordinate the final class label of an instance according to its Attaching Proportion (AP) values towards multiple classes. To evaluate the classification accuracy, 10 rounds of 3-fold cross-validation are conducted and many popular classification algorithms (like SVM, Decision Trees, Multi-layer Perceptron, Logistic, etc.) are served as comparative peers. Experimental results show that our proposed MSM classification framework outperforms those compared classifiers in 10 data sets, among which 8 of them hold a confidence level of significance higher than 99.5%. In addition, our framework shows its ability of handling imbalanced data set. Finally, a demo is built to display the accuracy and detailed information of the classification.

Download Full-text

Rotating Machinery Fault Diagnosis for Imbalanced Data Based on Fast Clustering Algorithm and Support Vector Machine

Journal of Sensors ◽

10.1155/2017/8092691 ◽

2017 ◽

Vol 2017 ◽

pp. 1-15 ◽

Cited By ~ 15

Author(s):

Xiaochen Zhang ◽

Dongxiang Jiang ◽

Te Han ◽

Nanfei Wang ◽

Wenguang Yang ◽

...

Keyword(s):

Support Vector Machine ◽

Fault Diagnosis ◽

Clustering Algorithm ◽

Imbalanced Data ◽

Principal Component ◽

Rotating Machinery ◽

Support Vector ◽

Data Set ◽

Diagnosis Model ◽

Sample Set

To diagnose rotating machinery fault for imbalanced data, a method based on fast clustering algorithm (FCA) and support vector machine (SVM) was proposed. Combined with variational mode decomposition (VMD) and principal component analysis (PCA), sensitive features of the rotating machinery fault were obtained and constituted the imbalanced fault sample set. Next, a fast clustering algorithm was adopted to reduce the number of the majority data from the imbalanced fault sample set. Consequently, the balanced fault sample set consisted of the clustered data and the minority data from the imbalanced fault sample set. After that, SVM was trained with the balanced fault sample set and tested with the imbalanced fault sample set so the fault diagnosis model of the rotating machinery could be obtained. Finally, the gearbox fault data set and the rolling bearing fault data set were adopted to test the fault diagnosis model. The experimental results showed that the fault diagnosis model could effectively diagnose the rotating machinery fault for imbalanced data.

Download Full-text

Customer-Related Social Stressors

Journal of Personnel Psychology ◽

10.1027/1866-5888/a000132 ◽

2015 ◽

Vol 14 (4) ◽

pp. 165-181 ◽

Cited By ~ 14

Author(s):

Sarah Dudenhöffer ◽

Christian Dormann

Keyword(s):

Service Providers ◽

Principal Component ◽

Well Being ◽

Emotional Dissonance ◽

Social Stressors ◽

Data Set ◽

Confirmatory Factor Analyses ◽

Confirmatory Factor ◽

Service Jobs ◽

Single Data

Abstract. The purpose of this study was to replicate the dimensions of the customer-related social stressors (CSS) concept across service jobs, to investigate their consequences for service providers’ well-being, and to examine emotional dissonance as mediator. Data of 20 studies comprising of different service jobs (N = 4,199) were integrated into a single data set and meta-analyzed. Confirmatory factor analyses and explorative principal component analysis confirmed four CSS scales: disproportionate expectations, verbal aggression, ambiguous expectations, disliked customers. These CSS scales were associated with burnout and job satisfaction. Most of the effects were partially mediated by emotional dissonance. Further analyses revealed that differences among jobs exist with regard to the factor solution. However, associations between CSS and outcomes are mainly invariant across service jobs.

Download Full-text

Spectroscopy-Based Mapping with Scanning Microwave Impedance Microscopy

10.31399/asm.cp.istfa2018p0550 ◽

2018 ◽

Author(s):

Peter De Wolf ◽

Zhuangqun Huang ◽

Bede Pittenger

Keyword(s):

Single Point ◽

Electrical Characterization ◽

Principal Component ◽

High Sensitivity ◽

Data Cube ◽

Nanometer Scale ◽

Learning Approaches ◽

Data Set ◽

3D Data ◽

Higher Dimensional

Abstract Methods are available to measure conductivity, charge, surface potential, carrier density, piezo-electric and other electrical properties with nanometer scale resolution. One of these methods, scanning microwave impedance microscopy (sMIM), has gained interest due to its capability to measure the full impedance (capacitance and resistive part) with high sensitivity and high spatial resolution. This paper introduces a novel data-cube approach that combines sMIM imaging and sMIM point spectroscopy, producing an integrated and complete 3D data set. This approach replaces the subjective approach of guessing locations of interest (for single point spectroscopy) with a big data approach resulting in higher dimensional data that can be sliced along any axis or plane and is conducive to principal component analysis or other machine learning approaches to data reduction. The data-cube approach is also applicable to other AFM-based electrical characterization modes.

Download Full-text

QSAR Study of PARP Inhibitors by GA-MLR, GA-SVM and GA-ANN Approaches

Current Analytical Chemistry ◽

10.2174/1573411016999200518083359 ◽

2020 ◽

Vol 16 (8) ◽

pp. 1088-1105

Author(s):

Nafiseh Vahedi ◽

Majid Mohammadhosseini ◽

Mehdi Nekoei

Keyword(s):

Present Report ◽

Principal Component ◽

Parp Inhibitors ◽

Support Vector ◽

Ann Model ◽

Statistical Parameters ◽

Qsar Study ◽

Data Set ◽

Test Set ◽

Non Linear

Background: The poly(ADP-ribose) polymerases (PARP) is a nuclear enzyme superfamily present in eukaryotes. Methods: In the present report, some efficient linear and non-linear methods including multiple linear regression (MLR), support vector machine (SVM) and artificial neural networks (ANN) were successfully used to develop and establish quantitative structure-activity relationship (QSAR) models capable of predicting pEC50 values of tetrahydropyridopyridazinone derivatives as effective PARP inhibitors. Principal component analysis (PCA) was used to a rational division of the whole data set and selection of the training and test sets. A genetic algorithm (GA) variable selection method was employed to select the optimal subset of descriptors that have the most significant contributions to the overall inhibitory activity from the large pool of calculated descriptors. Results: The accuracy and predictability of the proposed models were further confirmed using crossvalidation, validation through an external test set and Y-randomization (chance correlations) approaches. Moreover, an exhaustive statistical comparison was performed on the outputs of the proposed models. The results revealed that non-linear modeling approaches, including SVM and ANN could provide much more prediction capabilities. Conclusion: Among the constructed models and in terms of root mean square error of predictions (RMSEP), cross-validation coefficients (Q2 LOO and Q2 LGO), as well as R2 and F-statistical value for the training set, the predictive power of the GA-SVM approach was better. However, compared with MLR and SVM, the statistical parameters for the test set were more proper using the GA-ANN model.

Download Full-text

Imbalanced Data Detection Kernel Method in Closed Systems

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.3652 ◽

2013 ◽

Vol 756-759 ◽

pp. 3652-3658

Author(s):

You Li Lu ◽

Jun Luo

Keyword(s):

Kernel Methods ◽

Kernel Method ◽

Imbalanced Data ◽

Data Detection ◽

Data Sets ◽

System Call ◽

Data Set ◽

Imbalanced Data Sets ◽

Lower Complexity ◽

Closed Systems

Under the study of Kernel Methods, this paper put forward two improved algorithm which called R-SVM & I-SVDD in order to cope with the imbalanced data sets in closed systems. R-SVM used K-means algorithm clustering space samples while I-SVDD improved the performance of original SVDD by imbalanced sample training. Experiment of two sets of system call data set shows that these two algorithms are more effectively and R-SVM has a lower complexity.

Download Full-text

Evaluation of the Aging Behavior of High Density Polyethylene in Thermal Oxidative Environment by Principal Component Analysis

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.727.447 ◽

2017 ◽

Vol 727 ◽

pp. 447-449 ◽

Cited By ~ 1

Author(s):

Jun Dai ◽

Hua Yan ◽

Jian Jian Yang ◽

Jun Jun Guo

Keyword(s):

Principal Component Analysis ◽

Impact Strength ◽

High Density Polyethylene ◽

Tensile Modulus ◽

Principal Component ◽

Component Analysis ◽

High Density ◽

Aging Behavior ◽

Data Set ◽

The Impact

To evaluate the aging behavior of high density polyethylene (HDPE) under an artificial accelerated environment, principal component analysis (PCA) was used to establish a non-dimensional expression Z from a data set of multiple degradation parameters of HDPE. In this study, HDPE samples were exposed to the accelerated thermal oxidative environment for different time intervals up to 64 days. The results showed that the combined evaluating parameter Z was characterized by three-stage changes. The combined evaluating parameter Z increased quickly in the first 16 days of exposure and then leveled off. After 40 days, it began to increase again. Among the 10 degradation parameters, branching degree, carbonyl index and hydroxyl index are strongly associated. The tensile modulus is highly correlated with the impact strength. The tensile strength, tensile modulus and impact strength are negatively correlated with the crystallinity.

Download Full-text

Research on imbalanced data set preprocessing based on deep learning

2021 Asia-Pacific Conference on Communications Technology and Computer Science (ACCTCS) ◽

10.1109/acctcs52002.2021.00023 ◽

2021 ◽

Author(s):

Wang Fangyu ◽

Zhang Jianhui ◽

Bu Youjun ◽

Chen Bo

Keyword(s):

Deep Learning ◽

Imbalanced Data ◽

Data Set

Download Full-text

Agreement and reliability of magnetic, angular rate, and gravity (MARG) sensors to assess multiple body segment’s external loads during off-road running

Proceedings of the Institution of Mechanical Engineers Part P Journal of Sports Engineering and Technology ◽

10.1177/1754337121996986 ◽

2021 ◽

pp. 175433712199698

Author(s):

Daniel Rojas-Valverde ◽

José Pino-Ortega ◽

Rafael Timón ◽

Randall Gutiérrez-Vargas ◽

Braulio Sánchez-Ureña ◽

...

Keyword(s):

Vastus Lateralis ◽

Angular Rate ◽

Wearable Sensors ◽

Intraclass Correlation ◽

Principal Component ◽

Test Reliability ◽

Data Set ◽

Body Segments ◽

External Loads ◽

Good Agreement

The extensive use of wearable sensors in sport medicine, exercise medicine, and health has increased the interest in their study. That is why it is necessary to test these technologies’ efficiency, effectiveness, agreement, and reliability in different settings. Consequently, the purpose of this article was to analyze the magnetic, angular rate, and gravity (MARG) sensor’s test-retest agreement and reliability when assessing multiple body segments’ external loads during off-road running. A total of 18 off-road runners (38.78 ± 10.38 years, 73.24 ± 12.6 kg, 172.17 ± 9.48 cm) ran two laps (1st and 2nd Lap) of a 12 km circuit wearing six MARG sensors. The sensors were attached to six different body segments: left (MPLeft) and right (MPRight) malleolus peroneus, left (VLLeft) and right (VLRight) vastus lateralis, lumbar (L1-L3), and thorax (T2-T4) using a special neoprene suit. After a principal component analysis (PCA) was performed, the total data set variance of all body segments was represented by 44.08%–70.64% for the 1st PCA factor considering two variables, Player LoadRT and Impacts, on L1-L3, respectively. These two variables were chosen among three total accelerometry-based external load indicators (ABELIs) to perform the agreement and reliability tests due to their relevance based on PCAs for each body segment. There were no significant differences between laps in the Player LoadRT or Impacts ( p > 0.05, trivial). The intraclass correlation and lineal correlation showed a substantial to almost perfect over-time test consistency assessed via reliability in both Player LoadRT and Impacts. Bias and t-test assessments showed good agreement between Laps. It can be concluded that MARGs sensors offer significant test re-test reliability and good agreement when assessing off-road kinematics in the six different body segments.

Download Full-text

Effect of Speech-to-Noise Ratio and Luminance on a Range of Current and Potential Pupil Response Measures to Assess Listening Effort

Trends in Hearing ◽

10.1177/23312165211009351 ◽

2021 ◽

Vol 25 ◽

pp. 233121652110093

Author(s):

Patrycja Książek ◽

Adriana A. Zekveld ◽

Dorothea Wendt ◽

Lorenz Fiedler ◽

Thomas Lunner ◽

...

Keyword(s):

Time Course ◽

Principal Component ◽

Growth Curve Analysis ◽

Listening Effort ◽

Data Set ◽

Holistic View ◽

Hearing Research ◽

Speech In Noise ◽

Noise Test ◽

Noise Ratio

In hearing research, pupillometry is an established method of studying listening effort. The focus of this study was to evaluate several pupil measures extracted from the Task-Evoked Pupil Responses (TEPRs) in speech-in-noise test. A range of analysis approaches was applied to extract these pupil measures, namely (a) pupil peak dilation (PPD); (b) mean pupil dilation (MPD); (c) index of pupillary activity; (d) growth curve analysis (GCA); and (e) principal component analysis (PCA). The effect of signal-to-noise ratio (SNR; Data Set A: –20 dB, –10 dB, +5 dB SNR) and luminance (Data Set B: 0.1 cd/m2, 360 cd/m2) on the TEPRs were investigated. Data Sets A and B were recorded during a speech-in-noise test and included TEPRs from 33 and 27 normal-hearing native Dutch speakers, respectively. The main results were as follows: (a) A significant effect of SNR was revealed for all pupil measures extracted in the time domain (PPD, MPD, GCA, PCA); (b) Two time series analysis approaches (GCA, PCA) provided modeled temporal profiles of TEPRs (GCA); and time windows spanning subtasks performed in a speech-in-noise test (PCA); and (c) All pupil measures revealed a significant effect of luminance. In conclusion, multiple pupil measures showed similar effects of SNR, suggesting that effort may be reflected in multiple aspects of TEPR. Moreover, a direct analysis of the pupil time course seems to provide a more holistic view of TEPRs, yet further research is needed to understand and interpret its measures. Further research is also required to find pupil measures less sensitive to changes in luminance.

Download Full-text