Learning Target Class Feature Subspace (LTC-FS) Using Eigenspace Analysis and N-ary Search-Based Autonomous Hyperparameter Tuning for OCSVM

Author(s):  
Sanjay Kumar Sonbhadra ◽  
Sonali Agarwal ◽  
P. Nagabhushan

Existing dimensionality reduction (DR) techniques such as principal component analysis (PCA) and its variants are not suitable for target class mining due to the negligence of unique statistical properties of class-of-interest (CoI) samples. Conventionally, these approaches utilize higher or lower eigenvalued principal components (PCs) for data transformation; but the higher eigenvalued PCs may split the target class, whereas lower eigenvalued PCs do not contribute significant information and wrong selection of PCs leads to performance degradation. Considering these facts, the present research offers a novel target class-guided feature extraction method. In this approach, initially, the eigendecomposition is performed on variance–covariance matrix of only the target class samples, where the higher- and lower-valued eigenvectors are rejected via statistical analysis, and the selected eigenvectors are utilized to extract the most promising feature subspace. The extracted feature-subset gives a more tighter description of the CoI with enhanced associativity among target class samples and ensures the strong separation from nontarget class samples. One-class support vector machine (OCSVM) is evaluated to validate the performance of learned features. To obtain optimized values of hyperparameters of OCSVM a novel [Formula: see text]-ary search-based autonomous method is also proposed. Exhaustive experiments with a wide variety of datasets are performed in feature-space (original and reduced) and eigenspace (obtained from original and reduced features) to validate the performance of the proposed approach in terms of accuracy, precision, specificity and sensitivity.

2011 ◽  
Vol 181-182 ◽  
pp. 830-835
Author(s):  
Min Song Li

Latent Semantic Indexing(LSI) is an effective feature extraction method which can capture the underlying latent semantic structure between words in documents. However, it is probably not the most appropriate for text categorization to use the method to select feature subspace, since the method orders extracted features according to their variance,not the classification power. We proposed a method based on support vector machine to extract features and select a Latent Semantic Indexing that be suited for classification. Experimental results indicate that the method improves classification performance with more compact representation.


2013 ◽  
Vol 2013 ◽  
pp. 1-6 ◽  
Author(s):  
Ersen Yılmaz

An expert system having two stages is proposed for cardiac arrhythmia diagnosis. In the first stage, Fisher score is used for feature selection to reduce the feature space dimension of a data set. The second stage is classification stage in which least squares support vector machines classifier is performed by using the feature subset selected in the first stage to diagnose cardiac arrhythmia. Performance of the proposed expert system is evaluated by using an arrhythmia data set which is taken from UCI machine learning repository.


Author(s):  
F. Samadzadega ◽  
H. Hasani

Hyperspectral imagery is a rich source of spectral information and plays very important role in discrimination of similar land-cover classes. In the past, several efforts have been investigated for improvement of hyperspectral imagery classification. Recently the interest in the joint use of LiDAR data and hyperspectral imagery has been remarkably increased. Because LiDAR can provide structural information of scene while hyperspectral imagery provide spectral and spatial information. The complementary information of LiDAR and hyperspectral data may greatly improve the classification performance especially in the complex urban area. In this paper feature level fusion of hyperspectral and LiDAR data is proposed where spectral and structural features are extract from both dataset, then hybrid feature space is generated by feature stacking. Support Vector Machine (SVM) classifier is applied on hybrid feature space to classify the urban area. In order to optimize the classification performance, two issues should be considered: SVM parameters values determination and feature subset selection. Bees Algorithm (BA) is powerful meta-heuristic optimization algorithm which is applied to determine the optimum SVM parameters and select the optimum feature subset simultaneously. The obtained results show the proposed method can improve the classification accuracy in addition to reducing significantly the dimension of feature space.


2017 ◽  
Vol 37 (1) ◽  
pp. 68 ◽  
Author(s):  
Camilo Pulido Rojas ◽  
Leonardo Solaque Guzmán ◽  
Nelson Velasco Toledo

This paper presents a classification system for weeds and vegetables from outdoor crop images. The classifier is based on support vector machine (SVM) with its extension to nonlinear case using radial basis function (RBF) and optimizing its scale parameter σ to smooth the decision boundary. The feature space is the result of principal component analysis (PCA) for 10 texture measurements calculated from gray level co-occurrence matrices (GLCM). The results indicate that classifier performance is above 90%, validated with specificity, sensitivity and precision calculations.


2018 ◽  
Author(s):  
Deva Surya Vivek Madala ◽  
Ayushree Gangal ◽  
Shreyash Krishna ◽  
Anjali Goyal ◽  
Ashish Sureka

Background. Automated Essay Scoring (AES) is an area which falls at the intersection of computing and linguistics. AES systems conduct a linguistic analysis of a given essay or prose and then estimates the writing skill or the essay quality in the form a numeric score or a letter grade. AES systems are useful for the school, university and testing company community for efficiently and effectively scaling the task of grading a large number of essays. Methods. We propose an approach for automatically grading a given essay based on 9 surface level and deep linguistic features, 2 feature selection and ranking techniques and 4 text classification algorithms. We conduct a series of experiments on publicly available manually graded and annotated essay data and demonstrate the effectiveness of our approach. We investigate the performance of two different features selection techniques (1) RELIEF (2) Correlation-based Feature Subset Selection (CFS) with three different machine learning classifiers (kNN, SVM and Linear Regression). We also apply feature normalization and scaling. Results. Our results indicate that features like world count with respect to the world limit, appropriate use of vocabulary, relevance of the terms in the essay with the given topic and coherency between sentences and paragraphs are good predictors of essay score. Our analysis reveals that not all features are equally important and few features are more relevant and better correlated with respect to the target class. We conduct experiments with k-nearest neighbour, logistic regression and support vector machine based classifiers. Our results on 4075 essays across multiple topics and grade score range are encouraging with an accuracy of 73% to 93%. Discussion. Our experiments and approach are based on Grade 7 to Grade 10 essays which can be generalized to essays from other grades and level after doing context specific customization. Few features are more relevant and important than other features and it is interplay or combination of multiple feature values which determines the final score. We observe that different classifiers result in difference accuracy.


Author(s):  
Soumia Kerrache ◽  
Beladgham Mohammed ◽  
Hamza Aymen ◽  
Kadri Ibrahim

Features extraction is an essential process in identifying person biometrics because the effectiveness of the system depends on it. Multiresolution Analysis success can be used in the system of a person’s identification and pattern recognition. In this paper, we present a feature extraction method for two-dimensional face and iris authentication.  Our approach is a combination of principal component analysis (PCA) and curvelet transform as an improved fusion approach for feature extraction. The proposed fusion approach involves image denoising using 2D-Curvelet transform to achieve compact representations of curves singularities. This is followed by the application of PCA as a fusion rule to improve upon the spatial resolution. The limitations of the only PCA algorithm are a poor recognition speed and complex mathematical calculating load, to reduce these limitations, we are applying the curvelet transform. <br /> To assess the performance of the presented method, we have employed three classification techniques: Neural networks (NN), K-Nearest Neighbor (KNN) and Support Vector machines (SVM).<br />The results reveal that the extraction of image features is more efficient using Curvelet/PCA.


2021 ◽  
pp. 5-20
Author(s):  
Ivan Murenin ◽  
◽  
Natalia Ampilova ◽  

The computational analysis of wheat images to identify wheat varieties and quality has wide applications in agriculture and production. This paper presents an approach to the analysis and classification of images of wheat samples obtained by the method of crystallization with additives. In tests 3 concentration and 4 times for each concentration were used, such that each type of wheat was characterized by 12 images. We used the images obtained for 5 classes. All the images have similar visual characteristics, that makes it difficult to use statistical methods of analysis. The multifractal spectrum obtained by calculating the local density function was used as a classifying feature. The classification was performed on a set of 60 wheat images corresponding to 5 different samples (classes) by various machine learning methods such as linear regression, naive Bayesian classifier, support vector machine, and random forest. In some cases, to reduce the dimension of the feature space the method of principal components was applied. To identify the relationships between wheat samples obtained at different concentrations, 3 different clustering methods were used. The classification results showed that the multifractal spectrum as classifying sign and using the random forest method in combination with the principal component analysis allow identifying wheat samples obtained by crystallization with additives, being the highest average classi- fication accuracy is 74 %.


2019 ◽  
Vol 892 ◽  
pp. 200-209
Author(s):  
Rayner Pailus ◽  
Rayner Alfred

Adaboost Viola-Jones method is indeed a profound discovery in detecting face images mainly because it is fast, light and one of the easiest methods of detecting face images among other techniques of face detection. Viola Jones uses Haar wavelet filter to detect face images and it produces almost 80%accuracy of face detection. This paper discusses proposed methodology and algorithms that involved larger library of filters used to create more discrimination features among the images by processing the proposed 15 Haar rectangular features (an extension from 4 Haar wavelet filters of Viola Jones) and used them in multiple adaptive ensemble process of detecting face image. After facial detection, the process continues with normalization processes by applying feature extraction such as PCA combined with LDA or LPP to extract our week learners’ wavelet for more classification features. Upon the process of feature extraction proposed feature selection to index these extracted data. These extracted vectors are used for training and creating MADBoost (Multiple Adaptive Diversified Boost)(an improvement of Adaboost, which uses multiple feature extraction methods combined with multiple classifiers) is able to capture, recognize and distinguish face image (s) faster. MADBoost applies the ensemble approach with better weights for classification to produce better face recognition results. Three experiments have been conducted to investigate the performance of the proposed MADBoost with three other classifiers, Neural Network (NN), Support Vector Machines (SVM) and Adaboost classifiers using Principal Component Analysis (PCA) as the feature extraction method. These experiments were tested against obstacles of POIES (Pose, Obstruction, Illumination, Expression, Sizes). Based on the results obtained, Madboost is found to be able to improve the recognition performance in matching failures, incorrect matching, matching success percentages and acceptable time taken to perform the classification task.


2019 ◽  
Vol 9 (8) ◽  
pp. 1645-1654
Author(s):  
Zhizhong Wang ◽  
Hongyi Li ◽  
Chuang Han ◽  
Songwei Wang ◽  
Li Shi

Cardiovascular diseases have become more and more prominent in recent years, which have proven to be a major threat to people's health. Accurate detection of arrhythmia in patients has important implications for clinical treatment. The aim of this study was to propose a novel automatic classification method for arrhythmia in order to improve classification accuracy. The electrocardiogram (ECG) signal was subjected preprocessing for denoising purposes using a wavelet transform. Then, the local and global characteristics of the beat, which contained RR interval features according with the clinical diagnosis criterion, morphology features based on wavelet packet decomposition and statistical features along with kurtosis coefficient, skewness coefficient and variance are exploited and fused. Meanwhile, the dimensionality of wavelet packet coefficients were reduced via principal component analysis (PCA). Finally, these features were used as the input of the random forest classifier to train the model and were then compared with the support vector machine (SVM) and back propagation (BP) neural networks. Based on 100,647 beats from the MIT-BIH database, the proposed method achieved an average accuracy, specificity and sensitivity of 99.08%, 99.00% and 89.31%, respectively, using the intra-patient beats, and 92.31%, 89.98% and 37.47%, respectively, using the inter-patient beats. Moreover, two classification schemes, namely, inter-patient and intra-patient scheme, were validated. Compared with the other methods referred to in this paper, the performance of the novel method yielded better results.


Author(s):  
Xiaojing Gao ◽  
Heru Xue ◽  
Xin Pan ◽  
Xinhua Jiang ◽  
Yanqing Zhou ◽  
...  

In this paper, we propose a novel approach of Gabor feature based on bi-directional two-dimensional principal component analysis ((2D)2PCA) for somatic cells recognition. Firstly, Gabor features of different orientations and scales are extracted by the convolution of Gabor filter bank. Secondly, dimensionality reduction of the feature space applies (2D)2PCA in both row and column. Finally, the classifier uses Support Vector Machine (SVM) to achieve our goal. The experimental results are obtained using a large set of images from different sources. The results of our proposed method are not only efficient in accuracy and speed, but also robust to illumination in bovine mastitis via optical microscopy.


Sign in / Sign up

Export Citation Format

Share Document