scholarly journals Simultaneous Feature Selection and Classification for Data-Adaptive Kernel-Penalized SVM

Mathematics ◽  
2020 ◽  
Vol 8 (10) ◽  
pp. 1846
Author(s):  
Xin Liu ◽  
Bangxin Zhao ◽  
Wenqing He

Simultaneous feature selection and classification have been explored in the literature to extend the support vector machine (SVM) techniques by adding penalty terms to the loss function directly. However, it is the kernel function that controls the performance of the SVM, and an imbalance in the data will deteriorate the performance of an SVM. In this paper, we examine a new method of simultaneous feature selection and binary classification. Instead of incorporating the standard loss function of the SVM, a penalty is added to the data-adaptive kernel function directly to control the performance of the SVM, by firstly conformally transforming the kernel functions of the SVM, and then re-conducting an SVM classifier based on the sparse features selected. Both convex and non-convex penalties, such as least absolute shrinkage and selection (LASSO), moothly clipped absolute deviation (SCAD) and minimax concave penalty (MCP) are explored, and the oracle property of the estimator is established accordingly. An iterative optimization procedure is applied as there is no analytic form of the estimated coefficients available. Numerical comparisons show that the proposed method outperforms the competitors considered when data are imbalanced, and it performs similarly to the competitors when data are balanced. The method can be easily applied in medical images from different platforms.

2013 ◽  
Vol 2013 ◽  
pp. 1-7 ◽  
Author(s):  
Rakesh Patra ◽  
Sujan Kumar Saha

Support vector machine (SVM) is one of the popular machine learning techniques used in various text processing tasks including named entity recognition (NER). The performance of the SVM classifier largely depends on the appropriateness of the kernel function. In the last few years a number of task-specific kernel functions have been proposed and used in various text processing tasks, for example, string kernel, graph kernel, tree kernel and so on. So far very few efforts have been devoted to the development of NER task specific kernel. In the literature we found that the tree kernel has been used in NER task only for entity boundary detection or reannotation. The conventional tree kernel is unable to execute the complete NER task on its own. In this paper we have proposed a kernel function, motivated by the tree kernel, which is able to perform the complete NER task. To examine the effectiveness of the proposed kernel, we have applied the kernel function on the openly available JNLPBA 2004 data. Our kernel executes the complete NER task and achieves reasonable accuracy.


Author(s):  
B. Yekkehkhany ◽  
A. Safari ◽  
S. Homayouni ◽  
M. Hasanlou

In this paper, a framework is developed based on Support Vector Machines (SVM) for crop classification using polarimetric features extracted from multi-temporal Synthetic Aperture Radar (SAR) imageries. The multi-temporal integration of data not only improves the overall retrieval accuracy but also provides more reliable estimates with respect to single-date data. Several kernel functions are employed and compared in this study for mapping the input space to higher Hilbert dimension space. These kernel functions include linear, polynomials and Radial Based Function (RBF). <br><br> The method is applied to several UAVSAR L-band SAR images acquired over an agricultural area near Winnipeg, Manitoba, Canada. In this research, the temporal alpha features of H/A/α decomposition method are used in classification. The experimental tests show an SVM classifier with RBF kernel for three dates of data increases the Overall Accuracy (OA) to up to 3% in comparison to using linear kernel function, and up to 1% in comparison to a 3rd degree polynomial kernel function.


2020 ◽  
Vol 10 (16) ◽  
pp. 5527 ◽  
Author(s):  
Aref Eskandari ◽  
Jafar Milimonfared ◽  
Mohammadreza Aghaei ◽  
Angèle H.M.E. Reinders

Photovoltaic (PV) monitoring and fault detection are very crucial to enhance the service life and reliability of PV systems. It is difficult to detect and classify the faults at the Direct Current (DC) side of PV arrays by common protection devices, especially Line-to-Line (LL) faults, because such faults are not detectable under high impedance fault and low mismatch conditions. If these faults are not diagnosed, they may significantly reduce the output power of PV systems and even cause fire catastrophe. Recently, many efforts have been devoted to detecting and classifying LL faults. However, these methods could not efficiently detect and classify the LL faults under high impedance and low mismatch. This paper proposes a novel fault diagnostic scheme in accordance with the two main stages. First, the key features are extracted via analyzing Current–Voltage (I–V) characteristics under various LL fault events and normal operation. Second, a genetic algorithm (GA) is used for parameter optimization of the kernel functions used in the Support Vector Machine (SVM) classifier and feature selection in order to obtain higher performance in diagnosing the faults in PV systems. In contrast to previous studies, this method requires only a small dataset for the learning process and it has a higher accuracy in detecting and classifying the LL fault events under high impedance and low mismatch levels. The simulation results verify the validity and effectiveness of the proposed method in detecting and classifying of LL faults in PV arrays even under complex conditions. The proposed method detects and classifies the LL faults under any condition with an average accuracy of 96% and 97.5%, respectively.


2020 ◽  
Vol 20 ◽  
Author(s):  
Hongwei Zhang ◽  
Steven Wang ◽  
Tao Huang

Aims: We would like to identify the biomarkers for chronic hypersensitivity pneumonitis (CHP) and facilitate the precise gene therapy of CHP. Background: Chronic hypersensitivity pneumonitis (CHP) is an interstitial lung disease caused by hypersensitive reactions to inhaled antigens. Clinically, the tasks of differentiating between CHP and other interstitial lungs diseases, especially idiopathic pulmonary fibrosis (IPF), were challenging. Objective: In this study, we analyzed the public available gene expression profile of 82 CHP patients, 103 IPF patients, and 103 control samples to identify the CHP biomarkers. Method: The CHP biomarkers were selected with advanced feature selection methods: Monte Carlo Feature Selection (MCFS) and Incremental Feature Selection (IFS). A Support Vector Machine (SVM) classifier was built. Then, we analyzed these CHP biomarkers through functional enrichment analysis and differential co-expression analysis. Result: There were 674 identified CHP biomarkers. The co-expression network of these biomarkers in CHP included more negative regulations and the network structure of CHP was quite different from the network of IPF and control. Conclusion: The SVM classifier may serve as an important clinical tool to address the challenging task of differentiating between CHP and IPF. Many of the biomarker genes on the differential co-expression network showed great promise in revealing the underlying mechanisms of CHP.


Author(s):  
B. Venkatesh ◽  
J. Anuradha

In Microarray Data, it is complicated to achieve more classification accuracy due to the presence of high dimensions, irrelevant and noisy data. And also It had more gene expression data and fewer samples. To increase the classification accuracy and the processing speed of the model, an optimal number of features need to extract, this can be achieved by applying the feature selection method. In this paper, we propose a hybrid ensemble feature selection method. The proposed method has two phases, filter and wrapper phase in filter phase ensemble technique is used for aggregating the feature ranks of the Relief, minimum redundancy Maximum Relevance (mRMR), and Feature Correlation (FC) filter feature selection methods. This paper uses the Fuzzy Gaussian membership function ordering for aggregating the ranks. In wrapper phase, Improved Binary Particle Swarm Optimization (IBPSO) is used for selecting the optimal features, and the RBF Kernel-based Support Vector Machine (SVM) classifier is used as an evaluator. The performance of the proposed model are compared with state of art feature selection methods using five benchmark datasets. For evaluation various performance metrics such as Accuracy, Recall, Precision, and F1-Score are used. Furthermore, the experimental results show that the performance of the proposed method outperforms the other feature selection methods.


Mathematics ◽  
2021 ◽  
Vol 9 (9) ◽  
pp. 936
Author(s):  
Jianli Shao ◽  
Xin Liu ◽  
Wenqing He

Imbalanced data exist in many classification problems. The classification of imbalanced data has remarkable challenges in machine learning. The support vector machine (SVM) and its variants are popularly used in machine learning among different classifiers thanks to their flexibility and interpretability. However, the performance of SVMs is impacted when the data are imbalanced, which is a typical data structure in the multi-category classification problem. In this paper, we employ the data-adaptive SVM with scaled kernel functions to classify instances for a multi-class population. We propose a multi-class data-dependent kernel function for the SVM by considering class imbalance and the spatial association among instances so that the classification accuracy is enhanced. Simulation studies demonstrate the superb performance of the proposed method, and a real multi-class prostate cancer image dataset is employed as an illustration. Not only does the proposed method outperform the competitor methods in terms of the commonly used accuracy measures such as the F-score and G-means, but also successfully detects more than 60% of instances from the rare class in the real data, while the competitors can only detect less than 20% of the rare class instances. The proposed method will benefit other scientific research fields, such as multiple region boundary detection.


2018 ◽  
Vol 10 (7) ◽  
pp. 1123 ◽  
Author(s):  
Yuhang Zhang ◽  
Hao Sun ◽  
Jiawei Zuo ◽  
Hongqi Wang ◽  
Guangluan Xu ◽  
...  

Aircraft type recognition plays an important role in remote sensing image interpretation. Traditional methods suffer from bad generalization performance, while deep learning methods require large amounts of data with type labels, which are quite expensive and time-consuming to obtain. To overcome the aforementioned problems, in this paper, we propose an aircraft type recognition framework based on conditional generative adversarial networks (GANs). First, we design a new method to precisely detect aircrafts’ keypoints, which are used to generate aircraft masks and locate the positions of the aircrafts. Second, a conditional GAN with a region of interest (ROI)-weighted loss function is trained on unlabeled aircraft images and their corresponding masks. Third, an ROI feature extraction method is carefully designed to extract multi-scale features from the GAN in the regions of aircrafts. After that, a linear support vector machine (SVM) classifier is adopted to classify each sample using their features. Benefiting from the GAN, we can learn features which are strong enough to represent aircrafts based on a large unlabeled dataset. Additionally, the ROI-weighted loss function and the ROI feature extraction method make the features more related to the aircrafts rather than the background, which improves the quality of features and increases the recognition accuracy significantly. Thorough experiments were conducted on a challenging dataset, and the results prove the effectiveness of the proposed aircraft type recognition framework.


Author(s):  
Gang Liu ◽  
Chunlei Yang ◽  
Sen Liu ◽  
Chunbao Xiao ◽  
Bin Song

A feature selection method based on mutual information and support vector machine (SVM) is proposed in order to eliminate redundant feature and improve classification accuracy. First, local correlation between features and overall correlation is calculated by mutual information. The correlation reflects the information inclusion relationship between features, so the features are evaluated and redundant features are eliminated with analyzing the correlation. Subsequently, the concept of mean impact value (MIV) is defined and the influence degree of input variables on output variables for SVM network based on MIV is calculated. The importance weights of the features described with MIV are sorted by descending order. Finally, the SVM classifier is used to implement feature selection according to the classification accuracy of feature combination which takes MIV order of feature as a reference. The simulation experiments are carried out with three standard data sets of UCI, and the results show that this method can not only effectively reduce the feature dimension and high classification accuracy, but also ensure good robustness.


2020 ◽  
Vol 10 (9) ◽  
pp. 3282
Author(s):  
Angela Shin-Yu Lien ◽  
Yi-Der Jiang ◽  
Jia-Ling Tsai ◽  
Jawl-Shan Hwang ◽  
Wei-Chao Lin

Fatigue and poor sleep quality are the most common clinical complaints of people with diabetes mellitus (DM). These complaints are early signs of DM and are closely related to diabetic control and the presence of complications, which lead to a decline in the quality of life. Therefore, an accurate measurement of the relationship between fatigue, sleep status, and the complication of DM nephropathy could lead to a specific definition of fatigue and an appropriate medical treatment. This study recruited 307 people with Type 2 diabetes from two medical centers in Northern Taiwan through a questionnaire survey and a retrospective investigation of medical records. In an attempt to identify the related factors and accurately predict diabetic nephropathy, we applied hybrid research methods, integrated biostatistics, and feature selection methods in data mining and machine learning to compare and verify the results. Consequently, the results demonstrated that patients with diabetic nephropathy have a higher fatigue level and Charlson comorbidity index (CCI) score than without neuropathy, the presence of neuropathy leads to poor sleep quality, lower quality of life, and poor metabolism. Furthermore, by considering feature selection in selecting representative features or variables, we achieved consistence results with a support vector machine (SVM) classifier and merely ten representative factors and a prediction accuracy as high as 74% in predicting the presence of diabetic nephropathy.


2020 ◽  
pp. 3397-3407
Author(s):  
Nur Syafiqah Mohd Nafis ◽  
Suryanti Awang

Text documents are unstructured and high dimensional. Effective feature selection is required to select the most important and significant feature from the sparse feature space. Thus, this paper proposed an embedded feature selection technique based on Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) for unstructured and high dimensional text classificationhis technique has the ability to measure the feature’s importance in a high-dimensional text document. In addition, it aims to increase the efficiency of the feature selection. Hence, obtaining a promising text classification accuracy. TF-IDF act as a filter approach which measures features importance of the text documents at the first stage. SVM-RFE utilized a backward feature elimination scheme to recursively remove insignificant features from the filtered feature subsets at the second stage. This research executes sets of experiments using a text document retrieved from a benchmark repository comprising a collection of Twitter posts. Pre-processing processes are applied to extract relevant features. After that, the pre-processed features are divided into training and testing datasets. Next, feature selection is implemented on the training dataset by calculating the TF-IDF score for each feature. SVM-RFE is applied for feature ranking as the next feature selection step. Only top-rank features will be selected for text classification using the SVM classifier. Based on the experiments, it shows that the proposed technique able to achieve 98% accuracy that outperformed other existing techniques. In conclusion, the proposed technique able to select the significant features in the unstructured and high dimensional text document.


Sign in / Sign up

Export Citation Format

Share Document