scholarly journals Feature Selection Method Based on Partial Least Squares and Analysis of Traditional Chinese Medicine Data

2019 ◽  
Vol 2019 ◽  
pp. 1-11 ◽  
Author(s):  
Canyi Huang ◽  
Jianqiang Du ◽  
Bin Nie ◽  
Riyue Yu ◽  
Wangping Xiong ◽  
...  

The partial least squares method has many advantages in multivariable linear regression, but it does not include the function of feature selection. This method cannot screen for the best feature subset (referred to in this study as the “Gold Standard”) or optimize the model, although contrarily using the L1 norm can achieve the sparse representation of parameters, leading to feature selection. In this study, a feature selection method based on partial least squares is proposed. In the new method, exploiting partial least squares allows extraction of the latent variables required for performing multivariable linear regression, and this method applies the L1 regular term constraint to the sum of the absolute values of the regression coefficients. This technique is then combined with the coordinate descent method to perform multiple iterations to select a better feature subset. Analyzing traditional Chinese medicine data and University of California, Irvine (UCI), datasets with the model, the experimental results show that the feature selection method based on partial least squares exhibits preferable adaptability for traditional Chinese medicine data and UCI datasets.

2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Canyi Huang ◽  
Keding Li ◽  
Jianqiang Du ◽  
Bin Nie ◽  
Guoliang Xu ◽  
...  

The basic experimental data of traditional Chinese medicine are generally obtained by high-performance liquid chromatography and mass spectrometry. The data often show the characteristics of high dimensionality and few samples, and there are many irrelevant features and redundant features in the data, which bring challenges to the in-depth exploration of Chinese medicine material information. A hybrid feature selection method based on iterative approximate Markov blanket (CI_AMB) is proposed in the paper. The method uses the maximum information coefficient to measure the correlation between features and target variables and achieves the purpose of filtering irrelevant features according to the evaluation criteria, firstly. The iterative approximation Markov blanket strategy analyzes the redundancy between features and implements the elimination of redundant features and then selects an effective feature subset finally. Comparative experiments using traditional Chinese medicine material basic experimental data and UCI’s multiple public datasets show that the new method has a better advantage to select a small number of highly explanatory features, compared with Lasso, XGBoost, and the classic approximate Markov blanket method.


2015 ◽  
Vol 2015 ◽  
pp. 1-8 ◽  
Author(s):  
Yan Cui ◽  
Shizhong Liao ◽  
Hongwu Wang

Objective. To select significant Haar-like features extracted from tongue images for health identification.Materials and Methods. 1,322 tongue cases were included in this study. Health information and tongue images of each case were collected. Cases were classified into the following groups: group containing 148 cases diagnosed as health; group containing 332 cases diagnosed as ill based on health information, even though tongue image is normal; and group containing 842 cases diagnosed as ill. Haar-like features were extracted from tongue images. Then, we proposed a new boosting method in the ROC space for selecting significant features from the features extracted from these images.Results. A total of 27 features were obtained from groups A, B, and C. Seven features were selected from groups A and B, while 25 features were selected from groups A and C.Conclusions. The selected features in this study were mainly obtained from the root, top, and side areas of the tongue. This is consistent with the tongue partitions employed in traditional Chinese medicine. These results provide scientific evidence to TCM tongue diagnosis for health identification.


2020 ◽  
Vol 2020 ◽  
pp. 1-10 ◽  
Author(s):  
Wang-ping Xiong ◽  
Tian-ci Li ◽  
Qing-xia Zeng ◽  
Jian-qiang Du ◽  
Bin Nie ◽  
...  

Partial least squares method has many advantages in multivariate linear regression modeling, but its internal cross-checking method will lead to a sharp reduction of the principal component, thereby reducing the accuracy of the regression equation, and the selection of principal components about the traditional Chinese medicine data is particularly sensitive. This paper proposes a kind of partial least squares method based on deep belief nets. This method mainly uses the deep learning model to extract the upper-level features of the original data, putting the extracted features into the partial least squares model for multiple linear regression and evading the problem that selects the number of principal components, continuously adjusting the model parameters until satisfied well-pleased accuracy condition. Using Dachengqitang experimental data and data sets in the UCI Machine Learning Repository, the experimental results show that the partial least squares analysis method based on deep belief nets has good adaptability to TCM data.


Author(s):  
ShuRui Li ◽  
Jing Jin ◽  
Ian Daly ◽  
Chang Liu ◽  
Andrzej Cichocki

Abstract Brain–computer interface (BCI) systems decode electroencephalogram signals to establish a channel for direct interaction between the human brain and the external world without the need for muscle or nerve control. The P300 speller, one of the most widely used BCI applications, presents a selection of characters to the user and performs character recognition by identifying P300 event-related potentials from the EEG. Such P300-based BCI systems can reach good levels of accuracy but are difficult to use in day-to-day life due to redundancy and noisy signal. A room for improvement should be considered. We propose a novel hybrid feature selection method for the P300-based BCI system to address the problem of feature redundancy, which combines the Menger curvature and linear discriminant analysis. First, selected strategies are applied separately to a given dataset to estimate the gain for application to each feature. Then, each generated value set is ranked in descending order and judged by a predefined criterion to be suitable in classification models. The intersection of the two approaches is then evaluated to identify an optimal feature subset. The proposed method is evaluated using three public datasets, i.e., BCI Competition III dataset II, BNCI Horizon dataset, and EPFL dataset. Experimental results indicate that compared with other typical feature selection and classification methods, our proposed method has better or comparable performance. Additionally, our proposed method can achieve the best classification accuracy after all epochs in three datasets. In summary, our proposed method provides a new way to enhance the performance of the P300-based BCI speller.


2020 ◽  
Vol 2020 ◽  
pp. 1-14 ◽  
Author(s):  
Yong Liu ◽  
Shenggen Ju ◽  
Junfeng Wang ◽  
Chong Su

Feature selection method is designed to select the representative feature subsets from the original feature set by different evaluation of feature relevance, which focuses on reducing the dimension of the features while maintaining the predictive accuracy of a classifier. In this study, we propose a feature selection method for text classification based on independent feature space search. Firstly, a relative document-term frequency difference (RDTFD) method is proposed to divide the features in all text documents into two independent feature sets according to the features’ ability to discriminate the positive and negative samples, which has two important functions: one is to improve the high class correlation of the features and reduce the correlation between the features and the other is to reduce the search range of feature space and maintain appropriate feature redundancy. Secondly, the feature search strategy is used to search the optimal feature subset in independent feature space, which can improve the performance of text classification. Finally, we evaluate several experiments conduced on six benchmark corpora, the experimental results show the RDTFD method based on independent feature space search is more robust than the other feature selection methods.


2020 ◽  
Vol 10 (2) ◽  
pp. 370-379 ◽  
Author(s):  
Jie Cai ◽  
Lingjing Hu ◽  
Zhou Liu ◽  
Ke Zhou ◽  
Huailing Zhang

Background: Mild cognitive impairment (MCI) patients are a high-risk group for Alzheimer's disease (AD). Each year, the diagnosed of 10–15% of MCI patients are converted to AD (MCI converters, MCI_C), while some MCI patients remain relatively stable, and unconverted (MCI stable, MCI_S). MCI patients are considered the most suitable population for early intervention treatment for dementia, and magnetic resonance imaging (MRI) is clinically the most recommended means of imaging examination. Therefore, using MRI image features to reliably predict the conversion from MCI to AD can help physicians carry out an effective treatment plan for patients in advance so to prevent or slow down the development of dementia. Methods: We proposed an embedded feature selection method based on the least squares loss function and within-class scatter to select the optimal feature subset. The optimal subsets of features were used for binary classification (AD, MCI_C, MCI_S, normal control (NC) in pairs) based on a support vector machine (SVM), and the optimal 3-class features were used for 3-class classification (AD, MCI_C, MCI_S, NC in triples) based on one-versus-one SVMs (OVOSVMs). To ensure the insensitivity of the results to the random train/test division, a 10-fold cross-validation has been repeated for each classification. Results: Using our method for feature selection, only 7 features were selected from the original 90 features. With using the optimal subset in the SVM, we classified MCI_C from MCI_S with an accuracy, sensitivity, and specificity of 71.17%, 68.33% and 73.97%, respectively. In comparison, in the 3-class classification (AD vs. MCI_C vs. MCI_S) with OVOSVMs, our method selected 24 features, and the classification accuracy was 81.9%. The feature selection results were verified to be identical to the conclusions of the clinical diagnosis. Our feature selection method achieved the best performance, comparing with the existing methods using lasso and fused lasso for feature selection. Conclusion: The results of this study demonstrate the potential of the proposed approach for predicting the conversion from MCI to AD by identifying the affected brain regions undergoing this conversion.


Author(s):  
RONG LIU ◽  
ROBERT RALLO ◽  
YORAM COHEN

An unsupervised feature selection method is proposed for analysis of datasets of high dimensionality. The least square error (LSE) of approximating the complete dataset via a reduced feature subset is proposed as the quality measure for feature selection. Guided by the minimization of the LSE, a kernel least squares forward selection algorithm (KLS-FS) is developed that is capable of both linear and non-linear feature selection. An incremental LSE computation is designed to accelerate the selection process and, therefore, enhances the scalability of KLS-FS to high-dimensional datasets. The superiority of the proposed feature selection algorithm, in terms of keeping principal data structures, learning performances in classification and clustering applications, and robustness, is demonstrated using various real-life datasets of different sizes and dimensions.


2018 ◽  
Vol 8 (11) ◽  
pp. 2143 ◽  
Author(s):  
Xianghong Tang ◽  
Jiachen Wang ◽  
Jianguang Lu ◽  
Guokai Liu ◽  
Jiadui Chen

Effective feature selection can help improve the classification performance in bearing fault diagnosis. This paper proposes a novel feature selection method based on bearing fault diagnosis called Feature-to-Feature and Feature-to-Category- Maximum Information Coefficient (FF-FC-MIC), which considers the relevance among features and relevance between features and fault categories by exploiting the nonlinearity capturing capability of maximum information coefficient. In this method, a weak correlation feature subset obtained by a Feature-to-Feature-Maximum Information Coefficient (FF-MIC) matrix and a strong correlation feature subset obtained by a Feature-to-Category-Maximum Information Coefficient (FC-MIC) matrix are merged into a final diagnostic feature set by an intersection operation. To evaluate the proposed FF-FC-MIC method, vibration data collected from two bearing fault experiment platforms (CWRU dataset and CUT-2 dataset) were employed. Experimental results showed that accuracy of FF-FC-MIC can achieve 97.50%, and 98.75% on the CWRU dataset at the motor speeds of 1750 rpm, and 1772 rpm, respectively, and reach 91.75%, 94.69%, and 99.07% on CUT-2 dataset at the motor speeds of 2000 rpm, 2500 rpm, 3000 rpm, respectively. A significant improvement of FF-FC-MIC has been confirmed, since the p-values between FF-FC-MIC and the other methods are 1.166 × 10 − 3 , 2.509 × 10 − 5 , and 3.576 × 10 − 2 , respectively. Through comparison with other methods, FF-FC-MIC not only exceeds each of the baseline feature selection method in diagnosis accuracy, but also reduces the number of features.


2020 ◽  
Vol 59 (04/05) ◽  
pp. 151-161
Author(s):  
Yuchen Fei ◽  
Fengyu Zhang ◽  
Chen Zu ◽  
Mei Hong ◽  
Xingchen Peng ◽  
...  

Abstract Background An accurate and reproducible method to delineate tumor margins is of great importance in clinical diagnosis and treatment. In nasopharyngeal carcinoma (NPC), due to limitations such as high variability, low contrast, and discontinuous boundaries in presenting soft tissues, tumor margin can be extremely difficult to identify in magnetic resonance imaging (MRI), increasing the challenge of NPC segmentation task. Objectives The purpose of this work is to develop a semiautomatic algorithm for NPC image segmentation with minimal human intervention, while it is also capable of delineating tumor margins with high accuracy and reproducibility. Methods In this paper, we propose a novel feature selection algorithm for the identification of the margin of NPC image, named as modified random forest recursive feature selection (MRF-RFS). Specifically, to obtain a more discriminative feature subset for segmentation, a modified recursive feature selection method is applied to the original handcrafted feature set. Moreover, we combine the proposed feature selection method with the classical random forest (RF) in the training stage to take full advantage of its intrinsic property (i.e., feature importance measure). Results To evaluate the segmentation performance, we verify our method on the T1-weighted MRI images of 18 NPC patients. The experimental results demonstrate that the proposed MRF-RFS method outperforms the baseline methods and deep learning methods on the task of segmenting NPC images. Conclusion The proposed method could be effective in NPC diagnosis and useful for guiding radiation therapy.


Sign in / Sign up

Export Citation Format

Share Document