Feature Selection
Recently Published Documents


(FIVE YEARS 11727)



2022 ◽  
Vol 192 ◽  
pp. 106578
David Camilo Corrales ◽  
Céline Schoving ◽  
Hélène Raynal ◽  
Philippe Debaeke ◽  
Etienne-Pascal Journet ◽  

Maria Mohammad Yousef ◽  

Generally, medical dataset classification has become one of the biggest problems in data mining research. Every database has a given number of features but it is observed that some of these features can be redundant and can be harmful as well as disrupt the process of classification and this problem is known as a high dimensionality problem. Dimensionality reduction in data preprocessing is critical for increasing the performance of machine learning algorithms. Besides the contribution of feature subset selection in dimensionality reduction gives a significant improvement in classification accuracy. In this paper, we proposed a new hybrid feature selection approach based on (GA assisted by KNN) to deal with issues of high dimensionality in biomedical data classification. The proposed method first applies the combination between GA and KNN for feature selection to find the optimal subset of features where the classification accuracy of the k-Nearest Neighbor (kNN) method is used as the fitness function for GA. After selecting the best-suggested subset of features, Support Vector Machine (SVM) are used as the classifiers. The proposed method experiments on five medical datasets of the UCI Machine Learning Repository. It is noted that the suggested technique performs admirably on these databases, achieving higher classification accuracy while using fewer features.

2021 ◽  
Vol 2021 ◽  
pp. 1-10
Jie Shan ◽  
Muhammad Talha

This article uses a multimodal smart music online teaching method combined with artificial intelligence to address the problem of smart music online teaching and to compensate for the shortcomings of the single modal classification method that only uses audio features for smart music online teaching. The selection of music intelligence models and classification models, as well as the analysis and processing of music characteristics, is the subjects of this article. It mainly studies how to use lyrics and how to combine audio and lyrics to intelligently classify music and teach multimodal and monomodal smart music online. In the online teaching of smart music based on lyrics, on the basis of the traditional wireless network node feature selection method, three parameters of frequency, concentration, and dispersion are introduced to adjust the statistical value of wireless network nodes, and an improved wireless network is proposed. After feature selection, the TFIDF method is used to calculate the weights, and then artificial intelligence is used to perform secondary dimensionality reduction on the lyrics. Experimental data shows that in the process of intelligently classifying lyrics, the accuracy of the traditional wireless network node feature selection method is 58.20%, and the accuracy of the improved wireless network node feature selection method is 67.21%, combined with artificial intelligence and improved wireless, the accuracy of the network node feature selection method is 69.68%. It can be seen that the third method has higher accuracy and lower dimensionality. In the online teaching of multimodal smart music based on audio and lyrics, this article improves the traditional fusion method for the problem of multimodal fusion and compares various fusion methods through experiments. The experimental results show that the improved classification effect of the fusion method is the best, reaching 84.43%, which verifies the feasibility and effectiveness of the method.

Zhenzhen Li ◽  
Jian Guo ◽  
Xiaolin Xu ◽  
Wenbin Wei ◽  
Junfang Xian

Objectives: To develop an MRI-based radiomics model to predict postlaminar optic nerve invasion (PLONI) in retinoblastoma (RB) and compare its predictive performance with subjective radiologists’ assessment. Methods: We retrospectively enrolled 124 patients with pathologically proven RB (90 in training set and 34 in validation set) who had MRI scans before surgery. A radiomics model for predicting PLONI was developed by extracting quantitative imaging features from axial T2-weighted images and contrast-enhanced T1-weighted images in the training set. The Kruskal-Wallis test, least absolute shrinkage and selection operator regression, and recursive feature elimination were used for feature selection, whereupon a radiomics model was built with a logistic regression (LR) classifier. The area under the curve (AUC) of the receiver operating characteristic (ROC) curve and the accuracy were assessed to evaluate the predictive performance in the training and validation set. The performance of the radiomics model was compared to radiologists’ assessment by DeLong test. Results: The AUC of the radiomics model for the prediction of PLONI was 0.928 in the training set and 0.841 in the validation set. Radiomics model produced better sensitivity than radiologists’ assessment (81.1% vs  43.2% in training set, 82.4vs 52.9% in validation set). In all 124 patients, the AUC of the radiomics model was 0.897, while that of radiologists’ assessment was 0.674 (p < 0.001, DeLong test). Conclusion: MRI-based radiomics model to predict PLONI in RB patients was shown to be superior to visual assessment with improved sensitivity and AUC, may serve as a potential tool to guide personalized treatment.

2021 ◽  
Vol 22 (1) ◽  
Liqian Zhou ◽  
Qi Duan ◽  
Xiongfei Tian ◽  
He Xu ◽  
Jianxin Tang ◽  

Abstract Background Long noncoding RNAs (lncRNAs) have dense linkages with a plethora of important cellular activities. lncRNAs exert functions by linking with corresponding RNA-binding proteins. Since experimental techniques to detect lncRNA-protein interactions (LPIs) are laborious and time-consuming, a few computational methods have been reported for LPI prediction. However, computation-based LPI identification methods have the following limitations: (1) Most methods were evaluated on a single dataset, and researchers may thus fail to measure their generalization ability. (2) The majority of methods were validated under cross validation on lncRNA-protein pairs, did not investigate the performance under other cross validations, especially for cross validation on independent lncRNAs and independent proteins. (3) lncRNAs and proteins have abundant biological information, how to select informative features need to further investigate. Results Under a hybrid framework (LPI-HyADBS) integrating feature selection based on AdaBoost, and classification models including deep neural network (DNN), extreme gradient Boost (XGBoost), and SVM with a penalty Coefficient of misclassification (C-SVM), this work focuses on finding new LPIs. First, five datasets are arranged. Each dataset contains lncRNA sequences, protein sequences, and an LPI network. Second, biological features of lncRNAs and proteins are acquired based on Pyfeat. Third, the obtained features of lncRNAs and proteins are selected based on AdaBoost and concatenated to depict each LPI sample. Fourth, DNN, XGBoost, and C-SVM are used to classify lncRNA-protein pairs based on the concatenated features. Finally, a hybrid framework is developed to integrate the classification results from the above three classifiers. LPI-HyADBS is compared to six classical LPI prediction approaches (LPI-SKF, LPI-NRLMF, Capsule-LPI, LPI-CNNCP, LPLNP, and LPBNI) on five datasets under 5-fold cross validations on lncRNAs, proteins, lncRNA-protein pairs, and independent lncRNAs and independent proteins. The results show LPI-HyADBS has the best LPI prediction performance under four different cross validations. In particular, LPI-HyADBS obtains better classification ability than other six approaches under the constructed independent dataset. Case analyses suggest that there is relevance between ZNF667-AS1 and Q15717. Conclusions Integrating feature selection approach based on AdaBoost, three classification techniques including DNN, XGBoost, and C-SVM, this work develops a hybrid framework to identify new linkages between lncRNAs and proteins.

Thi Thanh Van Le

Vietnam has many traditional dances suchas Xoan singing, “tuồng” or “chèo”. They all urgentlyneed to be preserved in digital formats, especially in3D motion capture format for dances. In digitalformats, they bring many values such as the ability toautomatically classify and search for content ofdances' movement. In this paper, we propose asystem for 3D movement search of Cheo dance 'spostures and gestures. The system applies slidingwindow technique, Dynamic Time Warpingalgorithm and a novel feature selection methodnamed CheoAngle. Results show that the proposedsystem reach good scores in several metrics. We alsocompare CheoAngle with other feature selectionmethods for 3D movement and show that CheoAnglegive the best results

2021 ◽  
Vol 11 (1) ◽  
C. Bouvier ◽  
N. Souedet ◽  
J. Levy ◽  
C. Jan ◽  
Z. You ◽  

AbstractIn preclinical research, histology images are produced using powerful optical microscopes to digitize entire sections at cell scale. Quantification of stained tissue relies on machine learning driven segmentation. However, such methods require multiple additional information, or features, which are increasing the quantity of data to process. As a result, the quantity of features to deal with represents a drawback to process large series or massive histological images rapidly in a robust manner. Existing feature selection methods can reduce the amount of required information but the selected subsets lack reproducibility. We propose a novel methodology operating on high performance computing (HPC) infrastructures and aiming at finding small and stable sets of features for fast and robust segmentation of high-resolution histological images. This selection has two steps: (1) selection at features families scale (an intermediate pool of features, between spaces and individual features) and (2) feature selection performed on pre-selected features families. We show that the selected sets of features are stables for two different neuron staining. In order to test different configurations, one of these dataset is a mono-subject dataset and the other is a multi-subjects dataset to test different configurations. Furthermore, the feature selection results in a significant reduction of computation time and memory cost. This methodology will allow exhaustive histological studies at a high-resolution scale on HPC infrastructures for both preclinical and clinical research.

2021 ◽  
pp. 134-146
Surbhi Sharma ◽  
Anthony J. Bustamante

In this paper, we have focused to improve the performance of a speech-based uni-modal depression detection system, which is non-invasive, involves low cost and computation time in comparison to multi-modal systems. The performance of a decision system mainly depends on the choice of feature selection method and the classifier. We have investigated the combination of four well-known multivariate filter methods (minimum Redundancy Maximum Relevance, Scatter Ratio, Mahalanobis Distance, Fast Correlation Based feature selection) and four well-known classifiers (k-Nearest Neighbour, Linear Discriminant classifier, Decision Tree, Support Vector Machine) to obtain a minimal set of relevant and non-redundant features to improve the performance. This will speed up the acquisition of features from speech and build the decision system with low cost and complexity. Experimental results on the high and low-level features of recent work on the DAICWOZ dataset demonstrate the superior performance of the combination of Scatter Ratio and LDC as well as that of Mahalanobis Distance and LDC, in comparison to other combinations and existing speech-based depression results, for both gender independent and gender-based studies. Further, these combinations have also outperformed a few multimodal systems. It was noted that low-level features are more discriminatory and provide a better f1 score.

2021 ◽  
Vol 12 (1) ◽  
Aydin Demircioğlu

Abstract Background Many studies in radiomics are using feature selection methods to identify the most predictive features. At the same time, they employ cross-validation to estimate the performance of the developed models. However, if the feature selection is performed before the cross-validation, data leakage can occur, and the results can be biased. To measure the extent of this bias, we collected ten publicly available radiomics datasets and conducted two experiments. First, the models were developed by incorrectly applying the feature selection prior to cross-validation. Then, the same experiment was conducted by applying feature selection correctly within cross-validation to each fold. The resulting models were then evaluated against each other in terms of AUC-ROC, AUC-F1, and Accuracy. Results Applying the feature selection incorrectly prior to the cross-validation showed a bias of up to 0.15 in AUC-ROC, 0.29 in AUC-F1, and 0.17 in Accuracy. Conclusions Incorrect application of feature selection and cross-validation can lead to highly biased results for radiomic datasets.

BMC Cancer ◽  
2021 ◽  
Vol 21 (1) ◽  
Yong Tang ◽  
Chun Mei Yang ◽  
Song Su ◽  
Wei Jia Wang ◽  
Li Ping Fan ◽  

Abstract Background Radiomics may provide more objective and accurate predictions for extrahepatic cholangiocarcinoma (ECC). In this study, we developed radiomics models based on magnetic resonance imaging (MRI) and machine learning to preoperatively predict differentiation degree (DD) and lymph node metastasis (LNM) of ECC. Methods A group of 100 patients diagnosed with ECC was included. The ECC status of all patients was confirmed by pathology. A total of 1200 radiomics features were extracted from axial T1 weighted imaging (T1WI), T2-weighted imaging (T2WI), diffusion weighted imaging (DWI), and apparent diffusion coefficient (ADC) images. A systematical framework considering combinations of five feature selection methods and ten machine learning classification algorithms (classifiers) was developed and investigated. The predictive capabilities for DD and LNM were evaluated in terms of area under precision recall curve (AUPRC), area under the receiver operating characteristic (ROC) curve (AUC), negative predictive value (NPV), accuracy (ACC), sensitivity, and specificity. The prediction performance among models was statistically compared using DeLong test. Results For DD prediction, the feature selection method joint mutual information (JMI) and Bagging Classifier achieved the best performance (AUPRC = 0.65, AUC = 0.90 (95% CI 0.75–1.00), ACC = 0.85 (95% CI 0.69–1.00), sensitivity = 0.75 (95% CI 0.30–0.95), and specificity = 0.88 (95% CI 0.64–0.97)), and the radiomics signature was composed of 5 selected features. For LNM prediction, the feature selection method minimum redundancy maximum relevance and classifier eXtreme Gradient Boosting achieved the best performance (AUPRC = 0.95, AUC = 0.98 (95% CI 0.94–1.00), ACC = 0.90 (95% CI 0.77–1.00), sensitivity = 0.75 (95% CI 0.30–0.95), and specificity = 0.94 (95% CI 0.72–0.99)), and the radiomics signature was composed of 30 selected features. However, these two chosen models were not significantly different to other models of higher AUC values in DeLong test, though they were significantly different to most of all models. Conclusion MRI radiomics analysis based on machine learning demonstrated good predictive accuracies for DD and LNM of ECC. This shed new light on the noninvasive diagnosis of ECC.

Sign in / Sign up

Export Citation Format

Share Document