scholarly journals Voxel-Wise Feature Selection Method for CNN Binary Classification of Neuroimaging Data

2021 ◽  
Vol 15 ◽  
Author(s):  
Domenico Messina ◽  
Pasquale Borrelli ◽  
Paolo Russo ◽  
Marco Salvatore ◽  
Marco Aiello

Voxel-wise group analysis is presented as a novel feature selection (FS) technique for a deep learning (DL) approach to brain imaging data classification. The method, based on a voxel-wise two-sample t-test and denoted as t-masking, is integrated into the learning procedure as a data-driven FS strategy. t-Masking has been introduced in a convolutional neural network (CNN) for the test bench of binary classification of very-mild Alzheimer’s disease vs. normal control, using a structural magnetic resonance imaging dataset of 180 subjects. To better characterize the t-masking impact on CNN classification performance, six different experimental configurations were designed. Moreover, the performances of the presented FS method were compared to those of similar machine learning (ML) models that relied on different FS approaches. Overall, our results show an enhancement of about 6% in performance when t-masking was applied. Moreover, the reported performance enhancement was higher with respect to similar FS-based ML models. In addition, evaluation of the impact of t-masking on various selection rates has been provided, serving as a useful characterization for future insights. The proposed approach is also highly generalizable to other DL architectures, neuroimaging modalities, and brain pathologies.

Author(s):  
Nor Idayu Mahat ◽  
Maz Jamilah Masnan ◽  
Ali Yeon Md Shakaff ◽  
Ammar Zakaria ◽  
Muhd Khairulzaman Abdul Kadir

This chapter overviews the issue of multicollinearity in electronic nose (e-nose) classification and investigates some analytical solutions to deal with the problem. Multicollinearity effect may harm classification analysis from producing good parameters estimate during the construction of the classification rule. The common approach to deal with multicollinearity is feature extraction. However, the criterion used in extracting the raw features based on variances may not be appropriate for the ultimate goal of classification accuracy. Alternatively, feature selection method would be advisable as it chooses only valuable features. Two distance-based criteria in determining the right features for classification purposes, Wilk's Lambda and bounded Mahalanobis distance, are applied. Classification with features determined by bounded Mahalanobis distance statistically performs better than Wilk's Lambda. This chapter suggests that classification of e-nose with feature selection is a good choice to limit the cost of experiments and maintain good classification performance.


PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e2135 ◽  
Author(s):  
Waleed Khalifa ◽  
Malik Yousef ◽  
Müşerref Duygu Saçar Demirci ◽  
Jens Allmer

MicroRNAs (miRNAs) are short nucleotide sequences that form a typical hairpin structure which is recognized by a complex enzyme machinery. It ultimately leads to the incorporation of 18–24 nt long mature miRNAs into RISC where they act as recognition keys to aid in regulation of target mRNAs. It is involved to determine miRNAs experimentally and, therefore, machine learning is used to complement such endeavors. The success of machine learning mostly depends on proper input data and appropriate features for parameterization of the data. Although, in general, two-class classification (TCC) is used in the field; because negative examples are hard to come by, one-class classification (OCC) has been tried for pre-miRNA detection. Since both positive and negative examples are currently somewhat limited, feature selection can prove to be vital for furthering the field of pre-miRNA detection. In this study, we compare the performance of OCC and TCC using eight feature selection methods and seven different plant species providing positive pre-miRNA examples. Feature selection was very successful for OCC where the best feature selection method achieved an average accuracy of 95.6%, thereby being ∼29% better than the worst method which achieved 66.9% accuracy. While the performance is comparable to TCC, which performs up to 3% better than OCC, TCC is much less affected by feature selection and its largest performance gap is ∼13% which only occurs for two of the feature selection methodologies. We conclude that feature selection is crucially important for OCC and that it can perform on parwith TCC given the proper set of features.


2021 ◽  
Vol 21 (S2) ◽  
Author(s):  
Kun Zeng ◽  
Yibin Xu ◽  
Ge Lin ◽  
Likeng Liang ◽  
Tianyong Hao

Abstract Background Eligibility criteria are the primary strategy for screening the target participants of a clinical trial. Automated classification of clinical trial eligibility criteria text by using machine learning methods improves recruitment efficiency to reduce the cost of clinical research. However, existing methods suffer from poor classification performance due to the complexity and imbalance of eligibility criteria text data. Methods An ensemble learning-based model with metric learning is proposed for eligibility criteria classification. The model integrates a set of pre-trained models including Bidirectional Encoder Representations from Transformers (BERT), A Robustly Optimized BERT Pretraining Approach (RoBERTa), XLNet, Pre-training Text Encoders as Discriminators Rather Than Generators (ELECTRA), and Enhanced Representation through Knowledge Integration (ERNIE). Focal Loss is used as a loss function to address the data imbalance problem. Metric learning is employed to train the embedding of each base model for feature distinguish. Soft Voting is applied to achieve final classification of the ensemble model. The dataset is from the standard evaluation task 3 of 5th China Health Information Processing Conference containing 38,341 eligibility criteria text in 44 categories. Results Our ensemble method had an accuracy of 0.8497, a precision of 0.8229, and a recall of 0.8216 on the dataset. The macro F1-score was 0.8169, outperforming state-of-the-art baseline methods by 0.84% improvement on average. In addition, the performance improvement had a p-value of 2.152e-07 with a standard t-test, indicating that our model achieved a significant improvement. Conclusions A model for classifying eligibility criteria text of clinical trials based on multi-model ensemble learning and metric learning was proposed. The experiments demonstrated that the classification performance was improved by our ensemble model significantly. In addition, metric learning was able to improve word embedding representation and the focal loss reduced the impact of data imbalance to model performance.


2008 ◽  
Vol 18 (1) ◽  
pp. 123-138 ◽  
Author(s):  
Milos Radovanovic ◽  
Mirjana Ivanovic

Motivated by applying Text Categorization to classification of Web search results, this paper describes an extensive experimental study of the impact of bag-of- words document representations on the performance of five major classifiers - Na?ve Bayes, SVM, Voted Perceptron, kNN and C4.5. The texts, representing short Web-page descriptions sorted into a large hierarchy of topics, are taken from the dmoz Open Directory Web-page ontology, and classifiers are trained to automatically determine the topics which may be relevant to a previously unseen Web-page. Different transformations of input data: stemming, normalization, logtf and idf, together with dimensionality reduction, are found to have a statistically significant improving or degrading effect on classification performance measured by classical metrics - accuracy, precision, recall, F1 and F2. The emphasis of the study is not on determining the best document representation which corresponds to each classifier, but rather on describing the effects of every individual transformation on classification, together with their mutual relationships. .


Sensors ◽  
2021 ◽  
Vol 21 (21) ◽  
pp. 7417
Author(s):  
Alex J. Hope ◽  
Utkarsh Vashisth ◽  
Matthew J. Parker ◽  
Andreas B. Ralston ◽  
Joshua M. Roper ◽  
...  

Concussion injuries remain a significant public health challenge. A significant unmet clinical need remains for tools that allow related physiological impairments and longer-term health risks to be identified earlier, better quantified, and more easily monitored over time. We address this challenge by combining a head-mounted wearable inertial motion unit (IMU)-based physiological vibration acceleration (“phybrata”) sensor and several candidate machine learning (ML) models. The performance of this solution is assessed for both binary classification of concussion patients and multiclass predictions of specific concussion-related neurophysiological impairments. Results are compared with previously reported approaches to ML-based concussion diagnostics. Using phybrata data from a previously reported concussion study population, four different machine learning models (Support Vector Machine, Random Forest Classifier, Extreme Gradient Boost, and Convolutional Neural Network) are first investigated for binary classification of the test population as healthy vs. concussion (Use Case 1). Results are compared for two different data preprocessing pipelines, Time-Series Averaging (TSA) and Non-Time-Series Feature Extraction (NTS). Next, the three best-performing NTS models are compared in terms of their multiclass prediction performance for specific concussion-related impairments: vestibular, neurological, both (Use Case 2). For Use Case 1, the NTS model approach outperformed the TSA approach, with the two best algorithms achieving an F1 score of 0.94. For Use Case 2, the NTS Random Forest model achieved the best performance in the testing set, with an F1 score of 0.90, and identified a wider range of relevant phybrata signal features that contributed to impairment classification compared with manual feature inspection and statistical data analysis. The overall classification performance achieved in the present work exceeds previously reported approaches to ML-based concussion diagnostics using other data sources and ML models. This study also demonstrates the first combination of a wearable IMU-based sensor and ML model that enables both binary classification of concussion patients and multiclass predictions of specific concussion-related neurophysiological impairments.


Entropy ◽  
2021 ◽  
Vol 23 (11) ◽  
pp. 1502
Author(s):  
Ben Wilkes ◽  
Igor Vatolkin ◽  
Heinrich Müller

We present a multi-modal genre recognition framework that considers the modalities audio, text, and image by features extracted from audio signals, album cover images, and lyrics of music tracks. In contrast to pure learning of features by a neural network as done in the related work, handcrafted features designed for a respective modality are also integrated, allowing for higher interpretability of created models and further theoretical analysis of the impact of individual features on genre prediction. Genre recognition is performed by binary classification of a music track with respect to each genre based on combinations of elementary features. For feature combination a two-level technique is used, which combines aggregation into fixed-length feature vectors with confidence-based fusion of classification results. Extensive experiments have been conducted for three classifier models (Naïve Bayes, Support Vector Machine, and Random Forest) and numerous feature combinations. The results are presented visually, with data reduction for improved perceptibility achieved by multi-objective analysis and restriction to non-dominated data. Feature- and classifier-related hypotheses are formulated based on the data, and their statistical significance is formally analyzed. The statistical analysis shows that the combination of two modalities almost always leads to a significant increase of performance and the combination of three modalities in several cases.


2020 ◽  
Author(s):  
Raquel Candido ◽  
Rafael Lama ◽  
Natália Chiari ◽  
Marcello Nogueira-Barbosa ◽  
Paulo Azevedo Marques ◽  
...  

Non-traumatic Vertebral Compression Fractures (VCFs) are generally caused by osteoporosis (benign VCFs) or metastatic cancer (malignant VCFs) and the success of the medical treatment strongly depends on a fast and correct classification of VCFs. Recently, methods for computer-aided diagnosis (CAD) based on machine learning have been proposed for classifying VCFs. In this work, we investigate the problem of clustering images of VCFs and the impact of feature selection by genetic algorithms, comparing the clustering i)with all features and ii)with feature selection through the purity results. The analysis of the clusters helps to understand the results of classifiers and difficulties of differentiating images of different classes by an expert. The results indicate that features selection improved the separability of clusters and purity. Feature selection also helps to understand which attributes are most important for analysing the images of vertebral bodies.


Sign in / Sign up

Export Citation Format

Share Document