Fixed versus mixed RSA: Explaining visual representations by fixed and mixed feature sets from shallow and deep computational models

AbstractStudies of the primate visual system have begun to test a wide range of complex computational object-vision models. Realistic models have many parameters, which in practice cannot be fitted using the limited amounts of brain-activity data typically available. Task performance optimization (e.g. using backpropagation to train neural networks) provides major constraints for fitting parameters and discovering nonlinear representational features appropriate for the task (e.g. object classification). Model representations can be compared to brain representations in terms of the representational dissimilarities they predict for an image set. This method, called representational similarity analysis (RSA), enables us to test the representational feature space as is (fixed RSA) or to fit a linear transformation that mixes the nonlinear model features so as to best explain a cortical area’s representational space (mixed RSA). Like voxel/population-receptive-field modelling, mixed RSA uses a training set (different stimuli) to fit one weight per model feature and response channel (voxels here), so as to best predict the response profile across images for each response channel. We analysed response patterns elicited by natural images, which were measured with functional magnetic resonance imaging (fMRI). We found that early visual areas were best accounted for by shallow models, such as a Gabor wavelet pyramid (GWP). The GWP model performed similarly with and without mixing, suggesting that the original features already approximated the representational space, obviating the need for mixing. However, a higher ventral-stream visual representation (lateral occipital region) was best explained by the higher layers of a deep convolutional network, and mixing of its feature set was essential for this model to explain the representation. We suspect that mixing was essential because the convolutional network had been trained to discriminate a set of 1000 categories, whose frequencies in the training set did not match their frequencies in natural experience or their behavioural importance. The latter factors might determine the representational prominence of semantic dimensions in higher-level ventral-stream areas. Our results demonstrate the benefits of testing both the specific representational hypothesis expressed by a model’s original feature space and the hypothesis space generated by linear transformations of that feature space.HighlightsWe tested computational models of representations in ventral-stream visual areas.We compared representational dissimilarities with/without linear remixing of model features.Early visual areas were best explained by shallow – and higher by deep – models.Unsupervised shallow models performed better without linear remixing of their features.A supervised deep convolutional net performed best with linear feature remixing.

Download Full-text

DeepEnzyPred: A Bi-Layered Deep Learning Framework for prediction of Bacteriophage Enzymes and their Sub-Hydrolases Enzymes via Novel Multi Level- Multi Thresholds Feature Selection technique

10.21203/rs.3.rs-72347/v1 ◽

2020 ◽

Author(s):

Yu Wang ◽

ZAHEER ULLAH KHAN ◽

Shaukat Ali ◽

Maqsood Hayat

Keyword(s):

Feature Selection ◽

Bacterial Infections ◽

Large Scale ◽

Computational Models ◽

Genetic Material ◽

Feature Space ◽

Classification Model ◽

Antibacterial Drug ◽

Accurate Identification ◽

Multi Level

Abstract BackgroundBacteriophage or phage is a type of virus that replicates itself inside bacteria. It consist of genetic material surrounded by a protein structure. Bacteriophage plays a vital role in the domain of phage therapy and genetic engineering. Phage and hydrolases enzyme proteins have a significant impact on the cure of pathogenic bacterial infections and disease treatment. Accurate identification of bacteriophage proteins is important in the host subcellular localization for further understanding of the interaction between phage, hydrolases, and in designing antibacterial drugs. Looking at the significance of Bacteriophage proteins, besides wet laboratory-based methods several computational models have been developed so far. However, the performance was not considerable due to inefficient feature schemes, redundancy, noise, and lack of an intelligent learning engine. Therefore we have developed an anovative bi-layered model name DeepEnzyPred. A Hybrid feature vector was obtained via a novel Multi-Level Multi-Threshold subset feature selection (MLMT-SFS) algorithm. A two-dimensional convolutional neural network was adopted as a baseline classifier.ResultsA conductive hybrid feature was obtained via a serial combination of CTD and KSAACGP features. The optimum feature was selected via a Novel Multi-Level Multi-Threshold Subset Feature selection algorithm. Over 5-fold jackknife cross-validation, an accuracy of 91.6 %, Sensitivity of 63.39%, Specificity 95.72%, MCC of 0.6049, and ROC value of 0.8772 over Layer-1 were recorded respectively. Similarly, the underline model obtained an Accuracy of 96.05%, Sensitivity of 96.22%, Specificity of 95.91%, MCC of 0.9219, and ROC value of 0.9899 over layer-2 respectivily.ConclusionThis paper presents a robust and effective classification model was developed for bacteriophage and their types. Primitive features were extracted via CTD and KSAACGP. A novel method (MLMT-SFS ) was devised for yielding optimum hybrid feature space out of primitive features. The result drew over hybrid feature space and 2D-CNN shown an excellent classification. Based on the recorded results, we believe that the developed predictor will be a valuable resource for large scale discrimination of unknown Phage and hydrolase enzymes in particular and new antibacterial drug design in pharmaceutical companies in general.

Download Full-text

Extraction of Arecanut Planting Distribution Based on the Feature Space Optimization of PlanetScope Imagery

Agriculture ◽

10.3390/agriculture11040371 ◽

2021 ◽

Vol 11 (4) ◽

pp. 371

Author(s):

Yu Jin ◽

Jiawei Guo ◽

Huichun Ye ◽

Jinling Zhao ◽

Wenjiang Huang ◽

...

Keyword(s):

Random Forest ◽

Satellite Imagery ◽

Feature Space ◽

Kappa Coefficient ◽

Classification Model ◽

Support Vector ◽

Textural Feature ◽

Monitoring Accuracy ◽

Areca Catechu ◽

High Level

The remote sensing extraction of large areas of arecanut (Areca catechu L.) planting plays an important role in investigating the distribution of arecanut planting area and the subsequent adjustment and optimization of regional planting structures. Satellite imagery has previously been used to investigate and monitor the agricultural and forestry vegetation in Hainan. However, the monitoring accuracy is affected by the cloudy and rainy climate of this region, as well as the high level of land fragmentation. In this paper, we used PlanetScope imagery at a 3 m spatial resolution over the Hainan arecanut planting area to investigate the high-precision extraction of the arecanut planting distribution based on feature space optimization. First, spectral and textural feature variables were selected to form the initial feature space, followed by the implementation of the random forest algorithm to optimize the feature space. Arecanut planting area extraction models based on the support vector machine (SVM), BP neural network (BPNN), and random forest (RF) classification algorithms were then constructed. The overall classification accuracies of the SVM, BPNN, and RF models optimized by the RF features were determined as 74.82%, 83.67%, and 88.30%, with Kappa coefficients of 0.680, 0.795, and 0.853, respectively. The RF model with optimized features exhibited the highest overall classification accuracy and kappa coefficient. The overall accuracy of the SVM, BPNN, and RF models following feature optimization was improved by 3.90%, 7.77%, and 7.45%, respectively, compared with the corresponding unoptimized classification model. The kappa coefficient also improved. The results demonstrate the ability of PlanetScope satellite imagery to extract the planting distribution of arecanut. Furthermore, the RF is proven to effectively optimize the initial feature space, composed of spectral and textural feature variables, further improving the extraction accuracy of the arecanut planting distribution. This work can act as a theoretical and technical reference for the agricultural and forestry industries.

Download Full-text

A Novel Method to Predict Drug-Target Interactions Based on Large-Scale Graph Representation Learning

Cancers ◽

10.3390/cancers13092111 ◽

2021 ◽

Vol 13 (9) ◽

pp. 2111

Author(s):

Bo-Wei Zhao ◽

Zhu-Hong You ◽

Lun Hu ◽

Zhen-Hao Guo ◽

Lei Wang ◽

...

Keyword(s):

Drug Target ◽

Large Scale ◽

Computational Models ◽

Structural Information ◽

Characteristic Curve ◽

Representation Learning ◽

Graph Representation ◽

Convolutional Network ◽

Novel Method

Identification of drug-target interactions (DTIs) is a significant step in the drug discovery or repositioning process. Compared with the time-consuming and labor-intensive in vivo experimental methods, the computational models can provide high-quality DTI candidates in an instant. In this study, we propose a novel method called LGDTI to predict DTIs based on large-scale graph representation learning. LGDTI can capture the local and global structural information of the graph. Specifically, the first-order neighbor information of nodes can be aggregated by the graph convolutional network (GCN); on the other hand, the high-order neighbor information of nodes can be learned by the graph embedding method called DeepWalk. Finally, the two kinds of feature are fed into the random forest classifier to train and predict potential DTIs. The results show that our method obtained area under the receiver operating characteristic curve (AUROC) of 0.9455 and area under the precision-recall curve (AUPR) of 0.9491 under 5-fold cross-validation. Moreover, we compare the presented method with some existing state-of-the-art methods. These results imply that LGDTI can efficiently and robustly capture undiscovered DTIs. Moreover, the proposed model is expected to bring new inspiration and provide novel perspectives to relevant researchers.

Download Full-text

Feature-Weighted Sampling for Proper Evaluation of Classification Models

Applied Sciences ◽

10.3390/app11052039 ◽

2021 ◽

Vol 11 (5) ◽

pp. 2039

Author(s):

Hyunseok Shin ◽

Sejong Oh

Keyword(s):

Random Sampling ◽

Sampling Method ◽

Classification Model ◽

Training Set ◽

Test Set ◽

Feature Importance ◽

Proper Training ◽

Machine Learning Applications ◽

Test Sets ◽

The Given

In machine learning applications, classification schemes have been widely used for prediction tasks. Typically, to develop a prediction model, the given dataset is divided into training and test sets; the training set is used to build the model and the test set is used to evaluate the model. Furthermore, random sampling is traditionally used to divide datasets. The problem, however, is that the performance of the model is evaluated differently depending on how we divide the training and test sets. Therefore, in this study, we proposed an improved sampling method for the accurate evaluation of a classification model. We first generated numerous candidate cases of train/test sets using the R-value-based sampling method. We evaluated the similarity of distributions of the candidate cases with the whole dataset, and the case with the smallest distribution–difference was selected as the final train/test set. Histograms and feature importance were used to evaluate the similarity of distributions. The proposed method produces more proper training and test sets than previous sampling methods, including random and non-random sampling.

Download Full-text

Evaluation of Power Insulator Detection Efficiency with the Use of Limited Training Dataset

Applied Sciences ◽

10.3390/app10062104 ◽

2020 ◽

Vol 10 (6) ◽

pp. 2104

Author(s):

Michał Tomaszewski ◽

Paweł Michalski ◽

Jakub Osuchowski

Keyword(s):

Neural Network ◽

Neural Networks ◽

Object Detection ◽

Convolutional Neural Network ◽

Deep Neural Networks ◽

Detection Efficiency ◽

Training Data ◽

Training Dataset ◽

Training Set ◽

Convolutional Network

This article presents an analysis of the effectiveness of object detection in digital images with the application of a limited quantity of input. The possibility of using a limited set of learning data was achieved by developing a detailed scenario of the task, which strictly defined the conditions of detector operation in the considered case of a convolutional neural network. The described solution utilizes known architectures of deep neural networks in the process of learning and object detection. The article presents comparisons of results from detecting the most popular deep neural networks while maintaining a limited training set composed of a specific number of selected images from diagnostic video. The analyzed input material was recorded during an inspection flight conducted along high-voltage lines. The object detector was built for a power insulator. The main contribution of the presented papier is the evidence that a limited training set (in our case, just 60 training frames) could be used for object detection, assuming an outdoor scenario with low variability of environmental conditions. The decision of which network will generate the best result for such a limited training set is not a trivial task. Conducted research suggests that the deep neural networks will achieve different levels of effectiveness depending on the amount of training data. The most beneficial results were obtained for two convolutional neural networks: the faster region-convolutional neural network (faster R-CNN) and the region-based fully convolutional network (R-FCN). Faster R-CNN reached the highest AP (average precision) at a level of 0.8 for 60 frames. The R-FCN model gained a worse AP result; however, it can be noted that the relationship between the number of input samples and the obtained results has a significantly lower influence than in the case of other CNN models, which, in the authors’ assessment, is a desired feature in the case of a limited training set.

Download Full-text

Embedding Undersampling Rotation Forest for Imbalanced Problem

Computational Intelligence and Neuroscience ◽

10.1155/2018/6798042 ◽

2018 ◽

Vol 2018 ◽

pp. 1-15 ◽

Cited By ~ 3

Author(s):

Huaping Guo ◽

Xiaoyu Diao ◽

Hongbing Liu

Keyword(s):

Imbalanced Data ◽

Feature Space ◽

Original Data ◽

Training Set ◽

Data Set ◽

Minority Class ◽

Rotation Forest ◽

Novel Method ◽

Individual Classifier ◽

The Cost

Rotation Forest is an ensemble learning approach achieving better performance comparing to Bagging and Boosting through building accurate and diverse classifiers using rotated feature space. However, like other conventional classifiers, Rotation Forest does not work well on the imbalanced data which are characterized as having much less examples of one class (minority class) than the other (majority class), and the cost of misclassifying minority class examples is often much more expensive than the contrary cases. This paper proposes a novel method called Embedding Undersampling Rotation Forest (EURF) to handle this problem (1) sampling subsets from the majority class and learning a projection matrix from each subset and (2) obtaining training sets by projecting re-undersampling subsets of the original data set to new spaces defined by the matrices and constructing an individual classifier from each training set. For the first method, undersampling is to force the rotation matrix to better capture the features of the minority class without harming the diversity between individual classifiers. With respect to the second method, the undersampling technique aims to improve the performance of individual classifiers on the minority class. The experimental results show that EURF achieves significantly better performance comparing to other state-of-the-art methods.

Download Full-text

Characterizing Malignant Melanoma Clinically Resembling Seborrheic Keratosis Using Deep Knowledge Transfer

Cancers ◽

10.3390/cancers13246300 ◽

2021 ◽

Vol 13 (24) ◽

pp. 6300

Author(s):

Panagiota Spyridonos ◽

George Gaitanis ◽

Aristidis Likas ◽

Ioannis Bassukas

Keyword(s):

Public Access ◽

Classification Model ◽

Bayesian Optimization ◽

Support Vector ◽

Seborrheic Keratosis ◽

Convolutional Network ◽

Data Set ◽

Computer Based ◽

Hyperparameter Selection

Malignant melanomas resembling seborrheic keratosis (SK-like MMs) are atypical, challenging to diagnose melanoma cases that carry the risk of delayed diagnosis and inadequate treatment. On the other hand, SK may mimic melanoma, producing a ‘false positive’ with unnecessary lesion excisions. The present study proposes a computer-based approach using dermoscopy images for the characterization of SΚ-like MMs. Dermoscopic images were retrieved from the International Skin Imaging Collaboration archive. Exploiting image embeddings from pretrained convolutional network VGG16, we trained a support vector machine (SVM) classification model on a data set of 667 images. SVM optimal hyperparameter selection was carried out using the Bayesian optimization method. The classifier was tested on an independent data set of 311 images with atypical appearance: MMs had an absence of pigmented network and had an existence of milia-like cysts. SK lacked milia-like cysts and had a pigmented network. Atypical MMs were characterized with a sensitivity and specificity of 78.6% and 84.5%, respectively. The advent of deep learning in image recognition has attracted the interest of computer science towards improved skin lesion diagnosis. Open-source, public access archives of skin images empower further the implementation and validation of computer-based systems that might contribute significantly to complex clinical diagnostic problems such as the characterization of SK-like MMs.

Download Full-text

Fully Automated Detection of Paramagnetic Rims in Multiple Sclerosis Lesions on 3T Susceptibility-Based MR Imaging

10.1101/2020.08.31.276238 ◽

2020 ◽

Author(s):

Carolyn Lou ◽

Pascal Sati ◽

Martina Absinta ◽

Kelly Clark ◽

Jordan D. Dworkin ◽

...

Keyword(s):

Multiple Sclerosis ◽

Severe Disease ◽

Area Under The Curve ◽

Machine Learning Algorithms ◽

Classification Model ◽

List Type ◽

Training Set ◽

Random Forest Classification ◽

Automated Method ◽

Forest Classification

AbstractBackground and PurposeThe presence of a paramagnetic rim around a white matter lesion has recently been shown to be a hallmark of a particular pathological type of multiple sclerosis (MS) lesion. Increased prevalence of these paramagnetic rim lesions (PRLs) is associated with a more severe disease course in MS. The identification of these lesions is time-consuming to perform manually. We present a method to automatically detect PRLs on 3T T2*-phase images.MethodsT1-weighted, T2-FLAIR, and T2*-phase MRI of the brain were collected at 3T for 19 subjects with MS. The images were then processed with lesion segmentation, lesion center detection, lesion labelling, and lesion-level radiomic feature extraction. A total of 877 lesions were identified, 118 (13%) of which contained a paramagnetic rim. We divided our data into a training set (15 patients, 673 lesions) and a testing set (4 patients, 204 lesions). We fit a random forest classification model on the training set and assessed our ability to classify lesions as PRL on the test set.ResultsThe number of PRLs per subject identified via our automated lesion labelling method was highly correlated with the gold standard count of PRLs per subject, r = 0.91 (95% CI [0.79, 0.97]). The classification algorithm using radiomic features can classify a lesion as PRL or not with an area under the curve of 0.80 (95% CI [0.67, 0.86]).ConclusionThis study develops a fully automated technique for the detection of paramagnetic rim lesions using standard T1 and FLAIR sequences and a T2*phase sequence obtained on 3T MR images.HighlightsA fully automated method for both the identification and classification of paramagnetic rim lesions is proposed.Radiomic features in conjunction with machine learning algorithms can accurately classify paramagnetic rim lesions.Challenges for classification are largely driven by heterogeneity between lesions, including equivocal rim signatures and lesion location.

Download Full-text

Network Intrusion Detection Method Based on RS-LSSVM

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.602-605.1634 ◽

2014 ◽

Vol 602-605 ◽

pp. 1634-1637

Author(s):

Fang Nian Wang ◽

Shen Shen Wang ◽

Wan Fang Che ◽

Yun Bai

Keyword(s):

Intrusion Detection ◽

Detection Method ◽

Attribute Reduction ◽

Feature Space ◽

Least Square ◽

Classification Model ◽

Network Intrusion Detection ◽

Support Vector ◽

Network Intrusion ◽

Sample Data

An intrusion detection method based on RS-LSSVM is studied in this paper. Firstly, attribute reduction algorithm based on the generalized decision table is proposed to remove the interference features and reduce the dimension of input feature space. Then the classification method based on least square support vector machine (LSSVM) is analyzed. The sample data after dimension reduction is used for LSSVM training, and the LSSVM classification model is obtained, which forms the ability of detecting unknown intrusion. Simulation results show that the proposed method can effectively remove the unnecessary features and improve the performance of network intrusion detection.

Download Full-text

Melanoma Skin Cancer Recognition and Classification Using Deep Hybrid Learning

Journal of Medical Imaging and Health Informatics ◽

10.1166/jmihi.2021.3898 ◽

2021 ◽

Vol 11 (12) ◽

pp. 3110-3116

Author(s):

Jansi Rani Sella Veluswami ◽

M. Ezhil Prasanth ◽

K. Harini ◽

U. Ajaykumar

Keyword(s):

Skin Cancer ◽

Hybrid Approach ◽

Hybrid Learning ◽

Histogram Equalization ◽

Fine Tuning ◽

Classification Model ◽

Support Vector ◽

Common Disease ◽

Convolutional Network ◽

Melanoma Skin

Melanoma skin cancer is a common disease that develops in the melanocytes that produces melanin. In this work, a deep hybrid learning model is engaged to distinguish the skin cancer and classify them. The dataset used contains two classes of skin cancer–benign and malignant. Since the dataset is imbalanced between the number of images in malignant lesions and benign lesions, augmentation technique is used to balance it. To improve the clarity of the images, the images are then enhanced using Contrast Limited Adaptive Histogram Equalization Technique (CLAHE) technique. To detect only the affected lesion area, the lesions are segmented using the neural network based ensemble model which is the result of combining the segmentation algorithms of Fully Convolutional Network (FCN), SegNet and U-Net which produces a binary image of the skin and the lesion, where the lesion is represented with white and the skin is represented by black. These binary images are further classified using different pre-trained models like Inception ResNet V2, Inception V3, Resnet 50, Densenet and CNN. Following that fine tuning of the best performing pre-trained model is carried out to improve the performance of classification. To further improve the performance of the classification model, a method of combining deep learning (DL) and machine learning (ML) is carried out. Using this hybrid approach, the feature extraction is done using DL models and the classification is performed by Support Vector Machine (SVM). This computer aided tool will assist doctors in diagnosing the disease faster than the traditional method. There is a significant improvement of nearly 4% increase in the performance of the proposed method is presented.

Download Full-text