Fusion Based AER System Using Deep Learning Approach for Amplitude and Frequency Analysis

A. Pramod Reddy; Vijayarajan V.

doi:10.1145/3488369

Fusion Based AER System Using Deep Learning Approach for Amplitude and Frequency Analysis

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3488369 ◽

2022 ◽

Vol 21 (3) ◽

pp. 1-19

Author(s):

A. Pramod Reddy ◽

Vijayarajan V.

Keyword(s):

State Of The Art ◽

Deep Understanding ◽

Image Features ◽

Frequency Scale ◽

Proposed Model ◽

Fused Image ◽

Near Term ◽

Fully Connected ◽

Frequency Features ◽

Fine Tune

Automatic emotion recognition from Speech (AERS) systems based on acoustical analysis reveal that some emotional classes persist with ambiguity. This study employed an alternative method aimed at providing deep understanding into the amplitude–frequency, impacts of various emotions in order to aid in the advancement of near term, more effectively in classifying AER approaches. The study was undertaken by converting narrow 20 ms frames of speech into RGB or grey-scale spectrogram images. The features have been used to fine-tune a feature selection system that had previously been trained to recognise emotions. Two different Linear and Mel spectral scales are used to demonstrate a spectrogram. An inductive approach for in sighting the amplitude and frequency features of various emotional classes. We propose a two-channel profound combination of deep fusion network model for the efficient categorization of images. Linear and Mel- spectrogram is acquired from Speech-signal, which is prepared in the recurrence area to input Deep Neural Network. The proposed model Alex-Net with five convolutional layers and two fully connected layers acquire most vital features form spectrogram images plotted on the amplitude-frequency scale. The state-of-the-art is compared with benchmark dataset (EMO-DB). RGB and saliency images are fed to pre-trained Alex-Net tested both EMO-DB and Telugu dataset with an accuracy of 72.18% and fused image features less computations reaching to an accuracy 75.12%. The proposed model show that Transfer learning predict efficiently than Fine-tune network. When tested on Emo-DB dataset, the propȯsed system adequately learns discriminant features from speech spectrȯgrams and outperforms many stȧte-of-the-art techniques.

Download Full-text

LAND USE CLASSIFICATION BASED ON MULTI-STRUCTURE CONVOLUTION NEURAL NETWORK FEATURES CASCADING

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-2-w16-163-2019 ◽

2019 ◽

Vol XLII-2/W16 ◽

pp. 163-167

Author(s):

J. Men ◽

L. Fang ◽

Y. Liu ◽

Y. Sun

Keyword(s):

Land Use ◽

State Of The Art ◽

Image Features ◽

Classification Task ◽

Land Use Classification ◽

The Core ◽

Margin Distribution ◽

Feature Extractor ◽

Image Representations ◽

Fully Connected

<p><strong>Abstract.</strong> Learning efficient image representations is at the core of the classification task of remote sensing imagery. The existing methods for solving image classification task, based on either feature coding approaches extracted from convolution neural networks(CNNs) or training new CNNs, can only generate image features with limited representative ability, which essentially prevents them from achieving better performance. In this paper, we investigate how to transfer features from these successfully pre-trained CNNs for classification. We propose a scenario for generating image features via cascading features extracted from different CNNs. First, pre-trained CNNs, like CaffeNet, VGG-S and VGG-F, are used as feature extractor since their different structures help extract richer information of images. Then the fully-connected layers of the pre-trained CNNs are fine-tuned with UC Merced land use dataset. Finally, the image features generating from cascading the outputs of three networks above, are fed into multi-class Optimal Margin Distribution Machine (mcODM) to obtain the final classification results. Extensive experiments on public land use classification dataset demonstrates that the image features obtained by the proposed scenario can result in remarkable performance and improve the state-of-the-art by a significant margin. The results reveal that the features from pre-trained CNNs generalize well to land use dataset and are more expressive than features from single CNN.</p>

Download Full-text

A Novel Architecture to Classify Histopathology Images Using Convolutional Neural Networks

Applied Sciences ◽

10.3390/app10082929 ◽

2020 ◽

Vol 10 (8) ◽

pp. 2929 ◽

Cited By ~ 2

Author(s):

Ibrahem Kandel ◽

Mauro Castelli

Keyword(s):

Neural Network ◽

Neural Networks ◽

State Of The Art ◽

Treatment Plan ◽

Tissue Structure ◽

Activation Functions ◽

Proposed Model ◽

Histopathology Images ◽

Fully Connected

Histopathology is the study of tissue structure under the microscope to determine if the cells are normal or abnormal. Histopathology is a very important exam that is used to determine the patients’ treatment plan. The classification of histopathology images is very difficult to even an experienced pathologist, and a second opinion is often needed. Convolutional neural network (CNN), a particular type of deep learning architecture, obtained outstanding results in computer vision tasks like image classification. In this paper, we propose a novel CNN architecture to classify histopathology images. The proposed model consists of 15 convolution layers and two fully connected layers. A comparison between different activation functions was performed to detect the most efficient one, taking into account two different optimizers. To train and evaluate the proposed model, the publicly available PatchCamelyon dataset was used. The dataset consists of 220,000 annotated images for training and 57,000 unannotated images for testing. The proposed model achieved higher performance compared to the state-of-the-art architectures with an AUC of 95.46%.

Download Full-text

TReC: Transferred ResNet and CBAM for Detecting Brain Diseases

Frontiers in Neuroinformatics ◽

10.3389/fninf.2021.781551 ◽

2021 ◽

Vol 15 ◽

Author(s):

Yuteng Xiao ◽

Hongsheng Yin ◽

Shui-Hua Wang ◽

Yu-Dong Zhang

Keyword(s):

State Of The Art ◽

Specific Model ◽

Brain Diseases ◽

Small Scale ◽

Brain Images ◽

New Approach ◽

Proposed Model ◽

Residual Block ◽

Fully Connected

Early diagnosis of pathological brains leads to early interventions in brain diseases, which may help control the illness conditions, prolong the life of patients, and even cure them. Therefore, the classification of brain diseases is a challenging but helpful task. However, it is hard to collect brain images, and the superabundance of images is also a great challenge for computing resources. This study proposes a new approach named TReC: Transferred Residual Networks (ResNet)-Convolutional Block Attention Module (CBAM), a specific model for small-scale samples, to detect brain diseases based on MRI. At first, the ResNet model, which is pre-trained on the ImageNet dataset, serves as initialization. Subsequently, a simple attention mechanism named CBAM is introduced and added into every ResNet residual block. At the same time, the fully connected (FC) layers of the ResNet are replaced with new FC layers, which meet the goal of classification. Finally, all the parameters of our model, such as the ResNet, the CBAM, and new FC layers, are retrained. The effectiveness of the proposed model is evaluated on brain magnetic resonance (MR) datasets for multi-class and two-class tasks. Compared with other state-of-the-art models, our model reaches the best performance for two-class and multi-class tasks on brain diseases.

Download Full-text

A Concurrent and Hierarchy Target Learning Architecture for Classification in SAR Application

Sensors ◽

10.3390/s18103218 ◽

2018 ◽

Vol 18 (10) ◽

pp. 3218

Author(s):

Mohamed Touafria ◽

Qiang Yang

Keyword(s):

State Of The Art ◽

Automatic Target Recognition ◽

Image Features ◽

Training Data ◽

Synthetic Aperture ◽

Learning Networks ◽

Sar Images ◽

Stationary Target ◽

Fully Connected ◽

New Framework

This article discusses the issue of Automatic Target Recognition (ATR) on Synthetic Aperture Radar (SAR) images. Through learning the hierarchy of features automatically from a massive amount of training data, learning networks such as Convolutional Neural Networks (CNN) has recently achieved state-of-the-art results in many tasks. To extract better features about SAR targets, and to obtain better accuracies, a new framework is proposed: First, three CNN models based on different convolution and pooling kernel sizes are proposed. Second, they are applied simultaneously on the SAR images to generate image features via extracting CNN features from different layers in two scenarios. In the first scenario, the activation vectors obtained from fully connected layers are considered as the final image features; in the second scenario, dense features are extracted from the last convolutional layer and then encoded into global image features through one of the commonly used feature coding approaches, which is Fisher Vectors (FVs). Finally, different combination and fusion approaches between the two sets of experiments are considered to construct the final representation of the SAR images for final classification. Extensive experiments on the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset are conducted. Experimental results prove the capability of the proposed method, as compared to several state-of-the-art methods.

Download Full-text

Deep Object Co-Segmentation via Spatial-Semantic Network Modulation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6977 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12813-12820 ◽

Cited By ~ 1

Author(s):

Kaihua Zhang ◽

Jin Chen ◽

Bo Liu ◽

Qingshan Liu

Keyword(s):

State Of The Art ◽

Semantic Network ◽

Image Features ◽

Image Feature ◽

Backbone Network ◽

Feature Descriptors ◽

Proposed Model ◽

Shared Objects ◽

Benchmark Datasets ◽

Supervised Image Classification

Object co-segmentation is to segment the shared objects in multiple relevant images, which has numerous applications in computer vision. This paper presents a spatial and semantic modulated deep network framework for object co-segmentation. A backbone network is adopted to extract multi-resolution image features. With the multi-resolution features of the relevant images as input, we design a spatial modulator to learn a mask for each image. The spatial modulator captures the correlations of image feature descriptors via unsupervised learning. The learned mask can roughly localize the shared foreground object while suppressing the background. For the semantic modulator, we model it as a supervised image classification task. We propose a hierarchical second-order pooling module to transform the image features for classification use. The outputs of the two modulators manipulate the multi-resolution features by a shift-and-scale operation so that the features focus on segmenting co-object regions. The proposed model is trained end-to-end without any intricate post-processing. Extensive experiments on four image co-segmentation benchmark datasets demonstrate the superior accuracy of the proposed method compared to state-of-the-art methods. The codes are available at http://kaihuazhang.net/.

Download Full-text

Enhanced context-aware recommendation using topic modeling and particle swarm optimization

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210331 ◽

2021 ◽

pp. 1-16

Author(s):

Ibtissem Gasmi ◽

Mohamed Walid Azizi ◽

Hassina Seridi-Bouchelaghem ◽

Nabiha Azizi ◽

Samir Brahim Belhaouari

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

State Of The Art ◽

Weighting Function ◽

Contextual Factors ◽

Pearson Correlation ◽

Correlation Coefficients ◽

Pso Algorithm ◽

Context Aware ◽

Proposed Model

Context-Aware Recommender System (CARS) suggests more relevant services by adapting them to the user’s specific context situation. Nevertheless, the use of many contextual factors can increase data sparsity while few context parameters fail to introduce the contextual effects in recommendations. Moreover, several CARSs are based on similarity algorithms, such as cosine and Pearson correlation coefficients. These methods are not very effective in the sparse datasets. This paper presents a context-aware model to integrate contextual factors into prediction process when there are insufficient co-rated items. The proposed algorithm uses Latent Dirichlet Allocation (LDA) to learn the latent interests of users from the textual descriptions of items. Then, it integrates both the explicit contextual factors and their degree of importance in the prediction process by introducing a weighting function. Indeed, the PSO algorithm is employed to learn and optimize weights of these features. The results on the Movielens 1 M dataset show that the proposed model can achieve an F-measure of 45.51% with precision as 68.64%. Furthermore, the enhancement in MAE and RMSE can respectively reach 41.63% and 39.69% compared with the state-of-the-art techniques.

Download Full-text

A Deep Learning Approach to Predict Autism Spectrum Disorder Using Multisite Resting-State fMRI

Applied Sciences ◽

10.3390/app11083636 ◽

2021 ◽

Vol 11 (8) ◽

pp. 3636

Author(s):

Faria Zarin Subah ◽

Kaushik Deb ◽

Pranab Kumar Dhar ◽

Takeshi Koshiba

Keyword(s):

Autism Spectrum Disorder ◽

Resting State ◽

State Of The Art ◽

Resting State Fmri ◽

Autism Spectrum ◽

Spectrum Disorder ◽

Bootstrap Analysis ◽

Proposed Model ◽

Art Methods ◽

The Mean

Autism spectrum disorder (ASD) is a complex and degenerative neuro-developmental disorder. Most of the existing methods utilize functional magnetic resonance imaging (fMRI) to detect ASD with a very limited dataset which provides high accuracy but results in poor generalization. To overcome this limitation and to enhance the performance of the automated autism diagnosis model, in this paper, we propose an ASD detection model using functional connectivity features of resting-state fMRI data. Our proposed model utilizes two commonly used brain atlases, Craddock 200 (CC200) and Automated Anatomical Labelling (AAL), and two rarely used atlases Bootstrap Analysis of Stable Clusters (BASC) and Power. A deep neural network (DNN) classifier is used to perform the classification task. Simulation results indicate that the proposed model outperforms state-of-the-art methods in terms of accuracy. The mean accuracy of the proposed model was 88%, whereas the mean accuracy of the state-of-the-art methods ranged from 67% to 85%. The sensitivity, F1-score, and area under receiver operating characteristic curve (AUC) score of the proposed model were 90%, 87%, and 96%, respectively. Comparative analysis on various scoring strategies show the superiority of BASC atlas over other aforementioned atlases in classifying ASD and control.

Download Full-text

Image Restoration by Learning Morphological Opening-Closing Network

Mathematical Morphology - Theory and Applications ◽

10.1515/mathm-2020-0103 ◽

2020 ◽

Vol 4 (1) ◽

pp. 87-107

Author(s):

Ranjan Mondal ◽

Moni Shankar Dey ◽

Bhabatosh Chanda

Keyword(s):

Neural Network ◽

Image Restoration ◽

State Of The Art ◽

Source Code ◽

Back Propagation ◽

Image Features ◽

Main Difficulty ◽

The Right ◽

Right Order ◽

Morphological Opening

AbstractMathematical morphology is a powerful tool for image processing tasks. The main difficulty in designing mathematical morphological algorithm is deciding the order of operators/filters and the corresponding structuring elements (SEs). In this work, we develop morphological network composed of alternate sequences of dilation and erosion layers, which depending on learned SEs, may form opening or closing layers. These layers in the right order along with linear combination (of their outputs) are useful in extracting image features and processing them. Structuring elements in the network are learned by back-propagation method guided by minimization of the loss function. Efficacy of the proposed network is established by applying it to two interesting image restoration problems, namely de-raining and de-hazing. Results are comparable to that of many state-of-the-art algorithms for most of the images. It is also worth mentioning that the number of network parameters to handle is much less than that of popular convolutional neural network for similar tasks. The source code can be found here https://github.com/ranjanZ/Mophological-Opening-Closing-Net

Download Full-text

Equivariant Adversarial Network for Image-to-image Translation

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3458280 ◽

2021 ◽

Vol 17 (2s) ◽

pp. 1-14

Author(s):

Masoumeh Zareapoor ◽

Jie Yang

Keyword(s):

State Of The Art ◽

Generative Models ◽

Generative Model ◽

Target Domain ◽

Adversarial Network ◽

Proposed Model ◽

Image Translation ◽

Great Performance ◽

Representative Model ◽

The Ideal

Image-to-Image translation aims to learn an image from a source domain to a target domain. However, there are three main challenges, such as lack of paired datasets, multimodality, and diversity, that are associated with these problems and need to be dealt with. Convolutional neural networks (CNNs), despite of having great performance in many computer vision tasks, they fail to detect the hierarchy of spatial relationships between different parts of an object and thus do not form the ideal representative model we look for. This article presents a new variation of generative models that aims to remedy this problem. We use a trainable transformer, which explicitly allows the spatial manipulation of data within training. This differentiable module can be augmented into the convolutional layers in the generative model, and it allows to freely alter the generated distributions for image-to-image translation. To reap the benefits of proposed module into generative model, our architecture incorporates a new loss function to facilitate an effective end-to-end generative learning for image-to-image translation. The proposed model is evaluated through comprehensive experiments on image synthesizing and image-to-image translation, along with comparisons with several state-of-the-art algorithms.

Download Full-text

Question-aware memory network for multi-hop question answering in human–robot interaction

Complex & Intelligent Systems ◽

10.1007/s40747-021-00448-0 ◽

2021 ◽

Author(s):

Xinmeng Li ◽

Mamoun Alazab ◽

Qian Li ◽

Keping Yu ◽

Quanjun Yin

Keyword(s):

Question Answering ◽

State Of The Art ◽

Human Robot Interaction ◽

Knowledge Graph ◽

Robot Interaction ◽

Natural Language Question ◽

Memory Network ◽

The Given ◽

Fine Tune ◽

Language Question

AbstractKnowledge graph question answering is an important technology in intelligent human–robot interaction, which aims at automatically giving answer to human natural language question with the given knowledge graph. For the multi-relation question with higher variety and complexity, the tokens of the question have different priority for the triples selection in the reasoning steps. Most existing models take the question as a whole and ignore the priority information in it. To solve this problem, we propose question-aware memory network for multi-hop question answering, named QA2MN, to update the attention on question timely in the reasoning process. In addition, we incorporate graph context information into knowledge graph embedding model to increase the ability to represent entities and relations. We use it to initialize the QA2MN model and fine-tune it in the training process. We evaluate QA2MN on PathQuestion and WorldCup2014, two representative datasets for complex multi-hop question answering. The result demonstrates that QA2MN achieves state-of-the-art Hits@1 accuracy on the two datasets, which validates the effectiveness of our model.

Download Full-text