Detecting Bird and Frog Species Using Tropical Soundscape

Abstract: In the tropical jungle, hearing a species is considerably simpler than seeing it. The sounds of many birds and frogs may be heard if we are in the woods, but the bird cannot be seen. It is difficult in this these circumstances for the expert in identifying the many types of insects and harmful species that may be found in the wild. An audio-input model has been developed in this study. Intelligent signal processing is used to extract patterns and characteristics from the audio signal, and the output is used to identify the species. Sound of the birds and frogs vary according to their species in the tropical environment. In this research we have developed a deep learning model, this model enhances the process of recognizing the bird and frog species based on the audio features. The model achieved a high level of accuracy in recognizing the birds and the frog species. The Resnet model which includes block of simple and convolution neural network is effective in recognizing the birds and frog species using the sound of the animal. Above 90 percent of accuracy is achieved for this classification task. Keywords: Bird Frog Detection, Neural Network, Resnet, CNN.

Download Full-text

Deep Convolutional Neural Network based Ship Images Classification

Defence Science Journal ◽

10.14429/dsj.71.16236 ◽

2021 ◽

Vol 71 (2) ◽

pp. 200-208

Author(s):

Narendra Kumar Mishra ◽

Ashok Kumar ◽

Kishor Choudhury

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Classification Accuracy ◽

Data Augmentation ◽

Fine Tuning ◽

Average Classification Accuracy ◽

Maritime Traffic ◽

Proposed Model ◽

High Level ◽

Deep Learning Model

Ships are an integral part of maritime traffic where they play both militaries as well as non-combatant roles. This vast maritime traffic needs to be managed and monitored by identifying and recognising vessels to ensure the maritime safety and security. As an approach to find an automated and efficient solution, a deep learning model exploiting convolutional neural network (CNN) as a basic building block, has been proposed in this paper. CNN has been predominantly used in image recognition due to its automatic high-level features extraction capabilities and exceptional performance. We have used transfer learning approach using pre-trained CNNs based on VGG16 architecture to develop an algorithm that performs the different ship types classification. This paper adopts data augmentation and fine-tuning to further improve and optimize the baseline VGG16 model. The proposed model attains an average classification accuracy of 97.08% compared to the average classification accuracy of 88.54% obtained from the baseline model.

Download Full-text

Emotional Video to Audio Transformation Using Deep Recurrent Neural Networks and a Neuro-Fuzzy System

Mathematical Problems in Engineering ◽

10.1155/2020/8478527 ◽

2020 ◽

Vol 2020 ◽

pp. 1-15

Author(s):

Gwenaelle Cunha Sergio ◽

Minho Lee

Keyword(s):

Neural Network ◽

Short Term Memory ◽

Fuzzy Inference ◽

Audio Signals ◽

Global Features ◽

Inference System ◽

Audio Features ◽

Neuro Fuzzy ◽

High Level ◽

Domain Transformation

Generating music with emotion similar to that of an input video is a very relevant issue nowadays. Video content creators and automatic movie directors benefit from maintaining their viewers engaged, which can be facilitated by producing novel material eliciting stronger emotions in them. Moreover, there is currently a demand for more empathetic computers to aid humans in applications such as augmenting the perception ability of visually- and/or hearing-impaired people. Current approaches overlook the video’s emotional characteristics in the music generation step, only consider static images instead of videos, are unable to generate novel music, and require a high level of human effort and skills. In this study, we propose a novel hybrid deep neural network that uses an Adaptive Neuro-Fuzzy Inference System to predict a video’s emotion from its visual features and a deep Long Short-Term Memory Recurrent Neural Network to generate its corresponding audio signals with similar emotional inkling. The former is able to appropriately model emotions due to its fuzzy properties, and the latter is able to model data with dynamic time properties well due to the availability of the previous hidden state information. The novelty of our proposed method lies in the extraction of visual emotional features in order to transform them into audio signals with corresponding emotional aspects for users. Quantitative experiments show low mean absolute errors of 0.217 and 0.255 in the Lindsey and DEAP datasets, respectively, and similar global features in the spectrograms. This indicates that our model is able to appropriately perform domain transformation between visual and audio features. Based on experimental results, our model can effectively generate an audio that matches the scene eliciting a similar emotion from the viewer in both datasets, and music generated by our model is also chosen more often (code available online at https://github.com/gcunhase/Emotional-Video-to-Audio-with-ANFIS-DeepRNN).

Download Full-text

On the Comparison of Line Spectral Frequencies and Mel-Frequency Cepstral Coefficients Using Feedforward Neural Network for Language Identification

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v10.i1.pp168-175 ◽

2018 ◽

Vol 10 (1) ◽

pp. 168 ◽

Cited By ~ 2

Author(s):

Teddy Surya Gunawan ◽

Mira Kartiwi

Keyword(s):

Neural Network ◽

Recognition Rate ◽

Language Identification ◽

Feedforward Neural Network ◽

Majority Voting ◽

Mel Frequency Cepstral Coefficients ◽

Frame Size ◽

Audio Features ◽

Cepstral Coefficients ◽

The Many

<p>Of the many audio features available, this paper focuses on the comparison of two most popular features, i.e. line spectral frequencies (LSF) and Mel-Frequency Cepstral Coefficients. We trained a feedforward neural network with various hidden layers and number of hidden nodes to identify five different languages, i.e. Arabic, Chinese, English, Korean, and Malay. LSF, MFCC, and combination of both features were extracted as the feature vectors. Systematic experiments have been conducted to find the optimum parameters, i.e. sampling frequency, frame size, model order, and structure of neural network. The recognition rate per frame was converted to recognition rate per audio file using majority voting. On average, the recognition rate for LSF, MFCC, and combination of both features are 96%, 92%, and 96%, respectively. Therefore, LSF is the most suitable features to be utilized for language identification using feedforward neural network classifier.</p>

Download Full-text

Artificial Neural Network-Based Deep Learning Model for COVID-19 Patient Detection Using X-Ray Chest Images

Journal of Healthcare Engineering ◽

10.1155/2021/5513679 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Mohammad Shorfuzzaman ◽

Mehedi Masud ◽

Hesham Alhumyani ◽

Divya Anand ◽

Aman Singh

Keyword(s):

Neural Network ◽

Deep Learning ◽

Feature Representation ◽

Fusion Model ◽

X Ray ◽

Chest X Ray ◽

Radiological Patterns ◽

High Level ◽

Diagnostic Technologies ◽

Deep Learning Model

The world is experiencing an unprecedented crisis due to the coronavirus disease (COVID-19) outbreak that has affected nearly 216 countries and territories across the globe. Since the pandemic outbreak, there is a growing interest in computational model-based diagnostic technologies to support the screening and diagnosis of COVID-19 cases using medical imaging such as chest X-ray (CXR) scans. It is discovered in initial studies that patients infected with COVID-19 show abnormalities in their CXR images that represent specific radiological patterns. Still, detection of these patterns is challenging and time-consuming even for skilled radiologists. In this study, we propose a novel convolutional neural network- (CNN-) based deep learning fusion framework using the transfer learning concept where parameters (weights) from different models are combined into a single model to extract features from images which are then fed to a custom classifier for prediction. We use gradient-weighted class activation mapping to visualize the infected areas of CXR images. Furthermore, we provide feature representation through visualization to gain a deeper understanding of the class separability of the studied models with respect to COVID-19 detection. Cross-validation studies are used to assess the performance of the proposed models using open-access datasets containing healthy and both COVID-19 and other pneumonia infected CXR images. Evaluation results show that the best performing fusion model can attain a classification accuracy of 95.49% with a high level of sensitivity and specificity.

Download Full-text

Improving Sentiment Analysis using Hybrid Deep Learning Model

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666190328200012 ◽

2020 ◽

Vol 13 (4) ◽

pp. 627-640 ◽

Cited By ~ 1

Author(s):

Avinash Chandra Pandey ◽

Dharmveer Singh Rajpoot

Keyword(s):

Neural Network ◽

Deep Learning ◽

Sentiment Analysis ◽

Classification Accuracy ◽

Short Term Memory ◽

Computational Cost ◽

Extraction Process ◽

Learning Model ◽

Sentiment Classification ◽

Deep Learning Model

Background: Sentiment analysis is a contextual mining of text which determines viewpoint of users with respect to some sentimental topics commonly present at social networking websites. Twitter is one of the social sites where people express their opinion about any topic in the form of tweets. These tweets can be examined using various sentiment classification methods to find the opinion of users. Traditional sentiment analysis methods use manually extracted features for opinion classification. The manual feature extraction process is a complicated task since it requires predefined sentiment lexicons. On the other hand, deep learning methods automatically extract relevant features from data hence; they provide better performance and richer representation competency than the traditional methods. Objective: The main aim of this paper is to enhance the sentiment classification accuracy and to reduce the computational cost. Method: To achieve the objective, a hybrid deep learning model, based on convolution neural network and bi-directional long-short term memory neural network has been introduced. Results: The proposed sentiment classification method achieves the highest accuracy for the most of the datasets. Further, from the statistical analysis efficacy of the proposed method has been validated. Conclusion: Sentiment classification accuracy can be improved by creating veracious hybrid models. Moreover, performance can also be enhanced by tuning the hyper parameters of deep leaning models.

Download Full-text

Implementation of Convolutional Neural Network with Co-design of High-Level Synthesis and Verilog HDL

2020 IEEE 15th International Conference on Solid-State & Integrated Circuit Technology (ICSICT) ◽

10.1109/icsict49897.2020.9278149 ◽

2020 ◽

Author(s):

Hejie Yu ◽

Jun Cheng ◽

Xiangnan Zhang ◽

Yuzhe Gao ◽

Kuizhi Mei

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

High Level Synthesis ◽

Verilog Hdl ◽

High Level

Download Full-text

A Novel Deep Learning Model by Stacking Conditional Restricted Boltzmann Machine and Deep Neural Network

Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining ◽

10.1145/3394486.3403184 ◽

2020 ◽

Author(s):

Tianyu Kang ◽

Ping Chen ◽

John Quackenbush ◽

Wei Ding

Keyword(s):

Neural Network ◽

Deep Learning ◽

Deep Neural Network ◽

Learning Model ◽

Restricted Boltzmann Machine ◽

Boltzmann Machine ◽

Deep Learning Model

Download Full-text

Transcriptome Analysis Provides Insights into the Mechanism of Astaxanthin Enrichment in a Mutant of the Ridgetail White Prawn Exopalaemon carinicauda

Genes ◽

10.3390/genes12050618 ◽

2021 ◽

Vol 12 (5) ◽

pp. 618

Author(s):

Yue Jin ◽

Shihao Li ◽

Yang Yu ◽

Chengsong Zhang ◽

Xiaojun Zhang ◽

...

Keyword(s):

Transcriptome Analysis ◽

Underlying Mechanism ◽

Biological Processes ◽

Wild Type ◽

Expression Levels ◽

In The Wild ◽

Cuticle Proteins ◽

Apolipoprotein D ◽

High Level ◽

Lower Expression

A mutant of the ridgetail white prawn, which exhibited rare orange-red body color with a higher level of free astaxanthin (ASTX) concentration than that in the wild-type prawn, was obtained in our lab. In order to understand the underlying mechanism for the existence of a high level of free astaxanthin, transcriptome analysis was performed to identify the differentially expressed genes (DEGs) between the mutant and wild-type prawns. A total of 78,224 unigenes were obtained, and 1863 were identified as DEGs, in which 902 unigenes showed higher expression levels, while 961 unigenes presented lower expression levels in the mutant in comparison with the wild-type prawns. Based on Gene Ontology analysis and Kyoto Encyclopedia of Genes and Genomes analysis, as well as further investigation of annotated DEGs, we found that the biological processes related to astaxanthin binding, transport, and metabolism presented significant differences between the mutant and the wild-type prawns. Some genes related to these processes, including crustacyanin, apolipoprotein D (ApoD), cathepsin, and cuticle proteins, were identified as DEGs between the two types of prawns. These data may provide important information for us to understand the molecular mechanism of the existence of a high level of free astaxanthin in the prawn.

Download Full-text

Affective Expression Analysis in-the-wild using Multi-Task Temporal Statistical Deep Learning Model

2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) ◽

10.1109/fg47880.2020.00093 ◽

2020 ◽

Author(s):

Nhu-Tai Do ◽

Tram-Tran Nguyen-Quynh ◽

Soo-Hyung Kim

Keyword(s):

Deep Learning ◽

Expression Analysis ◽

Learning Model ◽

Affective Expression ◽

In The Wild ◽

Deep Learning Model

Download Full-text

402 Audio information retrieval for describing gait patterns in Brazilian horses

Journal of Animal Science ◽

10.1093/jas/skaa278.048 ◽

2020 ◽

Vol 98 (Supplement_4) ◽

pp. 27-27

Author(s):

Ricardo V Ventura ◽

Rafael Z Lopes ◽

Lucas T Andrietta ◽

Fernando Bussiman ◽

Julio Balieiro ◽

...

Keyword(s):

Information Retrieval ◽

Subjective Evaluation ◽

Audio Signal ◽

Principal Component ◽

Potential Method ◽

Economic Sectors ◽

Audio Features ◽

Horse Industry ◽

Audio Files ◽

Audio Information

Abstract The Brazilian gaited horse industry is growing steadily, even after a recession period that affected different economic sectors in the whole country. Recent numbers suggested an increase on the exports, which reveals the relevance of this horse market segment. Horses are classified according to the gait criteria, which divide the horses in two groups associated with the animal movements: lateral (Marcha Picada) or diagonal (Marcha_Batida). These two gait groups usually show remarkable differences related to speed and number of steps per fixed unit of time, among other factors. Audio retrieval refers to the process of information extraction obtained from audio signals. This new data analysis area, in comparison to traditional methods to evaluate and classify gait types (as, for example, human subjective evaluation and video monitoring), provides a potential method to collect phenotypes in a reduced cost manner. Audio files (n = 80) were obtained after extracting audio features from freely available YouTube videos. Videos were manually labeled according to the two gait groups (Marcha Picada or Marcha Batida) and thirty animals were used after a quality control filter step. This study aimed to investigate different metrics associated with audio signal processing, in order to first cluster animals according to the gait type and subsequently include additional traits that could be useful to improve accuracy during the identification of genetically superior animals. Twenty-eight metrics, based on frequency or physical audio aspects, were carried out individually or in groups of relative importance to perform Principal Component Analysis (PCA), as well as to describe the two gait types. The PCA results indicated that over 87% of the animals were correctly clustered. Challenges regarding environmental interferences and noises must be further investigated. These first findings suggest that audio information retrieval could potentially be implemented in animal breeding programs, aiming to improve horse gait.

Download Full-text