scholarly journals Detecting Bird and Frog Species Using Tropical Soundscape

Author(s):  
Anuranjan Pandey

Abstract: In the tropical jungle, hearing a species is considerably simpler than seeing it. The sounds of many birds and frogs may be heard if we are in the woods, but the bird cannot be seen. It is difficult in this these circumstances for the expert in identifying the many types of insects and harmful species that may be found in the wild. An audio-input model has been developed in this study. Intelligent signal processing is used to extract patterns and characteristics from the audio signal, and the output is used to identify the species. Sound of the birds and frogs vary according to their species in the tropical environment. In this research we have developed a deep learning model, this model enhances the process of recognizing the bird and frog species based on the audio features. The model achieved a high level of accuracy in recognizing the birds and the frog species. The Resnet model which includes block of simple and convolution neural network is effective in recognizing the birds and frog species using the sound of the animal. Above 90 percent of accuracy is achieved for this classification task. Keywords: Bird Frog Detection, Neural Network, Resnet, CNN.

2021 ◽  
Vol 71 (2) ◽  
pp. 200-208
Author(s):  
Narendra Kumar Mishra ◽  
Ashok Kumar ◽  
Kishor Choudhury

Ships are an integral part of maritime traffic where they play both militaries as well as non-combatant roles. This vast maritime traffic needs to be managed and monitored by identifying and recognising vessels to ensure the maritime safety and security. As an approach to find an automated and efficient solution, a deep learning model exploiting convolutional neural network (CNN) as a basic building block, has been proposed in this paper. CNN has been predominantly used in image recognition due to its automatic high-level features extraction capabilities and exceptional performance. We have used transfer learning approach using pre-trained CNNs based on VGG16 architecture to develop an algorithm that performs the different ship types classification. This paper adopts data augmentation and fine-tuning to further improve and optimize the baseline VGG16 model. The proposed model attains an average classification accuracy of 97.08% compared to the average classification accuracy of 88.54% obtained from the baseline model.


2020 ◽  
Vol 2020 ◽  
pp. 1-15
Author(s):  
Gwenaelle Cunha Sergio ◽  
Minho Lee

Generating music with emotion similar to that of an input video is a very relevant issue nowadays. Video content creators and automatic movie directors benefit from maintaining their viewers engaged, which can be facilitated by producing novel material eliciting stronger emotions in them. Moreover, there is currently a demand for more empathetic computers to aid humans in applications such as augmenting the perception ability of visually- and/or hearing-impaired people. Current approaches overlook the video’s emotional characteristics in the music generation step, only consider static images instead of videos, are unable to generate novel music, and require a high level of human effort and skills. In this study, we propose a novel hybrid deep neural network that uses an Adaptive Neuro-Fuzzy Inference System to predict a video’s emotion from its visual features and a deep Long Short-Term Memory Recurrent Neural Network to generate its corresponding audio signals with similar emotional inkling. The former is able to appropriately model emotions due to its fuzzy properties, and the latter is able to model data with dynamic time properties well due to the availability of the previous hidden state information. The novelty of our proposed method lies in the extraction of visual emotional features in order to transform them into audio signals with corresponding emotional aspects for users. Quantitative experiments show low mean absolute errors of 0.217 and 0.255 in the Lindsey and DEAP datasets, respectively, and similar global features in the spectrograms. This indicates that our model is able to appropriately perform domain transformation between visual and audio features. Based on experimental results, our model can effectively generate an audio that matches the scene eliciting a similar emotion from the viewer in both datasets, and music generated by our model is also chosen more often (code available online at https://github.com/gcunhase/Emotional-Video-to-Audio-with-ANFIS-DeepRNN).


Author(s):  
Teddy Surya Gunawan ◽  
Mira Kartiwi

<p>Of the many audio features available, this paper focuses on the comparison of two most popular features, i.e. line spectral frequencies (LSF) and Mel-Frequency Cepstral Coefficients. We trained a feedforward neural network with various hidden layers and number of hidden nodes to identify five different languages, i.e. Arabic, Chinese, English, Korean, and Malay. LSF, MFCC, and combination of both features were extracted as the feature vectors. Systematic experiments have been conducted to find the optimum parameters, i.e. sampling frequency, frame size, model order, and structure of neural network. The recognition rate per frame was converted to recognition rate per audio file using majority voting. On average, the recognition rate for LSF, MFCC, and combination of both features are 96%, 92%, and 96%, respectively. Therefore, LSF is the most suitable features to be utilized for language identification using feedforward neural network classifier.</p>


2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Mohammad Shorfuzzaman ◽  
Mehedi Masud ◽  
Hesham Alhumyani ◽  
Divya Anand ◽  
Aman Singh

The world is experiencing an unprecedented crisis due to the coronavirus disease (COVID-19) outbreak that has affected nearly 216 countries and territories across the globe. Since the pandemic outbreak, there is a growing interest in computational model-based diagnostic technologies to support the screening and diagnosis of COVID-19 cases using medical imaging such as chest X-ray (CXR) scans. It is discovered in initial studies that patients infected with COVID-19 show abnormalities in their CXR images that represent specific radiological patterns. Still, detection of these patterns is challenging and time-consuming even for skilled radiologists. In this study, we propose a novel convolutional neural network- (CNN-) based deep learning fusion framework using the transfer learning concept where parameters (weights) from different models are combined into a single model to extract features from images which are then fed to a custom classifier for prediction. We use gradient-weighted class activation mapping to visualize the infected areas of CXR images. Furthermore, we provide feature representation through visualization to gain a deeper understanding of the class separability of the studied models with respect to COVID-19 detection. Cross-validation studies are used to assess the performance of the proposed models using open-access datasets containing healthy and both COVID-19 and other pneumonia infected CXR images. Evaluation results show that the best performing fusion model can attain a classification accuracy of 95.49% with a high level of sensitivity and specificity.


2020 ◽  
Vol 13 (4) ◽  
pp. 627-640 ◽  
Author(s):  
Avinash Chandra Pandey ◽  
Dharmveer Singh Rajpoot

Background: Sentiment analysis is a contextual mining of text which determines viewpoint of users with respect to some sentimental topics commonly present at social networking websites. Twitter is one of the social sites where people express their opinion about any topic in the form of tweets. These tweets can be examined using various sentiment classification methods to find the opinion of users. Traditional sentiment analysis methods use manually extracted features for opinion classification. The manual feature extraction process is a complicated task since it requires predefined sentiment lexicons. On the other hand, deep learning methods automatically extract relevant features from data hence; they provide better performance and richer representation competency than the traditional methods. Objective: The main aim of this paper is to enhance the sentiment classification accuracy and to reduce the computational cost. Method: To achieve the objective, a hybrid deep learning model, based on convolution neural network and bi-directional long-short term memory neural network has been introduced. Results: The proposed sentiment classification method achieves the highest accuracy for the most of the datasets. Further, from the statistical analysis efficacy of the proposed method has been validated. Conclusion: Sentiment classification accuracy can be improved by creating veracious hybrid models. Moreover, performance can also be enhanced by tuning the hyper parameters of deep leaning models.


Genes ◽  
2021 ◽  
Vol 12 (5) ◽  
pp. 618
Author(s):  
Yue Jin ◽  
Shihao Li ◽  
Yang Yu ◽  
Chengsong Zhang ◽  
Xiaojun Zhang ◽  
...  

A mutant of the ridgetail white prawn, which exhibited rare orange-red body color with a higher level of free astaxanthin (ASTX) concentration than that in the wild-type prawn, was obtained in our lab. In order to understand the underlying mechanism for the existence of a high level of free astaxanthin, transcriptome analysis was performed to identify the differentially expressed genes (DEGs) between the mutant and wild-type prawns. A total of 78,224 unigenes were obtained, and 1863 were identified as DEGs, in which 902 unigenes showed higher expression levels, while 961 unigenes presented lower expression levels in the mutant in comparison with the wild-type prawns. Based on Gene Ontology analysis and Kyoto Encyclopedia of Genes and Genomes analysis, as well as further investigation of annotated DEGs, we found that the biological processes related to astaxanthin binding, transport, and metabolism presented significant differences between the mutant and the wild-type prawns. Some genes related to these processes, including crustacyanin, apolipoprotein D (ApoD), cathepsin, and cuticle proteins, were identified as DEGs between the two types of prawns. These data may provide important information for us to understand the molecular mechanism of the existence of a high level of free astaxanthin in the prawn.


2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 27-27
Author(s):  
Ricardo V Ventura ◽  
Rafael Z Lopes ◽  
Lucas T Andrietta ◽  
Fernando Bussiman ◽  
Julio Balieiro ◽  
...  

Abstract The Brazilian gaited horse industry is growing steadily, even after a recession period that affected different economic sectors in the whole country. Recent numbers suggested an increase on the exports, which reveals the relevance of this horse market segment. Horses are classified according to the gait criteria, which divide the horses in two groups associated with the animal movements: lateral (Marcha Picada) or diagonal (Marcha_Batida). These two gait groups usually show remarkable differences related to speed and number of steps per fixed unit of time, among other factors. Audio retrieval refers to the process of information extraction obtained from audio signals. This new data analysis area, in comparison to traditional methods to evaluate and classify gait types (as, for example, human subjective evaluation and video monitoring), provides a potential method to collect phenotypes in a reduced cost manner. Audio files (n = 80) were obtained after extracting audio features from freely available YouTube videos. Videos were manually labeled according to the two gait groups (Marcha Picada or Marcha Batida) and thirty animals were used after a quality control filter step. This study aimed to investigate different metrics associated with audio signal processing, in order to first cluster animals according to the gait type and subsequently include additional traits that could be useful to improve accuracy during the identification of genetically superior animals. Twenty-eight metrics, based on frequency or physical audio aspects, were carried out individually or in groups of relative importance to perform Principal Component Analysis (PCA), as well as to describe the two gait types. The PCA results indicated that over 87% of the animals were correctly clustered. Challenges regarding environmental interferences and noises must be further investigated. These first findings suggest that audio information retrieval could potentially be implemented in animal breeding programs, aiming to improve horse gait.


Sign in / Sign up

Export Citation Format

Share Document