Comparing recurrent convolutional neural networks for large scale bird species classification

AbstractWe present a deep learning approach towards the large-scale prediction and analysis of bird acoustics from 100 different bird species. We use spectrograms constructed on bird audio recordings from the Cornell Bird Challenge (CBC)2020 dataset, which includes recordings of multiple and potentially overlapping bird vocalizations with background noise. Our experiments show that a hybrid modeling approach that involves a Convolutional Neural Network (CNN) for learning the representation for a slice of the spectrogram, and a Recurrent Neural Network (RNN) for the temporal component to combine across time-points leads to the most accurate model on this dataset. We show results on a spectrum of models ranging from stand-alone CNNs to hybrid models of various types obtained by combining CNNs with other CNNs or RNNs of the following types: Long Short-Term Memory (LSTM) networks, Gated Recurrent Units (GRU), and Legendre Memory Units (LMU). The best performing model achieves an average accuracy of 67% over the 100 different bird species, with the highest accuracy of 90% for the bird species, Red crossbill. We further analyze the learned representations visually and find them to be intuitive, where we find that related bird species are clustered close together. We present a novel way to empirically interpret the representations learned by the LMU-based hybrid model which shows how memory channel patterns change over time with the changes seen in the spectrograms.

Download Full-text

Recurrent Convolutional Neural Networks for Large Scale Bird Species Classification

10.21203/rs.3.rs-275942/v1 ◽

2021 ◽

Author(s):

Gaurav Gupta ◽

Meghana Kshirsagar ◽

Ming Zhong ◽

Shahrzad Gholami ◽

Juan Lavista Ferres

Keyword(s):

Neural Network ◽

Large Scale ◽

Short Term Memory ◽

Bird Species ◽

Species Classification ◽

Average Accuracy ◽

Audio Recordings ◽

Gated Recurrent Units ◽

Channel Patterns ◽

Memory Channel

Abstract We present a deep learning approach towards the large-scale prediction and analysis of bird acoustics from 100 different bird species. We use spectrograms constructed on bird audio recordings from the Cornell Bird Challenge (CBC) dataset, which includes recordings with background noise, of multiple and potentially overlapping bird vocalizations per audio. Our experiments show that a hybrid modeling approach that involves a Convolutional Neural Network (CNN) for learning the representation for a slice of the spectrogram and a Recurrent Neural Network (RNN) for the temporal component to combine across time-points leads to the most accurate model on this dataset. We show results on a spectrum of models ranging from stand-alone CNNs to hybrid models of various types obtained by combining CNNs with CNNs or RNNs of the following types: Long Short-Term Memory (LSTM) networks, Gated Recurrent Units (GRU) and Legendre Memory Units (LMU). The best performing model achieves an average accuracy of 67% over the 100 different bird species, with the highest accuracy of 90% for the Red crossbill. We further analyze the learned representations visually and find them to be intuitive, where we find that related bird species are clustered close together. We present a novel way to empirically interpret the representations learned by the LMU-based hybrid model which shows how memory channel patterns over time change with spectrograms.

Download Full-text

Classification of Skin Disease Using Deep Learning Neural Networks with MobileNet V2 and LSTM

Sensors ◽

10.3390/s21082852 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2852

Author(s):

Parvathaneni Naga Srinivasu ◽

Jalluri Gnana SivaSai ◽

Muhammad Fazal Ijaz ◽

Akash Kumar Bhoi ◽

Wonjoon Kim ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Learning ◽

Convolutional Neural Network ◽

Skin Disease ◽

Network Architecture ◽

Large Scale ◽

Short Term Memory ◽

Convolutional Networks ◽

Occurrence Matrix

Deep learning models are efficient in learning the features that assist in understanding complex patterns precisely. This study proposed a computerized process of classifying skin disease through deep learning based MobileNet V2 and Long Short Term Memory (LSTM). The MobileNet V2 model proved to be efficient with a better accuracy that can work on lightweight computational devices. The proposed model is efficient in maintaining stateful information for precise predictions. A grey-level co-occurrence matrix is used for assessing the progress of diseased growth. The performance has been compared against other state-of-the-art models such as Fine-Tuned Neural Networks (FTNN), Convolutional Neural Network (CNN), Very Deep Convolutional Networks for Large-Scale Image Recognition developed by Visual Geometry Group (VGG), and convolutional neural network architecture that expanded with few changes. The HAM10000 dataset is used and the proposed method has outperformed other methods with more than 85% accuracy. Its robustness in recognizing the affected region much faster with almost 2× lesser computations than the conventional MobileNet model results in minimal computational efforts. Furthermore, a mobile application is designed for instant and proper action. It helps the patient and dermatologists identify the type of disease from the affected region’s image at the initial stage of the skin disease. These findings suggest that the proposed system can help general practitioners efficiently and effectively diagnose skin conditions, thereby reducing further complications and morbidity.

Download Full-text

SHEDR: An End-to-End Deep Neural Event Detection and Recommendation Framework for Hyperlocal News Using Social Media

INFORMS Journal on Computing ◽

10.1287/ijoc.2021.1112 ◽

2021 ◽

Author(s):

Yuheng Hu ◽

Yili Hong

Keyword(s):

Neural Network ◽

Social Media ◽

Deep Learning ◽

Event Detection ◽

Large Scale ◽

Short Term Memory ◽

State Of The Art ◽

Neural Network Models ◽

Neural Event ◽

End To End

Residents often rely on newspapers and television to gather hyperlocal news for community awareness and engagement. More recently, social media have emerged as an increasingly important source of hyperlocal news. Thus far, the literature on using social media to create desirable societal benefits, such as civic awareness and engagement, is still in its infancy. One key challenge in this research stream is to timely and accurately distill information from noisy social media data streams to community members. In this work, we develop SHEDR (social media–based hyperlocal event detection and recommendation), an end-to-end neural event detection and recommendation framework with a particular use case for Twitter to facilitate residents’ information seeking of hyperlocal events. The key model innovation in SHEDR lies in the design of the hyperlocal event detector and the event recommender. First, we harness the power of two popular deep neural network models, the convolutional neural network (CNN) and long short-term memory (LSTM), in a novel joint CNN-LSTM model to characterize spatiotemporal dependencies for capturing unusualness in a region of interest, which is classified as a hyperlocal event. Next, we develop a neural pairwise ranking algorithm for recommending detected hyperlocal events to residents based on their interests. To alleviate the sparsity issue and improve personalization, our algorithm incorporates several types of contextual information covering topic, social, and geographical proximities. We perform comprehensive evaluations based on two large-scale data sets comprising geotagged tweets covering Seattle and Chicago. We demonstrate the effectiveness of our framework in comparison with several state-of-the-art approaches. We show that our hyperlocal event detection and recommendation models consistently and significantly outperform other approaches in terms of precision, recall, and F-1 scores. Summary of Contribution: In this paper, we focus on a novel and important, yet largely underexplored application of computing—how to improve civic engagement in local neighborhoods via local news sharing and consumption based on social media feeds. To address this question, we propose two new computational and data-driven methods: (1) a deep learning–based hyperlocal event detection algorithm that scans spatially and temporally to detect hyperlocal events from geotagged Twitter feeds; and (2) A personalized deep learning–based hyperlocal event recommender system that systematically integrates several contextual cues such as topical, geographical, and social proximity to recommend the detected hyperlocal events to potential users. We conduct a series of experiments to examine our proposed models. The outcomes demonstrate that our algorithms are significantly better than the state-of-the-art models and can provide users with more relevant information about the local neighborhoods that they live in, which in turn may boost their community engagement.

Download Full-text

Identifying Foreign Tourists’ Nationality from Mobility Traces via LSTM Neural Network and Location Embeddings

Applied Sciences ◽

10.3390/app9142861 ◽

2019 ◽

Vol 9 (14) ◽

pp. 2861 ◽

Cited By ~ 2

Author(s):

Alessandro Crivellari ◽

Euro Beinat

Keyword(s):

Neural Network ◽

Motion Tracking ◽

Large Scale ◽

Short Term Memory ◽

Human Mobility ◽

User Profiling ◽

Personalized Recommendation ◽

Mobility Patterns ◽

Trajectory Data ◽

Short Term

The interest in human mobility analysis has increased with the rapid growth of positioning technology and motion tracking, leading to a variety of studies based on trajectory recordings. Mapping the routes that people commonly perform was revealed to be very useful for location-based service applications, where individual mobility behaviors can potentially disclose meaningful information about each customer and be fruitfully used for personalized recommendation systems. This paper tackles a novel trajectory labeling problem related to the context of user profiling in “smart” tourism, inferring the nationality of individual users on the basis of their motion trajectories. In particular, we use large-scale motion traces of short-term foreign visitors as a way of detecting the nationality of individuals. This task is not trivial, relying on the hypothesis that foreign tourists of different nationalities may not only visit different locations, but also move in a different way between the same locations. The problem is defined as a multinomial classification with a few tens of classes (nationalities) and sparse location-based trajectory data. We hereby propose a machine learning-based methodology, consisting of a long short-term memory (LSTM) neural network trained on vector representations of locations, in order to capture the underlying semantics of user mobility patterns. Experiments conducted on a real-world big dataset demonstrate that our method achieves considerably higher performances than baseline and traditional approaches.

Download Full-text

Long short-term memory recurrent neural network-based acoustic model using connectionist temporal classification on a large-scale training corpus

China Communications ◽

10.1109/cc.2017.8068761 ◽

2017 ◽

Vol 14 (9) ◽

pp. 23-31 ◽

Cited By ~ 3

Author(s):

Donghyun Lee ◽

Minkyu Lim ◽

Hosung Park ◽

Yoseb Kang ◽

Jeong-Sik Park ◽

...

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Large Scale ◽

Short Term Memory ◽

Acoustic Model ◽

Short Term ◽

Training Corpus ◽

Term Memory ◽

Long Short Term Memory ◽

Connectionist Temporal Classification

Download Full-text

An Incremental Class-Learning Approach with Acoustic Novelty Detection for Acoustic Event Recognition

Sensors ◽

10.3390/s21196622 ◽

2021 ◽

Vol 21 (19) ◽

pp. 6622

Author(s):

Barış Bayram ◽

Gökhan İnce

Keyword(s):

Neural Network ◽

Acoustic Signal ◽

Large Scale ◽

Short Term Memory ◽

Novelty Detection ◽

Event Recognition ◽

Acoustic Event ◽

Audio Features ◽

Spatio Temporal ◽

Self Learning

Acoustic scene analysis (ASA) relies on the dynamic sensing and understanding of stationary and non-stationary sounds from various events, background noises and human actions with objects. However, the spatio-temporal nature of the sound signals may not be stationary, and novel events may exist that eventually deteriorate the performance of the analysis. In this study, a self-learning-based ASA for acoustic event recognition (AER) is presented to detect and incrementally learn novel acoustic events by tackling catastrophic forgetting. The proposed ASA framework comprises six elements: (1) raw acoustic signal pre-processing, (2) low-level and deep audio feature extraction, (3) acoustic novelty detection (AND), (4) acoustic signal augmentations, (5) incremental class-learning (ICL) (of the audio features of the novel events) and (6) AER. The self-learning on different types of audio features extracted from the acoustic signals of various events occurs without human supervision. For the extraction of deep audio representations, in addition to visual geometry group (VGG) and residual neural network (ResNet), time-delay neural network (TDNN) and TDNN based long short-term memory (TDNN–LSTM) networks are pre-trained using a large-scale audio dataset, Google AudioSet. The performances of ICL with AND using Mel-spectrograms, and deep features with TDNNs, VGG, and ResNet from the Mel-spectrograms are validated on benchmark audio datasets such as ESC-10, ESC-50, UrbanSound8K (US8K), and an audio dataset collected by the authors in a real domestic environment.

Download Full-text

A Method for Traffic Flow Forecasting in a Large-Scale Road Network Using Multifeatures

PROMET - Traffic&Transportation ◽

10.7307/ptt.v33i4.3709 ◽

2021 ◽

Vol 33 (4) ◽

pp. 593-608

Author(s):

Chuhao Zhou ◽

Peiqun Lin ◽

Xukun Lin ◽

Yang Cheng

Keyword(s):

Neural Network ◽

Road Network ◽

Large Scale ◽

Prediction Models ◽

Short Term Memory ◽

Traffic Data ◽

Traffic Operations ◽

Forecast Horizon ◽

Temporal Characteristics ◽

Long Short Term Memory

Accurate traffic prediction on a large-scale road network is significant for traffic operations and management. In this study, we propose an equation for achieving a comprehensive and accurate prediction that effectively combines traffic data and non-traffic data. Based on that, we developed a novel prediction model, called the adaptive deep neural network (ADNN). In the ADNN, we use two long short-term memory (LSTM) networks to extract spatial-temporal characteristics and temporal characteristics, respectively. A backpropagation neural network (BPNN) is also employed to represent situations from contextual factors such as station index, forecast horizon, and weather. The experimental results show that the prediction of ADNN for different stations and different forecast horizons has high accuracy; even for one hour ahead, its performance is also satisfactory. The comparison of ADNN and several benchmark prediction models also indicates the robustness of the ADNN.

Download Full-text

A Randomized Bag-of-Birds Approach to Study Robustness of Automated Audio Based Bird Species Classification

Applied Sciences ◽

10.3390/app11199226 ◽

2021 ◽

Vol 11 (19) ◽

pp. 9226

Author(s):

Burooj Ghani ◽

Sarah Hallerberg

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Bird Species ◽

Classification Performance ◽

Species Classification ◽

Ongoing Research ◽

Number Of Species ◽

Artificial Neural ◽

Sound Features

The automatic classification of bird sounds is an ongoing research topic, and several results have been reported for the classification of selected bird species. In this contribution, we use an artificial neural network fed with pre-computed sound features to study the robustness of bird sound classification. We investigate, in detail, if and how the classification results are dependent on the number of species and the selection of species in the subsets presented to the classifier. In more detail, a bag-of-birds approach is employed to randomly create balanced subsets of sounds from different species for repeated classification runs. The number of species present in each subset is varied between 10 and 300 by randomly drawing sounds of species from a dataset of 659 bird species taken from the Xeno-Canto database. We observed that the shallow artificial neural network trained on pre-computed sound features was able to classify the bird sounds. The quality of classifications were at least comparable to some previously reported results when the number of species allowed for a direct comparison. The classification performance is evaluated using several common measures, such as the precision, recall, accuracy, mean average precision, and area under the receiver operator characteristics curve. All of these measures indicate a decrease in classification success as the number of species present in the subsets is increased. We analyze this dependence in detail and compare the computed results to an analytic explanation assuming dependencies for an idealized perfect classifier. Moreover, we observe that the classification performance depended on the individual composition of the subset and varied across 20 randomly drawn subsets.

Download Full-text

El Niño Index forecasting using machine learning techniques

10.5194/egusphere-egu21-5146 ◽

2021 ◽

Author(s):

Wanjiao Song ◽

Wenfang Lu ◽

Qing Dong

Keyword(s):

Neural Network ◽

Machine Learning ◽

Large Scale ◽

Short Term Memory ◽

Machine Learning Techniques ◽

Time Series Dataset ◽

The Pacific ◽

Forecast Efficiency ◽

Multiple Variables ◽

Lag Times

<p>El Ni&#241;o is a large-scale ocean-atmospheric coupling phenomenon in the Pacific. The interaction among marine and atmospheric variables over the tropical Pacific modulate the evolution of El Ni&#241;o. The latest research shows that machine learning and neural network (NN) have appeared as effective tools to achieve meaningful information from multiple marine and atmospheric parameters. In this paper, we aim to predict the El Ni&#241;o index more accurately and increase the forecast efficiency of El Ni&#241;o events. Here, we propose an approach combining a&#160;neural network technique with long short-term memory (LSTM) neural network to forecast El Ni&#241;o phenomenon. The attributes of model are resulted from physical explanation which are tested with the experiments and observations. The neural network represents the connection among multiple variables and machine learning creates models to identify the El Ni&#241;o events. The preliminary experimental results exhibit that training NN-LSTM model on network metrics time series dataset provides great potential for predicting El Ni&#241;o phenomenon at lag times of up to more than 6 months. &#160;</p>

Download Full-text

Context-Aware Winter Sports Based on Multivariate Sequence Learning

Sensors ◽

10.3390/s19153296 ◽

2019 ◽

Vol 19 (15) ◽

pp. 3296 ◽

Cited By ~ 1

Author(s):

Byung-Kil Han ◽

Je-Kwang Ryu ◽

Seung-Chan Kim

Keyword(s):

Neural Network ◽

Embedded System ◽

Short Term Memory ◽

Intelligent System ◽

Multivariate Time Series ◽

Ground Surface ◽

Physical Interaction ◽

Winter Sports ◽

The Status ◽

Gated Recurrent Units

In this paper, we present an intelligent system that is capable of estimating the status of a player engaging in winter activities based on the sequence analysis of multivariate time-series sensor signals. Among the winter activities, this paper mainly focuses on downhill winter sports such as alpine skiing and snowboarding. Assuming that the mechanical vibrations generated by physical interaction between the ground surface and ski/snowboard in motion can describe the ground conditions and playing contexts, we utilize inertial and vibration signals to categorize the motion context. For example, the proposed system estimates whether the player is sitting on a ski lift or standing on the escalator, or skiing on wet or snowy ground, etc. To measure the movement of a player during a game or on the move, we develop a custom embedded system comprising a motion sensor and piezo transducer. The captured multivariate sequence signals are then trained in a supervised fashion. We adopt artificial neural network approaches (e.g., 1D convolutional neural network, and gated recurrent neural networks, such as long short-term memory and gated recurrent units). The experimental results validate the feasibility of the proposed approach.

Download Full-text