scholarly journals Attention Based CNN-ConvLSTM for Pedestrian Attribute Recognition

Sensors ◽  
2020 ◽  
Vol 20 (3) ◽  
pp. 811 ◽  
Author(s):  
Yang Li ◽  
Huahu Xu ◽  
Minjie Bian ◽  
Junsheng Xiao

As a result of its important role in video surveillance, pedestrian attribute recognition has become an attractive facet of computer vision research. Because of the changes in viewpoints, illumination, resolution and occlusion, the task is very challenging. In order to resolve the issue of unsatisfactory performance of existing pedestrian attribute recognition methods resulting from ignoring the correlation between pedestrian attributes and spatial information, in this paper, the task is regarded as a spatiotemporal, sequential, multi-label image classification problem. An attention-based neural network consisting of convolutional neural networks (CNN), channel attention (CAtt) and convolutional long short-term memory (ConvLSTM) is proposed (CNN-CAtt-ConvLSTM). Firstly, the salient and correlated visual features of pedestrian attributes are extracted by pre-trained CNN and CAtt. Then, ConvLSTM is used to further extract spatial information and correlations from pedestrian attributes. Finally, pedestrian attributes are predicted with optimized sequences based on attribute image area size and importance. Extensive experiments are carried out on two common pedestrian attribute datasets, PEdesTrian Attribute (PETA) dataset and Richly Annotated Pedestrian (RAP) dataset, and higher performance than other state-of-the-art (SOTA) methods is achieved, which proves the superiority and validity of our method.

Author(s):  
Xingjian Lai ◽  
Huanyi Shui ◽  
Jun Ni

Throughput bottlenecks define and constrain the productivity of a production line. Prediction of future bottlenecks provides a great support for decision-making on the factory floor, which can help to foresee and formulate appropriate actions before production to improve the system throughput in a cost-effective manner. Bottleneck prediction remains a challenging task in literature. The difficulty lies in the complex dynamics of manufacturing systems. There are multiple factors collaboratively affecting bottleneck conditions, such as machine performance, machine degradation, line structure, operator skill level, and product release schedules. These factors impact on one another in a nonlinear manner and exhibit long-term temporal dependencies. State-of-the-art research utilizes various assumptions to simplify the modeling by reducing the input dimensionality. As a result, those models cannot accurately reflect complex dynamics of the bottleneck in a manufacturing system. To tackle this problem, this paper will propose a systematic framework to design a two-layer Long Short-Term Memory (LSTM) network tailored to the dynamic bottleneck prediction problem in multi-job manufacturing systems. This neural network based approach takes advantage of historical high dimensional factory floor data to predict system bottlenecks dynamically considering the future production planning inputs. The model is demonstrated with data from an automotive underbody assembly line. The result shows that the proposed method can achieve higher prediction accuracy compared with current state-of-the-art approaches.


2020 ◽  
Vol 23 (65) ◽  
pp. 124-135
Author(s):  
Imane Guellil ◽  
Marcelo Mendoza ◽  
Faical Azouaou

This paper presents an analytic study showing that it is entirely possible to analyze the sentiment of an Arabic dialect without constructing any resources. The idea of this work is to use the resources dedicated to a given dialect \textit{X} for analyzing the sentiment of another dialect \textit{Y}. The unique condition is to have \textit{X} and \textit{Y} in the same category of dialects. We apply this idea on Algerian dialect, which is a Maghrebi Arabic dialect that suffers from limited available tools and other handling resources required for automatic sentiment analysis. To do this analysis, we rely on Maghrebi dialect resources and two manually annotated sentiment corpus for respectively Tunisian and Moroccan dialect. We also use a large corpus for Maghrebi dialect. We use a state-of-the-art system and propose a new deep learning architecture for automatically classify the sentiment of Arabic dialect (Algerian dialect). Experimental results show that F1-score is up to 83% and it is achieved by Multilayer Perceptron (MLP) with Tunisian corpus and with Long short-term memory (LSTM) with the combination of Tunisian and Moroccan. An improvement of 15% compared to its closest competitor was observed through this study. Ongoing work is aimed at manually constructing an annotated sentiment corpus for Algerian dialect and comparing the results


Symmetry ◽  
2019 ◽  
Vol 11 (10) ◽  
pp. 1290 ◽  
Author(s):  
Rahman ◽  
Siddiqui

Abstractive text summarization that generates a summary by paraphrasing a long text remains an open significant problem for natural language processing. In this paper, we present an abstractive text summarization model, multi-layered attentional peephole convolutional LSTM (long short-term memory) (MAPCoL) that automatically generates a summary from a long text. We optimize parameters of MAPCoL using central composite design (CCD) in combination with the response surface methodology (RSM), which gives the highest accuracy in terms of summary generation. We record the accuracy of our model (MAPCoL) on a CNN/DailyMail dataset. We perform a comparative analysis of the accuracy of MAPCoL with that of the state-of-the-art models in different experimental settings. The MAPCoL also outperforms the traditional LSTM-based models in respect of semantic coherence in the output summary.


2019 ◽  
Vol 2019 ◽  
pp. 1-12 ◽  
Author(s):  
Haifeng Sang ◽  
Chuanzheng Wang ◽  
Dakuo He ◽  
Qing Liu

This paper presents a multi-information flow convolutional neural network (MiF-CNN) model for person reidentification (re-id). It contains several specific multilayer convolutional structures, where the input and output of a convolutional layer are concatenated together on channel dimension. With this idea, layers of model can go deeper and feature maps can be reused by each subsequent layer. Inspired by an image caption, a person attribute recognition network is proposed based on long-short-term memory network and attention mechanism. By fusing identification results of MiF-CNN and attribute recognition, this paper introduces the attribute-aided reranking algorithm to improve the accuracy of person re-id further. Experiments on VIPeR, CUHK01, and Market1501 datasets verify the proposed MiF-CNN can be trained sufficiently with small-scale datasets and obtain outstanding accuracy of person re-id. Contrast experiments also confirm the availability of the attribute-assisted reranking algorithm.


2006 ◽  
Vol 15 (04) ◽  
pp. 623-650
Author(s):  
JUDY A. FRANKLIN

Recurrent (neural) networks have been deployed as models for learning musical processes, by computational scientists who study processes such as dynamic systems. Over time, more intricate music has been learned as the state of the art in recurrent networks improves. One particular recurrent network, the Long Short-Term Memory (LSTM) network shows promise for learning long songs, and generating new songs. We are experimenting with a module containing two inter-recurrent LSTM networks to cooperatively learn several human melodies, based on the songs' harmonic structures, and on the feedback inherent in the network. We show that these networks can learn to reproduce four human melodies. We then present as input new harmonizations, so as to generate new songs. We describe the reharmonizations, and show the new melodies that result. We also present a hierarchical structure for using reinforcement learning to choose LSTM modules during the course of melody generation.


Author(s):  
Jing Wang ◽  
Yingwei Pan ◽  
Ting Yao ◽  
Jinhui Tang ◽  
Tao Mei

Image paragraph generation is the task of producing a coherent story (usually a paragraph) that describes the visual content of an image. The problem nevertheless is not trivial especially when there are multiple descriptive and diverse gists to be considered for paragraph generation, which often happens in real images. A valid question is how to encapsulate such gists/topics that are worthy of mention from an image, and then describe the image from one topic to another but holistically with a coherent structure. In this paper, we present a new design --- Convolutional Auto-Encoding (CAE) that purely employs convolutional and deconvolutional auto-encoding framework for topic modeling on the region-level features of an image. Furthermore, we propose an architecture, namely CAE plus Long Short-Term Memory (dubbed as CAE-LSTM), that novelly integrates the learnt topics in support of paragraph generation. Technically, CAE-LSTM capitalizes on a two-level LSTM-based paragraph generation framework with attention mechanism. The paragraph-level LSTM captures the inter-sentence dependency in a paragraph, while sentence-level LSTM is to generate one sentence which is conditioned on each learnt topic. Extensive experiments are conducted on Stanford image paragraph dataset, and superior results are reported when comparing to state-of-the-art approaches. More remarkably, CAE-LSTM increases CIDEr performance from 20.93% to 25.15%.


Author(s):  
Amirul Sadikin Md Affendi ◽  
Marina Yusoff

<p>This paper presents a review of anomalous sound event detection(SED) approaches.  SED is becoming more applicable for real-world appliactaions such as security, fire determination or olther emergency alarms. Despite many research outcome previously, further research is required to reduce false positives and improve accurracy.  SED approaches are comprehensively organized by methods covering system pipeline components of acoustic descriptors, classification engine, and decision finalization method.  The review compares multiple approaches that is applied on a specific dataset.   Security relies on anomalous events in order to prevent it one must find these anomalous events.  Audio surveillance has become more efficient as that artificial intelligence has stepped up the game.  Autonomous SED could be used for early detection and prevention.  It is found that the state of the art method viable used in SED using features of log-mel energies in convolutional recurrent neural network(CRNN) with long short term memory(LSTM) with a verification step of thresholding has obtained 93.1% F1 score and 0.1307 ER. It is found that feature extraction of log mel energies are highly reliable method showing promising results on multiple experiments.</p>


2021 ◽  
Vol 25 (3) ◽  
pp. 1671-1687
Author(s):  
Andreas Wunsch ◽  
Tanja Liesch ◽  
Stefan Broda

Abstract. It is now well established to use shallow artificial neural networks (ANNs) to obtain accurate and reliable groundwater level forecasts, which are an important tool for sustainable groundwater management. However, we observe an increasing shift from conventional shallow ANNs to state-of-the-art deep-learning (DL) techniques, but a direct comparison of the performance is often lacking. Although they have already clearly proven their suitability, shallow recurrent networks frequently seem to be excluded from the study design due to the euphoria about new DL techniques and its successes in various disciplines. Therefore, we aim to provide an overview on the predictive ability in terms of groundwater levels of shallow conventional recurrent ANNs, namely non-linear autoregressive networks with exogenous input (NARX) and popular state-of-the-art DL techniques such as long short-term memory (LSTM) and convolutional neural networks (CNNs). We compare the performance on both sequence-to-value (seq2val) and sequence-to-sequence (seq2seq) forecasting on a 4-year period while using only few, widely available and easy to measure meteorological input parameters, which makes our approach widely applicable. Further, we also investigate the data dependency in terms of time series length of the different ANN architectures. For seq2val forecasts, NARX models on average perform best; however, CNNs are much faster and only slightly worse in terms of accuracy. For seq2seq forecasts, mostly NARX outperform both DL models and even almost reach the speed of CNNs. However, NARX are the least robust against initialization effects, which nevertheless can be handled easily using ensemble forecasting. We showed that shallow neural networks, such as NARX, should not be neglected in comparison to DL techniques especially when only small amounts of training data are available, where they can clearly outperform LSTMs and CNNs; however, LSTMs and CNNs might perform substantially better with a larger dataset, where DL really can demonstrate its strengths, which is rarely available in the groundwater domain though.


2019 ◽  
Vol 7 ◽  
pp. 121-138 ◽  
Author(s):  
Rumen Dangovski ◽  
Li Jing ◽  
Preslav Nakov ◽  
Mićo Tatalović ◽  
Marin Soljačić

Stacking long short-term memory (LSTM) cells or gated recurrent units (GRUs) as part of a recurrent neural network (RNN) has become a standard approach to solving a number of tasks ranging from language modeling to text summarization. Although LSTMs and GRUs were designed to model long-range dependencies more accurately than conventional RNNs, they nevertheless have problems copying or recalling information from the long distant past. Here, we derive a phase-coded representation of the memory state, Rotational Unit of Memory (RUM), that unifies the concepts of unitary learning and associative memory. We show experimentally that RNNs based on RUMs can solve basic sequential tasks such as memory copying and memory recall much better than LSTMs/GRUs. We further demonstrate that by replacing LSTM/GRU with RUM units we can apply neural networks to real-world problems such as language modeling and text summarization, yielding results comparable to the state of the art.


Sensors ◽  
2020 ◽  
Vol 20 (11) ◽  
pp. 3115
Author(s):  
Wei Yang ◽  
Xiang Zhang ◽  
Qian Lei ◽  
Dengye Shen ◽  
Ping Xiao ◽  
...  

Accurate detection of lane lines is of great significance for improving vehicle driving safety. In our previous research, by improving the horizontal and vertical density of the detection grid in the YOLO v3 (You Only Look Once, the 3th version) model, the obtained lane line (LL) algorithm, YOLO v3 (S × 2S), has high accuracy. However, like the traditional LL detection algorithms, they do not use spatial information and have low detection accuracy under occlusion, deformation, worn, poor lighting, and other non-ideal environmental conditions. After studying the spatial information between LLs and learning the distribution law of LLs, an LL prediction model based on long short-term memory (LSTM) and recursive neural network (RcNN) was established; the method can predict the future LL position by using historical LL position information. Moreover, by combining the LL information predicted with YOLO v3 (S × 2S) detection results using Dempster Shafer (D-S) evidence theory, the LL detection accuracy can be improved effectively, and the uncertainty of this system be reduced correspondingly. The results show that the accuracy of LL detection can be significantly improved in rainy, snowy weather, and obstacle scenes.


Sign in / Sign up

Export Citation Format

Share Document