Online supervised attention-based recurrent depth estimation from monocular video

PeerJ Computer Science ◽

10.7717/peerj-cs.317 ◽

2020 ◽

Vol 6 ◽

pp. e317

Author(s):

Dmitrii Maslov ◽

Ilya Makarov

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Short Term Memory ◽

Depth Estimation ◽

Autonomous Driving ◽

Temporal Information ◽

Depth Information ◽

Safe Driving ◽

Monocular Video ◽

Depth Reconstruction

Autonomous driving highly depends on depth information for safe driving. Recently, major improvements have been taken towards improving both supervised and self-supervised methods for depth reconstruction. However, most of the current approaches focus on single frame depth estimation, where quality limit is hard to beat due to limitations of supervised learning of deep neural networks in general. One of the way to improve quality of existing methods is to utilize temporal information from frame sequences. In this paper, we study intelligent ways of integrating recurrent block in common supervised depth estimation pipeline. We propose a novel method, which takes advantage of the convolutional gated recurrent unit (convGRU) and convolutional long short-term memory (convLSTM). We compare use of convGRU and convLSTM blocks and determine the best model for real-time depth estimation task. We carefully study training strategy and provide new deep neural networks architectures for the task of depth estimation from monocular video using information from past frames based on attention mechanism. We demonstrate the efficiency of exploiting temporal information by comparing our best recurrent method with existing image-based and video-based solutions for monocular depth reconstruction.

Download Full-text

Sensor Simulation for Monocular Depth Estimation using Deep Neural Networks

10.1109/cw52790.2021.00010 ◽

2021 ◽

Author(s):

Christon R. Nadar ◽

Christian Kunert ◽

Tobias Schwandt ◽

Wolfgang Broll

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Depth Estimation ◽

Monocular Depth

Download Full-text

Part-of-Speech Tagging via Deep Neural Networks for Northern-Ethiopic Languages

Information Technology And Control ◽

10.5755/j01.itc.49.4.26808 ◽

2020 ◽

Vol 49 (4) ◽

pp. 482-494

Author(s):

Jurgita Kapočiūtė-Dzikienė ◽

Senait Gebremichael Tesfagergish

Keyword(s):

Neural Network ◽

Neural Networks ◽

Language Processing ◽

Deep Neural Networks ◽

Short Term Memory ◽

Parameter Tuning ◽

Feed Forward Neural Network ◽

Pos Tagging ◽

Part Of Speech ◽

Pos Tagger

Deep Neural Networks (DNNs) have proven to be especially successful in the area of Natural Language Processing (NLP) and Part-Of-Speech (POS) tagging—which is the process of mapping words to their corresponding POS labels depending on the context. Despite recent development of language technologies, low-resourced languages (such as an East African Tigrinya language), have received too little attention. We investigate the effectiveness of Deep Learning (DL) solutions for the low-resourced Tigrinya language of the Northern-Ethiopic branch. We have selected Tigrinya as the testbed example and have tested state-of-the-art DL approaches seeking to build the most accurate POS tagger. We have evaluated DNN classifiers (Feed Forward Neural Network – FFNN, Long Short-Term Memory method – LSTM, Bidirectional LSTM, and Convolutional Neural Network – CNN) on a top of neural word2vec word embeddings with a small training corpus known as Nagaoka Tigrinya Corpus. To determine the best DNN classifier type, its architecture and hyper-parameter set both manual and automatic hyper-parameter tuning has been performed. BiLSTM method was proved to be the most suitable for our solving task: it achieved the highest accuracy equal to 92% that is 65% above the random baseline.

Download Full-text

SADA: Semantic Adversarial Diagnostic Attacks for Autonomous Applications

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6722 ◽

2020 ◽

Vol 34 (07) ◽

pp. 10901-10908 ◽

Cited By ~ 2

Author(s):

Abdullah Hamdi ◽

Matthias Mueller ◽

Bernard Ghanem

Keyword(s):

Neural Networks ◽

Recent Work ◽

Autonomous Navigation ◽

General Framework ◽

Deep Neural Networks ◽

Autonomous Driving ◽

Black Box ◽

Semantic Meaning ◽

Safety Critical ◽

Adversarial Attack

One major factor impeding more widespread adoption of deep neural networks (DNNs) is their lack of robustness, which is essential for safety-critical applications such as autonomous driving. This has motivated much recent work on adversarial attacks for DNNs, which mostly focus on pixel-level perturbations void of semantic meaning. In contrast, we present a general framework for adversarial attacks on trained agents, which covers semantic perturbations to the environment of the agent performing the task as well as pixel-level attacks. To do this, we re-frame the adversarial attack problem as learning a distribution of parameters that always fools the agent. In the semantic case, our proposed adversary (denoted as BBGAN) is trained to sample parameters that describe the environment with which the black-box agent interacts, such that the agent performs its dedicated task poorly in this environment. We apply BBGAN on three different tasks, primarily targeting aspects of autonomous navigation: object detection, self-driving, and autonomous UAV racing. On these tasks, BBGAN can generate failure cases that consistently fool a trained agent.

Download Full-text

Convolutional Nonlinear Differential Recurrent Neural Networks for Crowd Scene Understanding

International Journal of Semantic Computing ◽

10.1142/s1793351x18400196 ◽

2018 ◽

Vol 12 (04) ◽

pp. 481-500 ◽

Cited By ~ 1

Author(s):

Naifan Zhuang ◽

The Duc Kieu ◽

Jun Ye ◽

Kien A. Hua

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Short Term Memory ◽

Scene Understanding ◽

Image Data ◽

High Density ◽

Temporal Information ◽

Deep Model ◽

End To End ◽

The Individual

With the growth of crowd phenomena in the real world, crowd scene understanding is becoming an important task in anomaly detection and public security. Visual ambiguities and occlusions, high density, low mobility, and scene semantics, however, make this problem a great challenge. In this paper, we propose an end-to-end deep architecture, convolutional nonlinear differential recurrent neural networks (CNDRNNs), for crowd scene understanding. CNDRNNs consist of GoogleNet Inception V3 convolutional neural networks (CNNs) and nonlinear differential recurrent neural networks (RNNs). Different from traditional non-end-to-end solutions which separate the steps of feature extraction and parameter learning, CNDRNN utilizes a unified deep model to optimize the parameters of CNN and RNN hand in hand. It thus has the potential of generating a more harmonious model. The proposed architecture takes sequential raw image data as input, and does not rely on tracklet or trajectory detection. It thus has clear advantages over the traditional flow-based and trajectory-based methods, especially in challenging crowd scenarios of high density and low mobility. Taking advantage of CNN and RNN, CNDRNN can effectively analyze the crowd semantics. Specifically, CNN is good at modeling the semantic crowd scene information. On the other hand, nonlinear differential RNN models the motion information. The individual and increasing orders of derivative of states (DoS) in differential RNN can progressively build up the ability of the long short-term memory (LSTM) gates to detect different levels of salient dynamical patterns in deeper stacked layers modeling higher orders of DoS. Lastly, existing LSTM-based crowd scene solutions explore deep temporal information and are claimed to be “deep in time.” Our proposed method CNDRNN, however, models the spatial and temporal information in a unified architecture and achieves “deep in space and time.” Extensive performance studies on the Violent-Flows, CUHK Crowd, and NUS-HGA datasets show that the proposed technique significantly outperforms state-of-the-art methods.

Download Full-text

Multiple Source Localization in a Shallow Water Waveguide Exploiting Subarray Beamforming and Deep Neural Networks

Sensors ◽

10.3390/s19214768 ◽

2019 ◽

Vol 19 (21) ◽

pp. 4768 ◽

Cited By ~ 2

Author(s):

Zhaoqiong Huang ◽

Ji Xu ◽

Zaixiao Gong ◽

Haibin Wang ◽

Yonghong Yan

Keyword(s):

Neural Network ◽

Neural Networks ◽

Shallow Water ◽

Source Localization ◽

Deep Neural Networks ◽

Short Term Memory ◽

Direction Finding ◽

Multiple Source ◽

Feed Forward Neural Network ◽

Subarray Beamforming

Deep neural networks (DNNs) have been shown to be effective for single sound source localization in shallow water environments. However, multiple source localization is a more challenging task because of the interactions among multiple acoustic signals. This paper proposes a framework for multiple source localization on underwater horizontal arrays using deep neural networks. The two-stage DNNs are adopted to determine both the directions and ranges of multiple sources successively. A feed-forward neural network is trained for direction finding, while the long short term memory recurrent neural network is used for source ranging. Particularly, in the source ranging stage, we perform subarray beamforming to extract features of sources that are detected by the direction finding stage, because subarray beamforming can enhance the mixed signal to the desired direction while preserving the horizontal-longitudinal correlations of the acoustic field. In this way, a universal model trained in the single-source scenario can be applied to multi-source scenarios with arbitrary numbers of sources. Both simulations and experiments in a range-independent shallow water environment of SWellEx-96 Event S5 are given to demonstrate the effectiveness of the proposed method.

Download Full-text

Long Short-Term Memory (LSTM) Deep Neural Networks in Energy Appliances Prediction

2019 Panhellenic Conference on Electronics & Telecommunications (PACET) ◽

10.1109/pacet48583.2019.8956252 ◽

2019 ◽

Cited By ~ 1

Author(s):

Georgios N. Kouziokas

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Short Term Memory ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

Download Full-text

LSTM Deep Neural Networks Postfiltering for Enhancing Synthetic Voices

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800141860008x ◽

2017 ◽

Vol 32 (01) ◽

pp. 1860008 ◽

Cited By ~ 8

Author(s):

Marvin Coto-Jiménez ◽

John Goddard-Close

Keyword(s):

Neural Networks ◽

Speech Synthesis ◽

Deep Neural Networks ◽

Short Term Memory ◽

Markov Models ◽

Natural Speech ◽

Objective Measures ◽

Recent Developments ◽

Small Footprint ◽

Synthetic Voices

Recent developments in speech synthesis have produced systems capable of producing speech which closely resembles natural speech, and researchers now strive to create models that more accurately mimic human voices. One such development is the incorporation of multiple linguistic styles in various languages and accents. Speech synthesis based on Hidden Markov Models (HMM) is of great interest to researchers, due to its ability to produce sophisticated features with a small footprint. Despite some progress, its quality has not yet reached the level of the current predominant unit-selection approaches, which select and concatenate recordings of real speech, and work has been conducted to try to improve HMM-based systems. In this paper, we present an application of long short-term memory (LSTM) deep neural networks as a postfiltering step in HMM-based speech synthesis. Our motivation stems from a similar desire to obtain characteristics which are closer to those of natural speech. The paper analyzes four types of postfilters obtained using five voices, which range from a single postfilter to enhance all the parameters, to a multi-stream proposal which separately enhances groups of parameters. The different proposals are evaluated using three objective measures and are statistically compared to determine any significance between them. The results described in the paper indicate that HMM-based voices can be enhanced using this approach, specially for the multi-stream postfilters on the considered objective measures.

Download Full-text

Importance-Aware Semantic Segmentation for Autonomous Driving System

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/208 ◽

2017 ◽

Cited By ~ 2

Author(s):

Bi-ke Chen ◽

Chen Gong ◽

Jian Yang

Keyword(s):

Deep Neural Networks ◽

Semantic Segmentation ◽

Autonomous Driving ◽

Learning Models ◽

Safe Driving ◽

Driving System ◽

Backward Propagation ◽

Autonomous Driving System ◽

Propagation Rules ◽

Different Levels

Semantic Segmentation (SS) partitions an image into several coherent semantically meaningful parts, and classifies each part into one of the pre-determined classes. In this paper, we argue that existing SS methods cannot be reliably applied to autonomous driving system as they ignore the different importance levels of distinct classes for safe-driving. For example, pedestrians in the scene are much more important than sky when driving a car, so their segmentations should be as accurate as possible. To incorporate the importance information possessed by various object classes, this paper designs an "Importance-Aware Loss" (IAL) that specifically emphasizes the critical objects for autonomous driving. IAL operates under a hierarchical structure, and the classes with different importance are located in different levels so that they are assigned distinct weights. Furthermore, we derive the forward and backward propagation rules for IAL and apply them to deep neural networks for realizing SS in intelligent driving system. The experiments on CamVid and Cityscapes datasets reveal that by employing the proposed loss function, the existing deep learning models including FCN, SegNet and ENet are able to consistently obtain the improved segmentation results on the pre-defined important classes for safe-driving.

Download Full-text

Classification of High Resolution Automotive Radar Imagery for Autonomous Driving Based on Deep Neural Networks

2019 20th International Radar Symposium (IRS) ◽

10.23919/irs.2019.8768156 ◽

2019 ◽

Author(s):

Ana Stroescu ◽

Mikhail Cherniakov ◽

Marina Gashinova

Keyword(s):

Neural Networks ◽

High Resolution ◽

Deep Neural Networks ◽

Autonomous Driving ◽

Automotive Radar ◽

Radar Imagery

Download Full-text

Multilingual Convolutional, Long Short-Term Memory, Deep Neural Networks for Low Resource Speech Recognition

Procedia Computer Science ◽

10.1016/j.procs.2017.03.179 ◽

2017 ◽

Vol 107 ◽

pp. 842-847 ◽

Cited By ~ 8

Author(s):

Danish bukhari ◽

Yutian Wang ◽

Hui Wang

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Deep Neural Networks ◽

Short Term Memory ◽

Short Term ◽

Term Memory ◽

Low Resource ◽

Long Short Term Memory

Download Full-text