Building pose estimation from the perspective of UAVs based on CNNs

Author(s):  
Zhenni Wu ◽  
Hengxin Chen ◽  
Bin Fang ◽  
Zihao Li ◽  
Xinrun Chen

With the rapid development of computer technology, building pose estimation combined with Augmented Reality (AR) can play a crucial role in the field of urban planning and architectural design. For example, a virtual building model can be placed into a realistic scenario acquired by a Unmanned Aerial Vehicle (UAV) to visually observe whether the building can integrate well with its surroundings, thus optimizing the design of the building. In the work, we contribute a building dataset for pose estimation named BD3D. To obtain accurate building pose, we use a physical camera which can simulate realistic cameras in Unity3D to simulate UAVs perspective and use virtual building models as objects. We propose a novel neural network that combines MultiBin module with PoseNet architecture to estimate the building pose. Sometimes, the building is symmetry and ambiguity causes its different surfaces to have similar features, making it difficult for CNNs to learn the differential features between the different surfaces. We propose a generalized world coordinate system repositioning strategy to deal with it. We evaluate our network with the strategy on BD3D, and the angle error is reduced to [Formula: see text] from [Formula: see text]. Code and dataset have been made available at: https://github.com/JellyFive/Building-pose-estimation-from-the-perspective-of-UAVs-based-on-CNNs .

Author(s):  
Xinyao Sun ◽  
Anup Basu ◽  
Irene Cheng

Hand pose estimation for a continuous sequence has been an important topic not only in computer vision but also human-computer-interaction. Exploring the feasibility to use hand gestures to replace input devices, e.g., mouse, keyboard, joy-stick and touch screen, has attracted increasing attention from academic and industrial researchers. The fast advancement of hand pose estimation techniques is complemented by the rapid development of smart sensors technology such as Kinect and Leap. We introduce a hand pose estimation multi-sensor system. Two tracking models are proposed based on Deep (Recurrent) Neural Network (DRNN) architecture. Data captured from different sensors are analyzed and fused to produce an optimal hand pose sequence. Experimental results show that our models outperform previous methods with better accuracy, meeting real-time application requirement. Performance comparisons between DNN and DRNN, spatial and spatial-temporal features, and single- and dual- sensors, are also presented.


Author(s):  
Xinyao Sun ◽  
Anup Basu ◽  
Irene Cheng

Hand pose estimation for a continuous sequence has been an important topic not only in computer vision but also human-computer-interaction. Exploring the feasibility to use hand gestures to replace input devices, e.g., mouse, keyboard, joy-stick and touch screen, has attracted increasing attention from academic and industrial researchers. The fast advancement of hand pose estimation techniques is complemented by the rapid development of smart sensors technology such as Kinect and Leap. We introduce a hand pose estimation multi-sensor system. Two tracking models are proposed based on Deep (Recurrent) Neural Network (DRNN) architecture. Data captured from different sensors are analyzed and fused to produce an optimal hand pose sequence. Experimental results show that our models outperform previous methods with better accuracy, meeting real-time application requirement. Performance comparisons between DNN and DRNN, spatial and spatial-temporal features, and single- and dual- sensors, are also presented.


2018 ◽  
Vol 1 (3) ◽  
pp. 76
Author(s):  
Bowen Hou

Abstract: the rapid development of computer technology has accelerated the progress of construction technology, and the application of virtual reality technology has become more and more common, which has caused earth-shaking changes in the thinking and mode of traditional architectural design. It plays an important role in optimizing the construction design scheme and improving the science and rationality of the architectural design. In order to realize the effective application of virtual technology in the future architectural design, it is necessary to intensify the research on its application and give full play to its application value and advantages.This paper discusses and analyzes the application and realization of virtual reality technology in the future architectural design, and predicts its future application prospects.


2020 ◽  
Vol 39 (6) ◽  
pp. 8927-8935
Author(s):  
Bing Zheng ◽  
Dawei Yun ◽  
Yan Liang

Under the impact of COVID-19, research on behavior recognition are highly needed. In this paper, we combine the algorithm of self-adaptive coder and recurrent neural network to realize the research of behavior pattern recognition. At present, most of the research of human behavior recognition is focused on the video data, which is based on the video number. At the same time, due to the complexity of video image data, it is easy to violate personal privacy. With the rapid development of Internet of things technology, it has attracted the attention of a large number of experts and scholars. Researchers have tried to use many machine learning methods, such as random forest, support vector machine and other shallow learning methods, which perform well in the laboratory environment, but there is still a long way to go from practical application. In this paper, a recursive neural network algorithm based on long and short term memory (LSTM) is proposed to realize the recognition of behavior patterns, so as to improve the accuracy of human activity behavior recognition.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Sangmin Jeon ◽  
Kyungmin Clara Lee

Abstract Objective The rapid development of artificial intelligence technologies for medical imaging has recently enabled automatic identification of anatomical landmarks on radiographs. The purpose of this study was to compare the results of an automatic cephalometric analysis using convolutional neural network with those obtained by a conventional cephalometric approach. Material and methods Cephalometric measurements of lateral cephalograms from 35 patients were obtained using an automatic program and a conventional program. Fifteen skeletal cephalometric measurements, nine dental cephalometric measurements, and two soft tissue cephalometric measurements obtained by the two methods were compared using paired t test and Bland-Altman plots. Results A comparison between the measurements from the automatic and conventional cephalometric analyses in terms of the paired t test confirmed that the saddle angle, linear measurements of maxillary incisor to NA line, and mandibular incisor to NB line showed statistically significant differences. All measurements were within the limits of agreement based on the Bland-Altman plots. The widths of limits of agreement were wider in dental measurements than those in the skeletal measurements. Conclusions Automatic cephalometric analyses based on convolutional neural network may offer clinically acceptable diagnostic performance. Careful consideration and additional manual adjustment are needed for dental measurements regarding tooth structures for higher accuracy and better performance.


2021 ◽  
Vol 13 ◽  
pp. 175682932110048
Author(s):  
Huajun Song ◽  
Yanqi Wu ◽  
Guangbing Zhou

With the rapid development of drones, many problems have arisen, such as invasion of privacy and endangering security. Inspired by biology, in order to achieve effective detection and robust tracking of small targets such as unmanned aerial vehicles, a binocular vision detection system is designed. The system is composed of long focus and wide-angle dual cameras, servo pan tilt, and dual processors for detecting and identifying targets. In view of the shortcomings of spatio-temporal context target tracking algorithm that cannot adapt to scale transformation and easy to track failure in complex scenes, the scale filter and loss criterion are introduced to make an improvement. Qualitative and quantitative experiments show that the designed system can adapt to the scale changes and partial occlusion conditions in the detection, and meets the real-time requirements. The hardware system and algorithm both have reference value for the application of anti-unmanned aerial vehicle systems.


Entropy ◽  
2021 ◽  
Vol 23 (7) ◽  
pp. 816
Author(s):  
Pingping Liu ◽  
Xiaokang Yang ◽  
Baixin Jin ◽  
Qiuzhan Zhou

Diabetic retinopathy (DR) is a common complication of diabetes mellitus (DM), and it is necessary to diagnose DR in the early stages of treatment. With the rapid development of convolutional neural networks in the field of image processing, deep learning methods have achieved great success in the field of medical image processing. Various medical lesion detection systems have been proposed to detect fundus lesions. At present, in the image classification process of diabetic retinopathy, the fine-grained properties of the diseased image are ignored and most of the retinopathy image data sets have serious uneven distribution problems, which limits the ability of the network to predict the classification of lesions to a large extent. We propose a new non-homologous bilinear pooling convolutional neural network model and combine it with the attention mechanism to further improve the network’s ability to extract specific features of the image. The experimental results show that, compared with the most popular fundus image classification models, the network model we proposed can greatly improve the prediction accuracy of the network while maintaining computational efficiency.


Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1792
Author(s):  
Juan Hagad ◽  
Tsukasa Kimura ◽  
Ken-ichi Fukui ◽  
Masayuki Numao

Two of the biggest challenges in building models for detecting emotions from electroencephalography (EEG) devices are the relatively small amount of labeled samples and the strong variability of signal feature distributions between different subjects. In this study, we propose a context-generalized model that tackles the data constraints and subject variability simultaneously using a deep neural network architecture optimized for normally distributed subject-independent feature embeddings. Variational autoencoders (VAEs) at the input level allow the lower feature layers of the model to be trained on both labeled and unlabeled samples, maximizing the use of the limited data resources. Meanwhile, variational regularization encourages the model to learn Gaussian-distributed feature embeddings, resulting in robustness to small dataset imbalances. Subject-adversarial regularization applied to the bi-lateral features further enforces subject-independence on the final feature embedding used for emotion classification. The results from subject-independent performance experiments on the SEED and DEAP EEG-emotion datasets show that our model generalizes better across subjects than other state-of-the-art feature embeddings when paired with deep learning classifiers. Furthermore, qualitative analysis of the embedding space reveals that our proposed subject-invariant bi-lateral variational domain adversarial neural network (BiVDANN) architecture may improve the subject-independent performance by discovering normally distributed features.


Author(s):  
Baiyu Peng ◽  
Qi Sun ◽  
Shengbo Eben Li ◽  
Dongsuk Kum ◽  
Yuming Yin ◽  
...  

AbstractRecent years have seen the rapid development of autonomous driving systems, which are typically designed in a hierarchical architecture or an end-to-end architecture. The hierarchical architecture is always complicated and hard to design, while the end-to-end architecture is more promising due to its simple structure. This paper puts forward an end-to-end autonomous driving method through a deep reinforcement learning algorithm Dueling Double Deep Q-Network, making it possible for the vehicle to learn end-to-end driving by itself. This paper firstly proposes an architecture for the end-to-end lane-keeping task. Unlike the traditional image-only state space, the presented state space is composed of both camera images and vehicle motion information. Then corresponding dueling neural network structure is introduced, which reduces the variance and improves sampling efficiency. Thirdly, the proposed method is applied to The Open Racing Car Simulator (TORCS) to demonstrate its great performance, where it surpasses human drivers. Finally, the saliency map of the neural network is visualized, which indicates the trained network drives by observing the lane lines. A video for the presented work is available online, https://youtu.be/76ciJmIHMD8 or https://v.youku.com/v_show/id_XNDM4ODc0MTM4NA==.html.


Sign in / Sign up

Export Citation Format

Share Document