scholarly journals HIGH QUALITY FACADE SEGMENTATION BASED ON STRUCTURED RANDOM FOREST, REGION PROPOSAL NETWORK AND RECTANGULAR FITTING

Author(s):  
K. Rahmani ◽  
H. Mayer

In this paper we present a pipeline for high quality semantic segmentation of building facades using Structured Random Forest (SRF), Region Proposal Network (RPN) based on a Convolutional Neural Network (CNN) as well as rectangular fitting optimization. Our main contribution is that we employ features created by the RPN as channels in the SRF.We empirically show that this is very effective especially for doors and windows. Our pipeline is evaluated on two datasets where we outperform current state-of-the-art methods. Additionally, we quantify the contribution of the RPN and the rectangular fitting optimization on the accuracy of the result.

2018 ◽  
Vol 232 ◽  
pp. 01061
Author(s):  
Danhua Li ◽  
Xiaofeng Di ◽  
Xuan Qu ◽  
Yunfei Zhao ◽  
Honggang Kong

Pedestrian detection aims to localize and recognize every pedestrian instance in an image with a bounding box. The current state-of-the-art method is Faster RCNN, which is such a network that uses a region proposal network (RPN) to generate high quality region proposals, while Fast RCNN is used to classifiers extract features into corresponding categories. The contribution of this paper is integrated low-level features and high-level features into a Faster RCNN-based pedestrian detection framework, which efficiently increase the capacity of the feature. Through our experiments, we comprehensively evaluate our framework, on the Caltech pedestrian detection benchmark and our methods achieve state-of-the-art accuracy and present a competitive result on Caltech dataset.


2018 ◽  
Vol 4 (9) ◽  
pp. 107 ◽  
Author(s):  
Mohib Ullah ◽  
Ahmed Mohammed ◽  
Faouzi Alaya Cheikh

Articulation modeling, feature extraction, and classification are the important components of pedestrian segmentation. Usually, these components are modeled independently from each other and then combined in a sequential way. However, this approach is prone to poor segmentation if any individual component is weakly designed. To cope with this problem, we proposed a spatio-temporal convolutional neural network named PedNet which exploits temporal information for spatial segmentation. The backbone of the PedNet consists of an encoder–decoder network for downsampling and upsampling the feature maps, respectively. The input to the network is a set of three frames and the output is a binary mask of the segmented regions in the middle frame. Irrespective of classical deep models where the convolution layers are followed by a fully connected layer for classification, PedNet is a Fully Convolutional Network (FCN). It is trained end-to-end and the segmentation is achieved without the need of any pre- or post-processing. The main characteristic of PedNet is its unique design where it performs segmentation on a frame-by-frame basis but it uses the temporal information from the previous and the future frame for segmenting the pedestrian in the current frame. Moreover, to combine the low-level features with the high-level semantic information learned by the deeper layers, we used long-skip connections from the encoder to decoder network and concatenate the output of low-level layers with the higher level layers. This approach helps to get segmentation map with sharp boundaries. To show the potential benefits of temporal information, we also visualized different layers of the network. The visualization showed that the network learned different information from the consecutive frames and then combined the information optimally to segment the middle frame. We evaluated our approach on eight challenging datasets where humans are involved in different activities with severe articulation (football, road crossing, surveillance). The most common CamVid dataset which is used for calculating the performance of the segmentation algorithm is evaluated against seven state-of-the-art methods. The performance is shown on precision/recall, F 1 , F 2 , and mIoU. The qualitative and quantitative results show that PedNet achieves promising results against state-of-the-art methods with substantial improvement in terms of all the performance metrics.


Author(s):  
Jianwen Jiang ◽  
Di Bao ◽  
Ziqiang Chen ◽  
Xibin Zhao ◽  
Yue Gao

3D shape retrieval has attracted much attention and become a hot topic in computer vision field recently.With the development of deep learning, 3D shape retrieval has also made great progress and many view-based methods have been introduced in recent years. However, how to represent 3D shapes better is still a challenging problem. At the same time, the intrinsic hierarchical associations among views still have not been well utilized. In order to tackle these problems, in this paper, we propose a multi-loop-view convolutional neural network (MLVCNN) framework for 3D shape retrieval. In this method, multiple groups of views are extracted from different loop directions first. Given these multiple loop views, the proposed MLVCNN framework introduces a hierarchical view-loop-shape architecture, i.e., the view level, the loop level, and the shape level, to conduct 3D shape representation from different scales. In the view-level, a convolutional neural network is first trained to extract view features. Then, the proposed Loop Normalization and LSTM are utilized for each loop of view to generate the loop-level features, which considering the intrinsic associations of the different views in the same loop. Finally, all the loop-level descriptors are combined into a shape-level descriptor for 3D shape representation, which is used for 3D shape retrieval. Our proposed method has been evaluated on the public 3D shape benchmark, i.e., ModelNet40. Experiments and comparisons with the state-of-the-art methods show that the proposed MLVCNN method can achieve significant performance improvement on 3D shape retrieval tasks. Our MLVCNN outperforms the state-of-the-art methods by the mAP of 4.84% in 3D shape retrieval task. We have also evaluated the performance of the proposed method on the 3D shape classification task where MLVCNN also achieves superior performance compared with recent methods.


Sensors ◽  
2019 ◽  
Vol 19 (8) ◽  
pp. 1795 ◽  
Author(s):  
Xiao Lin ◽  
Dalila Sánchez-Escobedo ◽  
Josep R. Casas ◽  
Montse Pardàs

Semantic segmentation and depth estimation are two important tasks in computer vision, and many methods have been developed to tackle them. Commonly these two tasks are addressed independently, but recently the idea of merging these two problems into a sole framework has been studied under the assumption that integrating two highly correlated tasks may benefit each other to improve the estimation accuracy. In this paper, depth estimation and semantic segmentation are jointly addressed using a single RGB input image under a unified convolutional neural network. We analyze two different architectures to evaluate which features are more relevant when shared by the two tasks and which features should be kept separated to achieve a mutual improvement. Likewise, our approaches are evaluated under two different scenarios designed to review our results versus single-task and multi-task methods. Qualitative and quantitative experiments demonstrate that the performance of our methodology outperforms the state of the art on single-task approaches, while obtaining competitive results compared with other multi-task methods.


2020 ◽  
Author(s):  
Fábia Isabella Pires Enembreck ◽  
Erikson Freitas de Morais ◽  
Marcella Scoczynski Ribeiro Martins

Abstract The person re-identification problem addresses the task of identify if a person being watched by security cameras in surveillance environments has ever been in the scene. This problem is considered challenging, since the images obtained by cameras are subject to many variations, such as lighting, perspective and occlusions. This work aims to develop two robust approaches based on deep learning techniques for person re-identification, considering these variations. The first approach uses a Siamese neural network composed by two identical subnets. This model receives two input images that may or may not be from the same person. The second approach consists of a triplet neural network, with three identical subnets, which receives a reference image from a certain person, a second image from the same person and another image from a different person. Both approaches have identical subnets, composed by a convolutional neural network which extracts general characteristics from each image and an autoencoder model, responsible for addressing high variations that input images may undergo. To compare the developed networks, three datasets were used, and the accuracy and the CMC curve metrics were applied for the analysis. The experiments showed an improvement in the results with the use of the autoencoder in the subnets. Besides, Triplet Neural Network presented promising results in comparison with Siamese Neural Network and state-of-the-art methods.


Author(s):  
Haitao Pu ◽  
Jian Lian ◽  
Mingqu Fan

In this paper, we propose an automatic convolutional neural network (CNN)-based method to recognize the chicken behavior within a poultry farm using a Kinect sensor. It resolves the hardships in flock behavior image classification by leveraging a data-driven mechanism and exploiting non-manually extracted multi-scale image features which combine both the local and global characteristics of the image. To our best knowledge, this is probably the first attempt of deep learning strategy in the field of domestic animal behavior recognition. To testify the performance of our proposed method, we conducted experiments between state-of-the-art methods and our method. Experimental results witness that our proposed approach outperforms the state-of-the-art methods both in effectiveness and efficiency. Our proposed CNN architecture for recognizing flock behavior of chickens produces an extremely impressive accuracy of 99.17%.


2021 ◽  
Vol 16 ◽  
Author(s):  
Hoang V. Tran ◽  
Quang H. Nguyen

Background: Reactive oxygen species (ROS) has many roles in the body such as cell signaling, homeostasis or protection from harmful bacteria. However, too much ROS in the body will damage lipids, proteins, and DNA. Many studies show that many environmental factors increase the amount of ROS produced in the body. Antioxidant proteins are responsible for neutralizing these ROS or free radicals. Although the amount of data on protein sequences has increased over the last two decades, we still lack bioinformatics tools to be able to accurately identify antioxidant protein sequences while biochemical methods to determine antioxidant proteins are very expensive and time consuming, so a machine learning approach must be used to speed up the computation. In this study. Methods: we propose a new method that combines convolutional neural network and Random Forest using two features, the normalized PSSM and the best selected feature of the ProtBert output. Result: Our model gave very good results on the independent test dataset with 97.3% sensitivity and 95.9% specificity. Comparison with current state of the art models shows that our model is superior. Conclusion: We have also installed iAnt as an online web site with a friendly interface available at http://antixiodant.nguyenhongquang.edu.vn. iAnt has been developed to accurately identify the antioxidant protein. It shows results outperforming the existing state-of-the-art methods, and it is available online.


2020 ◽  
Vol 10 (24) ◽  
pp. 8782
Author(s):  
Patrick L. Neary ◽  
Abbie T. Watnik ◽  
Kyle Peter Judd ◽  
James R. Lindle ◽  
Nicholas S. Flann

Turbulence and attenuation are signal degrading factors that can severely hinder free-space and underwater OAM optical pattern demultiplexing. A variety of state-of-the-art convolutional neural network architectures are explored to identify which, if any, provide optimal performance under these non-ideal environmental conditions. Hyperparameter searches are performed on the architectures to ensure that near-ideal settings are used for training. Architectures are compared in various scenarios and the best performing, with their settings, are provided. We show that from the current state-of-the-art architectures, DenseNet outperforms all others when memory is not a constraint. When memory footprint is a factor, ShuffleNet is shown to performed the best.


Author(s):  
K. Rahmani ◽  
H. Huang ◽  
H. Mayer

In this paper we present a bottom-up approach for the semantic segmentation of building facades. Facades have a predefined topology, contain specific objects such as doors and windows and follow architectural rules. Our goal is to create homogeneous segments for facade objects. To this end, we have created a pixelwise labeling method using a Structured Random Forest. According to the evaluation of results for two datasets with the classifier we have achieved the above goal producing a nearly noise-free labeling image and perform on par or even slightly better than the classifier-only stages of state-of-the-art approaches. This is due to the encoding of the local topological structure of the facade objects in the Structured Random Forest. Additionally, we have employed an iterative optimization approach to select the best possible labeling.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Enes Yiğit ◽  
Umut Özkaya ◽  
Şaban Öztürk ◽  
Dilbag Singh ◽  
Hassène Gritli

Power quality disturbance (PQD) is essential for devices consuming electricity and meeting today’s energy trends. This study contains an effective artificial intelligence (AI) framework for analyzing single or composite defects in power quality. A convolutional neural network (CNN) architecture, which has an output powered by a gated recurrent unit (GRU), is designed for this purpose. The proposed framework first obtains a matrix using a short-time Fourier transform (STFT) of PQD signals. This matrix contains the representation of the signal in the time and frequency domains, suitable for CNN input. Features are automatically extracted from these matrices using the proposed CNN architecture without preprocessing. These features are classified using the GRU. The performance of the proposed framework is tested using a dataset containing a total of seven single and composite defects. The amount of noise in these examples varies between 20 and 50 dB. The performance of the proposed method is higher than current state-of-the-art methods. The proposed method obtained 98.44% ACC, 98.45% SEN, 99.74% SPE, 98.45% PRE, 98.45% F1-score, 98.19% MCC, and 93.64% kappa metric. A novel power quality disturbance (PQD) system has been proposed, and its application has been represented in our study. The proposed system could be used in the industry and factory.


Sign in / Sign up

Export Citation Format

Share Document