scholarly journals Global Contextual Dependency Network for Object Detection

2022 ◽  
Vol 14 (1) ◽  
pp. 27
Author(s):  
Junda Li ◽  
Chunxu Zhang ◽  
Bo Yang

Current two-stage object detectors extract the local visual features of Regions of Interest (RoIs) for object recognition and bounding-box regression. However, only using local visual features will lose global contextual dependencies, which are helpful to recognize objects with featureless appearances and restrain false detections. To tackle the problem, a simple framework, named Global Contextual Dependency Network (GCDN), is presented to enhance the classification ability of two-stage detectors. Our GCDN mainly consists of two components, Context Representation Module (CRM) and Context Dependency Module (CDM). Specifically, a CRM is proposed to construct multi-scale context representations. With CRM, contextual information can be fully explored at different scales. Moreover, the CDM is designed to capture global contextual dependencies. Our GCDN includes multiple CDMs. Each CDM utilizes local Region of Interest (RoI) features and single-scale context representation to generate single-scale contextual RoI features via the attention mechanism. Finally, the contextual RoI features generated by parallel CDMs independently are combined with the original RoI features to help classification. Experiments on MS-COCO 2017 benchmark dataset show that our approach brings continuous improvements for two-stage detectors.

Author(s):  
Xinhai Liu ◽  
Zhizhong Han ◽  
Yu-Shen Liu ◽  
Matthias Zwicker

Exploring contextual information in the local region is important for shape understanding and analysis. Existing studies often employ hand-crafted or explicit ways to encode contextual information of local regions. However, it is hard to capture fine-grained contextual information in hand-crafted or explicit manners, such as the correlation between different areas in a local region, which limits the discriminative ability of learned features. To resolve this issue, we propose a novel deep learning model for 3D point clouds, named Point2Sequence, to learn 3D shape features by capturing fine-grained contextual information in a novel implicit way. Point2Sequence employs a novel sequence learning model for point clouds to capture the correlations by aggregating multi-scale areas of each local region with attention. Specifically, Point2Sequence first learns the feature of each area scale in a local region. Then, it captures the correlation between area scales in the process of aggregating all area scales using a recurrent neural network (RNN) based encoder-decoder structure, where an attention mechanism is proposed to highlight the importance of different area scales. Experimental results show that Point2Sequence achieves state-of-the-art performance in shape classification and segmentation tasks.


Author(s):  
Yiran Zhu ◽  
Xing Xu ◽  
Fumin Shen ◽  
Yanli Ji ◽  
Lianli Gao ◽  
...  

Graph neural networks (GNNs) have been widely used in the 3D human pose estimation task, since the pose representation of a human body can be naturally modeled by the graph structure. Generally, most of the existing GNN-based models utilize the restricted receptive fields of filters and single-scale information, while neglecting the valuable multi-scale contextual information. To tackle this issue, we propose a novel Graph Transformer Encoder-Decoder with Atrous Convolution, named PoseGTAC, to effectively extract multi-scale context and long-range information. In our proposed PoseGTAC model, Graph Atrous Convolution (GAC) and Graph Transformer Layer (GTL), respectively for the extraction of local multi-scale and global long-range information, are combined and stacked in an encoder-decoder structure, where graph pooling and unpooling are adopted for the interaction of multi-scale information from local to global (e.g., part-scale and body-scale). Extensive experiments on the Human3.6M and MPI-INF-3DHP datasets demonstrate that the proposed PoseGTAC model exceeds all previous methods and achieves state-of-the-art performance.


Author(s):  
Yutong Yan ◽  
Pierre-Henri Conze ◽  
Gwenolé Quellec ◽  
Mathieu Lamard ◽  
Beatrice Cochener ◽  
...  

Vibration ◽  
2020 ◽  
Vol 4 (1) ◽  
pp. 49-63
Author(s):  
Waad Subber ◽  
Sayan Ghosh ◽  
Piyush Pandita ◽  
Yiming Zhang ◽  
Liping Wang

Industrial dynamical systems often exhibit multi-scale responses due to material heterogeneity and complex operation conditions. The smallest length-scale of the systems dynamics controls the numerical resolution required to resolve the embedded physics. In practice however, high numerical resolution is only required in a confined region of the domain where fast dynamics or localized material variability is exhibited, whereas a coarser discretization can be sufficient in the rest majority of the domain. Partitioning the complex dynamical system into smaller easier-to-solve problems based on the localized dynamics and material variability can reduce the overall computational cost. The region of interest can be specified based on the localized features of the solution, user interest, and correlation length of the material properties. For problems where a region of interest is not evident, Bayesian inference can provide a feasible solution. In this work, we employ a Bayesian framework to update the prior knowledge of the localized region of interest using measurements of the system response. Once, the region of interest is identified, the localized uncertainty is propagate forward through the computational domain. We demonstrate our framework using numerical experiments on a three-dimensional elastodynamic problem.


2020 ◽  
Vol 13 (1) ◽  
pp. 60
Author(s):  
Chenjie Wang ◽  
Chengyuan Li ◽  
Jun Liu ◽  
Bin Luo ◽  
Xin Su ◽  
...  

Most scenes in practical applications are dynamic scenes containing moving objects, so accurately segmenting moving objects is crucial for many computer vision applications. In order to efficiently segment all the moving objects in the scene, regardless of whether the object has a predefined semantic label, we propose a two-level nested octave U-structure network with a multi-scale attention mechanism, called U2-ONet. U2-ONet takes two RGB frames, the optical flow between these frames, and the instance segmentation of the frames as inputs. Each stage of U2-ONet is filled with the newly designed octave residual U-block (ORSU block) to enhance the ability to obtain more contextual information at different scales while reducing the spatial redundancy of the feature maps. In order to efficiently train the multi-scale deep network, we introduce a hierarchical training supervision strategy that calculates the loss at each level while adding knowledge-matching loss to keep the optimization consistent. The experimental results show that the proposed U2-ONet method can achieve a state-of-the-art performance in several general moving object segmentation datasets.


Sensors ◽  
2021 ◽  
Vol 21 (15) ◽  
pp. 5137
Author(s):  
Elham Eslami ◽  
Hae-Bum Yun

Automated pavement distress recognition is a key step in smart infrastructure assessment. Advances in deep learning and computer vision have improved the automated recognition of pavement distresses in road surface images. This task remains challenging due to the high variation of defects in shapes and sizes, demanding a better incorporation of contextual information into deep networks. In this paper, we show that an attention-based multi-scale convolutional neural network (A+MCNN) improves the automated classification of common distress and non-distress objects in pavement images by (i) encoding contextual information through multi-scale input tiles and (ii) employing a mid-fusion approach with an attention module for heterogeneous image contexts from different input scales. A+MCNN is trained and tested with four distress classes (crack, crack seal, patch, pothole), five non-distress classes (joint, marker, manhole cover, curbing, shoulder), and two pavement classes (asphalt, concrete). A+MCNN is compared with four deep classifiers that are widely used in transportation applications and a generic CNN classifier (as the control model). The results show that A+MCNN consistently outperforms the baselines by 1∼26% on average in terms of the F-score. A comprehensive discussion is also presented regarding how these classifiers perform differently on different road objects, which has been rarely addressed in the existing literature.


Author(s):  
Djamel Guessoum ◽  
Moeiz Miraoui ◽  
Chakib Tadj

Purpose This paper aims to apply a contextual case-based reasoning (CBR) to a mobile device. The CBR method was chosen because it does not require training, demands minimal processing resources and easily integrates with the dynamic and uncertain nature of pervasive computing. Based on a mobile user’s location and activity, which can be determined through the device’s inertial sensors and GPS capabilities, it is possible to select and offer appropriate services to this user. Design/methodology/approach The proposed approach comprises two stages. The first stage uses simple semantic similarity measures to retrieve the case from the case base that best matches the current case. In the second stage, the obtained selection of services is then filtered based on current contextual information. Findings This two-stage method adds a higher level of relevance to the services proposed to the user; yet, it is easy to implement on a mobile device. Originality/value A two-stage CBR using light processing methods and generating context aware services is discussed. Ontological location modeling adds reasoning flexibility and knowledge sharing capabilities.


2021 ◽  
Vol 16 (1) ◽  
pp. 71-94
Author(s):  
Hairi Karim ◽  
Alias Abdul Rahman ◽  
Suhaibah Azri ◽  
Zurairah Halim

The CityGML model is now the norm for smart city or digital twin city development for better planning, management, risk-related modelling and other applications. CityGML comes with five levels of detail (LoD), mainly constructed from point cloud measurements and images of several systems, resulting in a variety of accuracies and detailed models. The LoDs, also known as pre-defined multi-scale models, require large storage-memory-graphic consumption compared to single scale models. Furthermore, these multi-scales have redundancy in geometries, attributes, are costly in terms of time and workload in updating tasks, and are difficult to view in a single viewer. It is essential for data owners to engage with a suitable multi-scale spatial management solution in minimizes the drawbacks of the current implementation. The proper construction, control and management of multi-scale models are needed to encourage and expedite data sharing among data owners, agencies, stakeholders and public users for efficient information retrieval and analyses. This paper discusses the construction of the CityGML model with different LoDs using several datasets. A scale unique ID is introduced to connect all respective LoDs for cross-LoD information queries within a single viewer. The paper also highlights the benefits of intermediate outputs and limitations of the proposed solution, as well as suggestions for the future.


2021 ◽  
Vol 7 (10) ◽  
pp. 850
Author(s):  
Veena Mayya ◽  
Sowmya Kamath Shevgoor ◽  
Uma Kulkarni ◽  
Manali Hazarika ◽  
Prabal Datta Barua ◽  
...  

Microbial keratitis is an infection of the cornea of the eye that is commonly caused by prolonged contact lens wear, corneal trauma, pre-existing systemic disorders and other ocular surface disorders. It can result in severe visual impairment if improperly managed. According to the latest World Vision Report, at least 4.2 million people worldwide suffer from corneal opacities caused by infectious agents such as fungi, bacteria, protozoa and viruses. In patients with fungal keratitis (FK), often overt symptoms are not evident, until an advanced stage. Furthermore, it has been reported that clear discrimination between bacterial keratitis and FK is a challenging process even for trained corneal experts and is often misdiagnosed in more than 30% of the cases. However, if diagnosed early, vision impairment can be prevented through early cost-effective interventions. In this work, we propose a multi-scale convolutional neural network (MS-CNN) for accurate segmentation of the corneal region to enable early FK diagnosis. The proposed approach consists of a deep neural pipeline for corneal region segmentation followed by a ResNeXt model to differentiate between FK and non-FK classes. The model trained on the segmented images in the region of interest, achieved a diagnostic accuracy of 88.96%. The features learnt by the model emphasize that it can correctly identify dominant corneal lesions for detecting FK.


Sign in / Sign up

Export Citation Format

Share Document