scholarly journals Planar Prior Assisted PatchMatch Multi-View Stereo

2020 ◽  
Vol 34 (07) ◽  
pp. 12516-12523
Author(s):  
Qingshan Xu ◽  
Wenbing Tao

The completeness of 3D models is still a challenging problem in multi-view stereo (MVS) due to the unreliable photometric consistency in low-textured areas. Since low-textured areas usually exhibit strong planarity, planar models are advantageous to the depth estimation of low-textured areas. On the other hand, PatchMatch multi-view stereo is very efficient for its sampling and propagation scheme. By taking advantage of planar models and PatchMatch multi-view stereo, we propose a planar prior assisted PatchMatch multi-view stereo framework in this paper. In detail, we utilize a probabilistic graphical model to embed planar models into PatchMatch multi-view stereo and contribute a novel multi-view aggregated matching cost. This novel cost takes both photometric consistency and planar compatibility into consideration, making it suited for the depth estimation of both non-planar and planar regions. Experimental results demonstrate that our method can efficiently recover the depth information of extremely low-textured areas, thus obtaining high complete 3D models and achieving state-of-the-art performance.

2021 ◽  
Vol 11 (12) ◽  
pp. 5383
Author(s):  
Huachen Gao ◽  
Xiaoyu Liu ◽  
Meixia Qu ◽  
Shijie Huang

In recent studies, self-supervised learning methods have been explored for monocular depth estimation. They minimize the reconstruction loss of images instead of depth information as a supervised signal. However, existing methods usually assume that the corresponding points in different views should have the same color, which leads to unreliable unsupervised signals and ultimately damages the reconstruction loss during the training. Meanwhile, in the low texture region, it is unable to predict the disparity value of pixels correctly because of the small number of extracted features. To solve the above issues, we propose a network—PDANet—that integrates perceptual consistency and data augmentation consistency, which are more reliable unsupervised signals, into a regular unsupervised depth estimation model. Specifically, we apply a reliable data augmentation mechanism to minimize the loss of the disparity map generated by the original image and the augmented image, respectively, which will enhance the robustness of the image in the prediction of color fluctuation. At the same time, we aggregate the features of different layers extracted by a pre-trained VGG16 network to explore the higher-level perceptual differences between the input image and the generated one. Ablation studies demonstrate the effectiveness of each components, and PDANet shows high-quality depth estimation results on the KITTI benchmark, which optimizes the state-of-the-art method from 0.114 to 0.084, measured by absolute relative error for depth estimation.


Author(s):  
Dat Ba Nguyen ◽  
Martin Theobald ◽  
Gerhard Weikum

Methods for Named Entity Recognition and Disambiguation (NERD) perform NER and NED in two separate stages. Therefore, NED may be penalized with respect to precision by NER false positives, and suffers in recall from NER false negatives. Conversely, NED does not fully exploit information computed by NER such as types of mentions. This paper presents J-NERD, a new approach to perform NER and NED jointly, by means of a probabilistic graphical model that captures mention spans, mention types, and the mapping of mentions to entities in a knowledge base. We present experiments with different kinds of texts from the CoNLL’03, ACE’05, and ClueWeb’09-FACC1 corpora. J-NERD consistently outperforms state-of-the-art competitors in end-to-end NERD precision, recall, and F1.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2691
Author(s):  
Seung-Jun Hwang ◽  
Sung-Jun Park ◽  
Gyu-Min Kim ◽  
Joong-Hwan Baek

A colonoscopy is a medical examination used to check disease or abnormalities in the large intestine. If necessary, polyps or adenomas would be removed through the scope during a colonoscopy. Colorectal cancer can be prevented through this. However, the polyp detection rate differs depending on the condition and skill level of the endoscopist. Even some endoscopists have a 90% chance of missing an adenoma. Artificial intelligence and robot technologies for colonoscopy are being studied to compensate for these problems. In this study, we propose a self-supervised monocular depth estimation using spatiotemporal consistency in the colon environment. It is our contribution to propose a loss function for reconstruction errors between adjacent predicted depths and a depth feedback network that uses predicted depth information of the previous frame to predict the depth of the next frame. We performed quantitative and qualitative evaluation of our approach, and the proposed FBNet (depth FeedBack Network) outperformed state-of-the-art results for unsupervised depth estimation on the UCL datasets.


2020 ◽  
Vol 34 (07) ◽  
pp. 12870-12877 ◽  
Author(s):  
Songyang Zhang ◽  
Houwen Peng ◽  
Jianlong Fu ◽  
Jiebo Luo

We address the problem of retrieving a specific moment from an untrimmed video by a query sentence. This is a challenging problem because a target moment may take place in relations to other temporal moments in the untrimmed video. Existing methods cannot tackle this challenge well since they consider temporal moments individually and neglect the temporal dependencies. In this paper, we model the temporal relations between video moments by a two-dimensional map, where one dimension indicates the starting time of a moment and the other indicates the end time. This 2D temporal map can cover diverse video moments with different lengths, while representing their adjacent relations. Based on the 2D map, we propose a Temporal Adjacent Network (2D-TAN), a single-shot framework for moment localization. It is capable of encoding the adjacent temporal relation, while learning discriminative features for matching video moments with referring expressions. We evaluate the proposed 2D-TAN on three challenging benchmarks, i.e., Charades-STA, ActivityNet Captions, and TACoS, where our 2D-TAN outperforms the state-of-the-art.


Author(s):  
B. Xiong ◽  
S. Oude Elberink ◽  
G. Vosselman

Nowadays many cities and countries are creating their 3D building models for a better daily management and smarter decision making. The newly created 3D models are required to be consistent with existing 2D footprint maps. Thereby the 2D maps are usually combined with height data for the task of 3D reconstruction. Many buildings are often composed by parts that are discontinuous over height. Building parts can be reconstructed independently and combined into a complete building. Therefore, most of the state-of-the-art work on 3D building reconstruction first decomposes a footprint map into parts. However, those works usually change the footprint maps for easier partitioning and cannot detect building parts that are fully inside the footprint polygon. In order to solve those problems, we introduce two methodologies, one more dependent on height data, and the other one more dependent on footprints. We also experimentally evaluate the two methodologies and compare their advantages and disadvantages. The experiments use Airborne Laser Scanning (ALS) data and two vector maps, one with 1:10,000 scale and another one with 1:500 scale.


Author(s):  
B. Xiong ◽  
S. Oude Elberink ◽  
G. Vosselman

Nowadays many cities and countries are creating their 3D building models for a better daily management and smarter decision making. The newly created 3D models are required to be consistent with existing 2D footprint maps. Thereby the 2D maps are usually combined with height data for the task of 3D reconstruction. Many buildings are often composed by parts that are discontinuous over height. Building parts can be reconstructed independently and combined into a complete building. Therefore, most of the state-of-the-art work on 3D building reconstruction first decomposes a footprint map into parts. However, those works usually change the footprint maps for easier partitioning and cannot detect building parts that are fully inside the footprint polygon. In order to solve those problems, we introduce two methodologies, one more dependent on height data, and the other one more dependent on footprints. We also experimentally evaluate the two methodologies and compare their advantages and disadvantages. The experiments use Airborne Laser Scanning (ALS) data and two vector maps, one with 1:10,000 scale and another one with 1:500 scale.


Author(s):  
Gershon Elber

The computations of curve-curve and surface-surface intersections are considered difficult problems in geometric design. Numerous results were annually published on these topics for the last several decades. Moreover, the detection and more so the computation and even elimination of self-intersections in freeform curves and surfaces is viewed by many as a far more challenging problem, with much fewer satisfactory results. In recent years, several methods were developed to robustly detect, compute and even eliminate self intersections in general freeform (typically NURBs) curves and surfaces, exploiting intrinsic and/or geometric properties, on one side, and the algebraic structure of the shape, on the other. Other methods are specific and employ special properties of the problem in hand, as is the case of offset computation. In this work, we will survey some of our results and others, and provide a birds view of the current state-of-the-art research, on the self-intersections problem, in the freeform domain.


Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1692
Author(s):  
Lei Jin ◽  
Xiaojuan Wang ◽  
Mingshu He ◽  
Jingyue Wang

This paper focuses on 6Dof object pose estimation from a single RGB image. We tackle this challenging problem with a two-stage optimization framework. More specifically, we first introduce a translation estimation module to provide an initial translation based on an estimated depth map. Then, a pose regression module combines the ROI (Region of Interest) and the original image to predict the rotation and refine the translation. Compared with previous end-to-end methods that directly predict rotations and translations, our method can utilize depth information as weak guidance and significantly reduce the searching space for the subsequent module. Furthermore, we design a new loss function function for symmetric objects, an approach that has handled such exceptionally difficult cases in prior works. Experiments show that our model achieves state-of-the-art object pose estimation for the YCB- video dataset (Yale-CMU-Berkeley).


2021 ◽  
Author(s):  
Akila Pemasiri ◽  
Kien Nguyen ◽  
Sridha Sridha ◽  
Clinton Fookes

Abstract This work addresses hand mesh recovery from a single RGB image. In contrast to most of the existing approaches where parametric hand models are employed as the prior, we show that the hand mesh can be learned directly from the input image. We propose a new type of GAN called Im2Mesh GAN to learn the mesh through end-to-end adversarial training. By interpreting the mesh as a graph, our model is able to capture the topological relationship among the mesh vertices. We also introduce a 3D surface descriptor into the GAN architecture to further capture the associated 3D features. We conduct experiments with the proposed Im2Mesh GAN architecture in two settings: one where we can reap the benefits of coupled groundtruth data availability of the images and the corresponding meshes; and the other which combats the more challenging problem of mesh estimation without the corresponding groundtruth. Through extensive evaluations we demonstrate that even without using any hand priors the proposed method performs on par or better than the state-of-the-art.


2011 ◽  
Vol 34 (10) ◽  
pp. 1897-1906 ◽  
Author(s):  
Kun YUE ◽  
Wei-Yi LIU ◽  
Yun-Lei ZHU ◽  
Wei ZHANG

Sign in / Sign up

Export Citation Format

Share Document