scholarly journals Adaptive Unimodal Cost Volume Filtering for Deep Stereo Matching

2020 ◽  
Vol 34 (07) ◽  
pp. 12926-12934
Author(s):  
Youmin Zhang ◽  
Yimin Chen ◽  
Xiao Bai ◽  
Suihanjin Yu ◽  
Kun Yu ◽  
...  

State-of-the-art deep learning based stereo matching approaches treat disparity estimation as a regression problem, where loss function is directly defined on true disparities and their estimated ones. However, disparity is just a byproduct of a matching process modeled by cost volume, while indirectly learning cost volume driven by disparity regression is prone to overfitting since the cost volume is under constrained. In this paper, we propose to directly add constraints to the cost volume by filtering cost volume with unimodal distribution peaked at true disparities. In addition, variances of the unimodal distributions for each pixel are estimated to explicitly model matching uncertainty under different contexts. The proposed architecture achieves state-of-the-art performance on Scene Flow and two KITTI stereo benchmarks. In particular, our method ranked the 1st place of KITTI 2012 evaluation and the 4th place of KITTI 2015 evaluation (recorded on 2019.8.20). The codes of AcfNet are available at: https://github.com/youmi-zym/AcfNet.

2021 ◽  
Vol 11 (15) ◽  
pp. 7046
Author(s):  
Jorge Francisco Ciprián-Sánchez ◽  
Gilberto Ochoa-Ruiz ◽  
Lucile Rossi ◽  
Frédéric Morandini

Wildfires stand as one of the most relevant natural disasters worldwide, particularly more so due to the effect of climate change and its impact on various societal and environmental levels. In this regard, a significant amount of research has been done in order to address this issue, deploying a wide variety of technologies and following a multi-disciplinary approach. Notably, computer vision has played a fundamental role in this regard. It can be used to extract and combine information from several imaging modalities in regard to fire detection, characterization and wildfire spread forecasting. In recent years, there has been work pertaining to Deep Learning (DL)-based fire segmentation, showing very promising results. However, it is currently unclear whether the architecture of a model, its loss function, or the image type employed (visible, infrared, or fused) has the most impact on the fire segmentation results. In the present work, we evaluate different combinations of state-of-the-art (SOTA) DL architectures, loss functions, and types of images to identify the parameters most relevant to improve the segmentation results. We benchmark them to identify the top-performing ones and compare them to traditional fire segmentation techniques. Finally, we evaluate if the addition of attention modules on the best performing architecture can further improve the segmentation results. To the best of our knowledge, this is the first work that evaluates the impact of the architecture, loss function, and image type in the performance of DL-based wildfire segmentation models.


2020 ◽  
Vol 12 (24) ◽  
pp. 4025
Author(s):  
Rongshu Tao ◽  
Yuming Xiang ◽  
Hongjian You

As an essential step in 3D reconstruction, stereo matching still faces unignorable problems due to the high resolution and complex structures of remote sensing images. Especially in occluded areas of tall buildings and textureless areas of waters and woods, precise disparity estimation has become a difficult but important task. In this paper, we develop a novel edge-sense bidirectional pyramid stereo matching network to solve the aforementioned problems. The cost volume is constructed from negative to positive disparities since the disparity range in remote sensing images varies greatly and traditional deep learning networks only work well for positive disparities. Then, the occlusion-aware maps based on the forward-backward consistency assumption are applied to reduce the influence of the occluded area. Moreover, we design an edge-sense smoothness loss to improve the performance of textureless areas while maintaining the main structure. The proposed network is compared with two baselines. The experimental results show that our proposed method outperforms two methods, DenseMapNet and PSMNet, in terms of averaged endpoint error (EPE) and the fraction of erroneous pixels (D1), and the improvements in occluded and textureless areas are significant.


Electronics ◽  
2020 ◽  
Vol 9 (6) ◽  
pp. 924 ◽  
Author(s):  
Zhao Pei ◽  
Deqiang Wen ◽  
Yanning Zhang ◽  
Miao Ma ◽  
Min Guo ◽  
...  

In recent years, disparity estimation of a scene based on deep learning methods has been extensively studied and significant progress has been made. In contrast, a traditional image disparity estimation method requires considerable resources and consumes much time in processes such as stereo matching and 3D reconstruction. At present, most deep learning based disparity estimation methods focus on estimating disparity based on monocular images. Motivated by the results of traditional methods that multi-view methods are more accurate than monocular methods, especially for scenes that are textureless and have thin structures, in this paper, we present MDEAN, a new deep convolutional neural network to estimate disparity using multi-view images with an asymmetric encoder–decoder network structure. First, our method takes an arbitrary number of multi-view images as input. Next, we use these images to produce a set of plane-sweep cost volumes, which are combined to compute a high quality disparity map using an end-to-end asymmetric network. The results show that our method performs better than state-of-the-art methods, in particular, for outdoor scenes with the sky, flat surfaces and buildings.


Author(s):  
Kaiyuan Wu ◽  
Zhiming Zheng ◽  
Shaoting Tang

In this paper, we propose a powerful weak learner (Vector Decision Tree (VDT)) and a new Boosted Vector Decision Tree (BVDT) algorithm framework for the task of multi-class classification. Unlike the traditional scalar valued boosting algorithms, the BVDT algorithm directly maps the feature space to the decision space in the multi-class setting, which facilitates convenient implementations of the multi-class classification algorithms using diverse loss functions. By viewing the explicit hard threshold on the leaf node value applied in the LogitBoost as a constraint optimization problem, we further develop two new variants of the BVDT algorithm: the [Formula: see text]-BVDT and the [Formula: see text]-BVDT. The performance of the proposed algorithm is evaluated on different datasets and compared with three state-of-the-art boosting algorithms, [Formula: see text]-Nearest Neighbor (KNN) and Support Vector Machine (SVM). The results show that the performance of the proposed algorithm ranks first in all but one dataset and reduces the test error rate by 4% up to 58% with respect to the state-of-the-art boosting algorithms based on the scalar-valued weak learner. Furthermore, we present a case study on the Abalone dataset by designing a new loss function that combines the negative log-likelihood loss function of classification problem and square loss function of regression problem.


2019 ◽  
Vol 19 (3) ◽  
pp. 693-720 ◽  
Author(s):  
Ferenc Attila Somogyi ◽  
Mark Asztalos

Abstract In model-driven methodologies, model matching is the process of finding a matching pair for every model element between two or more software models. Model matching is an important task as it is often used while differencing and merging models, which are key processes in version control systems. There are a number of different approaches to model matching, with most of them focusing on different goals, i.e., the accuracy of the matching process, or the generality of the algorithm. Moreover, there exist algorithms that use the textual representations of the models during the matching process. We present a systematic literature review that was carried out to obtain the state-of-the-art of model matching techniques. The search process was conducted based on a well-defined methodology. We have identified a total of 3274 non-duplicate studies, out of which 119 have been included as primary studies for this survey. We present the state-of-the-art of model matching, highlighting the differences between different matching techniques, mainly focusing on text-based and graph-based algorithms. Finally, the main open questions, challenges, and possible future directions in the field of model matching are discussed, also including topics like benchmarking, performance and scalability, and conflict handling.


2011 ◽  
Vol 10 (3) ◽  
pp. 65-72
Author(s):  
Shujun Zhang ◽  
Jianbo Zhang ◽  
Yun Liu

Current methods to solve the problem of binocular stereo matching can be divided into two categories: sparse points based methods and dense points based methods. However, both of them have different shortcomings and limitations. There is no perfect method to solve the disparity problem. Dense points based techniques relatively obtain more accurate results but with higher computation. A large number of window-based adaptive corres-pondence techniques have emerged in recent years. In order to solve the problem of high time complexity and large amount of calculation in matching process, we propose a new window-based correspondence search algorithm using mean shift and disparity estimation. Mean shift can aggregate the same or similar colors so it can be applied to pre-process the source images to reduce their dynamic color range. Disparity estimation is conducted on the pre-processed two images to compute disparities of uniform texture regions. Adaptive window matching through similarity computation and window-based support aggregation is finally executed and exact depth map is obtained. Experimental results show that our algorithm is more efficient and keeps smooth dis-parity better than the prior window method


2020 ◽  
Vol 34 (07) ◽  
pp. 12508-12515
Author(s):  
Qingshan Xu ◽  
Wenbing Tao

Deep learning has shown to be effective for depth inference in multi-view stereo (MVS). However, the scalability and accuracy still remain an open problem in this domain. This can be attributed to the memory-consuming cost volume representation and inappropriate depth inference. Inspired by the group-wise correlation in stereo matching, we propose an average group-wise correlation similarity measure to construct a lightweight cost volume. This can not only reduce the memory consumption but also reduce the computational burden in the cost volume filtering. Based on our effective cost volume representation, we propose a cascade 3D U-Net module to regularize the cost volume to further boost the performance. Unlike the previous methods that treat multi-view depth inference as a depth regression problem or an inverse depth classification problem, we recast multi-view depth inference as an inverse depth regression task. This allows our network to achieve sub-pixel estimation and be applicable to large-scale scenes. Through extensive experiments on DTU dataset and Tanks and Temples dataset, we show that our proposed network with Correlation cost volume and Inverse DEpth Regression (CIDER1), achieves state-of-the-art results, demonstrating its superior performance on scalability and accuracy.


Algorithms ◽  
2019 ◽  
Vol 12 (8) ◽  
pp. 154 ◽  
Author(s):  
Mário P. Véstias

The convolutional neural network (CNN) is one of the most used deep learning models for image detection and classification, due to its high accuracy when compared to other machine learning algorithms. CNNs achieve better results at the cost of higher computing and memory requirements. Inference of convolutional neural networks is therefore usually done in centralized high-performance platforms. However, many applications based on CNNs are migrating to edge devices near the source of data due to the unreliability of a transmission channel in exchanging data with a central server, the uncertainty about channel latency not tolerated by many applications, security and data privacy, etc. While advantageous, deep learning on edge is quite challenging because edge devices are usually limited in terms of performance, cost, and energy. Reconfigurable computing is being considered for inference on edge due to its high performance and energy efficiency while keeping a high hardware flexibility that allows for the easy adaption of the target computing platform to the CNN model. In this paper, we described the features of the most common CNNs, the capabilities of reconfigurable computing for running CNNs, the state-of-the-art of reconfigurable computing implementations proposed to run CNN models, as well as the trends and challenges for future edge reconfigurable platforms.


Sensors ◽  
2021 ◽  
Vol 21 (18) ◽  
pp. 6016
Author(s):  
Ming Wei ◽  
Ming Zhu ◽  
Yi Wu ◽  
Jiaqi Sun ◽  
Jiarong Wang ◽  
...  

Stereo matching networks based on deep learning are widely developed and can obtain excellent disparity estimation. We present a new end-to-end fast deep learning stereo matching network in this work that aims to determine the corresponding disparity from two stereo image pairs. We extract the characteristics of the low-resolution feature images using the stacked hourglass structure feature extractor and build a multi-level detailed cost volume. We also use the edge of the left image to guide disparity optimization and sub-sample with the low-resolution data, ensuring excellent accuracy and speed at the same time. Furthermore, we design a multi-cross attention model for binocular stereo matching to improve the matching accuracy and achieve end-to-end disparity regression effectively. We evaluate our network on Scene Flow, KITTI2012, and KITTI2015 datasets, and the experimental results show that the speed and accuracy of our method are excellent.


2020 ◽  
Author(s):  
Dean Sumner ◽  
Jiazhen He ◽  
Amol Thakkar ◽  
Ola Engkvist ◽  
Esben Jannik Bjerrum

<p>SMILES randomization, a form of data augmentation, has previously been shown to increase the performance of deep learning models compared to non-augmented baselines. Here, we propose a novel data augmentation method we call “Levenshtein augmentation” which considers local SMILES sub-sequence similarity between reactants and their respective products when creating training pairs. The performance of Levenshtein augmentation was tested using two state of the art models - transformer and sequence-to-sequence based recurrent neural networks with attention. Levenshtein augmentation demonstrated an increase performance over non-augmented, and conventionally SMILES randomization augmented data when used for training of baseline models. Furthermore, Levenshtein augmentation seemingly results in what we define as <i>attentional gain </i>– an enhancement in the pattern recognition capabilities of the underlying network to molecular motifs.</p>


Sign in / Sign up

Export Citation Format

Share Document