scholarly journals Relative depth estimation from single monocular images with deep convolutional network

2017 ◽  
Author(s):  
◽  
Alex Yang

Depth estimation from single monocular images is a theoretical challenge in computer vision as well as a computational challenge in practice. This thesis addresses the problem of depth estimation from single monocular images using a deep convolutional neural fields framework; which consists of convolutional feature extraction, superpixel dimensionality reduction, and depth inference. Data were collected using a stereo vision camera, which generated depth maps though triangulation that are paired with visual images. The visual image (input) and computed depth map (desired output) are used to train the model, which has achieved 83 percent test accuracy at the standard 25 percent tolerance. The problem has been formulated as depth regression for superpixels and our technique is superior to existing state-of-the-art approaches based on its demonstrated its generalization ability, high prediction accuracy, and real-time processing capability. We utilize the VGG-16 deep convolutional network as feature extractor and conditional random fields depth inference. We have leveraged a multi-phase training protocol that includes transfer learning and network fine-tuning lead to high performance accuracy. Our framework has a robust modular nature with capability of replacing each component with different implementations for maximum extensibility. Additionally, our GPU-accelerated implementation of superpixel pooling has further facilitated this extensibility by allowing incorporation of feature tensors with exible shapes and has provided both space and time optimization. Based on our novel contributions and high-performance computing methodologies, the model achieves a minimal and optimized design. It is capable of operating at 30 fps; which is a critical step towards empowering real-world applications such as autonomous vehicle with passive relative depth perception using single camera vision-based obstacle avoidance, environment mapping, etc.

Author(s):  
Rama A ◽  
Kumaravel A ◽  
Nalini C

Implementing image processing tools demands its components produce better results in critical applications like medical image classification. TensorFlow is one open source with a machine learning framework for high performance and operates in heterogeneous environments. It heralds broad attention at a fine tuning of parameters for obtaining the final models, to obtain better performance. The main aim of this article is to prove the appropriate steps for the classification techniques for diagnosing the diseases with better accuracy. The proposed convolutional network is comprised of three convolutional layers, preceded by average pooling with a size equal to the size of the final feature maps. The final layer in this network has two outputs, corresponding to the number of classes considered to be either normal or abnormal. To train and evaluate such networks like the Deep Convolutional Neural Network (DCNN), a dataset of 2000 x-ray images of lungs was used and a comparative analysis between the proposed DCNN against previous methods is also made.


Author(s):  
Ramaprasad Poojary ◽  
Roma Raina ◽  
Amit Kumar Mondal

<span id="docs-internal-guid-cdb76bbb-7fff-978d-961c-e21c41807064"><span>During the last few years, deep learning achieved remarkable results in the field of machine learning when used for computer vision tasks. Among many of its architectures, deep neural network-based architecture known as convolutional neural networks are recently used widely for image detection and classification. Although it is a great tool for computer vision tasks, it demands a large amount of training data to yield high performance. In this paper, the data augmentation method is proposed to overcome the challenges faced due to a lack of insufficient training data. To analyze the effect of data augmentation, the proposed method uses two convolutional neural network architectures. To minimize the training time without compromising accuracy, models are built by fine-tuning pre-trained networks VGG16 and ResNet50. To evaluate the performance of the models, loss functions and accuracies are used. Proposed models are constructed using Keras deep learning framework and models are trained on a custom dataset created from Kaggle CAT vs DOG database. Experimental results showed that both the models achieved better test accuracy when data augmentation is employed, and model constructed using ResNet50 outperformed VGG16 based model with a test accuracy of 90% with data augmentation &amp; 82% without data augmentation.</span></span>


Sensors ◽  
2019 ◽  
Vol 19 (22) ◽  
pp. 4850 ◽  
Author(s):  
Carlos S. Pereira ◽  
Raul Morais ◽  
Manuel J. C. S. Reis

Frequently, the vineyards in the Douro Region present multiple grape varieties per parcel and even per row. An automatic algorithm for grape variety identification as an integrated software component was proposed that can be applied, for example, to a robotic harvesting system. However, some issues and constraints in its development were highlighted, namely, the images captured in natural environment, low volume of images, high similarity of the images among different grape varieties, leaf senescence, and significant changes on the grapevine leaf and bunch images in the harvest seasons, mainly due to adverse climatic conditions, diseases, and the presence of pesticides. In this paper, the performance of the transfer learning and fine-tuning techniques based on AlexNet architecture were evaluated when applied to the identification of grape varieties. Two natural vineyard image datasets were captured in different geographical locations and harvest seasons. To generate different datasets for training and classification, some image processing methods, including a proposed four-corners-in-one image warping algorithm, were used. The experimental results, obtained from the application of an AlexNet-based transfer learning scheme and trained on the image dataset pre-processed through the four-corners-in-one method, achieved a test accuracy score of 77.30%. Applying this classifier model, an accuracy of 89.75% on the popular Flavia leaf dataset was reached. The results obtained by the proposed approach are promising and encouraging in helping Douro wine growers in the automatic task of identifying grape varieties.


2021 ◽  
Vol 11 (4) ◽  
pp. 1428
Author(s):  
Haopeng Wu ◽  
Zhiying Lu ◽  
Jianfeng Zhang ◽  
Xin Li ◽  
Mingyue Zhao ◽  
...  

This paper addresses the problem of Facial Expression Recognition (FER), focusing on unobvious facial movements. Traditional methods often cause overfitting problems or incomplete information due to insufficient data and manual selection of features. Instead, our proposed network, which is called the Multi-features Cooperative Deep Convolutional Network (MC-DCN), maintains focus on the overall feature of the face and the trend of key parts. The processing of video data is the first stage. The method of ensemble of regression trees (ERT) is used to obtain the overall contour of the face. Then, the attention model is used to pick up the parts of face that are more susceptible to expressions. Under the combined effect of these two methods, the image which can be called a local feature map is obtained. After that, the video data are sent to MC-DCN, containing parallel sub-networks. While the overall spatiotemporal characteristics of facial expressions are obtained through the sequence of images, the selection of keys parts can better learn the changes in facial expressions brought about by subtle facial movements. By combining local features and global features, the proposed method can acquire more information, leading to better performance. The experimental results show that MC-DCN can achieve recognition rates of 95%, 78.6% and 78.3% on the three datasets SAVEE, MMI, and edited GEMEP, respectively.


2013 ◽  
Vol 2013 ◽  
pp. 1-5 ◽  
Author(s):  
Guo Liu ◽  
Liang Xu ◽  
Yi Wang

A novel high-performance circularly polarized (CP) antenna is proposed in this paper. Two separate antennas featuring the global positioning system (GPS) dual-band operation (1.575 GHz and 1.227 GHz for L1 band and L2 band, resp.) are integrated with good isolation. To enhance the gain at low angle, a new structure of patch and two parasitic metal elements are introduced. With the optimized design, good axial ratio and near-hemispherical radiation pattern are obtained.


Sign in / Sign up

Export Citation Format

Share Document