Relative depth estimation from single monocular images with deep convolutional network

Mapping Intimacies ◽

10.32469/10355/63579 ◽

2017 ◽

Author(s):

◽

Alex Yang

Keyword(s):

High Performance ◽

Depth Estimation ◽

Fine Tuning ◽

Test Accuracy ◽

Relative Depth ◽

Optimized Design ◽

Convolutional Network ◽

Real Time Processing ◽

Deep Convolutional Network ◽

Depth Inference

Depth estimation from single monocular images is a theoretical challenge in computer vision as well as a computational challenge in practice. This thesis addresses the problem of depth estimation from single monocular images using a deep convolutional neural fields framework; which consists of convolutional feature extraction, superpixel dimensionality reduction, and depth inference. Data were collected using a stereo vision camera, which generated depth maps though triangulation that are paired with visual images. The visual image (input) and computed depth map (desired output) are used to train the model, which has achieved 83 percent test accuracy at the standard 25 percent tolerance. The problem has been formulated as depth regression for superpixels and our technique is superior to existing state-of-the-art approaches based on its demonstrated its generalization ability, high prediction accuracy, and real-time processing capability. We utilize the VGG-16 deep convolutional network as feature extractor and conditional random fields depth inference. We have leveraged a multi-phase training protocol that includes transfer learning and network fine-tuning lead to high performance accuracy. Our framework has a robust modular nature with capability of replacing each component with different implementations for maximum extensibility. Additionally, our GPU-accelerated implementation of superpixel pooling has further facilitated this extensibility by allowing incorporation of feature tensors with exible shapes and has provided both space and time optimization. Based on our novel contributions and high-performance computing methodologies, the model achieves a minimal and optimized design. It is capable of operating at 30 fps; which is a critical step towards empowering real-world applications such as autonomous vehicle with passive relative depth perception using single camera vision-based obstacle avoidance, environment mapping, etc.

Download Full-text

Food image recognition using deep convolutional network with pre-training and fine-tuning

2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) ◽

10.1109/icmew.2015.7169816 ◽

2015 ◽

Cited By ~ 90

Author(s):

Keiji Yanai ◽

Yoshiyuki Kawano

Keyword(s):

Image Recognition ◽

Fine Tuning ◽

Convolutional Network ◽

Food Image ◽

Deep Convolutional Network

Download Full-text

Depth Estimation and Object Detection for Monocular Semantic SLAM Using Deep Convolutional Network

2020 IEEE 20th International Conference on Software Quality, Reliability and Security Companion (QRS-C) ◽

10.1109/qrs-c51114.2020.00051 ◽

2020 ◽

Author(s):

Changbo Hou ◽

Xuejiao Zhao ◽

Yun Lin

Keyword(s):

Object Detection ◽

Depth Estimation ◽

Convolutional Network ◽

Deep Convolutional Network

Download Full-text

Construction of Deep Convolutional Neural Networks For Medical Image Classification

International Journal of Computer Vision and Image Processing ◽

10.4018/ijcvip.2019040101 ◽

2019 ◽

Vol 9 (2) ◽

pp. 1-15 ◽

Cited By ~ 1

Author(s):

Rama A ◽

Kumaravel A ◽

Nalini C

Keyword(s):

Image Classification ◽

Medical Image ◽

High Performance ◽

Fine Tuning ◽

Heterogeneous Environments ◽

Feature Maps ◽

Deep Convolutional Neural Networks ◽

Convolutional Network ◽

Final Layer ◽

Medical Image Classification

Implementing image processing tools demands its components produce better results in critical applications like medical image classification. TensorFlow is one open source with a machine learning framework for high performance and operates in heterogeneous environments. It heralds broad attention at a fine tuning of parameters for obtaining the final models, to obtain better performance. The main aim of this article is to prove the appropriate steps for the classification techniques for diagnosing the diseases with better accuracy. The proposed convolutional network is comprised of three convolutional layers, preceded by average pooling with a size equal to the size of the final feature maps. The final layer in this network has two outputs, corresponding to the number of classes considered to be either normal or abnormal. To train and evaluate such networks like the Deep Convolutional Neural Network (DCNN), a dataset of 2000 x-ray images of lungs was used and a comparative analysis between the proposed DCNN against previous methods is also made.

Download Full-text

Effect of data-augmentation on fine-tuned CNN model performance

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v10.i1.pp84-92 ◽

2021 ◽

Vol 10 (1) ◽

pp. 84

Author(s):

Ramaprasad Poojary ◽

Roma Raina ◽

Amit Kumar Mondal

Keyword(s):

Neural Network ◽

Computer Vision ◽

Deep Learning ◽

High Performance ◽

Data Augmentation ◽

Model Performance ◽

Training Data ◽

Fine Tuning ◽

Test Accuracy ◽

Training Time

<span id="docs-internal-guid-cdb76bbb-7fff-978d-961c-e21c41807064"><span>During the last few years, deep learning achieved remarkable results in the field of machine learning when used for computer vision tasks. Among many of its architectures, deep neural network-based architecture known as convolutional neural networks are recently used widely for image detection and classification. Although it is a great tool for computer vision tasks, it demands a large amount of training data to yield high performance. In this paper, the data augmentation method is proposed to overcome the challenges faced due to a lack of insufficient training data. To analyze the effect of data augmentation, the proposed method uses two convolutional neural network architectures. To minimize the training time without compromising accuracy, models are built by fine-tuning pre-trained networks VGG16 and ResNet50. To evaluate the performance of the models, loss functions and accuracies are used. Proposed models are constructed using Keras deep learning framework and models are trained on a custom dataset created from Kaggle CAT vs DOG database. Experimental results showed that both the models achieved better test accuracy when data augmentation is employed, and model constructed using ResNet50 outperformed VGG16 based model with a test accuracy of 90% with data augmentation & 82% without data augmentation.</span></span>

Download Full-text

Deep Learning Techniques for Grape Plant Species Identification in Natural Images

Sensors ◽

10.3390/s19224850 ◽

2019 ◽

Vol 19 (22) ◽

pp. 4850 ◽

Cited By ~ 6

Author(s):

Carlos S. Pereira ◽

Raul Morais ◽

Manuel J. C. S. Reis

Keyword(s):

Transfer Learning ◽

Climatic Conditions ◽

Fine Tuning ◽

Variety Identification ◽

Test Accuracy ◽

Accuracy Score ◽

Learning Techniques ◽

Four Corners ◽

Integrated Software ◽

Grape Varieties

Frequently, the vineyards in the Douro Region present multiple grape varieties per parcel and even per row. An automatic algorithm for grape variety identification as an integrated software component was proposed that can be applied, for example, to a robotic harvesting system. However, some issues and constraints in its development were highlighted, namely, the images captured in natural environment, low volume of images, high similarity of the images among different grape varieties, leaf senescence, and significant changes on the grapevine leaf and bunch images in the harvest seasons, mainly due to adverse climatic conditions, diseases, and the presence of pesticides. In this paper, the performance of the transfer learning and fine-tuning techniques based on AlexNet architecture were evaluated when applied to the identification of grape varieties. Two natural vineyard image datasets were captured in different geographical locations and harvest seasons. To generate different datasets for training and classification, some image processing methods, including a proposed four-corners-in-one image warping algorithm, were used. The experimental results, obtained from the application of an AlexNet-based transfer learning scheme and trained on the image dataset pre-processed through the four-corners-in-one method, achieved a test accuracy score of 77.30%. Applying this classifier model, an accuracy of 89.75% on the popular Flavia leaf dataset was reached. The results obtained by the proposed approach are promising and encouraging in helping Douro wine growers in the automatic task of identifying grape varieties.

Download Full-text

E-DiCoNet: Extreme learning machine based classifier for diagnosis of COVID-19 using deep convolutional network

Journal of Ambient Intelligence and Humanized Computing ◽

10.1007/s12652-020-02688-3 ◽

2021 ◽

Author(s):

R. Murugan ◽

Tripti Goel

Keyword(s):

Extreme Learning Machine ◽

Convolutional Network ◽

Deep Convolutional Network ◽

Learning Machine

Download Full-text

Image Classification Based On Deep Convolutional Network And Gaussian Aggregate Encoding

2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI) ◽

10.1109/ictai50040.2020.00089 ◽

2020 ◽

Author(s):

Fengge Wang ◽

Xiaolin Tian ◽

Yang Zhang ◽

Nan Jia ◽

Tiantian Lu

Keyword(s):

Image Classification ◽

Convolutional Network ◽

Deep Convolutional Network

Download Full-text

Facial Expression Recognition Based on Multi-Features Cooperative Deep Convolutional Network

Applied Sciences ◽

10.3390/app11041428 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1428

Author(s):

Haopeng Wu ◽

Zhiying Lu ◽

Jianfeng Zhang ◽

Xin Li ◽

Mingyue Zhao ◽

...

Keyword(s):

Facial Expression ◽

Facial Expressions ◽

Facial Expression Recognition ◽

Video Data ◽

Expression Recognition ◽

Convolutional Network ◽

Facial Movements ◽

The Face ◽

Deep Convolutional Network ◽

Selection Of

This paper addresses the problem of Facial Expression Recognition (FER), focusing on unobvious facial movements. Traditional methods often cause overfitting problems or incomplete information due to insufficient data and manual selection of features. Instead, our proposed network, which is called the Multi-features Cooperative Deep Convolutional Network (MC-DCN), maintains focus on the overall feature of the face and the trend of key parts. The processing of video data is the first stage. The method of ensemble of regression trees (ERT) is used to obtain the overall contour of the face. Then, the attention model is used to pick up the parts of face that are more susceptible to expressions. Under the combined effect of these two methods, the image which can be called a local feature map is obtained. After that, the video data are sent to MC-DCN, containing parallel sub-networks. While the overall spatiotemporal characteristics of facial expressions are obtained through the sequence of images, the selection of keys parts can better learn the changes in facial expressions brought about by subtle facial movements. By combining local features and global features, the proposed method can acquire more information, leading to better performance. The experimental results show that MC-DCN can achieve recognition rates of 95%, 78.6% and 78.3% on the three datasets SAVEE, MMI, and edited GEMEP, respectively.

Download Full-text

DeepTLR: A single deep convolutional network for detection and classification of traffic lights

2016 IEEE Intelligent Vehicles Symposium (IV) ◽

10.1109/ivs.2016.7535408 ◽

2016 ◽

Cited By ~ 34

Author(s):

Michael Weber ◽

Peter Wolf ◽

J. Marius Zollner

Keyword(s):

Convolutional Network ◽

Traffic Lights ◽

Deep Convolutional Network

Download Full-text

Modified Dual-Band Stacked Circularly Polarized Microstrip Antenna

International Journal of Antennas and Propagation ◽

10.1155/2013/382958 ◽

2013 ◽

Vol 2013 ◽

pp. 1-5 ◽

Cited By ~ 10

Author(s):

Guo Liu ◽

Liang Xu ◽

Yi Wang

Keyword(s):

Radiation Pattern ◽

Global Positioning System ◽

Microstrip Antenna ◽

High Performance ◽

Axial Ratio ◽

Dual Band ◽

Optimized Design ◽

Positioning System ◽

Circularly Polarized ◽

Global Positioning

A novel high-performance circularly polarized (CP) antenna is proposed in this paper. Two separate antennas featuring the global positioning system (GPS) dual-band operation (1.575 GHz and 1.227 GHz for L1 band and L2 band, resp.) are integrated with good isolation. To enhance the gain at low angle, a new structure of patch and two parasitic metal elements are introduced. With the optimized design, good axial ratio and near-hemispherical radiation pattern are obtained.

Download Full-text