Food image recognition using deep convolutional network with pre-training and fine-tuning

Relative depth estimation from single monocular images with deep convolutional network

10.32469/10355/63579 ◽

2017 ◽

Author(s):

◽

Alex Yang

Keyword(s):

High Performance ◽

Depth Estimation ◽

Fine Tuning ◽

Test Accuracy ◽

Relative Depth ◽

Optimized Design ◽

Convolutional Network ◽

Real Time Processing ◽

Deep Convolutional Network ◽

Depth Inference

Depth estimation from single monocular images is a theoretical challenge in computer vision as well as a computational challenge in practice. This thesis addresses the problem of depth estimation from single monocular images using a deep convolutional neural fields framework; which consists of convolutional feature extraction, superpixel dimensionality reduction, and depth inference. Data were collected using a stereo vision camera, which generated depth maps though triangulation that are paired with visual images. The visual image (input) and computed depth map (desired output) are used to train the model, which has achieved 83 percent test accuracy at the standard 25 percent tolerance. The problem has been formulated as depth regression for superpixels and our technique is superior to existing state-of-the-art approaches based on its demonstrated its generalization ability, high prediction accuracy, and real-time processing capability. We utilize the VGG-16 deep convolutional network as feature extractor and conditional random fields depth inference. We have leveraged a multi-phase training protocol that includes transfer learning and network fine-tuning lead to high performance accuracy. Our framework has a robust modular nature with capability of replacing each component with different implementations for maximum extensibility. Additionally, our GPU-accelerated implementation of superpixel pooling has further facilitated this extensibility by allowing incorporation of feature tensors with exible shapes and has provided both space and time optimization. Based on our novel contributions and high-performance computing methodologies, the model achieves a minimal and optimized design. It is capable of operating at 30 fps; which is a critical step towards empowering real-world applications such as autonomous vehicle with passive relative depth perception using single camera vision-based obstacle avoidance, environment mapping, etc.

Download Full-text

E-DiCoNet: Extreme learning machine based classifier for diagnosis of COVID-19 using deep convolutional network

Journal of Ambient Intelligence and Humanized Computing ◽

10.1007/s12652-020-02688-3 ◽

2021 ◽

Author(s):

R. Murugan ◽

Tripti Goel

Keyword(s):

Extreme Learning Machine ◽

Convolutional Network ◽

Deep Convolutional Network ◽

Learning Machine

Download Full-text

Image Classification Based On Deep Convolutional Network And Gaussian Aggregate Encoding

2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI) ◽

10.1109/ictai50040.2020.00089 ◽

2020 ◽

Author(s):

Fengge Wang ◽

Xiaolin Tian ◽

Yang Zhang ◽

Nan Jia ◽

Tiantian Lu

Keyword(s):

Image Classification ◽

Convolutional Network ◽

Deep Convolutional Network

Download Full-text

Facial Expression Recognition Based on Multi-Features Cooperative Deep Convolutional Network

Applied Sciences ◽

10.3390/app11041428 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1428

Author(s):

Haopeng Wu ◽

Zhiying Lu ◽

Jianfeng Zhang ◽

Xin Li ◽

Mingyue Zhao ◽

...

Keyword(s):

Facial Expression ◽

Facial Expressions ◽

Facial Expression Recognition ◽

Video Data ◽

Expression Recognition ◽

Convolutional Network ◽

Facial Movements ◽

The Face ◽

Deep Convolutional Network ◽

Selection Of

This paper addresses the problem of Facial Expression Recognition (FER), focusing on unobvious facial movements. Traditional methods often cause overfitting problems or incomplete information due to insufficient data and manual selection of features. Instead, our proposed network, which is called the Multi-features Cooperative Deep Convolutional Network (MC-DCN), maintains focus on the overall feature of the face and the trend of key parts. The processing of video data is the first stage. The method of ensemble of regression trees (ERT) is used to obtain the overall contour of the face. Then, the attention model is used to pick up the parts of face that are more susceptible to expressions. Under the combined effect of these two methods, the image which can be called a local feature map is obtained. After that, the video data are sent to MC-DCN, containing parallel sub-networks. While the overall spatiotemporal characteristics of facial expressions are obtained through the sequence of images, the selection of keys parts can better learn the changes in facial expressions brought about by subtle facial movements. By combining local features and global features, the proposed method can acquire more information, leading to better performance. The experimental results show that MC-DCN can achieve recognition rates of 95%, 78.6% and 78.3% on the three datasets SAVEE, MMI, and edited GEMEP, respectively.

Download Full-text