Leaf Counting with Multi-Scale Convolutional Neural Network Features and Fisher Vector Coding

The number of leaves in maize plant is one of the key traits describing its growth conditions. It is directly related to plant development and leaf counts also give insight into changing plant development stages. Compared with the traditional solutions which need excessive human interventions, the methods of computer vision and machine learning are more efficient. However, leaf counting with computer vision remains a challenging problem. More and more researchers are trying to improve accuracy. To this end, an automated, deep learning based approach for counting leaves in maize plants is developed in this paper. A Convolution Neural Network(CNN) is used to extract leaf features. The CNN model in this paper is inspired by Google Inception Net V3, which using multi-scale convolution kernels in one convolution layer. To compress feature maps generated from some middle layers in CNN, the Fisher Vector (FV) is used to reduce redundant information. Finally, these encoded feature maps are used to regress the leaf numbers by using Random Forests. To boost the related research, a relatively single maize image dataset (Different growth stage with 2845 samples, which 80% for train and 20% for test) is constructed by our team. The proposed algorithm in single maize data set achieves Mean Square Error (MSE) of 0.32.

Download Full-text

MSST-Net: A Multi-Scale Adaptive Network for Building Extraction from Remote Sensing Images Based on Swin Transformer

Remote Sensing ◽

10.3390/rs13234743 ◽

2021 ◽

Vol 13 (23) ◽

pp. 4743

Author(s):

Wei Yuan ◽

Wenbo Xu

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Convolutional Neural Network ◽

Network Model ◽

Remote Sensing Images ◽

Feature Maps ◽

Global Features ◽

Adaptive Network ◽

Data Set ◽

Multi Scale

The segmentation of remote sensing images by deep learning technology is the main method for remote sensing image interpretation. However, the segmentation model based on a convolutional neural network cannot capture the global features very well. A transformer, whose self-attention mechanism can supply each pixel with a global feature, makes up for the deficiency of the convolutional neural network. Therefore, a multi-scale adaptive segmentation network model (MSST-Net) based on a Swin Transformer is proposed in this paper. Firstly, a Swin Transformer is used as the backbone to encode the input image. Then, the feature maps of different levels are decoded separately. Thirdly, the convolution is used for fusion, so that the network can automatically learn the weight of the decoding results of each level. Finally, we adjust the channels to obtain the final prediction map by using the convolution with a kernel of 1 × 1. By comparing this with other segmentation network models on a WHU building data set, the evaluation metrics, mIoU, F1-score and accuracy are all improved. The network model proposed in this paper is a multi-scale adaptive network model that pays more attention to the global features for remote sensing segmentation.

Download Full-text

Application of convolution neural network in medical image processing

Technology and Health Care ◽

10.3233/thc-202657 ◽

2020 ◽

pp. 1-11

Author(s):

Jie Liu ◽

Hongbo Zhao

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Network Structure ◽

Medical Image ◽

Sampling Methods ◽

Medical Image Processing ◽

Convolution Neural Network ◽

Data Set ◽

Convolution Kernels ◽

Neural Network Structure

BACKGROUND: Convolution neural network is often superior to other similar algorithms in image classification. Convolution layer and sub-sampling layer have the function of extracting sample features, and the feature of sharing weights greatly reduces the training parameters of the network. OBJECTIVE: This paper describes the improved convolution neural network structure, including convolution layer, sub-sampling layer and full connection layer. This paper also introduces five kinds of diseases and normal eye images reflected by the blood filament of the eyeball “yan.mat” data set, convenient to use MATLAB software for calculation. METHODSL: In this paper, we improve the structure of the classical LeNet-5 convolutional neural network, and design a network structure with different convolution kernels, different sub-sampling methods and different classifiers, and use this structure to solve the problem of ocular bloodstream disease recognition. RESULTS: The experimental results show that the improved convolutional neural network structure is ideal for the recognition of eye blood silk data set, which shows that the convolution neural network has the characteristics of strong classification and strong robustness. The improved structure can classify the diseases reflected by eyeball bloodstain well.

Download Full-text

A Multi-Scale Neural Network for Traffic Sign Detection Based on Pyramid Feature Maps

2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS) ◽

10.1109/hpcc/smartcity/dss.2019.00255 ◽

2019 ◽

Author(s):

Jia Liu ◽

Chongyang Zhang

Keyword(s):

Neural Network ◽

Feature Maps ◽

Traffic Sign ◽

Multi Scale ◽

Sign Detection ◽

Traffic Sign Detection

Download Full-text

Application Research of Deep Convolutional Neural Network in Computer Vision

Journal of Networking and Telecommunications ◽

10.18282/jnt.v2i2.886 ◽

2020 ◽

Vol 2 (2) ◽

pp. 23

Author(s):

Lei Wang

Keyword(s):

Neural Network ◽

Neural Networks ◽

Computer Vision ◽

Face Recognition ◽

Human Brain ◽

Image Classification ◽

Convolutional Neural Networks ◽

Convolution Neural Network ◽

Data Set ◽

Deep Convolution Neural Network

As an important research achievement in the field of brain like computing, deep convolution neural network has been widely used in many fields such as computer vision, natural language processing, information retrieval, speech recognition, semantic understanding and so on. It has set off a wave of neural network research in industry and academia and promoted the development of artificial intelligence. At present, the deep convolution neural network mainly simulates the complex hierarchical cognitive laws of the human brain by increasing the number of layers of the network, using a larger training data set, and improving the network structure or training learning algorithm of the existing neural network, so as to narrow the gap with the visual system of the human brain and enable the machine to acquire the capability of "abstract concepts". Deep convolution neural network has achieved great success in many computer vision tasks such as image classification, target detection, face recognition, pedestrian recognition, etc. Firstly, this paper reviews the development history of convolutional neural networks. Then, the working principle of the deep convolution neural network is analyzed in detail. Then, this paper mainly introduces the representative achievements of convolution neural network from the following two aspects, and shows the improvement effect of various technical methods on image classification accuracy through examples. From the aspect of adding network layers, the structures of classical convolutional neural networks such as AlexNet, ZF-Net, VGG, GoogLeNet and ResNet are discussed and analyzed. From the aspect of increasing the size of data set, the difficulties of manually adding labeled samples and the effect of using data amplification technology on improving the performance of neural network are introduced. This paper focuses on the latest research progress of convolution neural network in image classification and face recognition. Finally, the problems and challenges to be solved in future brain-like intelligence research based on deep convolution neural network are proposed.

Download Full-text

Attention-Guided Multi-Scale Segmentation Neural Network for Interactive Extraction of Region Objects from High-Resolution Satellite Imagery

Remote Sensing ◽

10.3390/rs12050789 ◽

2020 ◽

Vol 12 (5) ◽

pp. 789 ◽

Cited By ~ 1

Author(s):

Kun Li ◽

Xiangyun Hu ◽

Huiwei Jiang ◽

Zhen Shu ◽

Mi Zhang

Keyword(s):

Neural Network ◽

High Resolution ◽

Satellite Imagery ◽

Conditional Random Field ◽

Geodesic Distance ◽

Feature Maps ◽

Object Boundary ◽

Multi Scale ◽

High Resolution Satellite Imagery ◽

Fully Connected

Automatic extraction of region objects from high-resolution satellite imagery presents a great challenge, because there may be very large variations of the objects in terms of their size, texture, shape, and contextual complexity in the image. To handle these issues, we present a novel, deep-learning-based approach to interactively extract non-artificial region objects, such as water bodies, woodland, farmland, etc., from high-resolution satellite imagery. First, our algorithm transforms user-provided positive and negative clicks or scribbles into guidance maps, which consist of a relevance map modified from Euclidean distance maps, two geodesic distance maps (for positive and negative, respectively), and a sampling map. Then, feature maps are extracted by applying a VGG convolutional neural network pre-trained on the ImageNet dataset to the image X, and they are then upsampled to the resolution of X. Image X, guidance maps, and feature maps are integrated as the input tensor. We feed the proposed attention-guided, multi-scale segmentation neural network (AGMSSeg-Net) with the input tensor above to obtain the mask that assigns a binary label to each pixel. After a post-processing operation based on a fully connected Conditional Random Field (CRF), we extract the selected object boundary from the segmentation result. Experiments were conducted on two typical datasets with diverse region object types from complex scenes. The results demonstrate the effectiveness of the proposed method, and our approach outperforms existing methods for interactive image segmentation.

Download Full-text

Pedestrian Detection at Night in Infrared Images Using an Attention-Guided Encoder-Decoder Convolutional Neural Network

Applied Sciences ◽

10.3390/app10030809 ◽

2020 ◽

Vol 10 (3) ◽

pp. 809 ◽

Cited By ~ 4

Author(s):

Yunfan Chen ◽

Hyunchul Shin

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Pedestrian Detection ◽

Weather Conditions ◽

Superior Performance ◽

Low Resolution ◽

Feature Maps ◽

Camera System ◽

Ir Camera ◽

Multi Scale

Pedestrian-related accidents are much more likely to occur during nighttime when visible (VI) cameras are much less effective. Unlike VI cameras, infrared (IR) cameras can work in total darkness. However, IR images have several drawbacks, such as low-resolution, noise, and thermal energy characteristics that can differ depending on the weather. To overcome these drawbacks, we propose an IR camera system to identify pedestrians at night that uses a novel attention-guided encoder-decoder convolutional neural network (AED-CNN). In AED-CNN, encoder-decoder modules are introduced to generate multi-scale features, in which new skip connection blocks are incorporated into the decoder to combine the feature maps from the encoder and decoder module. This new architecture increases context information which is helpful for extracting discriminative features from low-resolution and noisy IR images. Furthermore, we propose an attention module to re-weight the multi-scale features generated by the encoder-decoder module. The attention mechanism effectively highlights pedestrians while eliminating background interference, which helps to detect pedestrians under various weather conditions. Empirical experiments on two challenging datasets fully demonstrate that our method shows superior performance. Our approach significantly improves the precision of the state-of-the-art method by 5.1% and 23.78% on the Keimyung University (KMU) and Computer Vision Center (CVC)-09 pedestrian dataset, respectively.

Download Full-text

VP-Detector: A 3D convolutional neural network for automated macromolecule localization and classification in cryo-electron tomograms

10.1101/2021.05.25.443703 ◽

2021 ◽

Author(s):

Yu Hao ◽

Biao Zhang ◽

Xiaohua Wan ◽

Rui Yan ◽

Zhiyong Liu ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Electron Tomography ◽

Class Imbalance ◽

Feature Maps ◽

Particle Detection ◽

Multi Scale ◽

Accurate Performance ◽

Fully Automatic

Motivation: Cryo-electron tomography (Cryo-ET) with sub-tomogram averaging (STA) is indispensable when studying macromolecule structures and functions in their native environments. However, current tomographic reconstructions suffer the low signal-to-noise (SNR) ratio and the missing wedge artifacts. Hence, automatic and accurate macromolecule localization and classification become the bottleneck problem for structural determination by STA. Here, we propose a 3D multi-scale dense convolutional neural network (MSDNet) for voxel-wise annotations of tomograms. Weighted focal loss is adopted as a loss function to solve the class imbalance. The proposed network combines 3D hybrid dilated convolutions (HDC) and dense connectivity to ensure an accurate performance with relatively few trainable parameters. 3D HDC expands the receptive field without losing resolution or learning extra parameters. Dense connectivity facilitates the re-use of feature maps to generate fewer intermediate feature maps and trainable parameters. Then, we design a 3D MSDNet based approach for fully automatic macromolecule localization and classification, called VP-Detector (Voxel-wise Particle Detector). VP-Detector is efficient because classification performs on the pre-calculated coordinates instead of a sliding window. Results: We evaluated the VP-Detector on simulated tomograms. Compared to the state-of-the-art methods, our method achieved a competitive performance on localization with the highest F1-score. We also demonstrated that the weighted focal loss improves the classification of hard classes. We trained the network on a part of training sets to prove the availability of training on relatively small datasets. Moreover, the experiment shows that VP-Detector has a fast particle detection speed, which costs less than 14 minutes on a test tomogram.

Download Full-text

Adaptive Multi-Scale Feature Fusion Based Residual U-net for Fracture Segmentation in Coal Rock Images

10.21203/rs.2.23959/v2 ◽

2020 ◽

Author(s):

Fengli Lu ◽

Chengcai Fu ◽

Guoying Zhang ◽

Jie Shi

Keyword(s):

Spatial Information ◽

Feature Fusion ◽

Ct Images ◽

Published Data ◽

Rock Fractures ◽

Feature Maps ◽

Data Set ◽

Scale Feature ◽

Multi Scale ◽

Coal Rock

Abstract Accurate segmentation of fractures in coal rock CT images is important for safe production and the development of coalbed methane. However, the coal rock fractures formed through natural geological evolution, which are complex, low contrast and different scales. Furthermore, there is no published data set of coal rock. In this paper, we proposed adaptive multi-scale feature fusion based residual U-uet (AMSFFR-U-uet) for fracture segmentation in coal rock CT images. The dilated residual blocks (DResBlock) with dilated ratio (1,2,3) are embedded into encoding branch of the U-uet structure, which can improve the ability of extract feature of network and capture different scales fractures. Furthermore, feature maps of different sizes in the encoding branch are concatenated by adaptive multi-scale feature fusion (AMSFF) module. And AMSFF can not only capture different scales fractures but also improve the restoration of spatial information. To alleviate the lack of coal rock fractures training data, we applied a set of comprehensive data augmentation operations to increase the diversity of training samples. Our network, U-net and Res-U-net are tested on our test set of coal rock CT images with five different region coal rock samples. The experimental results show that our proposed approach improve the average Dice coefficient by 2.9%, the average precision by 7.2% and the average Recall by 9.1% , respectively. Therefore, AMSFFR-U-net can achieve better segmentation results of coal rock fractures, and has stronger generalization ability and robustness.

Download Full-text

SEMANTIC SEGMENTATION OF AERIAL IMAGERY VIA MULTI-SCALE SHUFFLING CONVOLUTIONAL NEURAL NETWORKS WITH DEEP SUPERVISION

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-iv-1-29-2018 ◽

2018 ◽

Vol IV-1 ◽

pp. 29-36 ◽

Cited By ~ 4

Author(s):

K. Chen ◽

M. Weinmann ◽

X. Sun ◽

M. Yan ◽

S. Hinz ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Semantic Segmentation ◽

Aerial Imagery ◽

Geometric Features ◽

Feature Maps ◽

Multi Scale ◽

Intermediate Layers ◽

Segmentation Task ◽

The Impact

Abstract. In this paper, we address the semantic segmentation of aerial imagery based on the use of multi-modal data given in the form of true orthophotos and the corresponding Digital Surface Models (DSMs). We present the Deeply-supervised Shuffling Convolutional Neural Network (DSCNN) representing a multi-scale extension of the Shuffling Convolutional Neural Network (SCNN) with deep supervision. Thereby, we take the advantage of the SCNN involving the shuffling operator to effectively upsample feature maps and then fuse multiscale features derived from the intermediate layers of the SCNN, which results in the Multi-scale Shuffling Convolutional Neural Network (MSCNN). Based on the MSCNN, we derive the DSCNN by introducing additional losses into the intermediate layers of the MSCNN. In addition, we investigate the impact of using different sets of hand-crafted radiometric and geometric features derived from the true orthophotos and the DSMs on the semantic segmentation task. For performance evaluation, we use a commonly used benchmark dataset. The achieved results reveal that both multi-scale fusion and deep supervision contribute to an improvement in performance. Furthermore, the use of a diversity of hand-crafted radiometric and geometric features as input for the DSCNN does not provide the best numerical results, but smoother and improved detections for several objects.

Download Full-text

MFNet algorithm based on indoor scene segmentation

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-212275 ◽

2021 ◽

pp. 1-10

Author(s):

Rui Cao ◽

Feng Jiang ◽

Zhao Wu ◽

Jia Ren

Keyword(s):

Deep Learning ◽

Learning Task ◽

Vital Role ◽

Scene Segmentation ◽

Feature Maps ◽

Real Time Processing ◽

Multi Scale ◽

Indoor Scene ◽

Convolution Kernels ◽

Hardware Platforms

With the advancement of computer performance, deep learning is playing a vital role on hardware platforms. Indoor scene segmentation is a challenging deep learning task because indoor objects tend to obscure each other, and the dense layout increases the difficulty of segmentation. Still, current networks pursue accuracy improvement, sacrifice speed, and augment memory resource usage. To solve this problem, achieve a compromise between accuracy, speed, and model size. This paper proposes Multichannel Fusion Network (MFNet) for indoor scene segmentation, which mainly consists of Dense Residual Module(DRM) and Multi-scale Feature Extraction Module(MFEM). MFEM uses depthwise separable convolution to cut the number of parameters, matches different sizes of convolution kernels and dilation rates to achieve optimal receptive field; DRM fuses feature maps at several levels of resolution to optimize segmentation details. Experimental results on the NYU V2 dataset show that the proposed method achieves very competitive results compared with other advanced algorithms, with a segmentation speed of 38.47 fps, nearly twice that of Deeplab v3+, but only 1/5 of the number of parameters of Deeplab v3 + . Its segmentation results were close to those of advanced segmentation networks, making it beneficial for the real-time processing of images.

Download Full-text