MFNet algorithm based on indoor scene segmentation

With the advancement of computer performance, deep learning is playing a vital role on hardware platforms. Indoor scene segmentation is a challenging deep learning task because indoor objects tend to obscure each other, and the dense layout increases the difficulty of segmentation. Still, current networks pursue accuracy improvement, sacrifice speed, and augment memory resource usage. To solve this problem, achieve a compromise between accuracy, speed, and model size. This paper proposes Multichannel Fusion Network (MFNet) for indoor scene segmentation, which mainly consists of Dense Residual Module(DRM) and Multi-scale Feature Extraction Module(MFEM). MFEM uses depthwise separable convolution to cut the number of parameters, matches different sizes of convolution kernels and dilation rates to achieve optimal receptive field; DRM fuses feature maps at several levels of resolution to optimize segmentation details. Experimental results on the NYU V2 dataset show that the proposed method achieves very competitive results compared with other advanced algorithms, with a segmentation speed of 38.47 fps, nearly twice that of Deeplab v3+, but only 1/5 of the number of parameters of Deeplab v3 + . Its segmentation results were close to those of advanced segmentation networks, making it beneficial for the real-time processing of images.

Download Full-text

Deep-Learning-Based Remaining Useful Life Prediction Based on a Multi-Scale Dilated Convolution Network

Mathematics ◽

10.3390/math9233035 ◽

2021 ◽

Vol 9 (23) ◽

pp. 3035

Author(s):

Feiyue Deng ◽

Yan Bi ◽

Yongqiang Liu ◽

Shaopu Yang

Keyword(s):

Deep Learning ◽

Rapid Development ◽

Operational Efficiency ◽

Remaining Useful Life ◽

Convolution Kernel ◽

Training Time ◽

Multi Scale ◽

Dilated Convolution ◽

Convolution Kernels ◽

Useful Life

Remaining useful life (RUL) prediction of key components is an important influencing factor in making accurate maintenance decisions for mechanical systems. With the rapid development of deep learning (DL) techniques, the research on RUL prediction based on the data-driven model is increasingly widespread. Compared with the conventional convolution neural networks (CNNs), the multi-scale CNNs can extract different-scale feature information, which exhibits a better performance in the RUL prediction. However, the existing multi-scale CNNs employ multiple convolution kernels with different sizes to construct the network framework. There are two main shortcomings of this approach: (1) the convolution operation based on multiple size convolution kernels requires enormous computation and has a low operational efficiency, which severely restricts its application in practical engineering. (2) The convolutional layer with a large size convolution kernel needs a mass of weight parameters, leading to a dramatic increase in the network training time and making it prone to overfitting in the case of small datasets. To address the above issues, a multi-scale dilated convolution network (MsDCN) is proposed for RUL prediction in this article. The MsDCN adopts a new multi-scale dilation convolution fusion unit (MsDCFU), in which the multi-scale network framework is composed of convolution operations with different dilated factors. This effectively expands the range of receptive field (RF) for the convolution kernel without an additional computational burden. Moreover, the MsDCFU employs the depthwise separable convolution (DSC) to further improve the operational efficiency of the prognostics model. Finally, the proposed method was validated with the accelerated degradation test data of rolling element bearings (REBs). The experimental results demonstrate that the proposed MSDCN has a higher RUL prediction accuracy compared to some typical CNNs and better operational efficiency than the existing multi-scale CNNs based on different convolution kernel sizes.

Download Full-text

Glassboxing Deep Learning to Enhance Aircraft Detection from SAR Imagery

Remote Sensing ◽

10.3390/rs13183650 ◽

2021 ◽

Vol 13 (18) ◽

pp. 3650

Author(s):

Ru Luo ◽

Jin Xing ◽

Lifu Chen ◽

Zhouhao Pan ◽

Xingmin Cai ◽

...

Keyword(s):

Deep Learning ◽

Feature Fusion ◽

Detection Performance ◽

Great Success ◽

Sar Image ◽

Feature Maps ◽

Multi Scale ◽

Learning Techniques ◽

Sar Imagery ◽

Aircraft Detection

Although deep learning has achieved great success in aircraft detection from SAR imagery, its blackbox behavior has been criticized for low comprehensibility and interpretability. Such challenges have impeded the trustworthiness and wide application of deep learning techniques in SAR image analytics. In this paper, we propose an innovative eXplainable Artificial Intelligence (XAI) framework to glassbox deep neural networks (DNN) by using aircraft detection as a case study. This framework is composed of three parts: hybrid global attribution mapping (HGAM) for backbone network selection, path aggregation network (PANet), and class-specific confidence scores mapping (CCSM) for visualization of the detector. HGAM integrates the local and global XAI techniques to evaluate the effectiveness of DNN feature extraction; PANet provides advanced feature fusion to generate multi-scale prediction feature maps; while CCSM relies on visualization methods to examine the detection performance with given DNN and input SAR images. This framework can select the optimal backbone DNN for aircraft detection and map the detection performance for better understanding of the DNN. We verify its effectiveness with experiments using Gaofen-3 imagery. Our XAI framework offers an explainable approach to design, develop, and deploy DNN for SAR image analytics.

Download Full-text

MS-AFF: A Novel Semantic Segmentation Approach for Buried Object Based on Multi-scale Attentional Feature Fusion

10.21203/rs.3.rs-193757/v1 ◽

2021 ◽

Author(s):

Chao Lu ◽

Fansheng Chen ◽

Xiaofeng Su ◽

Dan Zeng

Keyword(s):

Deep Learning ◽

Spatial Information ◽

Feature Fusion ◽

Infrared Image ◽

Semantic Segmentation ◽

Target Object ◽

Infrared Images ◽

Feature Maps ◽

Multi Scale ◽

Visible Images

Abstract Infrared technology is a widely used in precision guidance and mine detection since it can capture the heat radiated outward from the target object. We use infrared (IR) thermography to get the infrared image of the buried obje cts. Compared to the visible images, infrared images present poor resolution, low contrast, and fuzzy visual effect, which make it difficult to segment the target object, specifically in the complex backgrounds. In this condition, traditional segmentation methods cannot perform well in infrared images since they are easily disturbed by the noise and non-target objects in the images. With the advance of deep convolutional neural network (CNN), the deep learning-based methods have made significant improvements in semantic segmentation task. However, few of them research Infrared image semantic segmentation, which is a more challenging scenario compared to visible images. Moreover, the lack of an Infrared image dataset is also a problem for current methods based on deep learning. We raise a multi-scale attentional feature fusion (MS-AFF) module for infrared image semantic segmentation to solve this problem. Precisely, we integrate a series of feature maps from different levels by an atrous spatial pyramid structure. In this way, the model can obtain rich representation ability on the infrared images. Besides, a global spatial information attention module is employed to let the model focus on the target region and reduce disturbance in infrared images' background. In addition, we propose an infrared segmentation dataset based on the infrared thermal imaging system. Extensive experiments conducted in the infrared image segmentation dataset show the superiority of our method.

Download Full-text

Medical Image Classification Based On Normalized Coding Network with Multiscale Perception

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k2143.0881119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 2694-2697

Keyword(s):

Deep Learning ◽

Image Classification ◽

Medical Image ◽

Medical Images ◽

Vital Role ◽

Multi Scale ◽

Proposed Model ◽

Dataset Size ◽

High Resolution Images ◽

High Level

Medical imaging classification is playing a vital role in identifying and diagnoses the diseases, which is very helpful to doctor. Conventional ways classify supported the form, color, and/or texture, most of tiny problematic areas haven’t shown in medical images, which meant less efficient classification and that has poor ability to identify disease. Advanced deep learning algorithms provide an efficient way to construct a finished model that can compute final classification labels with the raw pixels of medical images. These conventional algorithms are not sufficient for high resolution images due to small dataset size, advanced deep learning models suffer from very high computational costs and limitations in the channels and multilayers in the channels. To overcome these limitations, we proposed a new algorithm Normalized Coding Network with Multi-scale Perceptron (NCNMP), which combines high-level features and traditional features. The Architecture of the proposed model includes three stages. Training, retrieve, fuse. We examined the proposed algorithm on medical image dataset NIH2626. We got an overall image classification accuracy of 91.35, which are greater than the present methods.

Download Full-text

Adaptive Weighted Multi-Level Fusion of Multi-Scale Features: A New Approach to Pedestrian Detection

Future Internet ◽

10.3390/fi13020038 ◽

2021 ◽

Vol 13 (2) ◽

pp. 38

Author(s):

Yao Xu ◽

Qin Yu

Keyword(s):

Deep Learning ◽

Feature Fusion ◽

Pedestrian Detection ◽

Feature Maps ◽

Scale Feature ◽

Multi Scale ◽

One Stage ◽

Current State ◽

Multi Level ◽

Feature Utilization

Great achievements have been made in pedestrian detection through deep learning. For detectors based on deep learning, making better use of features has become the key to their detection effect. While current pedestrian detectors have made efforts in feature utilization to improve their detection performance, the feature utilization is still inadequate. To solve the problem of inadequate feature utilization, we proposed the Multi-Level Feature Fusion Module (MFFM) and its Multi-Scale Feature Fusion Unit (MFFU) sub-module, which connect feature maps of the same scale and different scales by using horizontal and vertical connections and shortcut structures. All of these connections are accompanied by weights that can be learned; thus, they can be used as adaptive multi-level and multi-scale feature fusion modules to fuse the best features. Then, we built a complete pedestrian detector, the Adaptive Feature Fusion Detector (AFFDet), which is an anchor-free one-stage pedestrian detector that can make full use of features for detection. As a result, compared with other methods, our method has better performance on the challenging Caltech Pedestrian Detection Benchmark (Caltech) and has quite competitive speed. It is the current state-of-the-art one-stage pedestrian detection method.

Download Full-text

Leaf Counting with Multi-Scale Convolutional Neural Network Features and Fisher Vector Coding

Symmetry ◽

10.3390/sym11040516 ◽

2019 ◽

Vol 11 (4) ◽

pp. 516

Author(s):

Jiang ◽

Wang ◽

Zhuang ◽

Li ◽

...

Keyword(s):

Neural Network ◽

Computer Vision ◽

Plant Development ◽

Growth Conditions ◽

Feature Maps ◽

Data Set ◽

Vector Coding ◽

Multi Scale ◽

Fisher Vector ◽

Convolution Kernels

The number of leaves in maize plant is one of the key traits describing its growth conditions. It is directly related to plant development and leaf counts also give insight into changing plant development stages. Compared with the traditional solutions which need excessive human interventions, the methods of computer vision and machine learning are more efficient. However, leaf counting with computer vision remains a challenging problem. More and more researchers are trying to improve accuracy. To this end, an automated, deep learning based approach for counting leaves in maize plants is developed in this paper. A Convolution Neural Network(CNN) is used to extract leaf features. The CNN model in this paper is inspired by Google Inception Net V3, which using multi-scale convolution kernels in one convolution layer. To compress feature maps generated from some middle layers in CNN, the Fisher Vector (FV) is used to reduce redundant information. Finally, these encoded feature maps are used to regress the leaf numbers by using Random Forests. To boost the related research, a relatively single maize image dataset (Different growth stage with 2845 samples, which 80% for train and 20% for test) is constructed by our team. The proposed algorithm in single maize data set achieves Mean Square Error (MSE) of 0.32.

Download Full-text

A Phase Filtering Method with Scale Recurrent Networks for InSAR

Remote Sensing ◽

10.3390/rs12203453 ◽

2020 ◽

Vol 12 (20) ◽

pp. 3453

Author(s):

Liming Pu ◽

Xiaoling Zhang ◽

Zenan Zhou ◽

Jun Shi ◽

Shunjun Wei ◽

...

Keyword(s):

Deep Learning ◽

Structural Phase ◽

Input Image ◽

Deformation Monitoring ◽

Feature Maps ◽

Data Set ◽

Real Time Processing ◽

Practical Applications ◽

Filtering Method ◽

Interferometric Phase

Phase filtering is a key issue in interferometric synthetic aperture radar (InSAR) applications, such as deformation monitoring and topographic mapping. The accuracy of the deformation and terrain height is highly dependent on the quality of phase filtering. Researchers are committed to continuously improving the accuracy and efficiency of phase filtering. Inspired by the successful application of neural networks in SAR image denoising, in this paper we propose a phase filtering method that is based on deep learning to efficiently filter out the noise in the interferometric phase. In this method, the real and imaginary parts of the interferometric phase are filtered while using a scale recurrent network, which includes three single scale subnetworks based on the encoder-decoder architecture. The network can utilize the global structural phase information contained in the different-scaled feature maps, because RNN units are used to connect the three different-scaled subnetworks and transmit current state information among different subnetworks. The encoder part is used for extracting the phase features, and the decoder part restores detailed information from the encoded feature maps and makes the size of the output image the same as that of the input image. Experiments on simulated and real InSAR data prove that the proposed method is superior to three widely-used phase filtering methods by qualitative and quantitative comparisons. In addition, on the same simulated data set, the overall performance of the proposed method is better than another deep learning-based method (DeepInSAR). The runtime of the proposed method is only about 0.043s for an image with a size of 1024×1024 pixels, which has the significant advantage of computational efficiency in practical applications that require real-time processing.

Download Full-text

DeepTorrent: a deep learning-based approach for predicting DNA N4-methylcytosine sites

Briefings in Bioinformatics ◽

10.1093/bib/bbaa124 ◽

2020 ◽

Cited By ~ 4

Author(s):

Quanzhong Liu ◽

Jinxiang Chen ◽

Yanze Wang ◽

Shuqin Li ◽

Cangzhi Jia ◽

...

Keyword(s):

Deep Learning ◽

Dna Sequences ◽

Short Term Memory ◽

Epigenetic Modification ◽

Vital Role ◽

Feature Maps ◽

Feature Representations ◽

Learning Techniques ◽

Feature Encoding ◽

Long Short Term Memory

Abstract DNA N4-methylcytosine (4mC) is an important epigenetic modification that plays a vital role in regulating DNA replication and expression. However, it is challenging to detect 4mC sites through experimental methods, which are time-consuming and costly. Thus, computational tools that can identify 4mC sites would be very useful for understanding the mechanism of this important type of DNA modification. Several machine learning-based 4mC predictors have been proposed in the past 3 years, although their performance is unsatisfactory. Deep learning is a promising technique for the development of more accurate 4mC site predictions. In this work, we propose a deep learning-based approach, called DeepTorrent, for improved prediction of 4mC sites from DNA sequences. It combines four different feature encoding schemes to encode raw DNA sequences and employs multi-layer convolutional neural networks with an inception module integrated with bidirectional long short-term memory to effectively learn the higher-order feature representations. Dimension reduction and concatenated feature maps from the filters of different sizes are then applied to the inception module. In addition, an attention mechanism and transfer learning techniques are also employed to train the robust predictor. Extensive benchmarking experiments demonstrate that DeepTorrent significantly improves the performance of 4mC site prediction compared with several state-of-the-art methods.

Download Full-text

Skin Lesion Segmentation Using Deep Learning with Auxiliary Task

Journal of Imaging ◽

10.3390/jimaging7040067 ◽

2021 ◽

Vol 7 (4) ◽

pp. 67

Author(s):

Lina Liu ◽

Ying Y. Tsui ◽

Mrinal Mandal

Keyword(s):

Deep Learning ◽

Skin Lesion ◽

Auxiliary Information ◽

Classification Task ◽

Lesion Segmentation ◽

Feature Maps ◽

Learning Framework ◽

Multi Scale ◽

Task Learning ◽

Segmentation Task

Skin lesion segmentation is a primary step for skin lesion analysis, which can benefit the subsequent classification task. It is a challenging task since the boundaries of pigment regions may be fuzzy and the entire lesion may share a similar color. Prevalent deep learning methods for skin lesion segmentation make predictions by ensembling different convolutional neural networks (CNN), aggregating multi-scale information, or by multi-task learning framework. The main purpose of doing so is trying to make use of as much information as possible so as to make robust predictions. A multi-task learning framework has been proved to be beneficial for the skin lesion segmentation task, which is usually incorporated with the skin lesion classification task. However, multi-task learning requires extra labeling information which may not be available for the skin lesion images. In this paper, a novel CNN architecture using auxiliary information is proposed. Edge prediction, as an auxiliary task, is performed simultaneously with the segmentation task. A cross-connection layer module is proposed, where the intermediate feature maps of each task are fed into the subblocks of the other task which can implicitly guide the neural network to focus on the boundary region of the segmentation task. In addition, a multi-scale feature aggregation module is proposed, which makes use of features of different scales and enhances the performance of the proposed method. Experimental results show that the proposed method obtains a better performance compared with the state-of-the-art methods with a Jaccard Index (JA) of 79.46, Accuracy (ACC) of 94.32, SEN of 88.76 with only one integrated model, which can be learned in an end-to-end manner.

Download Full-text

Real Time Gender and Age Prediction using Deep Learning Techniques

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.e2906.049620 ◽

2020 ◽

Vol 9 (6) ◽

pp. 797-801

Keyword(s):

Deep Learning ◽

Face Recognition ◽

Real Time ◽

Vital Role ◽

Learning Techniques ◽

Gender And Age ◽

The Face ◽

And Gender ◽

Gender Detection ◽

Age Prediction

Face recognition plays a vital role in security purpose. In recent years, the researchers have focused on the pose illumination, face recognition, etc,. The traditional methods of face recognition focus on Open CV’s fisher faces which results in analyzing the face expressions and attributes. Deep learning method used in this proposed system is Convolutional Neural Network (CNN). Proposed work includes the following modules: [1] Face Detection [2] Gender Recognition [3] Age Prediction. Thus the results obtained from this work prove that real time age and gender detection using CNN provides better accuracy results compared to other existing approaches.

Download Full-text