scholarly journals Research on a Convolution Kernel Initialization Method for Speeding Up the Convergence of CNN

2022 ◽  
Vol 12 (2) ◽  
pp. 633
Author(s):  
Chunyu Xu ◽  
Hong Wang

This paper presents a convolution kernel initialization method based on the local binary patterns (LBP) algorithm and sparse autoencoder. This method can be applied to the initialization of the convolution kernel in the convolutional neural network (CNN). The main function of the convolution kernel is to extract the local pattern of the image by template matching as the target feature of subsequent image recognition. In general, the Xavier initialization method and the He initialization method are used to initialize the convolution kernel. In this paper, firstly, some typical sample images were selected from the training set, and the LBP algorithm was applied to extract the texture information of the typical sample images. Then, the texture information was divided into several small blocks, and these blocks were input into the sparse autoencoder (SAE) for pre-training. After finishing the training, the weight values of the sparse autoencoder that met the statistical features of the data set were used as the initial value of the convolution kernel in the CNN. The experimental result indicates that the method proposed in this paper can speed up the convergence of the network in the network training process and improve the recognition rate of the network to an extent.

2021 ◽  
Vol 30 (1) ◽  
pp. 893-902
Author(s):  
Ke Xu

Abstract A portrait recognition system can play an important role in emergency evacuation in mass emergencies. This paper designed a portrait recognition system, analyzed the overall structure of the system and the method of image preprocessing, and used the Single Shot MultiBox Detector (SSD) algorithm for portrait detection. It also designed an improved algorithm combining principal component analysis (PCA) with linear discriminant analysis (LDA) for portrait recognition and tested the system by applying it in a shopping mall to collect and monitor the portrait and establish a data set. The results showed that the missing detection rate and false detection rate of the SSD algorithm were 0.78 and 2.89%, respectively, which were lower than those of the AdaBoost algorithm. Comparisons with PCA, LDA, and PCA + LDA algorithms demonstrated that the recognition rate of the improved PCA + LDA algorithm was the highest, which was 95.8%, the area under the receiver operating characteristic curve was the largest, and the recognition time was the shortest, which was 465 ms. The experimental results show that the improved PCA + LDA algorithm is reliable in portrait recognition and can be used for emergency evacuation in mass emergencies.


2016 ◽  
Vol 14 (1) ◽  
pp. 172988141769231 ◽  
Author(s):  
Yingfeng Cai ◽  
Youguo He ◽  
Hai Wang ◽  
Xiaoqiang Sun ◽  
Long Chen ◽  
...  

The emergence and development of deep learning theory in machine learning field provide new method for visual-based pedestrian recognition technology. To achieve better performance in this application, an improved weakly supervised hierarchical deep learning pedestrian recognition algorithm with two-dimensional deep belief networks is proposed. The improvements are made by taking into consideration the weaknesses of structure and training methods of existing classifiers. First, traditional one-dimensional deep belief network is expanded to two-dimensional that allows image matrix to be loaded directly to preserve more information of a sample space. Then, a determination regularization term with small weight is added to the traditional unsupervised training objective function. By this modification, original unsupervised training is transformed to weakly supervised training. Subsequently, that gives the extracted features discrimination ability. Multiple sets of comparative experiments show that the performance of the proposed algorithm is better than other deep learning algorithms in recognition rate and outperforms most of the existing state-of-the-art methods in non-occlusion pedestrian data set while performs fair in weakly and heavily occlusion data set.


2017 ◽  
Vol 45 (2) ◽  
pp. 66-74
Author(s):  
Yufeng Ma ◽  
Long Xia ◽  
Wenqi Shen ◽  
Mi Zhou ◽  
Weiguo Fan

Purpose The purpose of this paper is automatic classification of TV series reviews based on generic categories. Design/methodology/approach What the authors mainly applied is using surrogate instead of specific roles or actors’ name in reviews to make reviews more generic. Besides, feature selection techniques and different kinds of classifiers are incorporated. Findings With roles’ and actors’ names replaced by generic tags, the experimental result showed that it can generalize well to agnostic TV series as compared with reviews keeping the original names. Research limitations/implications The model presented in this paper must be built on top of an already existed knowledge base like Baidu Encyclopedia. Such database takes lots of work. Practical implications Like in digital information supply chain, if reviews are part of the information to be transported or exchanged, then the model presented in this paper can help automatically identify individual review according to different requirements and help the information sharing. Originality/value One originality is that the authors proposed the surrogate-based approach to make reviews more generic. Besides, they also built a review data set of hot Chinese TV series, which includes eight generic category labels for each review.


2021 ◽  
Vol 11 (6) ◽  
pp. 1592-1598
Author(s):  
Xufei Liu

The early detection of cardiovascular diseases based on electrocardiogram (ECG) is very important for the timely treatment of cardiovascular patients, which increases the survival rate of patients. ECG is a visual representation that describes changes in cardiac bioelectricity and is the basis for detecting heart health. With the rise of edge machine learning and Internet of Things (IoT) technologies, small machine learning models have received attention. This study proposes an ECG automatic classification method based on Internet of Things technology and LSTM network to achieve early monitoring and early prevention of cardiovascular diseases. Specifically, this paper first proposes a single-layer bidirectional LSTM network structure. Make full use of the timing-dependent features of the sampling points before and after to automatically extract features. The network structure is more lightweight and the calculation complexity is lower. In order to verify the effectiveness of the proposed classification model, the relevant comparison algorithm is used to verify on the MIT-BIH public data set. Secondly, the model is embedded in a wearable device to automatically classify the collected ECG. Finally, when an abnormality is detected, the user is alerted by an alarm. The experimental results show that the proposed model has a simple structure and a high classification and recognition rate, which can meet the needs of wearable devices for monitoring ECG of patients.


2014 ◽  
Vol 539 ◽  
pp. 181-184
Author(s):  
Wan Li Zuo ◽  
Zhi Yan Wang ◽  
Ning Ma ◽  
Hong Liang

Accurate classification of text is a basic premise of extracting various types of information on the Web efficiently and utilizing the network resources properly. In this paper, a brand new text classification method was proposed. Consistency analysis method is a type of iterative algorithm, which mainly trains different classifiers (weak classifier) by aiming at the same training set, and then these classifiers will be gathered for testing the consistency degrees of various classification methods for the same text, thus to manifest the knowledge of each type of classifier. It main determines the weight of each sample according to the fact is the classification of each sample is accurate in each training set, as well as the accuracy of the last overall classification, and then sends the new data set whose weight has been modified to the subordinate classifier for training. In the end, the classifier gained in the training will be integrated as the final decision classifier. The classifier with consistency analysis can eliminate some unnecessary training data characteristics and place the key words on key training data. According to the experimental result, the average accuracy of this method is 91.0%, while the average recall rate is 88.1%.


Author(s):  
Teddy Surya Gunawan ◽  
Abdul Mutholib ◽  
Mira Kartiwi

<span>Automatic Number Plate Recognition (ANPR) is an intelligent system which has the capability to recognize the character on vehicle number plate. Previous researches implemented ANPR system on personal computer (PC) with high resolution camera and high computational capability. On the other hand, not many researches have been conducted on the design and implementation of ANPR in smartphone platforms which has limited camera resolution and processing speed. In this paper, various steps to optimize ANPR, including pre-processing, segmentation, and optical character recognition (OCR) using artificial neural network (ANN) and template matching, were described. The proposed ANPR algorithm was based on Tesseract and Leptonica libraries. For comparison purpose, the template matching based OCR will be compared to ANN based OCR. Performance of the proposed algorithm was evaluated on the developed Malaysian number plates’ image database captured by smartphone’s camera. Results showed that the accuracy and processing time of the proposed algorithm using template matching was 97.5% and 1.13 seconds, respectively. On the other hand, the traditional algorithm using template matching only obtained 83.7% recognition rate with 0.98 second processing time. It shows that our proposed ANPR algorithm improved the recognition rate with negligible additional processing time.</span>


2018 ◽  
Vol 7 (4.33) ◽  
pp. 487
Author(s):  
Mohamad Haniff Harun ◽  
Mohd Shahrieel Mohd Aras ◽  
Mohd Firdaus Mohd Ab Halim ◽  
Khalil Azha Mohd Annuar ◽  
Arman Hadi Azahar ◽  
...  

This investigation is solely on the adaptation of a vision system algorithm to classify the processes to regulate the decision making related to the tasks and defect’s recognition. These idea stresses on the new method on vision algorithm which is focusing on the shape matching properties to classify defects occur on the product. The problem faced before that the system required to process broad data acquired from the object caused the time and efficiency slightly decrease. The propose defect detection approach combine with Region of Interest, Gaussian smoothing, Correlation and Template Matching are introduced. This application provides high computational savings and results in better recognition rate about 95.14%. The defects occur provides with information of the height which corresponds by the z-coordinate, length which corresponds by the y-coordinate and width which corresponds by the x-coordinate. This data gathered from the proposed system using dual camera for executing the three dimensional transformation.  


2013 ◽  
Vol 284-287 ◽  
pp. 2402-2406 ◽  
Author(s):  
Rong Choi Lee ◽  
King Chu Hung ◽  
Huan Sheng Wang

This thesis is to approach license-plate recognition using 2D Haar Discrete Wavelet Transform (HDWT) and artificial neural network. This thesis consists of three main parts. The first part is to locate and extract the license-plate. The second part is to train the license-plate. The third part is to real time scan recognition. We select only after the second 2D Haar Discrete Wavelet Transform the image of low-frequency part, image pixels into one-sixteen, thus, reducing the image pixels and can increase rapid implementation of recognition and the computer memory. This method is to scan for car license plate recognition, without make recognition of the individual characters. The experimental result can be high recognition rate.


2021 ◽  
Author(s):  
Meng Chen ◽  
Jianjun Wu ◽  
Feng Tian

&lt;p&gt;Automatically extracting buildings from remote sensing images (RSI) plays important roles in urban planning, population estimation, disaster emergency response, etc. With the development of deep learning technology, convolutional neural networks (CNN) with better performance than traditional methods have been widely used in extracting buildings from remote sensing imagery (RSI). But it still faces some problems. First of all, low-level features extracted by shallow layers and abstract features extracted by deep layers of the artificial neural network could not be fully fused. it makes building extraction is often inaccurate, especially for buildings with complex structures, irregular shapes and small sizes. Secondly, there are so many parameters that need to be trained in a network, which occupies a lot of computing resources and consumes a lot of time in the training process. By analyzing the structure of the CNN, we found that abstract features extracted by deep layers with low geospatial resolution contain more semantic information. These abstract features are conducive to determine the category of pixels while not sensitive to the boundaries of the buildings. We found the stride of the convolution kernel and pooling operation reduced the geospatial resolution of feature maps, so, this paper proposed a simple and effective strategy&amp;#8212;reduce the stride of convolution kernel contains in one of the layers and reduced the number of convolutional kernels to alleviate the above two bottlenecks. This strategy was used to deeplabv3+net and the experimental results for both the WHU Building Dataset and Massachusetts Building Dataset. Compared with the original deeplabv3+net the result showed that this strategy has a better performance. In terms of WHU building data set, the Intersection over Union (IoU) increased by 1.4% and F1 score increased by 0.9%; in terms of Massachusetts Building Dataset, IoU increased by 3.31% and F1 score increased by 2.3%.&lt;/p&gt;


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
BinBin Zhang ◽  
Fumin Zhang ◽  
Xinghua Qu

Purpose Laser-based measurement techniques offer various advantages over conventional measurement techniques, such as no-destructive, no-contact, fast and long measuring distance. In cooperative laser ranging systems, it’s crucial to extract center coordinates of retroreflectors to accomplish automatic measurement. To solve this problem, this paper aims to propose a novel method. Design/methodology/approach We propose a method using Mask RCNN (Region Convolutional Neural Network), with ResNet101 (Residual Network 101) and FPN (Feature Pyramid Network) as the backbone, to localize retroreflectors, realizing automatic recognition in different backgrounds. Compared with two other deep learning algorithms, experiments show that the recognition rate of Mask RCNN is better especially for small-scale targets. Based on this, an ellipse detection algorithm is introduced to obtain the ellipses of retroreflectors from recognized target areas. The center coordinates of retroreflectors in the camera coordinate system are obtained by using a mathematics method. Findings To verify the accuracy of this method, an experiment was carried out: the distance between two retroreflectors with a known distance of 1,000.109 mm was measured, with 2.596 mm root-mean-squar error, meeting the requirements of the coarse location of retroreflectors. Research limitations/implications The research limitations/implications are as follows: (i) As the data set only has 200 pictures, although we have used some data augmentation methods such as rotating, mirroring and cropping, there is still room for improvement in the generalization ability of detection. (ii) The ellipse detection algorithm needs to work in relatively dark conditions, as the retroreflector is made of stainless steel, which easily reflects light. Originality/value The originality/value of the article lies in being able to obtain center coordinates of multiple retroreflectors automatically even in a cluttered background; being able to recognize retroreflectors with different sizes, especially for small targets; meeting the recognition requirement of multiple targets in a large field of view and obtaining 3 D centers of targets by monocular model-based vision.


Sign in / Sign up

Export Citation Format

Share Document