scholarly journals Detecting Citrus in Orchard Environment by Using Improved YOLOv4

2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Wenkang Chen ◽  
Shenglian Lu ◽  
Binghao Liu ◽  
Guo Li ◽  
Tingting Qian

Real-time detection of fruits in orchard environments is one of crucial techniques for many precision agriculture applications, including yield estimation and automatic harvesting. Due to the complex conditions, such as different growth periods and occlusion among leaves and fruits, detecting fruits in natural environments is a considerable challenge. A rapid citrus recognition method by improving the state-of-the-art You Only Look Once version 4 (YOLOv4) detector is proposed in this paper. Kinect V2 camera was used to collect RGB images of citrus trees. The Canopy algorithm and the K-Means++ algorithm were then used to automatically select the number and size of the prior frames from these RGB images. An improved YOLOv4 network structure was proposed to better detect smaller citrus under complex backgrounds. Finally, the trained network model was used for sparse training, pruning unimportant channels or network layers in the network, and fine-tuning the parameters of the pruned model to restore some of the recognition accuracy. The experimental results show that the improved YOLOv4 detector works well for detecting different growth periods of citrus in a natural environment, with an average increase in accuracy of 3.15% (from 92.89% to 96.04%). This result is superior to the original YOLOv4, YOLOv3, and Faster R-CNN. The average detection time of this model is 0.06 s per frame at 1920 × 1080 resolution. The proposed method is suitable for the rapid detection of the type and location of citrus in natural environments and can be applied to the application of citrus picking and yield evaluation in actual orchards.

2021 ◽  
Vol 271 ◽  
pp. 01039
Author(s):  
Dongsheng Ji ◽  
Yanzhong Zhao ◽  
Zhujun Zhang ◽  
Qianchuan Zhao

In view of the large demand for new coronary pneumonia covid19 image recognition samples, the recognition accuracy is not ideal. In this paper, a new coronary pneumonia positive image recognition method proposed based on small sample recognition. First, the CT image pictures are preprocessed, and the pictures are converted into the picture formats which are required for transfer learning. Secondly, small-sample image enhancement and extension are performed on the transformed image, such as staggered transformation, random rotation and translation, etc.. Then, multiple migration models are used to extract features and then perform feature fusion. Finally,the model is adjusted by fine-tuning. Then train the model to obtain experimental results. The experimental results show that our method has excellent recognition performance in the recognition of new coronary pneumonia images, even with only a small number of CT image samples.


2021 ◽  
Vol 2021 (1) ◽  
Author(s):  
Hugo Masson ◽  
Amran Bhuiyan ◽  
Le Thanh Nguyen-Meidine ◽  
Mehrsan Javan ◽  
Parthipan Siva ◽  
...  

AbstractRecent years have witnessed a substantial increase in the deep learning (DL) architectures proposed for visual recognition tasks like person re-identification, where individuals must be recognized over multiple distributed cameras. Although these architectures have greatly improved the state-of-the-art accuracy, the computational complexity of the convolutional neural networks (CNNs) commonly used for feature extraction remains an issue, hindering their deployment on platforms with limited resources, or in applications with real-time constraints. There is an obvious advantage to accelerating and compressing DL models without significantly decreasing their accuracy. However, the source (pruning) domain differs from operational (target) domains, and the domain shift between image data captured with different non-overlapping camera viewpoints leads to lower recognition accuracy. In this paper, we investigate the prunability of these architectures under different design scenarios. This paper first revisits pruning techniques that are suitable for reducing the computational complexity of deep CNN networks applied to person re-identification. Then, these techniques are analyzed according to their pruning criteria and strategy and according to different scenarios for exploiting pruning methods to fine-tuning networks to target domains. Experimental results obtained using DL models with ResNet feature extractors, and multiple benchmarks re-identification datasets, indicate that pruning can considerably reduce network complexity while maintaining a high level of accuracy. In scenarios where pruning is performed with large pretraining or fine-tuning datasets, the number of FLOPS required by ResNet architectures is reduced by half, while maintaining a comparable rank-1 accuracy (within 1% of the original model). Pruning while training a larger CNNs can also provide a significantly better performance than fine-tuning smaller ones.


Electronics ◽  
2021 ◽  
Vol 10 (5) ◽  
pp. 551
Author(s):  
Xin Xiong ◽  
Haoyuan Wu ◽  
Weidong Min ◽  
Jianqiang Xu ◽  
Qiyan Fu ◽  
...  

Traffic police gesture recognition is important in automatic driving. Most existing traffic police gesture recognition methods extract pixel-level features from RGB images which are uninterpretable because of a lack of gesture skeleton features and may result in inaccurate recognition due to background noise. Existing deep learning methods are not suitable for handling gesture skeleton features because they ignore the inevitable connection between skeleton joint coordinate information and gestures. To alleviate the aforementioned issues, a traffic police gesture recognition method based on a gesture skeleton extractor (GSE) and a multichannel dilated graph convolution network (MD-GCN) is proposed. To extract discriminative and interpretable gesture skeleton coordinate information, a GSE is proposed to extract skeleton coordinate information and remove redundant skeleton joints and bones. In the gesture discrimination stage, GSE-based features are introduced into the proposed MD-GCN. The MD-GCN constructs a graph convolution with a multichannel dilated to enlarge the receptive field, which extracts body topological and spatiotemporal action features from skeleton coordinates. Comparison experiments with state-of-the-art methods were conducted on a public dataset. The results show that the proposed method achieves an accuracy rate of 98.95%, which is the best and at least 6% higher than that of the other methods.


Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 1999
Author(s):  
Sadiq H. Abdulhussain ◽  
Basheera M. Mahmmod ◽  
Marwah Abdulrazzaq Naser ◽  
Muntadher Qasim Alsabah ◽  
Roslizah Ali ◽  
...  

Numeral recognition is considered an essential preliminary step for optical character recognition, document understanding, and others. Although several handwritten numeral recognition algorithms have been proposed so far, achieving adequate recognition accuracy and execution time remain challenging to date. In particular, recognition accuracy depends on the features extraction mechanism. As such, a fast and robust numeral recognition method is essential, which meets the desired accuracy by extracting the features efficiently while maintaining fast implementation time. Furthermore, to date most of the existing studies are focused on evaluating their methods based on clean environments, thus limiting understanding of their potential application in more realistic noise environments. Therefore, finding a feasible and accurate handwritten numeral recognition method that is accurate in the more practical noisy environment is crucial. To this end, this paper proposes a new scheme for handwritten numeral recognition using Hybrid orthogonal polynomials. Gradient and smoothed features are extracted using the hybrid orthogonal polynomial. To reduce the complexity of feature extraction, the embedded image kernel technique has been adopted. In addition, support vector machine is used to classify the extracted features for the different numerals. The proposed scheme is evaluated under three different numeral recognition datasets: Roman, Arabic, and Devanagari. We compare the accuracy of the proposed numeral recognition method with the accuracy achieved by the state-of-the-art recognition methods. In addition, we compare the proposed method with the most updated method of a convolutional neural network. The results show that the proposed method achieves almost the highest recognition accuracy in comparison with the existing recognition methods in all the scenarios considered. Importantly, the results demonstrate that the proposed method is robust against the noise distortion and outperforms the convolutional neural network considerably, which signifies the feasibility and the effectiveness of the proposed approach in comparison to the state-of-the-art recognition methods under both clean noise and more realistic noise environments.


2021 ◽  
Vol 13 (12) ◽  
pp. 2417
Author(s):  
Savvas Karatsiolis ◽  
Andreas Kamilaris ◽  
Ian Cole

Estimating the height of buildings and vegetation in single aerial images is a challenging problem. A task-focused Deep Learning (DL) model that combines architectural features from successful DL models (U-NET and Residual Networks) and learns the mapping from a single aerial imagery to a normalized Digital Surface Model (nDSM) was proposed. The model was trained on aerial images whose corresponding DSM and Digital Terrain Models (DTM) were available and was then used to infer the nDSM of images with no elevation information. The model was evaluated with a dataset covering a large area of Manchester, UK, as well as the 2018 IEEE GRSS Data Fusion Contest LiDAR dataset. The results suggest that the proposed DL architecture is suitable for the task and surpasses other state-of-the-art DL approaches by a large margin.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Young Jae Kim ◽  
Jang Pyo Bae ◽  
Jun-Won Chung ◽  
Dong Kyun Park ◽  
Kwang Gi Kim ◽  
...  

AbstractWhile colorectal cancer is known to occur in the gastrointestinal tract. It is the third most common form of cancer of 27 major types of cancer in South Korea and worldwide. Colorectal polyps are known to increase the potential of developing colorectal cancer. Detected polyps need to be resected to reduce the risk of developing cancer. This research improved the performance of polyp classification through the fine-tuning of Network-in-Network (NIN) after applying a pre-trained model of the ImageNet database. Random shuffling is performed 20 times on 1000 colonoscopy images. Each set of data are divided into 800 images of training data and 200 images of test data. An accuracy evaluation is performed on 200 images of test data in 20 experiments. Three compared methods were constructed from AlexNet by transferring the weights trained by three different state-of-the-art databases. A normal AlexNet based method without transfer learning was also compared. The accuracy of the proposed method was higher in statistical significance than the accuracy of four other state-of-the-art methods, and showed an 18.9% improvement over the normal AlexNet based method. The area under the curve was approximately 0.930 ± 0.020, and the recall rate was 0.929 ± 0.029. An automatic algorithm can assist endoscopists in identifying polyps that are adenomatous by considering a high recall rate and accuracy. This system can enable the timely resection of polyps at an early stage.


Author(s):  
Anil S. Baslamisli ◽  
Partha Das ◽  
Hoang-An Le ◽  
Sezer Karaoglu ◽  
Theo Gevers

AbstractIn general, intrinsic image decomposition algorithms interpret shading as one unified component including all photometric effects. As shading transitions are generally smoother than reflectance (albedo) changes, these methods may fail in distinguishing strong photometric effects from reflectance variations. Therefore, in this paper, we propose to decompose the shading component into direct (illumination) and indirect shading (ambient light and shadows) subcomponents. The aim is to distinguish strong photometric effects from reflectance variations. An end-to-end deep convolutional neural network (ShadingNet) is proposed that operates in a fine-to-coarse manner with a specialized fusion and refinement unit exploiting the fine-grained shading model. It is designed to learn specific reflectance cues separated from specific photometric effects to analyze the disentanglement capability. A large-scale dataset of scene-level synthetic images of outdoor natural environments is provided with fine-grained intrinsic image ground-truths. Large scale experiments show that our approach using fine-grained shading decompositions outperforms state-of-the-art algorithms utilizing unified shading on NED, MPI Sintel, GTA V, IIW, MIT Intrinsic Images, 3DRMS and SRD datasets.


2021 ◽  
Vol 11 (5) ◽  
pp. 603
Author(s):  
Chunlei Shi ◽  
Xianwei Xin ◽  
Jiacai Zhang

Machine learning methods are widely used in autism spectrum disorder (ASD) diagnosis. Due to the lack of labelled ASD data, multisite data are often pooled together to expand the sample size. However, the heterogeneity that exists among different sites leads to the degeneration of machine learning models. Herein, the three-way decision theory was introduced into unsupervised domain adaptation in the first time, and applied to optimize the pseudolabel of the target domain/site from functional magnetic resonance imaging (fMRI) features related to ASD patients. The experimental results using multisite fMRI data show that our method not only narrows the gap of the sample distribution among domains but is also superior to the state-of-the-art domain adaptation methods in ASD recognition. Specifically, the ASD recognition accuracy of the proposed method is improved on all the six tasks, by 70.80%, 75.41%, 69.91%, 72.13%, 71.01% and 68.85%, respectively, compared with the existing methods.


Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1919
Author(s):  
Shuhua Liu ◽  
Huixin Xu ◽  
Qi Li ◽  
Fei Zhang ◽  
Kun Hou

With the aim to solve issues of robot object recognition in complex scenes, this paper proposes an object recognition method based on scene text reading. The proposed method simulates human-like behavior and accurately identifies objects with texts through careful reading. First, deep learning models with high accuracy are adopted to detect and recognize text in multi-view. Second, datasets including 102,000 Chinese and English scene text images and their inverse are generated. The F-measure of text detection is improved by 0.4% and the recognition accuracy is improved by 1.26% because the model is trained by these two datasets. Finally, a robot object recognition method is proposed based on the scene text reading. The robot detects and recognizes texts in the image and then stores the recognition results in a text file. When the user gives the robot a fetching instruction, the robot searches for corresponding keywords from the text files and achieves the confidence of multiple objects in the scene image. Then, the object with the maximum confidence is selected as the target. The results show that the robot can accurately distinguish objects with arbitrary shape and category, and it can effectively solve the problem of object recognition in home environments.


2011 ◽  
Vol 121-126 ◽  
pp. 2141-2145 ◽  
Author(s):  
Wei Gang Yan ◽  
Chang Jian Wang ◽  
Jin Guo

This paper proposes a new image segmentation algorithm to detect the flame image from video in enclosed compartment. In order to avoid the contamination of soot and water vapor, this method first employs the cubic root of four color channels to transform a RGB image to a pseudo-gray one. Then the latter is divided into many small stripes (child images) and OTSU is employed to perform child image segmentation. Lastly, these processed child images are reconstructed into a whole image. A computer program using OpenCV library is developed and the new method is compared with other commonly used methods such as edge detection and normal Otsu’s method. It is found that the new method has better performance in flame image recognition accuracy and can be used to obtain flame shape from experiment video with much noise.


Sign in / Sign up

Export Citation Format

Share Document