scholarly journals A scale-aware YOLO model for pedestrian detection

Author(s):  
Xingyi Yang ◽  
Yong Wang ◽  
Robert Laganiere

<div>Pedestrian detection is considered one of the most challenging problems in computer vision, as it involves the combination of classification and localization within a scene. Recently, convolutional neural networks (CNNs) have been demonstrated to achieve superior detection results compared to traditional approaches. Although YOLOv3 (an improved You Only Look Once model) is proposed as one of state-of-the-art methods in CNN-based object detection, it remains very challenging to leverage this method for real-time pedestrian detection. In this paper, we propose a new framework called SA YOLOv3, a scale-aware You Only Look Once framework which improves YOLOv3 in improving pedestrian detection of small scale pedestrian instances in a real-time manner.</div><div>Our network introduces two sub-networks which detect pedestrians of different scales. Outputs from the sub-networks are then combined to generate robust detection results.</div><div>Experimental results show that the proposed SA YOLOv3 framework outperforms the results of YOLOv3 on public datasets and run at an average of 11 fps on a GPU.</div>

2020 ◽  
Author(s):  
Xingyi Yang ◽  
Yonghu Wang ◽  
Robert Laganiere

<div>Pedestrian detection is considered one of the most challenging problems in computer vision, as it involves the combination of classification and localization within a scene. Recently, convolutional neural networks (CNNs) have been demonstrated to achieve superior detection results compared to traditional approaches. Although YOLOv3 (an improved You Only Look Once model) is proposed as one of state-of-the-art methods in CNN-based object detection, it remains very challenging to leverage this method for real-time pedestrian detection. In this paper, we propose a new framework called SA YOLOv3, a scale-aware You Only Look Once framework which improves YOLOv3 in improving pedestrian detection of small scale pedestrian instances in a real-time manner.</div><div>Our network introduces two sub-networks which detect pedestrians of different scales. Outputs from the sub-networks are then combined to generate robust detection results.</div><div>Experimental results show that the proposed SA YOLOv3 framework outperforms the results of YOLOv3 on public datasets and run at an average of 11 fps on a GPU.</div>


2020 ◽  
Author(s):  
Xingyi Yang ◽  
Yong Wang ◽  
Robert Laganiere

<div>Pedestrian detection is considered one of the most challenging problems in computer vision, as it involves the combination of classification and localization within a scene. Recently, convolutional neural networks (CNNs) have been demonstrated to achieve superior detection results compared to traditional approaches. Although YOLOv3 (an improved You Only Look Once model) is proposed as one of state-of-the-art methods in CNN-based object detection, it remains very challenging to leverage this method for real-time pedestrian detection. In this paper, we propose a new framework called SA YOLOv3, a scale-aware You Only Look Once framework which improves YOLOv3 in improving pedestrian detection of small scale pedestrian instances in a real-time manner.</div><div>Our network introduces two sub-networks which detect pedestrians of different scales. Outputs from the sub-networks are then combined to generate robust detection results.</div><div>Experimental results show that the proposed SA YOLOv3 framework outperforms the results of YOLOv3 on public datasets and run at an average of 11 fps on a GPU.</div>


2021 ◽  
Vol 30 (1) ◽  
pp. 23-43
Author(s):  
Roopalakshmi R

In this pandemic-prone era, health is of utmost concern for everyone and hence eating good quality fruits is very much essential for sound health. Unfortunately, nowadays it is quite very difficult to obtain naturally ripened fruits, due to existence of chemically ripened fruits being ripened using hazardous chemicals such as calcium carbide. However, most of the state-of-the art techniques are primarily focusing on identification of chemically ripened fruits with the help of computer vision-based approaches, which are less effective towards quantification of chemical contaminations present in the sample fruits. To solve these issues, a new framework for chemical ripening and contamination detection is presented, which employs both visual and IR spectrometric signatures in two different stages. The experiments conducted on both the GUI tool as well as hardware-based setups, clearly demonstrate the efficiency of the proposed framework in terms of detection confidence levels followed by the percentage of presence of chemicals in the sample fruit.


2015 ◽  
Vol 2015 ◽  
pp. 1-11 ◽  
Author(s):  
Tao Xiang ◽  
Tao Li ◽  
Mao Ye ◽  
Zijian Liu

Pedestrian detection with large intraclass variations is still a challenging task in computer vision. In this paper, we propose a novel pedestrian detection method based on Random Forest. Firstly, we generate a few local templates with different sizes and different locations in positive exemplars. Then, the Random Forest is built whose splitting functions are optimized by maximizing class purity of matching the local templates to the training samples, respectively. To improve the classification accuracy, we adopt a boosting-like algorithm to update the weights of the training samples in a layer-wise fashion. During detection, the trained Random Forest will vote the category when a sliding window is input. Our contributions are the splitting functions based on local template matching with adaptive size and location and iteratively weight updating method. We evaluate the proposed method on 2 well-known challenging datasets: TUD pedestrians and INRIA pedestrians. The experimental results demonstrate that our method achieves state-of-the-art or competitive performance.


2020 ◽  
Vol 226 ◽  
pp. 02020
Author(s):  
Alexey V. Stadnik ◽  
Pavel S. Sazhin ◽  
Slavomir Hnatic

The performance of neural networks is one of the most important topics in the field of computer vision. In this work, we analyze the speed of object detection using the well-known YOLOv3 neural network architecture in different frameworks under different hardware requirements. We obtain results, which allow us to formulate preliminary qualitative conclusions about the feasibility of various hardware scenarios to solve tasks in real-time environments.


2008 ◽  
Vol 392-394 ◽  
pp. 1030-1036 ◽  
Author(s):  
Yue Ming Wu ◽  
Han Wu He ◽  
J. Sun ◽  
T. Ru ◽  
De Tao Zheng

A real time hand tracking and gesture recognition approach which can deal with dynamic backgrounds is presented. This approach is based on computer vision. It segments hand from dynamic backgrounds using the color-based and appearance-based methods. Then, it locates the hand according to marker on the hand. Last, it recognizes the gesture based on geometry constraint of hand. In comparison with the traditional approaches, it provides a good real time performance, is easy to realize, does not require a stationary camera and is not sensitive with intensity different because its gesture recognition does not depend on the templates. Moreover, an augmented assembly system using the presented approach is described. The experiment result of the augmented assembly system demonstrated the effectiveness and robustness of our approach.


Author(s):  
M A Isayev ◽  
D A Savelyev

The comparison of different convolutional neural networks which are the core of the most actual solutions in the computer vision area is considers in hhe paper. The study includes benchmarks of this state-of-the-art solutions by some criteria, such as mAP (mean average precision), FPS (frames per seconds), for the possibility of real-time usability. It is concluded on the best convolutional neural network model and deep learning methods that were used at particular solution.


2020 ◽  
Vol 34 (07) ◽  
pp. 11229-11236
Author(s):  
Zhiwei Ke ◽  
Zhiwei Wen ◽  
Weicheng Xie ◽  
Yi Wang ◽  
Linlin Shen

Dropout regularization has been widely used in various deep neural networks to combat overfitting. It works by training a network to be more robust on information-degraded data points for better generalization. Conventional dropout and variants are often applied to individual hidden units in a layer to break up co-adaptations of feature detectors. In this paper, we propose an adaptive dropout to reduce the co-adaptations in a group-wise manner by coarse semantic information to improve feature discriminability. In particular, we showed that adjusting the dropout probability based on local feature densities can not only improve the classification performance significantly but also enhance the network robustness against adversarial examples in some cases. The proposed approach was evaluated in comparison with the baseline and several state-of-the-art adaptive dropouts over four public datasets of Fashion-MNIST, CIFAR-10, CIFAR-100 and SVHN.


Author(s):  
Ritwik Chavhan ◽  
Kadir Sheikh ◽  
Rishikesh Bondade ◽  
Swaraj Dhanulkar ◽  
Aniket Ninave ◽  
...  

Plant disease is an ongoing challenge for smallholder farmers, which threatens income and food security. The recent revolution in smartphone penetration and computer vision models has created an opportunity for image classification in agriculture. The project focuses on providing the data relating to the pesticide/insecticide and therefore the quantity of pesticide/insecticide to be used for associate degree unhealthy crop. The user, is that the farmer clicks an image of the crop and uploads it to the server via the humanoid application. When uploading the image the farmer gets associate degree distinctive ID displayed on his application screen. The farmer must create note of that ID since that ID must be utilized by the farmer later to retrieve the message when a minute. The uploaded image is then processed by Convolutional Neural Networks. Convolutional Neural Networks (CNNs) are considered state-of-the-art in image recognition and offer the ability to provide a prompt and definite diagnosis. Then the result consisting of the malady name and therefore the affected space is retrieved. This result's then uploaded into the message table within the server. Currently the Farmer are going to be ready to retrieve the whole info during a respectable format by coming into the distinctive ID he had received within the Application.


2021 ◽  
Author(s):  
Weihao Zhuang ◽  
Tristan Hascoet ◽  
Xunquan Chen ◽  
Ryoichi Takashima ◽  
Tetsuya Takiguchi ◽  
...  

Abstract Currently, deep learning plays an indispensable role in many fields, including computer vision, natural language processing, and speech recognition. Convolutional Neural Networks (CNNs) have demonstrated excellent performance in computer vision tasks thanks to their powerful feature extraction capability. However, as the larger models have shown higher accuracy, recent developments have led to state-of-the-art CNN models with increasing resource consumption. This paper investigates a conceptual approach to reduce the memory consumption of CNN inference. Our method consists of processing the input image in a sequence of carefully designed tiles within the lower subnetwork of the CNN, so as to minimize its peak memory consumption, while keeping the end-to-end computation unchanged. This method introduces a trade-off between memory consumption and computations, which is particularly suitable for high-resolution inputs. Our experimental results show that MobileNetV2 memory consumption can be reduced by up to 5.3 times with our proposed method. For ResNet50, one of the most commonly used CNN models in computer vision tasks, memory can be optimized by up to 2.3 times.


Sign in / Sign up

Export Citation Format

Share Document