scholarly journals Object Detectors’ Convolutional Neural Networks backbones : a review and a comparative study

Computer vision is a scientific field that deals with how computers can acquire significant level comprehension from computerized images or videos. One of the keystones of computer vision is object detection that aims to identify relevant features from video or image to detect objects. Backbone is the first stage in object detection algorithms that play a crucial role in object detection. Object detectors are usually provided with backbone networks designed for image classification. Object detection performance is highly based on features extracted by backbones, for instance, by simply replacing a backbone with its extended version, a large accuracy metric grows up. Additionally, the backbone's importance is demonstrated by its efficiency in real-time object detection. In this paper, we aim to accumulate the crucial role of the deep learning era and convolutional neural networks in particular in object detection tasks. We have analyzed and have been concentrating on a wide range of reviews on convolutional neural networks used as the backbone of object detection models. Building, therefore, a review of backbones that help researchers and scientists to use it as a guideline for their works.

2021 ◽  
Vol 6 ◽  
pp. 93-101
Author(s):  
Andrey Litvynchuk ◽  
◽  
Lesia Baranovska ◽  
◽  

Face recognition is one of the main tasks of computer vision, which is relevant due to its practical significance and great interest of wide range of scientists. It has many applications, which has led to a huge amount of research in this area. And although research in the field has been going on since the beginning of the computer vision, good results could be achieved only with the help of convolutional neural networks. In this work, a comparative analysis of facial recognition methods before convolutional neural networks was performed. A metric learning approach, augmentations and learning rate schedulers are considered. There were performed bunch of experiments and comparative analysis of the considered methods of improvement of convolutional neural networks. As a result a universal algorithm for training the face recognition model was obtained. In this work, we used SE-ResNet50 as the only neural network for experiments. Metric learning is a method by which it is possible to achieve good accuracy in face recognition. Overfitting is a big problem of neural networks, in particular because they have too many parameters and usually not enough data to guarantee the generalization of the model. Additional data labeling can be time-consuming and expensive, so there is such an approach as augmentation. Augmentations artificially increase the training dataset, so as expected, this method improved the results relative to the original experiment in all experiments. Different degrees and more aggressive forms of augmentation in this work led to better results. As expected, the best learning rate scheduler was cosine scheduler with warm-ups and restarts. This schedule has few parameters, so it is also easy to use. In general, using different approaches, we were able to obtain an accuracy of 93,5 %, which is 22 % better than the baseline experiment. In the following studies, it is planned to consider improving not only the model of facial recognition, but also detection. The accuracy of face detection directly depends on the quality of face recognition.


Sensors ◽  
2019 ◽  
Vol 19 (14) ◽  
pp. 3111 ◽  
Author(s):  
Jing Pan ◽  
Hanqing Sun ◽  
Zhanjie Song ◽  
Jungong Han

Downsampling input images is a simple trick to speed up visual object-detection algorithms, especially on robotic vision and applied mobile vision systems. However, this trick comes with a significant decline in accuracy. In this paper, dual-resolution dual-path Convolutional Neural Networks (CNNs), named DualNets, are proposed to bump up the accuracy of those detection applications. In contrast to previous methods that simply downsample the input images, DualNets explicitly take dual inputs in different resolutions and extract complementary visual features from these using dual CNN paths. The two paths in a DualNet are a backbone path and an auxiliary path that accepts larger inputs and then rapidly downsamples them to relatively small feature maps. With the help of the carefully designed auxiliary CNN paths in DualNets, auxiliary features are extracted from the larger input with controllable computation. Auxiliary features are then fused with the backbone features using a proposed progressive residual fusion strategy to enrich feature representation.This architecture, as the feature extractor, is further integrated with the Single Shot Detector (SSD) to accomplish latency-sensitive visual object-detection tasks. We evaluate the resulting detection pipeline on Pascal VOC and MS COCO benchmarks. Results show that the proposed DualNets can raise the accuracy of those CNN detection applications that are sensitive to computation payloads.


The global development and progress in scientific paraphernalia and technology is the fundamental reason for the rapid increasein the data volume. Several significant techniques have been introducedfor image processing and object detection owing to this advancement. The promising features and transfer learning of ConvolutionalNeural Network (CNN) havegained much attention around the globe by researchers as well as computer vision society, as a result of which, several remarkable breakthroughs were achieved. This paper comprehensively reviews the data classification, history as well as architecture of CNN and well-known techniques bytheir boons and absurdities. Finally, a discussion for implementation of CNN over object detection for effectual results based on their critical analysis and performances is presented


Author(s):  
I. G. Zubov

Introduction. Computer vision systems are finding widespread application in various life domains. Monocularcamera based systems can be used to solve a wide range of problems. The availability of digital cameras and large sets of annotated data, as well as the power of modern computing technologies, render monocular image analysis a dynamically developing direction in the field of machine vision. In order for any computer vision system to describe objects and predict their actions in the physical space of a scene, the image under analysis should be interpreted from the standpoint of the basic 3D scene. This can be achieved by analysing a rigid object as a set of mutually arranged parts, which represents a powerful framework for reasoning about physical interaction.Objective. Development of an automatic method for detecting interest points of an object in an image.Materials and methods. An automatic method for identifying interest points of vehicles, such as license plates, in an image is proposed. This method allows localization of interest points by analysing the inner layers of convolutional neural networks trained for the classification of images and detection of objects in an image. The proposed method allows identification of interest points without incurring additional costs of data annotation and training.Results. The conducted experiments confirmed the correctness of the proposed method in identifying interest points. Thus, the accuracy of identifying a point on a license plate achieved 97%.Conclusion. A new method for detecting interest points of an object by analysing the inner layers of convolutional neural networks is proposed. This method provides an accuracy similar to or exceeding that of other modern methods.


Convolutional Neural Networks(CNNs) are a floating area in Deep Learning. Now a days CNNs are used inside the more note worthy some portion of the Object Recognition tasks. It is used in stand-out utility regions like Speech Recognition, Pattern Acknowledgment, Computer Vision, Object Detection and extraordinary photograph handling programs. CNN orders the realities in light of an opportunity regard. Right now, inside and out assessment of CNN shape and projects are built up. A relative examine of different assortments of CNN are too portrayed on this work.


2021 ◽  
Vol 11 (24) ◽  
pp. 11868
Author(s):  
José Naranjo-Torres ◽  
Marco Mora ◽  
Claudio Fredes ◽  
Andres Valenzuela

Raspberries are fruit of great importance for human beings. Their products are segmented by quality. However, estimating raspberry quality is a manual process carried out at the reception of the fruit processing plant,and is thus exposed to factors that could distort the measurement. The agriculture industry has increased the use of deep learning (DL) in computer vision systems. Non-destructive and computer vision equipment and methods are proposed to solve the problem of estimating the quality of raspberries in a tray. To solve the issue of estimating the quality of raspberries in a picking tray, prototype equipment is developed to determine the quality of raspberry trays using computer vision techniques and convolutional neural networks from images captured in the visible RGB spectrum. The Faster R–CNN object-detection algorithm is used, and different pretrained CNN networks are evaluated as a backbone to develop the software for the developed equipment. To avoid imbalance in the dataset, an individual object-detection model is trained and optimized for each detection class. Finally, both hardware and software are effectively integrated. A conceptual test is performed in a real industrial scenario, thus achieving an automatic evaluation of the quality of the raspberry tray, in this way eliminating the intervention of the human expert and eliminating errors involved in visual analysis. Excellent results were obtained in the conceptual test performed, reaching in some cases precision of 100%, reducing the evaluation time per raspberry tray image to 30 s on average, allowing the evaluation of a larger and representative sample of the raspberry batch arriving at the processing plant.


2021 ◽  
Vol 11 (6) ◽  
pp. 2738
Author(s):  
Xiangzhou Wang ◽  
Xiaohui Du ◽  
Lin Liu ◽  
Guangming Ni ◽  
Jing Zhang ◽  
...  

Diagnosis of Trichomonas vaginalis infection is one of the most important factors in the routine examination of leucorrhea. According to the motion characteristics of Trichomonas vaginalis, a viable detection method is the use of a microscopic camera to record videos of leucorrhea samples and video object detection algorithms for detection. Most Trichomonas vaginalis is defocused and displays as shadow regions on microscopic images, and it is hard to recognize the movement of shadow regions using traditional video object detection algorithms. In order to solve this problem, we propose two convolutional neural networks based on an encoder-decoder architecture. The first network has the ability to learn the difference between frames and utilizes the image and optical flow information of three consecutive frames as the input to perform rough detection. The second network corrects the coarse contours and uses the image information and the rough detection result of the current frame as the input to perform fine detection. With these two networks applied, the metric value of the mean intersection over union of Trichomonas vaginalis achieves 72.09% on test videos. The proposed networks can effectively detect defocused Trichomonas vaginalis and suppress false alarms caused by the motion of formed elements or impurities.


2019 ◽  
Vol 10 (1) ◽  
pp. 83 ◽  
Author(s):  
Atakan Körez ◽  
Necaattin Barışçı

Object detection in remote sensing images has been frequently used in a wide range of areas such as land planning, city monitoring, traffic monitoring, and agricultural applications. It is essential in the field of aerial and satellite image analysis but it is also a challenge. To overcome this challenging problem, there are many object detection models using convolutional neural networks (CNN). The deformable convolutional structure has been introduced to eliminate the disadvantage of the fixed grid structure of the convolutional neural networks. In this study, a multi-scale Faster R-CNN method based on deformable convolution is proposed for single/low graphics processing unit (GPU) systems. Weight standardization (WS) is used instead of batch normalization (BN) to make the proposed model more efficient for a small batch size (1 img/per GPU) on single GPU systems. Experiments were conducted on the publicly available 10-class geospatial object detection (NWPU-VHR 10) dataset to evaluate the object detection performance of the proposed model. Experiment results show that our model achieved a 92.3 mAP. This is a 1.7% mAP increase when compared to the best results in the models using the same dataset.


Author(s):  
Samuel Humphries ◽  
Trevor Parker ◽  
Bryan Jonas ◽  
Bryan Adams ◽  
Nicholas J Clark

Quick identification of building and roads is critical for execution of tactical US military operations in an urban environment. To this end, a gridded, referenced, satellite images of an objective, often referred to as a gridded reference graphic or GRG, has become a standard product developed during intelligence preparation of the environment. At present, operational units identify key infrastructure by hand through the work of individual intelligence officers. Recent advances in Convolutional Neural Networks, however, allows for this process to be streamlined through the use of object detection algorithms. In this paper, we describe an object detection algorithm designed to quickly identify and label both buildings and road intersections present in an image. Our work leverages both the U-Net architecture as well the SpaceNet data corpus to produce an algorithm that accurately identifies a large breadth of buildings and different types of roads. In addition to predicting buildings and roads, our model numerically labels each building by means of a contour finding algorithm. Most importantly, the dual U-Net model is capable of predicting buildings and roads on a diverse set of test images and using these predictions to produce clean GRGs.


2021 ◽  
Vol 11 (15) ◽  
pp. 6721
Author(s):  
Jinyeong Wang ◽  
Sanghwan Lee

In increasing manufacturing productivity with automated surface inspection in smart factories, the demand for machine vision is rising. Recently, convolutional neural networks (CNNs) have demonstrated outstanding performance and solved many problems in the field of computer vision. With that, many machine vision systems adopt CNNs to surface defect inspection. In this study, we developed an effective data augmentation method for grayscale images in CNN-based machine vision with mono cameras. Our method can apply to grayscale industrial images, and we demonstrated outstanding performance in the image classification and the object detection tasks. The main contributions of this study are as follows: (1) We propose a data augmentation method that can be performed when training CNNs with industrial images taken with mono cameras. (2) We demonstrate that image classification or object detection performance is better when training with the industrial image data augmented by the proposed method. Through the proposed method, many machine-vision-related problems using mono cameras can be effectively solved by using CNNs.


Sign in / Sign up

Export Citation Format

Share Document