OBJECT CLASSIFICATION AND OCCLUSION HANDLING USING QUADRATIC FEATURE CORRELATION MODEL AND NEURAL NETWORKS

Author(s):  
NA FAN

Occlusion handling is an old but important problem for the computer vision and pattern recognition community. Features from different objects may twist with each other, and any matched feature points may belong to different objects for many traditional object recognition algorithms. To recognize occlusions, we should not only match objects from different view points but also match features extracted from the same object. In this paper, we propose a method to consider these two perspectives simultaneously by encoding various types of features, such as geometry, color and texture relationships among feature points into a matrix and find the best quadratic feature correlation model to fit them. Experiments on our own built dataset and the publicly available PASCAL VOC dataset shows that, our method can robustly classify objects and handle occluded objects under large occlusions, and the performance is among the state-of-the-art.

Algorithms ◽  
2020 ◽  
Vol 13 (7) ◽  
pp. 167 ◽  
Author(s):  
Dan Malowany ◽  
Hugo Guterman

Computer vision is currently one of the most exciting and rapidly evolving fields of science, which affects numerous industries. Research and development breakthroughs, mainly in the field of convolutional neural networks (CNNs), opened the way to unprecedented sensitivity and precision in object detection and recognition tasks. Nevertheless, the findings in recent years on the sensitivity of neural networks to additive noise, light conditions, and to the wholeness of the training dataset, indicate that this technology still lacks the robustness needed for the autonomous robotic industry. In an attempt to bring computer vision algorithms closer to the capabilities of a human operator, the mechanisms of the human visual system was analyzed in this work. Recent studies show that the mechanisms behind the recognition process in the human brain include continuous generation of predictions based on prior knowledge of the world. These predictions enable rapid generation of contextual hypotheses that bias the outcome of the recognition process. This mechanism is especially advantageous in situations of uncertainty, when visual input is ambiguous. In addition, the human visual system continuously updates its knowledge about the world based on the gaps between its prediction and the visual feedback. CNNs are feed forward in nature and lack such top-down contextual attenuation mechanisms. As a result, although they process massive amounts of visual information during their operation, the information is not transformed into knowledge that can be used to generate contextual predictions and improve their performance. In this work, an architecture was designed that aims to integrate the concepts behind the top-down prediction and learning processes of the human visual system with the state-of-the-art bottom-up object recognition models, e.g., deep CNNs. The work focuses on two mechanisms of the human visual system: anticipation-driven perception and reinforcement-driven learning. Imitating these top-down mechanisms, together with the state-of-the-art bottom-up feed-forward algorithms, resulted in an accurate, robust, and continuously improving target recognition model.


2020 ◽  
Author(s):  
Dean Sumner ◽  
Jiazhen He ◽  
Amol Thakkar ◽  
Ola Engkvist ◽  
Esben Jannik Bjerrum

<p>SMILES randomization, a form of data augmentation, has previously been shown to increase the performance of deep learning models compared to non-augmented baselines. Here, we propose a novel data augmentation method we call “Levenshtein augmentation” which considers local SMILES sub-sequence similarity between reactants and their respective products when creating training pairs. The performance of Levenshtein augmentation was tested using two state of the art models - transformer and sequence-to-sequence based recurrent neural networks with attention. Levenshtein augmentation demonstrated an increase performance over non-augmented, and conventionally SMILES randomization augmented data when used for training of baseline models. Furthermore, Levenshtein augmentation seemingly results in what we define as <i>attentional gain </i>– an enhancement in the pattern recognition capabilities of the underlying network to molecular motifs.</p>


1999 ◽  
Vol 18 (3-4) ◽  
pp. 265-273
Author(s):  
Giovanni B. Garibotto

The paper is intended to provide an overview of advanced robotic technologies within the context of Postal Automation services. The main functional requirements of the application are briefly referred, as well as the state of the art and new emerging solutions. Image Processing and Pattern Recognition have always played a fundamental role in Address Interpretation and Mail sorting and the new challenging objective is now off-line handwritten cursive recognition, in order to be able to handle all kind of addresses in a uniform way. On the other hand, advanced electromechanical and robotic solutions are extremely important to solve the problems of mail storage, transportation and distribution, as well as for material handling and logistics. Finally a short description of new services of Postal Automation is referred, by considering new emerging services of hybrid mail and paper to electronic conversion.


2013 ◽  
Vol 2 (2) ◽  
pp. 66-79 ◽  
Author(s):  
Onsy A. Abdel Alim ◽  
Amin Shoukry ◽  
Neamat A. Elboughdadly ◽  
Gehan Abouelseoud

In this paper, a pattern recognition module that makes use of 3-D images of objects is presented. The proposed module takes advantage of both the generalization capability of neural networks and the possibility of manipulating 3-D images to generate views at different poses of the object that is to be recognized. This allows the construction of a robust 3-D object recognition module that can find use in various applications including military, biomedical and mine detection applications. The paper proposes an efficient training procedure and decision making strategy for the suggested neural network. Sample results of testing the module on 3-D images of several objects are also included along with an insightful discussion of the implications of the results.


2011 ◽  
Vol 16 (1-2) ◽  
pp. 49-56 ◽  
Author(s):  
Dariusz Jakóbczak

Curve Parameterization and Curvature via Method of Hurwitz-Radon MatricesParametric representation of the curve is more appropriate in computer vision applications then explicit formy=f(x)or implicit representationf(x, y) = 0. Proposed method of Hurwitz-Radon Matrices (MHR) can be used in parameterization and interpolation of curves in the plane. Suitable parameterization leads to curvature calculations. Points with local maximum curvature are treated as feature points in object recognition and image analysis. This paper contains the way of curve parameterization and computing the curvature in the range of two successive interpolation nodes via MHR method. Proposed method is based on a family of Hurwitz-Radon (HR) matrices. The matrices are skew-symmetric and possess columns composed of orthogonal vectors. The operator of Hurwitz-Radon (OHR), built from these matrices, is described. It is shown how to create the orthogonal OHR and how to use it in a process of curve parameterization and curvature calculation.


2021 ◽  
Vol 11 (19) ◽  
pp. 9197
Author(s):  
Muhammad Tahir ◽  
Saeed Anwar

Person Re-Identification is an essential task in computer vision, particularly in surveillance applications. The aim is to identify a person based on an input image from surveillance photographs in various scenarios. Most Person re-ID techniques utilize Convolutional Neural Networks (CNNs); however, Vision Transformers are replacing pure CNNs for various computer vision tasks such as object recognition, classification, etc. The vision transformers contain information about local regions of the image. The current techniques take this advantage to improve the accuracy of the tasks underhand. We propose to use the vision transformers in conjunction with vanilla CNN models to investigate the true strength of transformers in person re-identification. We employ three backbones with different combinations of vision transformers on two benchmark datasets. The overall performance of the backbones increased, showing the importance of vision transformers. We provide ablation studies and show the importance of various components of the vision transformers in re-identification tasks.


Author(s):  
Ritwik Chavhan ◽  
Kadir Sheikh ◽  
Rishikesh Bondade ◽  
Swaraj Dhanulkar ◽  
Aniket Ninave ◽  
...  

Plant disease is an ongoing challenge for smallholder farmers, which threatens income and food security. The recent revolution in smartphone penetration and computer vision models has created an opportunity for image classification in agriculture. The project focuses on providing the data relating to the pesticide/insecticide and therefore the quantity of pesticide/insecticide to be used for associate degree unhealthy crop. The user, is that the farmer clicks an image of the crop and uploads it to the server via the humanoid application. When uploading the image the farmer gets associate degree distinctive ID displayed on his application screen. The farmer must create note of that ID since that ID must be utilized by the farmer later to retrieve the message when a minute. The uploaded image is then processed by Convolutional Neural Networks. Convolutional Neural Networks (CNNs) are considered state-of-the-art in image recognition and offer the ability to provide a prompt and definite diagnosis. Then the result consisting of the malady name and therefore the affected space is retrieved. This result's then uploaded into the message table within the server. Currently the Farmer are going to be ready to retrieve the whole info during a respectable format by coming into the distinctive ID he had received within the Application.


2021 ◽  
Author(s):  
Weihao Zhuang ◽  
Tristan Hascoet ◽  
Xunquan Chen ◽  
Ryoichi Takashima ◽  
Tetsuya Takiguchi ◽  
...  

Abstract Currently, deep learning plays an indispensable role in many fields, including computer vision, natural language processing, and speech recognition. Convolutional Neural Networks (CNNs) have demonstrated excellent performance in computer vision tasks thanks to their powerful feature extraction capability. However, as the larger models have shown higher accuracy, recent developments have led to state-of-the-art CNN models with increasing resource consumption. This paper investigates a conceptual approach to reduce the memory consumption of CNN inference. Our method consists of processing the input image in a sequence of carefully designed tiles within the lower subnetwork of the CNN, so as to minimize its peak memory consumption, while keeping the end-to-end computation unchanged. This method introduces a trade-off between memory consumption and computations, which is particularly suitable for high-resolution inputs. Our experimental results show that MobileNetV2 memory consumption can be reduced by up to 5.3 times with our proposed method. For ResNet50, one of the most commonly used CNN models in computer vision tasks, memory can be optimized by up to 2.3 times.


2021 ◽  
Vol 2042 (1) ◽  
pp. 012002
Author(s):  
Roberto Castello ◽  
Alina Walch ◽  
Raphaël Attias ◽  
Riccardo Cadei ◽  
Shasha Jiang ◽  
...  

Abstract The integration of solar technology in the built environment is realized mainly through rooftop-installed panels. In this paper, we leverage state-of-the-art Machine Learning and computer vision techniques applied on overhead images to provide a geo-localization of the available rooftop surfaces for solar panel installation. We further exploit a 3D building database to associate them to the corresponding roof geometries by means of a geospatial post-processing approach. The stand-alone Convolutional Neural Network used to segment suitable rooftop areas reaches an intersection over union of 64% and an accuracy of 93%, while a post-processing step using building database improves the rejection of false positives. The model is applied to a case study area in the canton of Geneva and the results are compared with another recent method used in the literature to derive the realistic available area.


Sign in / Sign up

Export Citation Format

Share Document