scholarly journals Real Time Eye Detector with Cascaded Convolutional Neural Networks

2018 ◽  
Vol 2018 ◽  
pp. 1-8 ◽  
Author(s):  
Bin Li ◽  
Hong Fu

An accurate and efficient eye detector is essential for many computer vision applications. In this paper, we present an efficient method to evaluate the eye location from facial images. First, a group of candidate regions with regional extreme points is quickly proposed; then, a set of convolution neural networks (CNNs) is adopted to determine the most likely eye region and classify the region as left or right eye; finally, the center of the eye is located with other CNNs. In the experiments using GI4E, BioID, and our datasets, our method attained a detection accuracy which is comparable to existing state-of-the-art methods; meanwhile, our method was faster and adaptable to variations of the images, including external light changes, facial occlusion, and changes in image modality.

Author(s):  
PASQUALE FOGGIA ◽  
GENNARO PERCANNELLA ◽  
CARLO SANSONE ◽  
MARIO VENTO

In some Computer Vision applications there is the need for grouping, in one or more clusters, only a part of the whole dataset. This happens, for example, when samples of interest for the application at hand are present together with several noisy samples. In this paper we present a graph-based algorithm for cluster detection that is particularly suited for detecting clusters of any size and shape, without the need of specifying either the actual number of clusters or the other parameters. The algorithm has been tested on data coming from two different computer vision applications. A comparison with other four state-of-the-art graph-based algorithms was also provided, demonstrating the effectiveness of the proposed approach.


2020 ◽  
Author(s):  
Jawad Khan

Activity recognition is a topic undergoing massive research in the field of computer vision. Applications of activity recognition include sports summaries, human-computer interaction, violence detection, surveillance etc. In this paper, we propose the modification of the standard local binary patterns descriptor to obtain a concatenated histogram of lower dimensions. This helps to encode the spatial and temporal information of various actions happening in a frame. This method helps to overcome the dimensionality problem that occurs with LBP and the results show that the proposed method performed comparably with state of the art methods.


Author(s):  
Ritwik Chavhan ◽  
Kadir Sheikh ◽  
Rishikesh Bondade ◽  
Swaraj Dhanulkar ◽  
Aniket Ninave ◽  
...  

Plant disease is an ongoing challenge for smallholder farmers, which threatens income and food security. The recent revolution in smartphone penetration and computer vision models has created an opportunity for image classification in agriculture. The project focuses on providing the data relating to the pesticide/insecticide and therefore the quantity of pesticide/insecticide to be used for associate degree unhealthy crop. The user, is that the farmer clicks an image of the crop and uploads it to the server via the humanoid application. When uploading the image the farmer gets associate degree distinctive ID displayed on his application screen. The farmer must create note of that ID since that ID must be utilized by the farmer later to retrieve the message when a minute. The uploaded image is then processed by Convolutional Neural Networks. Convolutional Neural Networks (CNNs) are considered state-of-the-art in image recognition and offer the ability to provide a prompt and definite diagnosis. Then the result consisting of the malady name and therefore the affected space is retrieved. This result's then uploaded into the message table within the server. Currently the Farmer are going to be ready to retrieve the whole info during a respectable format by coming into the distinctive ID he had received within the Application.


2021 ◽  
Author(s):  
Weihao Zhuang ◽  
Tristan Hascoet ◽  
Xunquan Chen ◽  
Ryoichi Takashima ◽  
Tetsuya Takiguchi ◽  
...  

Abstract Currently, deep learning plays an indispensable role in many fields, including computer vision, natural language processing, and speech recognition. Convolutional Neural Networks (CNNs) have demonstrated excellent performance in computer vision tasks thanks to their powerful feature extraction capability. However, as the larger models have shown higher accuracy, recent developments have led to state-of-the-art CNN models with increasing resource consumption. This paper investigates a conceptual approach to reduce the memory consumption of CNN inference. Our method consists of processing the input image in a sequence of carefully designed tiles within the lower subnetwork of the CNN, so as to minimize its peak memory consumption, while keeping the end-to-end computation unchanged. This method introduces a trade-off between memory consumption and computations, which is particularly suitable for high-resolution inputs. Our experimental results show that MobileNetV2 memory consumption can be reduced by up to 5.3 times with our proposed method. For ResNet50, one of the most commonly used CNN models in computer vision tasks, memory can be optimized by up to 2.3 times.


2021 ◽  
Vol 2042 (1) ◽  
pp. 012002
Author(s):  
Roberto Castello ◽  
Alina Walch ◽  
Raphaël Attias ◽  
Riccardo Cadei ◽  
Shasha Jiang ◽  
...  

Abstract The integration of solar technology in the built environment is realized mainly through rooftop-installed panels. In this paper, we leverage state-of-the-art Machine Learning and computer vision techniques applied on overhead images to provide a geo-localization of the available rooftop surfaces for solar panel installation. We further exploit a 3D building database to associate them to the corresponding roof geometries by means of a geospatial post-processing approach. The stand-alone Convolutional Neural Network used to segment suitable rooftop areas reaches an intersection over union of 64% and an accuracy of 93%, while a post-processing step using building database improves the rejection of false positives. The model is applied to a case study area in the canton of Geneva and the results are compared with another recent method used in the literature to derive the realistic available area.


2021 ◽  
Vol 3 (4) ◽  
pp. 966-989
Author(s):  
Vanessa Buhrmester ◽  
David Münch ◽  
Michael Arens

Deep Learning is a state-of-the-art technique to make inference on extensive or complex data. As a black box model due to their multilayer nonlinear structure, Deep Neural Networks are often criticized as being non-transparent and their predictions not traceable by humans. Furthermore, the models learn from artificially generated datasets, which often do not reflect reality. By basing decision-making algorithms on Deep Neural Networks, prejudice and unfairness may be promoted unknowingly due to a lack of transparency. Hence, several so-called explanators, or explainers, have been developed. Explainers try to give insight into the inner structure of machine learning black boxes by analyzing the connection between the input and output. In this survey, we present the mechanisms and properties of explaining systems for Deep Neural Networks for Computer Vision tasks. We give a comprehensive overview about the taxonomy of related studies and compare several survey papers that deal with explainability in general. We work out the drawbacks and gaps and summarize further research ideas.


Algorithms ◽  
2020 ◽  
Vol 13 (7) ◽  
pp. 167 ◽  
Author(s):  
Dan Malowany ◽  
Hugo Guterman

Computer vision is currently one of the most exciting and rapidly evolving fields of science, which affects numerous industries. Research and development breakthroughs, mainly in the field of convolutional neural networks (CNNs), opened the way to unprecedented sensitivity and precision in object detection and recognition tasks. Nevertheless, the findings in recent years on the sensitivity of neural networks to additive noise, light conditions, and to the wholeness of the training dataset, indicate that this technology still lacks the robustness needed for the autonomous robotic industry. In an attempt to bring computer vision algorithms closer to the capabilities of a human operator, the mechanisms of the human visual system was analyzed in this work. Recent studies show that the mechanisms behind the recognition process in the human brain include continuous generation of predictions based on prior knowledge of the world. These predictions enable rapid generation of contextual hypotheses that bias the outcome of the recognition process. This mechanism is especially advantageous in situations of uncertainty, when visual input is ambiguous. In addition, the human visual system continuously updates its knowledge about the world based on the gaps between its prediction and the visual feedback. CNNs are feed forward in nature and lack such top-down contextual attenuation mechanisms. As a result, although they process massive amounts of visual information during their operation, the information is not transformed into knowledge that can be used to generate contextual predictions and improve their performance. In this work, an architecture was designed that aims to integrate the concepts behind the top-down prediction and learning processes of the human visual system with the state-of-the-art bottom-up object recognition models, e.g., deep CNNs. The work focuses on two mechanisms of the human visual system: anticipation-driven perception and reinforcement-driven learning. Imitating these top-down mechanisms, together with the state-of-the-art bottom-up feed-forward algorithms, resulted in an accurate, robust, and continuously improving target recognition model.


2020 ◽  
Vol 34 (10) ◽  
pp. 13714-13715
Author(s):  
Subhajit Chaudhury

Neural networks have contributed to tremendous progress in the domains of computer vision, speech processing, and other real-world applications. However, recent studies have shown that these state-of-the-art models can be easily compromised by adding small imperceptible perturbations. My thesis summary frames the problem of adversarial robustness as an equivalent problem of learning suitable features that leads to good generalization in neural networks. This is motivated from learning in humans which is not trivially fooled by such perturbations due to robust feature learning which shows good out-of-sample generalization.


2018 ◽  
Vol 28 (05) ◽  
pp. 1750056 ◽  
Author(s):  
Ezequiel López-Rubio ◽  
Miguel A. Molina-Cabello ◽  
Rafael Marcos Luque-Baena ◽  
Enrique Domínguez

One of the most important challenges in computer vision applications is the background modeling, especially when the background is dynamic and the input distribution might not be stationary, i.e. the distribution of the input data could change with time (e.g. changing illuminations, waving trees, water, etc.). In this work, an unsupervised learning neural network is proposed which is able to cope with progressive changes in the input distribution. It is based on a dual learning mechanism which manages the changes of the input distribution separately from the cluster detection. The proposal is adequate for scenes where the background varies slowly. The performance of the method is tested against several state-of-the-art foreground detectors both quantitatively and qualitatively, with favorable results.


Sign in / Sign up

Export Citation Format

Share Document