A Novel Plausible Model for Visual Perception

Author(s):  
Zhiwei Shi ◽  
Zhongzhi Shi ◽  
Hong Hu

Traditionally, how to bridge the gap between low-level visual features and high-level semantic concepts has been a tough task for researchers. In this article, we propose a novel plausible model, namely cellular Bayesian networks (CBNs), to model the process of visual perception. The new model takes advantage of both the low-level visual features, such as colors, textures, and shapes, of target objects and the interrelationship between the known objects, and integrates them into a Bayesian framework, which possesses both firm theoretical foundation and wide practical applications. The novel model successfully overcomes some weakness of traditional Bayesian Network (BN), which prohibits BN being applied to large-scale cognitive problem. The experimental simulation also demonstrates that the CBNs model outperforms purely Bottom-up strategy 6% or more in the task of shape recognition. Finally, although the CBNs model is designed for visual perception, it has great potential to be applied to other areas as well.

2015 ◽  
Vol 28 (17) ◽  
pp. 6743-6762 ◽  
Author(s):  
Catherine M. Naud ◽  
Derek J. Posselt ◽  
Susan C. van den Heever

Abstract The distribution of cloud and precipitation properties across oceanic extratropical cyclone cold fronts is examined using four years of combined CloudSat radar and CALIPSO lidar retrievals. The global annual mean cloud and precipitation distributions show that low-level clouds are ubiquitous in the postfrontal zone while higher-level cloud frequency and precipitation peak in the warm sector along the surface front. Increases in temperature and moisture within the cold front region are associated with larger high-level but lower mid-/low-level cloud frequencies and precipitation decreases in the cold sector. This behavior seems to be related to a shift from stratiform to convective clouds and precipitation. Stronger ascent in the warm conveyor belt tends to enhance cloudiness and precipitation across the cold front. A strong temperature contrast between the warm and cold sectors also encourages greater post-cold-frontal cloud occurrence. While the seasonal contrasts in environmental temperature, moisture, and ascent strength are enough to explain most of the variations in cloud and precipitation across cold fronts in both hemispheres, they do not fully explain the differences between Northern and Southern Hemisphere cold fronts. These differences are better explained when the impact of the contrast in temperature across the cold front is also considered. In addition, these large-scale parameters do not explain the relatively large frequency in springtime postfrontal precipitation.


2021 ◽  
Author(s):  
Maryam Nematollahi Arani

Object recognition has become a central topic in computer vision applications such as image search, robotics and vehicle safety systems. However, it is a challenging task due to the limited discriminative power of low-level visual features in describing the considerably diverse range of high-level visual semantics of objects. Semantic gap between low-level visual features and high-level concepts are a bottleneck in most systems. New content analysis models need to be developed to bridge the semantic gap. In this thesis, algorithms based on conditional random fields (CRF) from the class of probabilistic graphical models are developed to tackle the problem of multiclass image labeling for object recognition. Image labeling assigns a specific semantic category from a predefined set of object classes to each pixel in the image. By well capturing spatial interactions of visual concepts, CRF modeling has proved to be a successful tool for image labeling. This thesis proposes novel approaches to empowering the CRF modeling for robust image labeling. Our primary contributions are twofold. To better represent feature distributions of CRF potentials, new feature functions based on generalized Gaussian mixture models (GGMM) are designed and their efficacy is investigated. Due to its shape parameter, GGMM can provide a proper fit to multi-modal and skewed distribution of data in nature images. The new model proves more successful than Gaussian and Laplacian mixture models. It also outperforms a deep neural network model on Corel imageset by 1% accuracy. Further in this thesis, we apply scene level contextual information to integrate global visual semantics of the image with pixel-wise dense inference of fully-connected CRF to preserve small objects of foreground classes and to make dense inference robust to initial misclassifications of the unary classifier. Proposed inference algorithm factorizes the joint probability of labeling configuration and image scene type to obtain prediction update equations for labeling individual image pixels and also the overall scene type of the image. The proposed context-based dense CRF model outperforms conventional dense CRF model by about 2% in terms of labeling accuracy on MSRC imageset and by 4% on SIFT Flow imageset. Also, the proposed model obtains the highest scene classification rate of 86% on MSRC dataset.


Author(s):  
Ranjan Parekh ◽  
Nalin Sharda

Semantic characterization is necessary for developing intelligent multimedia databases, because humans tend to search for media content based on their inherent semantics. However, automated inference of semantic concepts derived from media components stored in a database is still a challenge. The aim of this chapter is to demonstrate how layered architectures and “visual keywords” can be used to develop intelligent search systems for multimedia databases. The layered architecture is used to extract meta-data from multimedia components at various layers of abstractions. While the lower layers handle physical file attributes and low-level features, the upper layers handle high-level features and attempts to remove ambiguities inherent in them. To access the various abstracted features, a query schema is presented, which provides a single point of access while establishing hierarchical pathways between feature-classes. Minimization of the semantic gap is addressed using the concept of “visual keyword” (VK). “Visual keywords” are segmented portions of images with associated low- and high-level features, implemented within a semantic layer on top of the standard low-level features layer, for characterizing semantic content in media components. Semantic information is however predominantly expressed in textual form, and hence is susceptible to the limitations of textual descriptors – viz. ambiguities related to synonyms, homonyms, hypernyms, and hyponyms. To handle such ambiguities, this chapter proposes a domain specific ontology-based layer on top of the semantic layer, to increase the effectiveness of the search process.


2018 ◽  
Vol 8 (12) ◽  
pp. 2367 ◽  
Author(s):  
Hongling Luo ◽  
Jun Sang ◽  
Weiqun Wu ◽  
Hong Xiang ◽  
Zhili Xiang ◽  
...  

In recent years, the trampling events due to overcrowding have occurred frequently, which leads to the demand for crowd counting under a high-density environment. At present, there are few studies on monitoring crowds in a large-scale crowded environment, while there exists technology drawbacks and a lack of mature systems. Aiming to solve the crowd counting problem with high-density under complex environments, a feature fusion-based deep convolutional neural network method FF-CNN (Feature Fusion of Convolutional Neural Network) was proposed in this paper. The proposed FF-CNN mapped the crowd image to its crowd density map, and then obtained the head count by integration. The geometry adaptive kernels were adopted to generate high-quality density maps which were used as ground truths for network training. The deconvolution technique was used to achieve the fusion of high-level and low-level features to get richer features, and two loss functions, i.e., density map loss and absolute count loss, were used for joint optimization. In order to increase the sample diversity, the original images were cropped with a random cropping method for each iteration. The experimental results of FF-CNN on the ShanghaiTech public dataset showed that the fusion of low-level and high-level features can extract richer features to improve the precision of density map estimation, and further improve the accuracy of crowd counting.


1960 ◽  
Vol 41 (6) ◽  
pp. 291-297 ◽  
Author(s):  
John H. Conover ◽  
James C. Sadler

Time-lapse films of the earth from high-flying ballistic missiles have provided the meteorologist with the first synoptic detailed coverage of cloud patterns over large areas. Analysis of the film obtained on 24 August 1959 shows the cloud patterns over an area corresponding to one-twentieth of the earth's total surface. Comparison of the rectified cloud positions with, the high- and low-level synoptic charts shows large-scale cloud patterns directly associated with high-level vortices and troughs as well as patterns associated with a quasi-stationary front and the intertropical convergence zone. Details suggesting low-level vortices, frontal waves, and a squall line appear, but they cannot be verified due to sparse surface observations. Other details, such as the effects of large and small islands, coastlines and rivers upon the pattern of vertical motion are indicated by the clouds.


2019 ◽  
Author(s):  
Michael B. Bone ◽  
Fahad Ahmad ◽  
Bradley R. Buchsbaum

AbstractWhen recalling an experience of the past, many of the component features of the original episode may be, to a greater or lesser extent, reconstructed in the mind’s eye. There is strong evidence that the pattern of neural activity that occurred during an initial perceptual experience is recreated during episodic recall (neural reactivation), and that the degree of reactivation is correlated with the subjective vividness of the memory. However, while we know that reactivation occurs during episodic recall, we have lacked a way of precisely characterizing the contents—in terms of its featural constituents—of a reactivated memory. Here we present a novel approach, feature-specific informational connectivity (FSIC), that leverages hierarchical representations of image stimuli derived from a deep convolutional neural network to decode neural reactivation in fMRI data collected while participants performed an episodic recall task. We show that neural reactivation associated with low-level visual features (e.g. edges), high-level visual features (e.g. facial features), and semantic features (e.g. “terrier”) occur throughout the dorsal and ventral visual streams and extend into the frontal cortex. Moreover, we show that reactivation of both low- and high-level visual features correlate with the vividness of the memory, whereas only reactivation of low-level features correlates with recognition accuracy when the lure and target images are semantically similar. In addition to demonstrating the utility of FSIC for mapping feature-specific reactivation, these findings resolve the relative contributions of low- and high-level features to the vividness of visual memories, clarify the role of the frontal cortex during episodic recall, and challenge a strict interpretation the posterior-to-anterior visual hierarchy.


2019 ◽  
Vol 11 (8) ◽  
pp. 922 ◽  
Author(s):  
Juli Zhang ◽  
Junyi Zhang ◽  
Tao Dai ◽  
Zhanzhuang He

Manually annotating remote sensing images is laborious work, especially on large-scale datasets. To improve the efficiency of this work, we propose an automatic annotation method for remote sensing images. The proposed method formulates the multi-label annotation task as a recommended problem, based on non-negative matrix tri-factorization (NMTF). The labels of remote sensing images can be recommended directly by recovering the image–label matrix. To learn more efficient latent feature matrices, two graph regularization terms are added to NMTF that explore the affiliated relationships on the image graph and label graph simultaneously. In order to reduce the gap between semantic concepts and visual content, both low-level visual features and high-level semantic features are exploited to construct the image graph. Meanwhile, label co-occurrence information is used to build the label graph, which discovers the semantic meaning to enhance the label prediction for unlabeled images. By employing the information from images and labels, the proposed method can efficiently deal with the sparsity and cold-start problem brought by limited image–label pairs. Experimental results on the UCMerced and Corel5k datasets show that our model outperforms most baseline algorithms for multi-label annotation of remote sensing images and performs efficiently on large-scale unlabeled datasets.


2019 ◽  
Author(s):  
Remington Mallett ◽  
Anurima Mummaneni ◽  
Jarrod Lewis-Peacock

Working memory persists in the face of distraction, yet not without consequence. Previous research has shown that memory for low-level visual features is systematically influenced by the maintenance or presentation of a similar distractor stimulus. Responses are frequently biased in stimulus space towards a perceptual distractor, though this has yet to be determined for high-level stimuli. We investigated whether these influences are shared for complex visual stimuli such as faces. To quantify response accuracies for these stimuli, we used a delayed-estimation task with a computer-generated “face space” consisting of eighty faces that varied continuously as a function of age and sex. In a set of three experiments, we found that responses for a target face held in working memory were biased towards a distractor face presented during the maintenance period. The amount of response bias did not vary as a function of distance between target and distractor. Our data suggest that, similar to low-level visual features, high-level face representations in working memory are biased by the processing of related but task-irrelevant information.


Sign in / Sign up

Export Citation Format

Share Document