visual features
Recently Published Documents


TOTAL DOCUMENTS

1611
(FIVE YEARS 523)

H-INDEX

54
(FIVE YEARS 10)

In the modern context, interior design has inevitably become a part of social culture. All kinds of modeling, decoration and furnishings in modern interior space show people's pursuit and desire for a better life. These different styles of modern interior design rely on science and technology, utilize culture and art as the connotation. Its development often reflects the cultural spirit of a nation. The aesthetic evaluation plays an important role in the modern interior design. With development of derivative digital devices, a large number of digital images have been emerged. The rapid development of computer vision and artificial intelligence makes aesthetic evaluation for interior design become automatic. This paper implements an intelligent aesthetic evaluation of interior design framework to help people choose the appropriate and effective interior design from collected images or mobile digital devices.


PLoS ONE ◽  
2022 ◽  
Vol 17 (1) ◽  
pp. e0258832
Author(s):  
Jonathan C. Flavell ◽  
Harriet Over ◽  
Tim Vestner ◽  
Richard Cook ◽  
Steven P. Tipper

Using visual search displays of interacting and non-interacting pairs, it has been demonstrated that detection of social interactions is facilitated. For example, two people facing each other are found faster than two people with their backs turned: an effect that may reflect social binding. However, recent work has shown the same effects with non-social arrow stimuli, where towards facing arrows are detected faster than away facing arrows. This latter work suggests a primary mechanism is an attention orienting process driven by basic low-level direction cues. However, evidence for lower level attentional processes does not preclude a potential additional role of higher-level social processes. Therefore, in this series of experiments we test this idea further by directly comparing basic visual features that orient attention with representations of socially interacting individuals. Results confirm the potency of orienting of attention via low-level visual features in the detection of interacting objects. In contrast, there is little evidence for the representation of social interactions influencing initial search performance.


2022 ◽  
Vol 14 (1) ◽  
pp. 27
Author(s):  
Junda Li ◽  
Chunxu Zhang ◽  
Bo Yang

Current two-stage object detectors extract the local visual features of Regions of Interest (RoIs) for object recognition and bounding-box regression. However, only using local visual features will lose global contextual dependencies, which are helpful to recognize objects with featureless appearances and restrain false detections. To tackle the problem, a simple framework, named Global Contextual Dependency Network (GCDN), is presented to enhance the classification ability of two-stage detectors. Our GCDN mainly consists of two components, Context Representation Module (CRM) and Context Dependency Module (CDM). Specifically, a CRM is proposed to construct multi-scale context representations. With CRM, contextual information can be fully explored at different scales. Moreover, the CDM is designed to capture global contextual dependencies. Our GCDN includes multiple CDMs. Each CDM utilizes local Region of Interest (RoI) features and single-scale context representation to generate single-scale contextual RoI features via the attention mechanism. Finally, the contextual RoI features generated by parallel CDMs independently are combined with the original RoI features to help classification. Experiments on MS-COCO 2017 benchmark dataset show that our approach brings continuous improvements for two-stage detectors.


2022 ◽  
Author(s):  
Lisa M Kroell ◽  
Martin Rolfs

Despite the fovea's singular importance for active human vision, the impact of large eye movements on foveal processing remains elusive. Building on findings from passive fixation tasks, we hypothesized that during the preparation of rapid eye movements (saccades), foveal processing anticipates soon-to-be fixated visual features. Using a dynamic large-field noise paradigm, we indeed demonstrate that sensitivity for defining features of a saccade target is enhanced in the pre-saccadic center of gaze. Enhancement manifested in higher Hit Rates for foveal probes with target-congruent orientation, and a sensitization to incidental, target-like orientation information in foveally presented noise. Enhancement was spatially confined to the center of gaze and its immediate vicinity. We suggest a crucial contribution of foveal processing to trans-saccadic visual continuity which has previously been overlooked: Foveal processing of saccade targets commences before the movement is executed and thereby enables a seamless transition once the center of gaze reaches the target.


Author(s):  
Wei Li ◽  
Haiyu Song ◽  
Hongda Zhang ◽  
Houjie Li ◽  
Pengjie Wang

The ever-increasing size of images has made automatic image annotation one of the most important tasks in the fields of machine learning and computer vision. Despite continuous efforts in inventing new annotation algorithms and new models, results of the state-of-the-art image annotation methods are often unsatisfactory. In this paper, to further improve annotation refinement performance, a novel approach based on weighted mutual information to automatically refine the original annotations of images is proposed. Unlike the traditional refinement model using only visual feature, the proposed model use semantic embedding to properly map labels and visual features to a meaningful semantic space. To accurately measure the relevance between the particular image and its original annotations, the proposed model utilize all available information including image-to-image, label-to-label and image-to-label. Experimental results conducted on three typical datasets show not only the validity of the refinement, but also the superiority of the proposed algorithm over existing ones. The improvement largely benefits from our proposed mutual information method and utilizing all available information.


2022 ◽  
Vol 2022 ◽  
pp. 1-8
Author(s):  
Xiaoyue Cui

Aiming at the problems of low image data retrieval accuracy and slow retrieval speed in the existing image database retrieval algorithms, this paper designs a clothing image database retrieval algorithm based on wavelet transform. Firstly, it represents the color consistency vector of clothing image, reflects the composition and distribution of image color through color histogram, quantifies the visual features of clothing image, aggregates them into a fixed size representation vector, and uses the Fair Value (FV) model to complete the collection of clothing image data. Then, the size of the clothing image is adjusted by using the size transformation technology, and the clothing pattern is divided into four moments with the same size. On this basis, the clothing image is discretized with the help of Hu invariant moment to complete the preprocessing of clothing image data. Finally, the generating function of wavelet transform is determined, and a cluster of functions is obtained through translation and expansion. The wavelet filter is decomposed into basic modules, and then, the wavelet transform is studied step by step. The clothing image data are regarded as a signal, split, predicted, and updated and input into the wavelet model, and the retrieval research of clothing image database is completed. The experimental results show that the design of the retrieval algorithm is reasonable, the retrieval data accuracy is high, and the retrieval speed is fast.


2022 ◽  
Author(s):  
Laurent Caplette ◽  
Nicholas Turk-Browne

Revealing the contents of mental representations is a longstanding goal of cognitive science. However, there is currently no general framework for providing direct access to representations of high-level visual concepts. We asked participants to indicate what they perceived in images synthesized from random visual features in a deep neural network. We then inferred a mapping between the semantic features of their responses and the visual features of the images. This allowed us to reconstruct the mental representation of virtually any common visual concept, both those reported and others extrapolated from the same semantic space. We successfully validated 270 of these reconstructions as containing the target concept in a separate group of participants. The visual-semantic mapping uncovered with our method further generalized to new stimuli, participants, and tasks. Finally, it allowed us to reveal how the representations of individual observers differ from each other and from those of neural networks.


2022 ◽  
Author(s):  
Akshay Vivek Jagadeesh ◽  
Justin Gardner

The human visual ability to recognize objects and scenes is widely thought to rely on representations in category-selective regions of visual cortex. These representations could support object vision by specifically representing objects, or, more simply, by representing complex visual features regardless of the particular spatial arrangement needed to constitute real world objects. That is, by representing visual textures. To discriminate between these hypotheses, we leveraged an image synthesis approach that, unlike previous methods, provides independent control over the complexity and spatial arrangement of visual features. We found that human observers could easily detect a natural object among synthetic images with similar complex features that were spatially scrambled. However, observer models built from BOLD responses from category-selective regions, as well as a model of macaque inferotemporal cortex and Imagenet-trained deep convolutional neural networks, were all unable to identify the real object. This inability was not due to a lack of signal-to-noise, as all of these observer models could predict human performance in image categorization tasks. How then might these texture-like representations in category-selective regions support object perception? An image-specific readout from category-selective cortex yielded a representation that was more selective for natural feature arrangement, showing that the information necessary for object discrimination is available. Thus, our results suggest that the role of human category-selective visual cortex is not to explicitly encode objects but rather to provide a basis set of texture-like features that can be infinitely reconfigured to flexibly learn and identify new object categories.


2022 ◽  
Vol 2022 ◽  
pp. 1-9
Author(s):  
Junlong Feng ◽  
Jianping Zhao

Recent image captioning models based on the encoder-decoder framework have achieved remarkable success in humanlike sentence generation. However, an explicit separation between encoder and decoder brings out a disconnection between the image and sentence. It usually leads to a rough image description: the generated caption only contains main instances but neglects additional objects and scenes unexpectedly, which reduces the caption consistency of the image. To address this issue, we proposed an image captioning system within context-fused guidance in this paper. It incorporates regional and global image representation as the compositional visual features to learn the objects and attributes in images. To integrate image-level semantic information, the visual concept is employed. To avoid misleading decoding, a context fusion gate is introduced to calculate the textual context by selectively aggregating the information of visual concept and word embedding. Subsequently, the context-fused image guidance is formulated based on the compositional visual features and textual context. It provides the decoder with informative semantic knowledge. Finally, a captioner with a two-layer LSTM architecture is constructed to generate captions. Moreover, to overcome the exposure bias, we train the proposed model through sequence decision-making. The experiments conducted on the MS COCO dataset show the outstanding performance of our work. The linguistic analysis demonstrates that our model improves the caption consistency of the image.


2022 ◽  
Author(s):  
Jun Kai Ho ◽  
Tomoyasu Horikawa ◽  
Kei Majima ◽  
Yukiyasu Kamitani

The sensory cortex is characterized by general organizational principles such as topography and hierarchy. However, measured brain activity given identical input exhibits substantially different patterns across individuals. While anatomical and functional alignment methods have been proposed in functional magnetic resonance imaging (fMRI) studies, it remains unclear whether and how hierarchical and fine-grained representations can be converted between individuals while preserving the encoded perceptual contents. In this study, we evaluated machine learning models called neural code converters that predict one's brain activity pattern (target) from another's (source) given the same stimulus by the decoding of hierarchical visual features and the reconstruction of perceived images. The training data for converters consisted of fMRI data obtained with identical sets of natural images presented to pairs of individuals. Converters were trained using the whole visual cortical voxels from V1 through the ventral object areas, without explicit labels of visual areas. We decoded the converted brain activity patterns into hierarchical visual features of a deep neural network (DNN) using decoders pre-trained on the target brain and then reconstructed images via the decoded features. Without explicit information about visual cortical hierarchy, the converters automatically learned the correspondence between the visual areas of the same levels. DNN feature decoding at each layer showed higher decoding accuracies from corresponding levels of visual areas, indicating that hierarchical representations were preserved after conversion. The viewed images were faithfully reconstructed with recognizable silhouettes of objects even with relatively small amounts of data for converter training. The conversion also allows pooling data across multiple individuals, leading to stably high reconstruction accuracy compared to those converted between individuals. These results demonstrate that the conversion learns hierarchical correspondence and preserves the fine-grained representations of visual features, enabling visual image reconstruction using decoders trained on other individuals.


Sign in / Sign up

Export Citation Format

Share Document