scholarly journals Intermediate, Wholistic Shape Representation in Object Recognition: A Pre-Attentive Stage of Processing?

2021 ◽  
Vol 15 ◽  
Author(s):  
Jarrod Hollis ◽  
Glyn W. Humphreys ◽  
Peter M. Allen

Evidence is presented for intermediate, wholistic visual representations of objects and non-objects that are computed online and independent of visual attention. Short-term visual priming was examined between visually similar shapes, with targets either falling at the (valid) location cued by primes or at another (invalid) location. Object decision latencies were facilitated when the overall shapes of the stimuli were similar irrespective of whether the location of the prime was valid or invalid, with the effects being equally large for object and non-object targets. In addition, the effects were based on the overall outlines of the stimuli and low spatial frequency components, not on local parts. In conclusion, wholistic shape representations based on outline form, are rapidly computed online during object recognition. Moreover, activation of common wholistic shape representations prime the processing of subsequent objects and non-objects irrespective of whether they appear at attended or unattended locations. Rapid derivation of wholistic form provides a key intermediate stage of object recognition.

2007 ◽  
Vol 98 (3) ◽  
pp. 1733-1750 ◽  
Author(s):  
Charles Cadieu ◽  
Minjoon Kouh ◽  
Anitha Pasupathy ◽  
Charles E. Connor ◽  
Maximilian Riesenhuber ◽  
...  

Object recognition in primates is mediated by the ventral visual pathway and is classically described as a feedforward hierarchy of increasingly sophisticated representations. Neurons in macaque monkey area V4, an intermediate stage along the ventral pathway, have been shown to exhibit selectivity to complex boundary conformation and invariance to spatial translation. How could such a representation be derived from the signals in lower visual areas such as V1? We show that a quantitative model of hierarchical processing, which is part of a larger model of object recognition in the ventral pathway, provides a plausible mechanism for the translation-invariant shape representation observed in area V4. Simulated model neurons successfully reproduce V4 selectivity and invariance through a nonlinear, translation-invariant combination of locally selective subunits, suggesting that a similar transformation may occur or culminate in area V4. Specifically, this mechanism models the selectivity of individual V4 neurons to boundary conformation stimuli, exhibits the same degree of translation invariance observed in V4, and produces observed V4 population responses to bars and non-Cartesian gratings. This work provides a quantitative model of the widely described shape selectivity and invariance properties of area V4 and points toward a possible canonical mechanism operating throughout the ventral pathway.


1998 ◽  
Vol 53 (7-8) ◽  
pp. 610-621 ◽  
Author(s):  
Jeffrey C. Liter ◽  
Heinrich H. Bülthoff

Abstract In this report we present a general introduction to object recognition. We begin with brief discussions of the terminology used in the object recognition literature and the psychophysi­ cal tasks that are used to investigate object recognition. We then discuss models of shape representation. We dispense with the idea that shape representations are like the 3-D models used in computer aided design and explore instead models of shape representation that are based on feature descriptions. As these descriptions encode only the features that are visible from a particular viewpoint, they are generally viewpoint-specific. We discuss various means of achieving viewpoint-invariant recognition using such descriptions, including reliance on diagnostic features visible from a wide range of viewpoints, storage of multiple descriptions for each object, and the use of transformation mechanisms. Finally, we discuss how differ­ences in viewpoint dependence that are often observed for within-category and between-category recognition tasks could be due to differences in the types of features that are natu­rally available to distinguish among different objects in these tasks.


Author(s):  
Dan Guo ◽  
Shengeng Tang ◽  
Meng Wang

Online sign interpretation suffers from challenges presented by hybrid semantics learning among sequential variations of visual representations, sign linguistics, and textual grammars. This paper proposes a Connectionist Temporal Modeling (CTM) network for sentence translation and sign labeling. To acquire short-term temporal correlations, a Temporal Convolution Pyramid (TCP) module is performed on 2D CNN features to realize (2D+1D)=pseudo 3D' CNN features. CTM aligns the pseudo 3D' with the original 3D CNN clip features and fuses them. Next, we implement a connectionist decoding scheme for long-term sequential learning. Here, we embed dynamic programming into the decoding scheme, which learns temporal mapping among features, sign labels, and the generated sentence directly. The solution using dynamic programming to sign labeling is considered as pseudo labels. Finally, we utilize the pseudo supervision cues in an end-to-end framework. A joint objective function is designed to measure feature correlation, entropy regularization on sign labeling, and probability maximization on sentence decoding. The experimental results using the RWTH-PHOENIX-Weather and USTC-CSL datasets demonstrate the effectiveness of the proposed approach.


2021 ◽  
Vol 9 (4) ◽  
pp. 399-420
Author(s):  
Weiguo Chen ◽  
Shufen Zhou ◽  
Yin Zhang ◽  
Yi Sun

Abstract According to behavioral finance theory, investor sentiment generally exists in investors’ trading activities and influences financial market. In order to investigate the interaction between investor sentiment and stock market as well as financial industry, this study decomposed investor sentiment, stock price index and SWS index of financial industry into IMF components at different scales by using BEMD algorithm. Moreover, the fluctuation characteristics of time series at different time scales were extracted, and the IMF components were reconstructed into short-term high-frequency components, medium-term important event low-frequency components and long-term trend components. The short-term interaction between investor sentiment and Shanghai Composite Index, Shenzhen Component Index and financial industries represented by SWS index was investigated based on the spillover index. The time difference correlation coefficient was employed to determine the medium-term and long-term correlation among variables. Results demonstrate that investor sentiment has a strong correlation with Shanghai Composite Index, Shenzhen Component Index and different financial industries represented by SWS index at the original scale, and the change of investor sentiment is mainly influenced by external market information. The interaction between most markets at the short-term scale is weaker than that at the original scale. Investor sentiment is more significantly correlated with SWS Bond, SWS Diversified Finance and Shanghai Composite Index at the long-term scale than that at the medium-term scale.


2020 ◽  
Vol 13 (2) ◽  
pp. 72-89
Author(s):  
D.S. Alekseeva ◽  
V.V. Babenko ◽  
D.V. Yavna

Visual perceptual representations are formed from the results of processing the input image in parallel pathways with different spatial-frequency tunings. It is known that these representations are created gradually, starting from low spatial frequencies. However, the order of information transfer from the perceptual representation to short-term memory has not yet been determined. The purpose of our study is to determine the principle of entering information of different spatial frequencies in the short-term memory. We used the task of unfamiliar faces matching. Digitized photographs of faces were filtered by six filters with a frequency tuning step of 1 octave. These filters reproduced the spatial-frequency characteristics of the human visual pathways. In the experiment, the target face was shown first. Its duration was variable and limited by a mask. Then four test faces were presented. Their presentation was not limited in time. The observer had to determine the face that corresponds to the target one. The dependence of the accuracy of the solution of the task on the target face duration for different ranges of spatial frequencies was determined. When the target stimuli were unfiltered (broadband) faces, the filtered faces were the test ones, and vice versa. It was found that the short-term memory gets information about an unfamiliar face in a certain order, starting from the medium spatial frequencies, and this sequence does not depend on the processing method (holistic or featural).


Sign in / Sign up

Export Citation Format

Share Document