Real world scene analysis in perspective

1975 ◽  
Author(s):  
Bruce L. Bullock
Keyword(s):  
2010 ◽  
Vol 3 (3) ◽  
Author(s):  
Hsueh-Cheng Wang ◽  
Alex D. Hwang ◽  
Marc Pomplun

During text reading, the durations of eye fixations decrease with greater frequency and predictability of the currently fixated word (Rayner, 1998; 2009). However, it has not been tested whether those results also apply to scene viewing. We computed object frequency and predictability from both linguistic and visual scene analysis (LabelMe, Russell et al., 2008), and Latent Semantic Analysis (Landauer et al., 1998) was applied to estimate predictability. In a scene-viewing experiment, we found that, for small objects, linguistics-based frequency, but not scene-based frequency, had effects on first fixation duration, gaze duration, and total time. Both linguistic and scene-based predictability affected total time. Similar to reading, fixation duration decreased with higher frequency and predictability. For large objects, we found the direction of effects to be the inverse of those found in reading studies. These results suggest that the recognition of small objects in scene viewing shares some characteristics with the recognition of words in reading.


Robotica ◽  
1992 ◽  
Vol 10 (5) ◽  
pp. 389-396 ◽  
Author(s):  
R. A. Jarvis

SUMMARYThis paper argues the case for extracting as complete a set of sensory data as practicable from scenes consisting of complex assemblages of objects with the goal of completing the task of scene analysis, including placement, pose, identity and relationship amongst the components in a robust manner which supports goal directed robotic action, including collision-free trajectory planning, grip site location and manipulation of selected object classes.The emphasis of the paper is that of sensor fusion of range and surface colour data including preliminary results in proximity, surface normal directionality and colour based scene segmentation through semantic-free clustering processes. The larger context is that of imbedding the results of such analysis in a graphics world containing an articulated robotic manipulator and of carrying out experiments in that world prior to replication of safe manipulation sequences in the real world.


2011 ◽  
Vol 17 (2) ◽  
Author(s):  
Jennifer Iverson

Charles’s Ives’s collages, such as “Putnam’s Camp,”The Fourth of July, and selected movements of theFourth Symphony, present listeners with extraordinarily complex sound environments. This article uses Albert Bregman’sAuditory Scene Analysisas a source for methodology to analyze how listeners may parse and organize the chaotic surface of a musical collage. Since scene analysis problems in Ives’s collages often mimic real-world environments, Ives creates music that seems “spatial” or “pictorial” as a result. Finally, the article compares and contrasts the perception of space in Ives’s musical collages with their historical parallel in visual art, Cubist collage.


2021 ◽  
Author(s):  
Daniel Kaiser ◽  
Radoslaw M. Cichy

During natural vision, our brains are constantly exposed to complex, but regularly structured environments. Real-world scenes are defined by typical part-whole relationships, where the meaning of the whole scene emerges from configurations of localized information present in individual parts of the scene. Such typical part-whole relationships suggest that information from individual scene parts is not processed independently, but that there are mutual influences between the parts and the whole during scene analysis. Here, we review recent research that used a straightforward, but effective approach to study such mutual influences: by dissecting scenes into multiple arbitrary pieces, these studies provide new insights into how the processing of whole scenes is shaped by their consistent parts and, conversely, how the processing of individual parts is determined by their role within the whole scene. We highlight three facets of this research: First, we discuss studies demonstrating that the spatial configuration of multiple scene parts has a profound impact on the neural processing of the whole scene. Second, we review work showing that cortical responses to individual scene parts are shaped by the context in which these parts typically appear within the environment. Third, we discuss studies demonstrating that missing scene parts are interpolated from the surrounding scene context. Bridging these findings, we argue that efficient scene processing relies on an active use of the scene’s part-whole structure, where the visual brain matches scene inputs with internal models of what the world should look like.


Electronics ◽  
2021 ◽  
Vol 10 (20) ◽  
pp. 2527
Author(s):  
Minji Jung ◽  
Heekyung Yang ◽  
Kyungha Min

The advancement and popularity of computer games make game scene analysis one of the most interesting research topics in the computer vision society. Among the various computer vision techniques, we employ object detection algorithms for the analysis, since they can both recognize and localize objects in a scene. However, applying the existing object detection algorithms for analyzing game scenes does not guarantee a desired performance, since the algorithms are trained using datasets collected from the real world. In order to achieve a desired performance for analyzing game scenes, we built a dataset by collecting game scenes and retrained the object detection algorithms pre-trained with the datasets from the real world. We selected five object detection algorithms, namely YOLOv3, Faster R-CNN, SSD, FPN and EfficientDet, and eight games from various game genres including first-person shooting, role-playing, sports, and driving. PascalVOC and MS COCO were employed for the pre-training of the object detection algorithms. We proved the improvement in the performance that comes from our strategy in two aspects: recognition and localization. The improvement in recognition performance was measured using mean average precision (mAP) and the improvement in localization using intersection over union (IoU).


2021 ◽  
pp. 1-12
Author(s):  
Daniel Kaiser ◽  
Radoslaw M. Cichy

Abstract During natural vision, our brains are constantly exposed to complex, but regularly structured environments. Real-world scenes are defined by typical part–whole relationships, where the meaning of the whole scene emerges from configurations of localized information present in individual parts of the scene. Such typical part–whole relationships suggest that information from individual scene parts is not processed independently, but that there are mutual influences between the parts and the whole during scene analysis. Here, we review recent research that used a straightforward, but effective approach to study such mutual influences: By dissecting scenes into multiple arbitrary pieces, these studies provide new insights into how the processing of whole scenes is shaped by their constituent parts and, conversely, how the processing of individual parts is determined by their role within the whole scene. We highlight three facets of this research: First, we discuss studies demonstrating that the spatial configuration of multiple scene parts has a profound impact on the neural processing of the whole scene. Second, we review work showing that cortical responses to individual scene parts are shaped by the context in which these parts typically appear within the environment. Third, we discuss studies demonstrating that missing scene parts are interpolated from the surrounding scene context. Bridging these findings, we argue that efficient scene processing relies on an active use of the scene's part–whole structure, where the visual brain matches scene inputs with internal models of what the world should look like.


Sign in / Sign up

Export Citation Format

Share Document