scene information
Recently Published Documents


TOTAL DOCUMENTS

110
(FIVE YEARS 41)

H-INDEX

9
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Xenia Grande ◽  
Magdalena Sauvage ◽  
Andreas Becke ◽  
Emrah Duzel ◽  
David Berron

Cortical processing streams for item and contextual information come together in the entorhinal-hippocampal circuitry. Various evidence suggest that information-specific pathways organize the cortical — entorhinal interaction and the circuitry's inner communication along the transversal axis. Here, we leveraged ultra-high field functional imaging and advance Maass, Berron et al. (2015) who report two functional routes segregating the entorhinal cortex (EC) and subiculum. Our data show specific scene processing in the functionally connected posterior-medial EC and distal subiculum. The regions of another route, that connects the anterior-lateral EC and a newly identified retrosplenial-based anterior-medial EC subregion with the CA1/subiculum border, process object and scene information similarly. Our results support topographical information flow in human entorhinal-hippocampal subregions with organized convergence of cortical processing streams and a unique route for contextual information. They characterize the functional organization of the circuitry and underpin its central role in memory function and pathological decline.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Shaosong Dou ◽  
Zhiquan Feng ◽  
Jinglan Tian ◽  
Xue Fan ◽  
Ya Hou ◽  
...  

This paper proposes an intention understanding algorithm (KDI) based on an elderly service robot, which combines Neural Network with a seminaive Bayesian classifier to infer user’s intention. KDI algorithm uses CNN to analyze gesture and action information, and YOLOV3 is used for object detection to provide scene information. Then, we enter them into a seminaive Bayesian classifier and set key properties as super parent to enhance its contribution to an intent, realizing intention understanding based on prior knowledge. In addition, we introduce the actual distance between the users and objects and give each object a different purpose to implement intent understanding based on object-user distance. The two methods are combined to enhance the intention understanding. The main contributions of this paper are as follows: (1) an intention reasoning model (KDI) is proposed based on prior knowledge and distance, which combines Neural Network with seminaive Bayesian classifier. (2) A set of robot accompanying systems based on the robot is formed, which is applied in the elderly service scene.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Hui Qian ◽  
Mengxuan Dai ◽  
Yong Ma ◽  
Jiale Zhao ◽  
Qinghua Liu ◽  
...  

Video situational information detection is widely used in the fields of video query, character anomaly detection, surveillance analysis, and so on. However, most of the existing researches pay much attention to the subject or video backgrounds, but little attention to the recognition of situational information. What is more, because there is no strong relation between the pixel information and the scene information of video data, it is difficult for computers to obtain corresponding high-level scene information through the low-level pixel information of video data. Video scene information detection is mainly to detect and analyze the multiple features in the video and mark the scenes in the video. It is aimed at automatically extracting video scene information from all kinds of original video data and realizing the recognition of scene information through “comprehensive consideration of pixel information and spatiotemporal continuity.” In order to solve the problem of transforming pixel information into scene information, this paper proposes a video scene information detection method based on entity recognition. This model integrates the spatiotemporal relationship between the video subject and object on the basis of entity recognition, so as to realize the recognition of scene information by establishing mapping relation. The effectiveness and accuracy of the model are verified by simulation experiments with the TV series as experimental data. The accuracy of this model in the simulation experiment can reach more than 85%.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Bin Li ◽  
Ting Zhang

In order to obtain the scene information of the ordinary football game more comprehensively, an algorithm of collecting the scene information of the ordinary football game based on web documents is proposed. The commonly used T-graph web crawler model is used to collect the sample nodes of a specific topic in the football game scene information and then collect the edge document information of the football game scene information topic after the crawling stage of the web crawler. Using the feature item extraction algorithm of semantic analysis, according to the similarity of the feature items, the feature items of the football game scene information are extracted to form a web document. By constructing a complex network and introducing the local contribution and overlap coefficient of the community discovery feature selection algorithm, the features of the web document are selected to realize the collection of football game scene information. Experimental results show that the algorithm has high topic collection capabilities and low computational cost, the average accuracy of equilibrium is always around 98%, and it has strong quantification capabilities for web crawlers and communities.


2021 ◽  
Author(s):  
Siyuan Song ◽  
Brecht Desplanques ◽  
Celest De Moor ◽  
Kris Demuynck ◽  
Nilesh Madhu

We present an iVector based Acoustic Scene Clas-sification (ASC) system suited for real life settings where activeforeground speech can be present. In the proposed system, eachrecording is represented by a fixed-length iVector that modelsthe recording’s important properties. A regularized Gaussianbackend classifier with class-specific covariance models is usedto extract the relevant acoustic scene information from theseiVectors. To alleviate the large performance degradation when aforeground speaker dominates the captured signal, we investigatethe use of the iVector framework on Mel-Frequency CepstralCoefficients (MFCCs) that are derived from an estimate of thenoise power spectral density. This noise-floor can be extracted in astatistical manner for single channel recordings. We show that theuse of noise-floor features is complementary to multi-conditiontraining in which foreground speech is added to training signalto reduce the mismatch between training and testing conditions.Experimental results on the DCASE 2016 Task 1 dataset showthat the noise-floor based features and multi-condition trainingrealize significant classification accuracy gains of up to more than25 percentage points (absolute) in the most adverse conditions.These promising results can further facilitate the integration ofASC in resource-constrained devices such as hearables.


2021 ◽  
Author(s):  
Yuzhong Wu ◽  
Tan Lee

Acoustic scene classification (ASC) aims to identify the type of scene (environment) in which a given audio signal is recorded. The log-mel feature and convolutional neural network (CNN) have recently become the most popular time-frequency (TF) feature representation and classifier in ASC. An audio signal recorded in a scene may include various sounds overlapping in time and frequency. The previous study suggests that separately considering the long-duration sounds and short-duration sounds in CNN may improve ASC accuracy. This study addresses the problem of the generalization ability of acoustic scene classifiers. In practice, acoustic scene signals' characteristics may be affected by various factors, such as the choice of recording devices and the change of recording locations. When an established ASC system predicts scene classes on audios recorded in unseen scenarios, its accuracy may drop significantly. The long-duration sounds not only contain domain-independent acoustic scene information, but also contain channel information determined by the recording conditions, which is prone to over-fitting. For a more robust ASC system, We propose a robust feature learning (RFL) framework to train the CNN. The RFL framework down-weights CNN learning specifically on long-duration sounds. The proposed method is to train an auxiliary classifier with only long-duration sound information as input. The auxiliary classifier is trained with an auxiliary loss function that assigns less learning weight to poorly classified examples than the standard cross-entropy loss. The experimental results show that the proposed RFL framework can obtain a more robust acoustic scene classifier towards unseen devices and cities.


Author(s):  
Sarah Morris ◽  
Ari Goldman ◽  
Brian Thurow

Time of Flight (ToF) cameras are a type of range-imaging camera that provides three-dimensional scene information from a single camera. This paper assesses the ability of ToF technology to be used for threedimensional particle tracking velocimetry (3D-PTV). Using a commercially available ToF camera various aspects of 3D-PTV are considered, including: minimum resolvable particle size, environmental factors (reflections and refractive index changes) and time resolution. Although it is found that an off-the-shelf ToF camera is not a viable alternative to traditional 3D-PTV measurement systems, basic 3D-PTV measurements are shown with large (6mm) particles in both air and water to demonstrate future potential use as this technology develops. A summary of necessary technological advances is also discussed.


2021 ◽  
Vol 13 (13) ◽  
pp. 2640
Author(s):  
Jake J. Gristey ◽  
Wenying Su ◽  
Norman G. Loeb ◽  
Thomas H. Vonder Haar ◽  
Florian Tornow ◽  
...  

Observing the Earth radiation budget (ERB) from satellites is crucial for monitoring and understanding Earth’s climate. One of the major challenges for ERB observations, particularly for reflected shortwave radiation, is the conversion of the measured radiance to the more energetically relevant quantity of radiative flux, or irradiance. This conversion depends on the solar-viewing geometry and the scene composition associated with each instantaneous observation. We first outline the theoretical basis for algorithms to convert shortwave radiance to irradiance, most commonly known as empirical angular distribution models (ADMs). We then review the progression from early ERB satellite observations that applied relatively simple ADMs, to current ERB satellite observations that apply highly sophisticated ADMs. A notable development is the dramatic increase in the number of scene types, made possible by both the extended observational record and the enhanced scene information now available from collocated imager information. Compared with their predecessors, current shortwave ADMs result in a more consistent average albedo as a function of viewing zenith angle and lead to more accurate instantaneous and mean regional irradiance estimates. One implication of the increased complexity is that the algorithms may not be directly applicable to observations with insufficient accompanying imager information, or for existing or new satellite instruments where detailed scene information is not available. Recent advances that complement and build on the base of current approaches, including machine learning applications and semi-physical calculations, are highlighted.


2021 ◽  
Vol 38 (3) ◽  
pp. 607-617
Author(s):  
Sumanth Kumar Panguluri ◽  
Laavanya Mohan

Nowadays multimodal image fusion has been majorly utilized as an important processing tool in various image related applications. For capturing useful information different sensors have been developed. Mainly such sensors are infrared (IR) image sensor and visible (VI) image sensor. Fusing both these sensors provides better and accurate scene information. The major application areas where this fused image has been mostly used are military, surveillance, and remote sensing. For better identification of targets and to understand overall scene information, the fused image has to provide better contrast and more edge information. This paper introduces a novel multimodal image fusion method mainly for improving contrast and as well as edge information. Primary step of this algorithm is to resize source images. The 3×3 sharpen filter and morphology hat transform are applied separately on resized IR image and VI image. DWT transform has been used to produce "low-frequency" and "high-frequency" sub-bands. "Filters based mean-weighted fusion rule" and "Filters based max-weighted fusion rule" are newly introduced in this algorithm for combining "low-frequency" sub-bands and "high-frequency" sub-bands respectively. Fused image reconstruction is done with IDWT. Proposed method has outperformed and shown improved results in subjective manner and objectively than similar existing techniques.


2021 ◽  
Vol 13 (11) ◽  
pp. 2038
Author(s):  
Linbo Qing ◽  
Lindong Li ◽  
Yuchen Wang ◽  
Yongqiang Cheng ◽  
Yonghong Peng

People’s interactions with each other form the social relations in society. Understanding human social relations in the public space is of great importance for supporting the public administrations. Recognizing social relations through visual data captured by remote sensing cameras is one of the most efficient ways to observe human interactions in a public space. Generally speaking, persons in the same scene tend to know each other, and the relations between person pairs are strongly correlated. The scene information in which people interact is also one of the important cues for social relation recognition (SRR). The existing works have not explored the correlations between the scene information and people’s interactions. The scene information has only been extracted on a simple level and high level semantic features to support social relation understanding are lacking. To address this issue, we propose a social relation structure-aware local–global model for SRR to exploit the high-level semantic global information of the scene where the social relation structure is explored. In our proposed model, the graph neural networks (GNNs) are employed to reason through the interactions (local information) between social relations and the global contextual information contained in the constructed scene-relation graph. Experiments demonstrate that our proposed local–global information-reasoned social relation recognition model (SRR-LGR) can reason through the local–global information. Further, the results of the final model show that our method outperforms the state-of-the-art methods. In addition, we have further discussed whether the global information contributes equally to different social relations in the same scene, by exploiting an attention mechanism in our proposed model. Further applications of SRR for human-observation are also exploited.


Sign in / Sign up

Export Citation Format

Share Document