scholarly journals Computational modeling of human reasoning processes for interpretable visual knowledge: a case study with radiographers

2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Yu Li ◽  
Hongfei Cao ◽  
Carla M. Allen ◽  
Xin Wang ◽  
Sanda Erdelez ◽  
...  

AbstractVisual reasoning is critical in many complex visual tasks in medicine such as radiology or pathology. It is challenging to explicitly explain reasoning processes due to the dynamic nature of real-time human cognition. A deeper understanding of such reasoning processes is necessary for improving diagnostic accuracy and computational tools. Most computational analysis methods for visual attention utilize black-box algorithms which lack explainability and are therefore limited in understanding the visual reasoning processes. In this paper, we propose a computational method to quantify and dissect visual reasoning. The method characterizes spatial and temporal features and identifies common and contrast visual reasoning patterns to extract significant gaze activities. The visual reasoning patterns are explainable and can be compared among different groups to discover strategy differences. Experiments with radiographers of varied levels of expertise on 10 levels of visual tasks were conducted. Our empirical observations show that the method can capture the temporal and spatial features of human visual attention and distinguish expertise level. The extracted patterns are further examined and interpreted to showcase key differences between expertise levels in the visual reasoning processes. By revealing task-related reasoning processes, this method demonstrates potential for explaining human visual understanding.

Author(s):  
Rachel M. Brown ◽  
Erik Friedgen ◽  
Iring Koch

AbstractActions we perform every day generate perceivable outcomes with both spatial and temporal features. According to the ideomotor principle, we plan our actions by anticipating the outcomes, but this principle does not directly address how sequential movements are influenced by different outcomes. We examined how sequential action planning is influenced by the anticipation of temporal and spatial features of action outcomes. We further explored the influence of action sequence switching. Participants performed cued sequences of button presses that generated visual effects which were either spatially compatible or incompatible with the sequences, and the spatial effects appeared after a short or long delay. The sequence cues switched or repeated across trials, and the predictability of action sequence switches was varied across groups. The results showed a delay-anticipation effect for sequential action, whereby a shorter anticipated delay between action sequences and their outcomes speeded initiation and execution of the cued action sequences. Delay anticipation was increased by predictable action switching, but it was not strongly modified by the spatial compatibility of the action outcomes. The results extend previous demonstrations of delay anticipation to the context of sequential action. The temporal delay between actions and their outcomes appears to be retrieved for sequential planning and influences both the initiation and the execution of actions.


2021 ◽  
Vol 11 (3) ◽  
pp. 1327
Author(s):  
Rui Zhang ◽  
Zhendong Yin ◽  
Zhilu Wu ◽  
Siyang Zhou

Automatic Modulation Classification (AMC) is of paramount importance in wireless communication systems. Existing methods usually adopt a single category of neural network or stack different categories of networks in series, and rarely extract different types of features simultaneously in a proper way. When it comes to the output layer, softmax function is applied for classification to expand the inter-class distance. In this paper, we propose a hybrid parallel network for the AMC problem. Our proposed method designs a hybrid parallel structure which utilizes Convolution Neural Network (CNN) and Gate Rate Unit (GRU) to extract spatial features and temporal features respectively. Instead of superposing these two categories of features directly, three different attention mechanisms are applied to assign weights for different types of features. Finally, a cosine similarity metric named Additive Margin softmax function, which can expand the inter-class distance and compress the intra-class distance simultaneously, is adopted for output. Simulation results demonstrate that the proposed method can achieve remarkable performance on an open access dataset.


2021 ◽  
pp. 1-12
Author(s):  
Omid Izadi Ghafarokhi ◽  
Mazda Moattari ◽  
Ahmad Forouzantabar

With the development of the wide-area monitoring system (WAMS), power system operators are capable of providing an accurate and fast estimation of time-varying load parameters. This study proposes a spatial-temporal deep network-based new attention concept to capture the dynamic and static patterns of electrical load consumption through modeling complicated and non-stationary interdependencies between time sequences. The designed deep attention-based network benefits from long short-term memory (LSTM) based component to learning temporal features in time and frequency-domains as encoder-decoder based recurrent neural network. Furthermore, to inherently learn spatial features, a convolutional neural network (CNN) based attention mechanism is developed. Besides, this paper develops a loss function based on a pseudo-Huber concept to enhance the robustness of the proposed network in noisy conditions as well as improve the training performance. The simulation results on IEEE 68-bus demonstrates the effectiveness and superiority of the proposed network through comparison with several previously presented and state-of-the-art methods.


Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 2003 ◽  
Author(s):  
Xiaoliang Zhu ◽  
Shihao Ye ◽  
Liang Zhao ◽  
Zhicheng Dai

As a sub-challenge of EmotiW (the Emotion Recognition in the Wild challenge), how to improve performance on the AFEW (Acted Facial Expressions in the wild) dataset is a popular benchmark for emotion recognition tasks with various constraints, including uneven illumination, head deflection, and facial posture. In this paper, we propose a convenient facial expression recognition cascade network comprising spatial feature extraction, hybrid attention, and temporal feature extraction. First, in a video sequence, faces in each frame are detected, and the corresponding face ROI (range of interest) is extracted to obtain the face images. Then, the face images in each frame are aligned based on the position information of the facial feature points in the images. Second, the aligned face images are input to the residual neural network to extract the spatial features of facial expressions corresponding to the face images. The spatial features are input to the hybrid attention module to obtain the fusion features of facial expressions. Finally, the fusion features are input in the gate control loop unit to extract the temporal features of facial expressions. The temporal features are input to the fully connected layer to classify and recognize facial expressions. Experiments using the CK+ (the extended Cohn Kanade), Oulu-CASIA (Institute of Automation, Chinese Academy of Sciences) and AFEW datasets obtained recognition accuracy rates of 98.46%, 87.31%, and 53.44%, respectively. This demonstrated that the proposed method achieves not only competitive performance comparable to state-of-the-art methods but also greater than 2% performance improvement on the AFEW dataset, proving the significant outperformance of facial expression recognition in the natural environment.


2021 ◽  
Author(s):  
◽  
Ibrahim Mohammad Hussain Rahman

<p>The human visual attention system (HVA) encompasses a set of interconnected neurological modules that are responsible for analyzing visual stimuli by attending to those regions that are salient. Two contrasting biological mechanisms exist in the HVA systems; bottom-up, data-driven attention and top-down, task-driven attention. The former is mostly responsible for low-level instinctive behaviors, while the latter is responsible for performing complex visual tasks such as target object detection.  Very few computational models have been proposed to model top-down attention, mainly due to three reasons. The first is that the functionality of top-down process involves many influential factors. The second reason is that there is a diversity in top-down responses from task to task. Finally, many biological aspects of the top-down process are not well understood yet.  For the above reasons, it is difficult to come up with a generalized top-down model that could be applied to all high level visual tasks. Instead, this thesis addresses some outstanding issues in modelling top-down attention for one particular task, target object detection. Target object detection is an essential step for analyzing images to further perform complex visual tasks. Target object detection has not been investigated thoroughly when modelling top-down saliency and hence, constitutes the may domain application for this thesis.  The thesis will investigate methods to model top-down attention through various high-level data acquired from images. Furthermore, the thesis will investigate different strategies to dynamically combine bottom-up and top-down processes to improve the detection accuracy, as well as the computational efficiency of the existing and new visual attention models. The following techniques and approaches are proposed to address the outstanding issues in modelling top-down saliency:  1. A top-down saliency model that weights low-level attentional features through contextual knowledge of a scene. The proposed model assigns weights to features of a novel image by extracting a contextual descriptor of the image. The contextual descriptor plays the role of tuning the weighting of low-level features to maximize detection accuracy. By incorporating context into the feature weighting mechanism we improve the quality of the assigned weights to these features.  2. Two modules of target features combined with contextual weighting to improve detection accuracy of the target object. In this proposed model, two sets of attentional feature weights are learned, one through context and the other through target features. When both sources of knowledge are used to model top-down attention, a drastic increase in detection accuracy is achieved in images with complex backgrounds and a variety of target objects.  3. A top-down and bottom-up attention combination model based on feature interaction. This model provides a dynamic way for combining both processes by formulating the problem as feature selection. The feature selection exploits the interaction between these features, yielding a robust set of features that would maximize both the detection accuracy and the overall efficiency of the system.  4. A feature map quality score estimation model that is able to accurately predict the detection accuracy score of any previously novel feature map without the need of groundtruth data. The model extracts various local, global, geometrical and statistical characteristic features from a feature map. These characteristics guide a regression model to estimate the quality of a novel map.  5. A dynamic feature integration framework for combining bottom-up and top-down saliencies at runtime. If the estimation model is able to predict the quality score of any novel feature map accurately, then it is possible to perform dynamic feature map integration based on the estimated value. We propose two frameworks for feature map integration using the estimation model. The proposed integration framework achieves higher human fixation prediction accuracy with minimum number of feature maps than that achieved by combining all feature maps.  The proposed works in this thesis provide new directions in modelling top-down saliency for target object detection. In addition, dynamic approaches for top-down and bottom-up combination show considerable improvements over existing approaches in both efficiency and accuracy.</p>


2022 ◽  
Vol 12 (1) ◽  
pp. 87
Author(s):  
Conrad Perry ◽  
Heidi Long

This critical review examined current issues to do with the role of visual attention in reading. To do this, we searched for and reviewed 18 recent articles, including all that were found after 2019 and used a Latin alphabet. Inspection of these articles showed that the Visual Attention Span task was run a number of times in well-controlled studies and was typically a small but significant predictor of reading ability, even after potential covariation with phonological effects were accounted for. A number of other types of tasks were used to examine different aspects of visual attention, with differences between dyslexic readers and controls typically found. However, most of these studies did not adequately control for phonological effects, and of those that did, only very weak and non-significant results were found. Furthermore, in the smaller studies, separate within-group correlations between the tasks and reading performance were generally not provided, making causal effects of the manipulations difficult to ascertain. Overall, it seems reasonable to suggest that understanding how and why different types of visual tasks affect particular aspects of reading performance is an important area for future research.


2021 ◽  
Vol 13 (16) ◽  
pp. 3338
Author(s):  
Xiao Xiao ◽  
Zhiling Jin ◽  
Yilong Hui ◽  
Yueshen Xu ◽  
Wei Shao

With the development of sensors and of the Internet of Things (IoT), smart cities can provide people with a variety of information for a more convenient life. Effective on-street parking availability prediction can improve parking efficiency and, at times, alleviate city congestion. Conventional methods of parking availability prediction often do not consider the spatial–temporal features of parking duration distributions. To this end, we propose a parking space prediction scheme called the hybrid spatial–temporal graph convolution networks (HST-GCNs). We use graph convolutional networks and gated linear units (GLUs) with a 1D convolutional neural network to obtain the spatial features and the temporal features, respectively. Then, we construct a spatial–temporal convolutional block to obtain the instantaneous spatial–temporal correlations. Based on the similarity of the parking duration distributions, we propose an attention mechanism called distAtt to measure the similarity of parking duration distributions. Through the distAtt mechanism, we add the long-term spatial–temporal correlations to our spatial–temporal convolutional block, and thus, we can capture complex hybrid spatial–temporal correlations to achieve a higher accuracy of parking availability prediction. Based on real-world datasets, we compare the proposed scheme with the benchmark models. The experimental results show that the proposed scheme has the best performance in predicting the parking occupancy rate.


2019 ◽  
Vol 16 (1) ◽  
Author(s):  
Tianci Chu ◽  
Yi Ping Zhang ◽  
Zhisen Tian ◽  
Chuyuan Ye ◽  
Mingming Zhu ◽  
...  

Abstract Background The glial response in multiple sclerosis (MS), especially for recruitment and differentiation of oligodendrocyte progenitor cells (OPCs), predicts the success of remyelination of MS plaques and return of function. As a central player in neuroinflammation, activation and polarization of microglia/macrophages (M/M) that modulate the inflammatory niche and cytokine components in demyelination lesions may impact the OPC response and progression of demyelination and remyelination. However, the dynamic behaviors of M/M and OPCs during demyelination and spontaneous remyelination are poorly understood, and the complex role of neuroinflammation in the demyelination-remyelination process is not well known. In this study, we utilized two focal demyelination models with different dynamic patterns of M/M to investigate the correlation between M/M polarization and the demyelination-remyelination process. Methods The temporal and spatial features of M/M activation/polarization and OPC response in two focal demyelination models induced by lysolecithin (LPC) and lipopolysaccharide (LPS) were examined in mice. Detailed discrimination of morphology, sensorimotor function, diffusion tensor imaging (DTI), inflammation-relevant cytokines, and glial responses between these two models were analyzed at different phases. Results The results show that LPC and LPS induced distinctive temporal and spatial lesion patterns. LPS produced diffuse demyelination lesions, with a delayed peak of demyelination and functional decline compared to LPC. Oligodendrocytes, astrocytes, and M/M were scattered throughout the LPS-induced demyelination lesions but were distributed in a layer-like pattern throughout the LPC-induced lesion. The specific M/M polarization was tightly correlated to the lesion pattern associated with balance beam function. Conclusions This study elaborated on the spatial and temporal features of neuroinflammation mediators and glial response during the demyelination-remyelination processes in two focal demyelination models. Specific M/M polarization is highly correlated to the demyelination-remyelination process probably via modulations of the inflammatory niche, cytokine components, and OPC response. These findings not only provide a basis for understanding the complex and dynamic glial phenotypes and behaviors but also reveal potential targets to promote/inhibit certain M/M phenotypes at the appropriate time for efficient remyelination.


2011 ◽  
Vol 34 (1) ◽  
pp. 138-155 ◽  
Author(s):  
Richard Huyghe

This paper deals with the spatial features of event-denoting nouns [EvNs], which are often overlooked in the linguistic literature on space. EvNs can refer to spatial entities, as they can be used as trajectors in localization sentences (Il y a une cérémonie dans l’église ‘There is a ceremony in the church’). Still, EvNs differ in several ways from nouns denoting prototypical spatial entities. They do not combine with complements denoting spatial extension (*une cérémonie de deux hectares ‘a four acres’ ceremony’), and they are associated with specific nouns and verbs of location (le lieu / *la place de la cérémonie ‘the location / the place of the ceremony’, Une cérémonie a lieu / *se trouve dans l’église ‘A ceremony takes place / is in the church’). It is assumed that the peculiarity of the spatial denotation of EvNs is due to their direct relation to time. The dependence between the spatial and temporal properties of EvNs shows when these nouns are used as landmarks (Pierre se rend à la cérémonie ‘Peter goes to the ceremony’). First, spatial eventive landmarks bear a temporal specification. Second, the temporal features of events determine their ability to be used as spatial landmarks.


2002 ◽  
Vol 14 (5) ◽  
pp. 687-701 ◽  
Author(s):  
Jason Proksch ◽  
Daphne Bavelier

There is much anecdotal suggestion of improved visual skills in congenitally deaf individuals. However, this claim has only been met by mixed results from careful investigations of visual skills in deaf individuals. Psychophysical assessments of visual functions have failed, for the most part, to validate the view of enhanced visual skills after deafness. Only a few studies have shown an advantage for deaf individuals in visual tasks. Interestingly, all of these studies share the requirement that participants process visual information in their peripheral visual field under demanding conditions of attention. This work has led us to propose that congenital auditory deprivation alters the gradient of visual attention from central to peripheral field by enhancing peripheral processing. This hypothesis was tested by adapting a search task from Lavie and colleagues in which the interference from distracting information on the search task provides a measure of attentional resources. These authors have established that during an easy central search for a target, any surplus attention remaining will involuntarily process a peripheral distractor that the subject has been instructed to ignore. Attentional resources can be measured by adjusting the difficulty of the search task to the point at which no surplus resources are available for the distractor. Through modification of this paradigm, central and peripheral attentional resources were compared in deaf and hearing individuals. Deaf individuals possessed greater attentional resources in the periphery but less in the center when compared to hearing individuals. Furthermore, based on results from native hearing signers, it was shown that sign language alone could not be responsible for these changes. We conclude that auditory deprivation from birth leads to compensatory changes within the visual system that enhance attentional processing of the peripheral visual field.


Sign in / Sign up

Export Citation Format

Share Document