scholarly journals Robust Deep Co-Saliency Detection with Group Semantic

Author(s):  
Chong Wang ◽  
Zheng-Jun Zha ◽  
Dong Liu ◽  
Hongtao Xie

High-level semantic knowledge in addition to low-level visual cues is essentially crucial for co-saliency detection. This paper proposes a novel end-to-end deep learning approach for robust co-saliency detection by simultaneously learning highlevel group-wise semantic representation as well as deep visual features of a given image group. The inter-image interaction at semantic-level as well as the complementarity between group semantics and visual features are exploited to boost the inferring of co-salient regions. Specifically, the proposed approach consists of a co-category learning branch and a co-saliency detection branch. While the former is proposed to learn group-wise semantic vector using co-category association of an image group as supervision, the latter is to infer precise co-salient maps based on the ensemble of group semantic knowledge and deep visual cues. The group semantic vector is broadcasted to each spatial location of multi-scale visual feature maps and is used as a top-down semantic guidance for boosting the bottom-up inferring of co-saliency. The co-category learning and co-saliency detection branches are jointly optimized in a multi-task learning manner, further improving the robustness of the approach. Moreover, we construct a new large-scale co-saliency dataset COCO-SEG to facilitate research of co-saliency detection. Extensive experimental results on COCO-SEG and a widely used benchmark Cosal2015 have demonstrated the superiority of the proposed approach as compared to the state-of-the-art methods.

2012 ◽  
Vol 24 (1) ◽  
pp. 133-147 ◽  
Author(s):  
Carin Whitney ◽  
Marie Kirk ◽  
Jamie O'Sullivan ◽  
Matthew A. Lambon Ralph ◽  
Elizabeth Jefferies

To understand the meanings of words and objects, we need to have knowledge about these items themselves plus executive mechanisms that compute and manipulate semantic information in a task-appropriate way. The neural basis for semantic control remains controversial. Neuroimaging studies have focused on the role of the left inferior frontal gyrus (LIFG), whereas neuropsychological research suggests that damage to a widely distributed network elicits impairments of semantic control. There is also debate about the relationship between semantic and executive control more widely. We used TMS in healthy human volunteers to create “virtual lesions” in structures typically damaged in patients with semantic control deficits: LIFG, left posterior middle temporal gyrus (pMTG), and intraparietal sulcus (IPS). The influence of TMS on tasks varying in semantic and nonsemantic control demands was examined for each region within this hypothesized network to gain insights into (i) their functional specialization (i.e., involvement in semantic representation, controlled retrieval, or selection) and (ii) their domain dependence (i.e., semantic or cognitive control). The results revealed that LIFG and pMTG jointly support both the controlled retrieval and selection of semantic knowledge. IPS specifically participates in semantic selection and responds to manipulations of nonsemantic control demands. These observations are consistent with a large-scale semantic control network, as predicted by lesion data, that draws on semantic-specific (LIFG and pMTG) and domain-independent executive components (IPS).


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Zelin Deng ◽  
Qiran Zhu ◽  
Pei He ◽  
Dengyong Zhang ◽  
Yuansheng Luo

Using the convolutional neural network (CNN) method for image emotion recognition is a research hotspot of deep learning. Previous studies tend to use visual features obtained from a global perspective and ignore the role of local visual features in emotional arousal. Moreover, the CNN shallow feature maps contain image content information; such maps obtained from shallow layers directly to describe low-level visual features may lead to redundancy. In order to enhance image emotion recognition performance, an improved CNN is proposed in this work. Firstly, the saliency detection algorithm is used to locate the emotional region of the image, which is served as the supplementary information to conduct emotion recognition better. Secondly, the Gram matrix transform is performed on the CNN shallow feature maps to decrease the redundancy of image content information. Finally, a new loss function is designed by using hard labels and probability labels of image emotion category to reduce the influence of image emotion subjectivity. Extensive experiments have been conducted on benchmark datasets, including FI (Flickr and Instagram), IAPSsubset, ArtPhoto, and Abstract. The experimental results show that compared with the existing approaches, our method has a good application prospect.


Sensors ◽  
2021 ◽  
Vol 21 (3) ◽  
pp. 1009
Author(s):  
Ilaria De Santis ◽  
Michele Zanoni ◽  
Chiara Arienti ◽  
Alessandro Bevilacqua ◽  
Anna Tesei

Subcellular spatial location is an essential descriptor of molecules biological function. Presently, super-resolution microscopy techniques enable quantification of subcellular objects distribution in fluorescence images, but they rely on instrumentation, tools and expertise not constituting a default for most of laboratories. We propose a method that allows resolving subcellular structures location by reinforcing each single pixel position with the information from surroundings. Although designed for entry-level laboratory equipment with common resolution powers, our method is independent from imaging device resolution, and thus can benefit also super-resolution microscopy. The approach permits to generate density distribution maps (DDMs) informative of both objects’ absolute location and self-relative displacement, thus practically reducing location uncertainty and increasing the accuracy of signal mapping. This work proves the capability of the DDMs to: (a) improve the informativeness of spatial distributions; (b) empower subcellular molecules distributions analysis; (c) extend their applicability beyond mere spatial object mapping. Finally, the possibility of enhancing or even disclosing latent distributions can concretely speed-up routine, large-scale and follow-up experiments, besides representing a benefit for all spatial distribution studies, independently of the image acquisition resolution. DDMaker, a Software endowed with a user-friendly Graphical User Interface (GUI), is also provided to support users in DDMs creation.


2020 ◽  
Vol 34 (07) ◽  
pp. 11693-11700 ◽  
Author(s):  
Ao Luo ◽  
Fan Yang ◽  
Xin Li ◽  
Dong Nie ◽  
Zhicheng Jiao ◽  
...  

Crowd counting is an important yet challenging task due to the large scale and density variation. Recent investigations have shown that distilling rich relations among multi-scale features and exploiting useful information from the auxiliary task, i.e., localization, are vital for this task. Nevertheless, how to comprehensively leverage these relations within a unified network architecture is still a challenging problem. In this paper, we present a novel network structure called Hybrid Graph Neural Network (HyGnn) which targets to relieve the problem by interweaving the multi-scale features for crowd density as well as its auxiliary task (localization) together and performing joint reasoning over a graph. Specifically, HyGnn integrates a hybrid graph to jointly represent the task-specific feature maps of different scales as nodes, and two types of relations as edges: (i) multi-scale relations capturing the feature dependencies across scales and (ii) mutual beneficial relations building bridges for the cooperation between counting and localization. Thus, through message passing, HyGnn can capture and distill richer relations between nodes to obtain more powerful representations, providing robust and accurate results. Our HyGnn performs significantly well on four challenging datasets: ShanghaiTech Part A, ShanghaiTech Part B, UCF_CC_50 and UCF_QNRF, outperforming the state-of-the-art algorithms by a large margin.


1976 ◽  
Vol 28 (2) ◽  
pp. 193-202 ◽  
Author(s):  
Philip Merikle

Report of single letters from centrally-fixated, seven-letter, target rows was probed by either auditory or visual cues. The target rows were presented for 100 ms, and the report cues were single digits which indicated the spatial location of a letter. In three separate experiments, report was always better with the auditory cues. The advantage for the auditory cues was maintained both when target rows were masked by a patterned stimulus and when the auditory cues were presented 500 ms later than comparable visual cues. The results indicate that visual cues produce modality-specific interference which operates at a level of processing beyond iconic representation.


2013 ◽  
Vol 765-767 ◽  
pp. 1401-1405
Author(s):  
Chi Zhang ◽  
Wei Qiang Wang

Object-level saliency detection is an important branch of visual saliency. In this paper, we propose a novel method which can conduct object-level saliency detection in both images and videos in a unified way. We employ a more effective spatial compactness assumption to measure saliency instead of the popular contrast assumption. In addition, we present a combination framework which integrates multiple saliency maps generated in different feature maps. The proposed algorithm can automatically select saliency maps of high quality according to the quality evaluation score we define. The experimental results demonstrate that the proposed method outperforms all state-of-the-art methods on both of the datasets of still images and video sequences.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Bangtong Huang ◽  
Hongquan Zhang ◽  
Zihong Chen ◽  
Lingling Li ◽  
Lihua Shi

Deep learning algorithms are facing the limitation in virtual reality application due to the cost of memory, computation, and real-time computation problem. Models with rigorous performance might suffer from enormous parameters and large-scale structure, and it would be hard to replant them onto embedded devices. In this paper, with the inspiration of GhostNet, we proposed an efficient structure ShuffleGhost to make use of the redundancy in feature maps to alleviate the cost of computations, as well as tackling some drawbacks of GhostNet. Since GhostNet suffers from high computation of convolution in Ghost module and shortcut, the restriction of downsampling would make it more difficult to apply Ghost module and Ghost bottleneck to other backbone. This paper proposes three new kinds of ShuffleGhost structure to tackle the drawbacks of GhostNet. The ShuffleGhost module and ShuffleGhost bottlenecks are utilized by the shuffle layer and group convolution from ShuffleNet, and they are designed to redistribute the feature maps concatenated from Ghost Feature Map and Primary Feature Map. Besides, they eliminate the gap of them and extract the features. Then, SENet layer is adopted to reduce the computation cost of group convolution, as well as evaluating the importance of the feature maps which concatenated from Ghost Feature Maps and Primary Feature Maps and giving proper weights for the feature maps. This paper conducted some experiments and proved that the ShuffleGhostV3 has smaller trainable parameters and FLOPs with the ensurance of accuracy. And with proper design, it could be more efficient in both GPU and CPU side.


ACTA IMEKO ◽  
2021 ◽  
Vol 10 (1) ◽  
pp. 98
Author(s):  
Valeria Croce ◽  
Gabriella Caroti ◽  
Andrea Piemonte ◽  
Marco Giorgio Bevilacqua

The digitization of Cultural Heritage paves the way for new approaches to surveying and restitution of historical sites. With a view to the management of integrated programs of documentation and conservation, the research is now focusing on the creation of information systems where to link the digital representation of a building to semantic knowledge. With reference to the emblematic case study of the Calci Charterhouse, also known as Pisa Charterhouse, this contribution illustrates an approach to be followed in the transition from 3D survey information, derived from laser scanner and photogrammetric techniques, to the creation of semantically enriched 3D models. The proposed approach is based on the recognition -segmentation and classification- of elements on the original raw point cloud, and on the manual mapping of NURBS elements on it. For this shape recognition process, reference to architectural treatises and vocabularies of classical architecture is a key step. The created building components are finally imported in a H-BIM environment, where they are enriched with semantic information related to historical knowledge, documentary sources and restoration activities.


2020 ◽  
Vol 34 (05) ◽  
pp. 7554-7561
Author(s):  
Pengxiang Cheng ◽  
Katrin Erk

Recent progress in NLP witnessed the development of large-scale pre-trained language models (GPT, BERT, XLNet, etc.) based on Transformer (Vaswani et al. 2017), and in a range of end tasks, such models have achieved state-of-the-art results, approaching human performance. This clearly demonstrates the power of the stacked self-attention architecture when paired with a sufficient number of layers and a large amount of pre-training data. However, on tasks that require complex and long-distance reasoning where surface-level cues are not enough, there is still a large gap between the pre-trained models and human performance. Strubell et al. (2018) recently showed that it is possible to inject knowledge of syntactic structure into a model through supervised self-attention. We conjecture that a similar injection of semantic knowledge, in particular, coreference information, into an existing model would improve performance on such complex problems. On the LAMBADA (Paperno et al. 2016) task, we show that a model trained from scratch with coreference as auxiliary supervision for self-attention outperforms the largest GPT-2 model, setting the new state-of-the-art, while only containing a tiny fraction of parameters compared to GPT-2. We also conduct a thorough analysis of different variants of model architectures and supervision configurations, suggesting future directions on applying similar techniques to other problems.


Sign in / Sign up

Export Citation Format

Share Document