feature representations
Recently Published Documents


TOTAL DOCUMENTS

473
(FIVE YEARS 261)

H-INDEX

28
(FIVE YEARS 12)

2022 ◽  
Vol 15 ◽  
Author(s):  
Chongwen Wang ◽  
Zicheng Wang

Facial action unit (AU) detection is an important task in affective computing and has attracted extensive attention in the field of computer vision and artificial intelligence. Previous studies for AU detection usually encode complex regional feature representations with manually defined facial landmarks and learn to model the relationships among AUs via graph neural network. Albeit some progress has been achieved, it is still tedious for existing methods to capture the exclusive and concurrent relationships among different combinations of the facial AUs. To circumvent this issue, we proposed a new progressive multi-scale vision transformer (PMVT) to capture the complex relationships among different AUs for the wide range of expressions in a data-driven fashion. PMVT is based on the multi-scale self-attention mechanism that can flexibly attend to a sequence of image patches to encode the critical cues for AUs. Compared with previous AU detection methods, the benefits of PMVT are 2-fold: (i) PMVT does not rely on manually defined facial landmarks to extract the regional representations, and (ii) PMVT is capable of encoding facial regions with adaptive receptive fields, thus facilitating representation of different AU flexibly. Experimental results show that PMVT improves the AU detection accuracy on the popular BP4D and DISFA datasets. Compared with other state-of-the-art AU detection methods, PMVT obtains consistent improvements. Visualization results show PMVT automatically perceives the discriminative facial regions for robust AU detection.


2022 ◽  
Vol 12 (1) ◽  
Author(s):  
Thirza Dado ◽  
Yağmur Güçlütürk ◽  
Luca Ambrogioni ◽  
Gabriëlle Ras ◽  
Sander Bosch ◽  
...  

AbstractNeural decoding can be conceptualized as the problem of mapping brain responses back to sensory stimuli via a feature space. We introduce (i) a novel experimental paradigm that uses well-controlled yet highly naturalistic stimuli with a priori known feature representations and (ii) an implementation thereof for HYPerrealistic reconstruction of PERception (HYPER) of faces from brain recordings. To this end, we embrace the use of generative adversarial networks (GANs) at the earliest step of our neural decoding pipeline by acquiring fMRI data as participants perceive face images synthesized by the generator network of a GAN. We show that the latent vectors used for generation effectively capture the same defining stimulus properties as the fMRI measurements. As such, these latents (conditioned on the GAN) are used as the in-between feature representations underlying the perceived images that can be predicted in neural decoding for (re-)generation of the originally perceived stimuli, leading to the most accurate reconstructions of perception to date.


eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Jiedong Zhang ◽  
Yong Jiang ◽  
Yunjie Song ◽  
Peng Zhang ◽  
Sheng He

Regions sensitive to specific object categories as well as organized spatial patterns sensitive to different features have been found across the whole ventral temporal cortex (VTC). However, it is unclear that within each object category region, how specific feature representations are organized to support object identification. Would object features, such as object parts, be represented in fine-scale spatial tuning within object category-specific regions? Here, we used high-field 7T fMRI to examine the spatial tuning to different face parts within each face-selective region. Our results show consistent spatial tuning of face parts across individuals that within right posterior fusiform face area (pFFA) and right occipital face area (OFA), the posterior portion of each region was biased to eyes, while the anterior portion was biased to mouth and chin stimuli. Our results demonstrate that within the occipital and fusiform face processing regions, there exist systematic spatial tuning to different face parts that support further computation combining them.


Healthcare ◽  
2021 ◽  
Vol 10 (1) ◽  
pp. 36
Author(s):  
Yubin Wu ◽  
Qianqian Lin ◽  
Mingrun Yang ◽  
Jing Liu ◽  
Jing Tian ◽  
...  

The main objective of yoga pose grading is to assess the input yoga pose and compare it to a standard pose in order to provide a quantitative evaluation as a grade. In this paper, a computer vision-based yoga pose grading approach is proposed using contrastive skeleton feature representations. First, the proposed approach extracts human body skeleton keypoints from the input yoga pose image and then feeds their coordinates into a pose feature encoder, which is trained using contrastive triplet examples; finally, a comparison of similar encoded pose features is made. Furthermore, to tackle the inherent challenge of composing contrastive examples in pose feature encoding, this paper proposes a new strategy to use both a coarse triplet example—comprised of an anchor, a positive example from the same category, and a negative example from a different category, and a fine triplet example—comprised of an anchor, a positive example, and a negative example from the same category with different pose qualities. Extensive experiments are conducted using two benchmark datasets to demonstrate the superior performance of the proposed approach.


Sensors ◽  
2021 ◽  
Vol 22 (1) ◽  
pp. 73
Author(s):  
Marjan Stoimchev ◽  
Marija Ivanovska ◽  
Vitomir Štruc

In the past few years, there has been a leap from traditional palmprint recognition methodologies, which use handcrafted features, to deep-learning approaches that are able to automatically learn feature representations from the input data. However, the information that is extracted from such deep-learning models typically corresponds to the global image appearance, where only the most discriminative cues from the input image are considered. This characteristic is especially problematic when data is acquired in unconstrained settings, as in the case of contactless palmprint recognition systems, where visual artifacts caused by elastic deformations of the palmar surface are typically present in spatially local parts of the captured images. In this study we address the problem of elastic deformations by introducing a new approach to contactless palmprint recognition based on a novel CNN model, designed as a two-path architecture, where one path processes the input in a holistic manner, while the second path extracts local information from smaller image patches sampled from the input image. As elastic deformations can be assumed to most significantly affect the global appearance, while having a lesser impact on spatially local image areas, the local processing path addresses the issues related to elastic deformations thereby supplementing the information from the global processing path. The model is trained with a learning objective that combines the Additive Angular Margin (ArcFace) Loss and the well-known center loss. By using the proposed model design, the discriminative power of the learned image representation is significantly enhanced compared to standard holistic models, which, as we show in the experimental section, leads to state-of-the-art performance for contactless palmprint recognition. Our approach is tested on two publicly available contactless palmprint datasets—namely, IITD and CASIA—and is demonstrated to perform favorably against state-of-the-art methods from the literature. The source code for the proposed model is made publicly available.


2021 ◽  
Author(s):  
Qingxing Cao ◽  
Wentao Wan ◽  
Xiaodan Liang ◽  
Liang Lin

Despite the significant success in various domains, the data-driven deep neural networks compromise the feature interpretability, lack the global reasoning capability, and can’t incorporate external information crucial for complicated real-world tasks. Since the structured knowledge can provide rich cues to record human observations and commonsense, it is thus desirable to bridge symbolic semantics with learned local feature representations. In this chapter, we review works that incorporate different domain knowledge into the intermediate feature representation.These methods firstly construct a domain-specific graph that represents related human knowledge. Then, they characterize node representations with neural network features and perform graph convolution to enhance these symbolic nodes via the graph neural network(GNN).Lastly, they map the enhanced node feature back into the neural network for further propagation or prediction. Through integrating knowledge graphs into neural networks, one can collaborate feature learning and graph reasoning with the same supervised loss function and achieve a more effective and interpretable way to introduce structure constraints.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Vincent Majanga ◽  
Serestina Viriri

Recent advances in medical imaging analysis, especially the use of deep learning, are helping to identify, detect, classify, and quantify patterns in radiographs. At the center of these advances is the ability to explore hierarchical feature representations learned from data. Deep learning is invaluably becoming the most sought out technique, leading to enhanced performance in analysis of medical applications and systems. Deep learning techniques have achieved great performance results in dental image segmentation. Segmentation of dental radiographs is a crucial step that helps the dentist to diagnose dental caries. The performance of these deep networks is however restrained by various challenging features of dental carious lesions. Segmentation of dental images becomes difficult due to a vast variety in topologies, intricacies of medical structures, and poor image qualities caused by conditions such as low contrast, noise, irregular, and fuzzy edges borders, which result in unsuccessful segmentation. The dental segmentation method used is based on thresholding and connected component analysis. Images are preprocessed using the Gaussian blur filter to remove noise and corrupted pixels. Images are then enhanced using erosion and dilation morphology operations. Finally, segmentation is done through thresholding, and connected components are identified to extract the Region of Interest (ROI) of the teeth. The method was evaluated on an augmented dataset of 11,114 dental images. It was trained with 10 090 training set images and tested on 1024 testing set images. The proposed method gave results of 93 % for both precision and recall values, respectively.


Author(s):  
Francesco Bronzino ◽  
Paul Schmitt ◽  
Sara Ayoubi ◽  
Hyojoon Kim ◽  
Renata Teixeira ◽  
...  

Network management often relies on machine learning to make predictions about performance and security from network traffic. Often, the representation of the traffic is as important as the choice of the model. The features that the model relies on, and the representation of those features, ultimately determine model accuracy, as well as where and whether the model can be deployed in practice. Thus, the design and evaluation of these models ultimately requires understanding not only model accuracy but also the systems costs associated with deploying the model in an operational network. Towards this goal, this paper develops a new framework and system that enables a joint evaluation of both the conventional notions of machine learning performance (e.g., model accuracy) and the systems-level costs of different representations of network traffic. We highlight these two dimensions for two practical network management tasks, video streaming quality inference and malware detection, to demonstrate the importance of exploring different representations to find the appropriate operating point. We demonstrate the benefit of exploring a range of representations of network traffic and present Traffic Refinery, a proof-of-concept implementation that both monitors network traffic at 10~Gbps and transforms traffic in real time to produce a variety of feature representations for machine learning. Traffic Refinery both highlights this design space and makes it possible to explore different representations for learning, balancing systems costs related to feature extraction and model training against model accuracy.


2021 ◽  
Author(s):  
Jie Hao ◽  
William Zhu

Abstract Differentiable architecture search (DARTS) approach has made great progress in reducing the com- putational costs of neural architecture search. DARTS tries to discover an optimal architecture module called cell from a predefined super network. However, the obtained cell is then repeatedly and simply stacked to build a target network, failing to extract layered fea- tures hidden in different network depths. Therefore, this target network cannot meet the requirements of prac- tical applications. To address this problem, we propose an effective approach called Layered Feature Repre- sentation for Differentiable Architecture Search (LFR- DARTS). Specifically, we iteratively search for multiple cells with different architectures from shallow to deep layers of the super network. For each iteration, we optimize the architecture of a cell by gradient descent and prune out weak connections from this cell. After obtain- ing the optimal architecture of this cell, we deepen the super network by increasing the number of this cell, so as to create an adaptive network context to search for a deeper-adaptive cell in the next iteration. Thus, our LFR-DARTS can discover the architecture of each cell at a specific and adaptive network depth, which embeds the ability of layered feature representations into each cell to sufficiently extract layered features in different depths. Extensive experiments show that our algorithm achieves an advanced performance on the datasets of CIFAR10, fashionMNIST and ImageNet while at low search costs.


2021 ◽  
Vol 9 ◽  
Author(s):  
Chengzhe Yuan ◽  
Yi He ◽  
Ronghua Lin ◽  
Yong Tang

The academic social networks (ASNs) play an important role in promoting scientific collaboration and innovation in academic society. Accompanying the tremendous growth of scholarly big data, finding suitable scholars on ASNs for collaboration has become more difficult. Different from friend recommendation in conventional social networks, scholar recommendation in ASNs usually involves different academic entities (e.g., scholars, scientific publications, and status updates) and various relationships (e.g., collaboration relationship between team members, citations, and co-authorships), which forms a complex heterogeneous academic network. Our goal is to recommend potential similar scholars for users in ASNs. In this article, we propose to design a graph embedding-based scholar recommendation system by leveraging academic auxiliary information. First, we construct enhanced ASNs by integrating two types of academic features extracted from scholars’ academic information with original network topology. Then, the refined feature representations of the scholars are obtained by a graph embedding framework, which helps the system measure the similarity between scholars based on their representation vectors. Finally, the system generates potential similar scholars for users in ASNs for the final recommendation. We evaluate the effectiveness of our model on five real-world datasets: SCHOLAT, Zhihu, APS, Yelp and Gowalla. The experimental results demonstrate that our model is effective and achieves promising improvements than the other competitive baselines.


Sign in / Sign up

Export Citation Format

Share Document