scholarly journals Progressive Multi-Scale Vision Transformer for Facial Action Unit Detection

2022 ◽  
Vol 15 ◽  
Author(s):  
Chongwen Wang ◽  
Zicheng Wang

Facial action unit (AU) detection is an important task in affective computing and has attracted extensive attention in the field of computer vision and artificial intelligence. Previous studies for AU detection usually encode complex regional feature representations with manually defined facial landmarks and learn to model the relationships among AUs via graph neural network. Albeit some progress has been achieved, it is still tedious for existing methods to capture the exclusive and concurrent relationships among different combinations of the facial AUs. To circumvent this issue, we proposed a new progressive multi-scale vision transformer (PMVT) to capture the complex relationships among different AUs for the wide range of expressions in a data-driven fashion. PMVT is based on the multi-scale self-attention mechanism that can flexibly attend to a sequence of image patches to encode the critical cues for AUs. Compared with previous AU detection methods, the benefits of PMVT are 2-fold: (i) PMVT does not rely on manually defined facial landmarks to extract the regional representations, and (ii) PMVT is capable of encoding facial regions with adaptive receptive fields, thus facilitating representation of different AU flexibly. Experimental results show that PMVT improves the AU detection accuracy on the popular BP4D and DISFA datasets. Compared with other state-of-the-art AU detection methods, PMVT obtains consistent improvements. Visualization results show PMVT automatically perceives the discriminative facial regions for robust AU detection.

Sensors ◽  
2021 ◽  
Vol 21 (12) ◽  
pp. 4222
Author(s):  
Shushi Namba ◽  
Wataru Sato ◽  
Masaki Osumi ◽  
Koh Shimokawa

In the field of affective computing, achieving accurate automatic detection of facial movements is an important issue, and great progress has already been made. However, a systematic evaluation of systems that now have access to the dynamic facial database remains an unmet need. This study compared the performance of three systems (FaceReader, OpenFace, AFARtoolbox) that detect each facial movement corresponding to an action unit (AU) derived from the Facial Action Coding System. All machines could detect the presence of AUs from the dynamic facial database at a level above chance. Moreover, OpenFace and AFAR provided higher area under the receiver operating characteristic curve values compared to FaceReader. In addition, several confusion biases of facial components (e.g., AU12 and AU14) were observed to be related to each automated AU detection system and the static mode was superior to dynamic mode for analyzing the posed facial database. These findings demonstrate the features of prediction patterns for each system and provide guidance for research on facial expressions.


Author(s):  
Guanbin Li ◽  
Xin Zhu ◽  
Yirui Zeng ◽  
Qing Wang ◽  
Liang Lin

Facial action unit (AU) recognition is a crucial task for facial expressions analysis and has attracted extensive attention in the field of artificial intelligence and computer vision. Existing works have either focused on designing or learning complex regional feature representations, or delved into various types of AU relationship modeling. Albeit with varying degrees of progress, it is still arduous for existing methods to handle complex situations. In this paper, we investigate how to integrate the semantic relationship propagation between AUs in a deep neural network framework to enhance the feature representation of facial regions, and propose an AU semantic relationship embedded representation learning (SRERL) framework. Specifically, by analyzing the symbiosis and mutual exclusion of AUs in various facial expressions, we organize the facial AUs in the form of structured knowledge-graph and integrate a Gated Graph Neural Network (GGNN) in a multi-scale CNN framework to propagate node information through the graph for generating enhanced AU representation. As the learned feature involves both the appearance characteristics and the AU relationship reasoning, the proposed model is more robust and can cope with more challenging cases, e.g., illumination change and partial occlusion. Extensive experiments on the two public benchmarks demonstrate that our method outperforms the previous work and achieves state of the art performance.


2019 ◽  
Vol 11 (2) ◽  
pp. 142 ◽  
Author(s):  
Wenping Ma ◽  
Hui Yang ◽  
Yue Wu ◽  
Yunta Xiong ◽  
Tao Hu ◽  
...  

In this paper, a novel change detection approach based on multi-grained cascade forest(gcForest) and multi-scale fusion for synthetic aperture radar (SAR) images is proposed. It detectsthe changed and unchanged areas of the images by using the well-trained gcForest. Most existingchange detection methods need to select the appropriate size of the image block. However, thesingle size image block only provides a part of the local information, and gcForest cannot achieve agood effect on the image representation learning ability. Therefore, the proposed approach choosesdifferent sizes of image blocks as the input of gcForest, which can learn more image characteristicsand reduce the influence of the local information of the image on the classification result as well.In addition, in order to improve the detection accuracy of those pixels whose gray value changesabruptly, the proposed approach combines gradient information of the difference image with theprobability map obtained from the well-trained gcForest. Therefore, the image edge information canbe enhanced and the accuracy of edge detection can be improved by extracting the image gradientinformation. Experiments on four data sets indicate that the proposed approach outperforms otherstate-of-the-art algorithms.


2018 ◽  
Vol 10 (12) ◽  
pp. 1987 ◽  
Author(s):  
Rocío Ramos-Bernal ◽  
René Vázquez-Jiménez ◽  
Raúl Romero-Calcerrada ◽  
Patricia Arrogante-Funes ◽  
Carlos Novillo

Natural hazards include a wide range of high-impact phenomena that affect socioeconomic and natural systems. Landslides are a natural hazard whose destructive power has caused a significant number of victims and substantial damage around the world. Remote sensing provides many data types and techniques that can be applied to monitor their effects through landslides inventory maps. Three unsupervised change detection methods were applied to the Advanced Spaceborne Thermal Emission and Reflection Radiometer (Aster)-derived images from an area prone to landslides in the south of Mexico. Linear Regression (LR), Chi-Square Transformation, and Change Vector Analysis were applied to the principal component and the Normalized Difference Vegetation Index (NDVI) data to obtain the difference image of change. The thresholding was performed on the change histogram using two approaches: the statistical parameters and the secant method. According to previous works, a slope mask was used to classify the pixels as landslide/No-landslide; a cloud mask was used to eliminate false positives; and finally, those landslides less than 450 m2 (two Aster pixels) were discriminated. To assess the landslide detection accuracy, 617 polygons (35,017 pixels) were sampled, classified as real landslide/No-landslide, and defined as ground-truth according to the interpretation of color aerial photo slides to obtain omission/commission errors and Kappa coefficient of agreement. The results showed that the LR using NDVI data performs the best results in landslide detection. Change detection is a suitable technique that can be applied for the landslides mapping and we think that it can be replicated in other parts of the world with results similar to those obtained in the present work.


2021 ◽  
Author(s):  
Hung-Hao Chen ◽  
Chia-Hung Wang ◽  
Hsueh-Wei Chen ◽  
Pei-Yung Hsiao ◽  
Li-Chen Fu ◽  
...  

The current fusion-based methods transform LiDAR data into bird’s eye view (BEV) representations or 3D voxel, leading to information loss and heavy computation cost of 3D convolution. In contrast, we directly consume raw point clouds and perform fusion between two modalities. We employ the concept of region proposal network to generate proposals from two streams, respectively. In order to make two sensors compensate the weakness of each other, we utilize the calibration parameters to project proposals from one stream onto the other. With the proposed multi-scale feature aggregation module, we are able to combine the extracted regionof-interest-level (RoI-level) features of RGB stream from different receptive fields, resulting in fertilizing feature richness. Experiments on KITTI dataset show that our proposed network outperforms other fusion-based methods with meaningful improvements as compared to 3D object detection methods under challenging setting.


2019 ◽  
Vol 11 (5) ◽  
pp. 531 ◽  
Author(s):  
Yuanyuan Wang ◽  
Chao Wang ◽  
Hong Zhang ◽  
Yingbo Dong ◽  
Sisi Wei

Independent of daylight and weather conditions, synthetic aperture radar (SAR) imagery is widely applied to detect ships in marine surveillance. The shapes of ships are multi-scale in SAR imagery due to multi-resolution imaging modes and their various shapes. Conventional ship detection methods are highly dependent on the statistical models of sea clutter or the extracted features, and their robustness need to be strengthened. Being an automatic learning representation, the RetinaNet object detector, one kind of deep learning model, is proposed to crack this obstacle. Firstly, feature pyramid networks (FPN) are used to extract multi-scale features for both ship classification and location. Then, focal loss is used to address the class imbalance and to increase the importance of the hard examples during training. There are 86 scenes of Chinese Gaofen-3 Imagery at four resolutions, i.e., 3 m, 5 m, 8 m, and 10 m, used to evaluate our approach. Two Gaofen-3 images and one Constellation of Small Satellite for Mediterranean basin Observation (Cosmo-SkyMed) image are used to evaluate the robustness. The experimental results reveal that (1) RetinaNet not only can efficiently detect multi-scale ships but also has a high detection accuracy; (2) compared with other object detectors, RetinaNet achieves more than a 96% mean average precision (mAP). These results demonstrate the effectiveness of our proposed method.


Sensors ◽  
2021 ◽  
Vol 21 (15) ◽  
pp. 5125
Author(s):  
Pengcheng Xu ◽  
Zhongyuan Guo ◽  
Lei Liang ◽  
Xiaohang Xu

In the field of surface defect detection, the scale difference of product surface defects is often huge. The existing defect detection methods based on Convolutional Neural Networks (CNNs) are more inclined to express macro and abstract features, and the ability to express local and small defects is insufficient, resulting in an imbalance of feature expression capabilities. In this paper, a Multi-Scale Feature Learning Network (MSF-Net) based on Dual Module Feature (DMF) extractor is proposed. DMF extractor is mainly composed of optimized Concatenated Rectified Linear Units (CReLUs) and optimized Inception feature extraction modules, which increases the diversity of feature receptive fields while reducing the amount of calculation; the feature maps of the middle layer with different sizes of receptive fields are merged to increase the richness of the receptive fields of the last layer of feature maps; the residual shortcut connections, batch normalization layer and average pooling layer are used to replace the fully connected layer to improve training efficiency, and make the multi-scale feature learning ability more balanced at the same time. Two representative multi-scale defect data sets are used for experiments, and the experimental results verify the advancement and effectiveness of the proposed MSF-Net in the detection of surface defects with multi-scale features.


2019 ◽  
Vol 16 (4) ◽  
pp. 172988141987067
Author(s):  
Enze Yang ◽  
Linlin Huang ◽  
Jian Hu

Vehicle detection is involved in a wide range of intelligent transportation and smart city applications, and the demand of fast and accurate detection of vehicles is increasing. In this article, we propose a convolutional neural network-based framework, called separable reverse connected network, for multi-scale vehicles detection. In this network, reverse connected structure enriches the semantic context information of previous layers, while separable convolution is introduced for sparse representation of heavy feature maps generated from subnetworks. Further, we use multi-scale training scheme, online hard example mining, and model compression technique to accelerate the training process as well as reduce the parameters. Experimental results on Pascal Visual Object Classes (VOC) 2007 + 2012 and MicroSoft Common Objects in COntext (MS COCO) 2014 demonstrate the proposed method yields state-of-the-art performance. Moreover, by separable convolution and model compression, the network of two-stage detector is accelerated by about two times with little loss of detection accuracy.


2017 ◽  
Vol 10 (1) ◽  
pp. 199-208 ◽  
Author(s):  
Hsu-Yung Cheng ◽  
Chih-Lung Lin

Abstract. Cloud detection is important for providing necessary information such as cloud cover in many applications. Existing cloud detection methods include red-to-blue ratio thresholding and other classification-based techniques. In this paper, we propose to perform cloud detection using supervised learning techniques with multi-resolution features. One of the major contributions of this work is that the features are extracted from local image patches with different sizes to include local structure and multi-resolution information. The cloud models are learned through the training process. We consider classifiers including random forest, support vector machine, and Bayesian classifier. To take advantage of the clues provided by multiple classifiers and various levels of patch sizes, we employ a voting scheme to combine the results to further increase the detection accuracy. In the experiments, we have shown that the proposed method can distinguish cloud and non-cloud pixels more accurately compared with existing works.


2013 ◽  
Vol 734-737 ◽  
pp. 2815-2818
Author(s):  
Hui Liu ◽  
Chun Xian Gao ◽  
Xing Hao Ding ◽  
Zhe Zeng

Due to the high mobility, a wide range of monitoring, air mobile platform-based vehicle detection and tracking system is becoming core of the investigation and the monitoring. Self-motion of the camera and external interference caused by the low-level platform led to instability of the obtained video and affect the correct detection of moving targets and subsequent analysis. For the characteristics for low-level video, an image stabilization algorithm based on SURF combined with normal vector of optical flow is proposed to solve moving vehicle detection low-altitude video. From the experimental results can be seen: (1) compared to other moving vehicle detection methods, the method proposed can get better detection efficiency and detection accuracy; (2) in the complex context, this method can effectively detect moving vehicles. The experiments show that this method has some theoretical and application value of space-based video moving target detection.


Sign in / Sign up

Export Citation Format

Share Document