scholarly journals Face Tracking via Content Aware Correlation Filter

Author(s):  
Houjie Li ◽  
Shuangshuang Yin ◽  
Fuming Sun ◽  
Fasheng Wang

Face tracking is an importance task in many computer vision based augment reality systems. Correlation filters (CFs) have been applied with great success to several computer vision problems including object detection, classification and tracking, but few CF-based methods are proposed for face tracking. As an essential research direction in computer vision, face tracking is very important in many human-computer applications. In this paper, we present a content aware CF for face tracking. In our work, face content refers to the locality sensitive histogram based foreground feature and the learning samples extracted from complex background. It means that both foreground and background information are considered in constructing the face tracker. The foreground feature is introduced into the objective function which could learn an efficient model to adapt to the face appearance variation. For evaluating the proposed face tracker, we build a dataset which contains 97 video sequences covering the 11 challenging attributes of face tracking. Extensive experiments are conducted on the dataset and the results demonstrate that the proposed face tracker shows superior performance to several state-of-the-art tracking algorithms.

2021 ◽  
Vol 13 (14) ◽  
pp. 2656
Author(s):  
Furong Shi ◽  
Tong Zhang

Deep-learning technologies, especially convolutional neural networks (CNNs), have achieved great success in building extraction from areal images. However, shape details are often lost during the down-sampling process, which results in discontinuous segmentation or inaccurate segmentation boundary. In order to compensate for the loss of shape information, two shape-related auxiliary tasks (i.e., boundary prediction and distance estimation) were jointly learned with building segmentation task in our proposed network. Meanwhile, two consistency constraint losses were designed based on the multi-task network to exploit the duality between the mask prediction and two shape-related information predictions. Specifically, an atrous spatial pyramid pooling (ASPP) module was appended to the top of the encoder of a U-shaped network to obtain multi-scale features. Based on the multi-scale features, one regression loss and two classification losses were used for predicting the distance-transform map, segmentation, and boundary. Two inter-task consistency-loss functions were constructed to ensure the consistency between distance maps and masks, and the consistency between masks and boundary maps. Experimental results on three public aerial image data sets showed that our method achieved superior performance over the recent state-of-the-art models.


2019 ◽  
Vol 2019 ◽  
pp. 1-21 ◽  
Author(s):  
Naeem Ratyal ◽  
Imtiaz Ahmad Taj ◽  
Muhammad Sajid ◽  
Anzar Mahmood ◽  
Sohail Razzaq ◽  
...  

Face recognition aims to establish the identity of a person based on facial characteristics and is a challenging problem due to complex nature of the facial manifold. A wide range of face recognition applications are based on classification techniques and a class label is assigned to the test image that belongs to the unknown class. In this paper, a pose invariant deeply learned multiview 3D face recognition approach is proposed and aims to address two problems: face alignment and face recognition through identification and verification setups. The proposed alignment algorithm is capable of handling frontal as well as profile face images. It employs a nose tip heuristic based pose learning approach to estimate acquisition pose of the face followed by coarse to fine nose tip alignment using L2 norm minimization. The whole face is then aligned through transformation using knowledge learned from nose tip alignment. Inspired by the intrinsic facial symmetry of the Left Half Face (LHF) and Right Half Face (RHF), Deeply learned (d) Multi-View Average Half Face (d-MVAHF) features are employed for face identification using deep convolutional neural network (dCNN). For face verification d-MVAHF-Support Vector Machine (d-MVAHF-SVM) approach is employed. The performance of the proposed methodology is demonstrated through extensive experiments performed on four databases: GavabDB, Bosphorus, UMB-DB, and FRGC v2.0. The results show that the proposed approach yields superior performance as compared to existing state-of-the-art methods.


2018 ◽  
Vol 10 (12) ◽  
pp. 1934 ◽  
Author(s):  
Bao-Di Liu ◽  
Wen-Yang Xie ◽  
Jie Meng ◽  
Ye Li ◽  
Yanjiang Wang

In recent years, the collaborative representation-based classification (CRC) method has achieved great success in visual recognition by directly utilizing training images as dictionary bases. However, it describes a test sample with all training samples to extract shared attributes and does not consider the representation of the test sample with the training samples in a specific class to extract the class-specific attributes. For remote-sensing images, both the shared attributes and class-specific attributes are important for classification. In this paper, we propose a hybrid collaborative representation-based classification approach. The proposed method is capable of improving the performance of classifying remote-sensing images by embedding the class-specific collaborative representation to conventional collaborative representation-based classification. Moreover, we extend the proposed method to arbitrary kernel space to explore the nonlinear characteristics hidden in remote-sensing image features to further enhance classification performance. Extensive experiments on several benchmark remote-sensing image datasets were conducted and clearly demonstrate the superior performance of our proposed algorithm to state-of-the-art approaches.


Author(s):  
Kamal Naina Soni

Abstract: Human expressions play an important role in the extraction of an individual's emotional state. It helps in determining the current state and mood of an individual, extracting and understanding the emotion that an individual has based on various features of the face such as eyes, cheeks, forehead, or even through the curve of the smile. A survey confirmed that people use Music as a form of expression. They often relate to a particular piece of music according to their emotions. Considering these aspects of how music impacts a part of the human brain and body, our project will deal with extracting the user’s facial expressions and features to determine the current mood of the user. Once the emotion is detected, a playlist of songs suitable to the mood of the user will be presented to the user. This can be a big help to alleviate the mood or simply calm the individual and can also get quicker song according to the mood, saving time from looking up different songs and parallel developing a software that can be used anywhere with the help of providing the functionality of playing music according to the emotion detected. Keywords: Music, Emotion recognition, Categorization, Recommendations, Computer vision, Camera


Author(s):  
Sally-Ann Treharne

The Falklands War between Britain and Argentina from April to June 1982 was an emotive political and ideological issue for the UK and its Prime Minister, who fought tirelessly to safeguard the Falkland islanders’ right to self-determination. The war represented a considerable financial and moral commitment by the British to the Falkland Islands and their 1,800 inhabitants in a time of significant economic uncertainty in the UK. Notwithstanding this, Britain’s hegemony and influence over the islands was reasserted in the face of perceived Argentine aggression. Britain’s victory was considered a great success in the UK given the strategic difficulties involved in orchestrating a war in a wind-swept archipelago nearly 8,000 miles from the British mainland, but a mere 400 miles from Argentina. Moreover, it helped to secure Thatcher’s re-election the following year and was a source of national pride for the jubilant British public.1


Author(s):  
Chamin Morikawa ◽  
Michael J. Lyons

Interaction methods based on computer-vision hold the potential to become the next powerful technology to support breakthroughs in the field of human-computer interaction. Non-invasive vision-based techniques permit unconventional interaction methods to be considered, including use of movements of the face and head for intentional gestural control of computer systems. Facial gesture interfaces open new possibilities for assistive input technologies. This chapter gives an overview of research aimed at developing vision-based head and face-tracking interfaces. This work has important implications for future assistive input devices. To illustrate this concretely the authors describe work from their own research in which they developed two vision-based facial feature tracking algorithms for human computer interaction and assistive input. Evaluation forms a critical component of this research and the authors provide examples of new quantitative evaluation tasks as well as the use of model real-world applications for the qualitative evaluation of new interaction styles.


2020 ◽  
Vol 128 (10-11) ◽  
pp. 2665-2683 ◽  
Author(s):  
Grigorios G. Chrysos ◽  
Jean Kossaifi ◽  
Stefanos Zafeiriou

Abstract Conditional image generation lies at the heart of computer vision and conditional generative adversarial networks (cGAN) have recently become the method of choice for this task, owing to their superior performance. The focus so far has largely been on performance improvement, with little effort in making cGANs more robust to noise. However, the regression (of the generator) might lead to arbitrarily large errors in the output, which makes cGANs unreliable for real-world applications. In this work, we introduce a novel conditional GAN model, called RoCGAN, which leverages structure in the target space of the model to address the issue. Specifically, we augment the generator with an unsupervised pathway, which promotes the outputs of the generator to span the target manifold, even in the presence of intense noise. We prove that RoCGAN share similar theoretical properties as GAN and establish with both synthetic and real data the merits of our model. We perform a thorough experimental validation on large scale datasets for natural scenes and faces and observe that our model outperforms existing cGAN architectures by a large margin. We also empirically demonstrate the performance of our approach in the face of two types of noise (adversarial and Bernoulli).


2020 ◽  
Vol 10 (18) ◽  
pp. 6227
Author(s):  
Ebenezer Nii Ayi Hammond ◽  
Shijie Zhou ◽  
Hongrong Cheng ◽  
Qihe Liu

Facial age estimation is of interest due to its potential to be applied in many real-life situations. However, recent age estimation efforts do not consider juveniles. Consequently, we introduce a juvenile age detection scheme called LaGMO, which focuses on the juvenile aging cues of facial shape and appearance. LaGMO is a combination of facial landmark points and Term Frequency Inverse Gravity Moment (TF-IGM). Inspired by the formation of words from morphemes, we obtained facial appearance features comprising facial shape and wrinkle texture and represented them as terms that described the age of the face. By leveraging the implicit ordinal relationship between the frequencies of the terms in the face, TF-IGM was used to compute the weights of the terms. From these weights, we built a matrix that corresponds to the possibilities of the face belonging to the age. Next, we reduced the reference matrix according to the juvenile age range (0–17 years) and avoided the exhaustive search through the entire training set. LaGMO detects the age by the projection of an unlabeled face image onto the reference matrix; the value of the projection depicts the higher probability of the image belonging to the age. With Mean Absolute Error (MAE) of 89% on the Face and Gesture Recognition Research Network (FG-NET) dataset, our proposal demonstrated superior performance in juvenile age estimation.


2011 ◽  
pp. 5-44 ◽  
Author(s):  
Daijin Kim ◽  
Jaewon Sung

Face detection is the most fundamental step for the research on image-based automated face analysis such as face tracking, face recognition, face authentication, facial expression recognition and facial gesture recognition. When a novel face image is given we must know where the face is located, and how large the scale is to limit our concern to the face patch in the image and normalize the scale and orientation of the face patch. Usually, the face detection results are not stable; the scale of the detected face rectangle can be larger or smaller than that of the real face in the image. Therefore, many researchers use eye detectors to obtain stable normalized face images. Because the eyes have salient patterns in the human face image, they can be located stably and used for face image normalization. The eye detection becomes more important when we want to apply model-based face image analysis approaches.


2020 ◽  
Vol 12 (10) ◽  
pp. 1660 ◽  
Author(s):  
Qiang Li ◽  
Qi Wang ◽  
Xuelong Li

Deep learning-based hyperspectral image super-resolution (SR) methods have achieved great success recently. However, there are two main problems in the previous works. One is to use the typical three-dimensional convolution analysis, resulting in more parameters of the network. The other is not to pay more attention to the mining of hyperspectral image spatial information, when the spectral information can be extracted. To address these issues, in this paper, we propose a mixed convolutional network (MCNet) for hyperspectral image super-resolution. We design a novel mixed convolutional module (MCM) to extract the potential features by 2D/3D convolution instead of one convolution, which enables the network to more mine spatial features of hyperspectral image. To explore the effective features from 2D unit, we design the local feature fusion to adaptively analyze from all the hierarchical features in 2D units. In 3D unit, we employ spatial and spectral separable 3D convolution to extract spatial and spectral information, which reduces unaffordable memory usage and training time. Extensive evaluations and comparisons on three benchmark datasets demonstrate that the proposed approach achieves superior performance in comparison to existing state-of-the-art methods.


Sign in / Sign up

Export Citation Format

Share Document