scholarly journals TFW: Annotated Thermal Faces in the Wild Dataset

Author(s):  
Askat Kuzdeuov ◽  
Dana Aubakirova ◽  
Darina Koishigarina ◽  
Hüseyin Atakan Varol

Face detection and localization of facial landmarks are the primary steps in building many face applications in computer vision. Numerous algorithms and benchmark datasets have been proposed to develop accurate face and facial landmark detection models in the visual domain. However, varying illumination conditions still pose challenging problems. Thermal cameras can address this problem because of their operation in longer wavelengths. However, thermal face detection and localization of facial landmarks in the wild condition are overlooked. The main reason is that most of the existing thermal face datasets have been collected in controlled environments. In addition, many of them contain no annotations of face bounding boxes and facial landmarks. In this work, we present a thermal face dataset with manually labeled bounding boxes and facial landmarks to address these problems. The dataset contains 9,202 images of 145 subjects, collected in both controlled and wild conditions. As a baseline, we trained the YOLOv5 object detection model and its adaptation for face detection, YOLO5Face, on our dataset. To show the efficacy of our dataset, we evaluated these models on the RWTH-Aachen thermal face dataset in addition to our test set. We have made the dataset, source code, and pretrained models publicly available at https://github.com/IS2AI/TFW to bolster research in thermal face analysis. <br>

2021 ◽  
Author(s):  
Askat Kuzdeuov ◽  
Dana Aubakirova ◽  
Darina Koishigarina ◽  
Hüseyin Atakan Varol

Face detection and localization of facial landmarks are the primary steps in building many face applications in computer vision. Numerous algorithms and benchmark datasets have been proposed to develop accurate face and facial landmark detection models in the visual domain. However, varying illumination conditions still pose challenging problems. Thermal cameras can address this problem because of their operation in longer wavelengths. However, thermal face detection and localization of facial landmarks in the wild condition are overlooked. The main reason is that most of the existing thermal face datasets have been collected in controlled environments. In addition, many of them contain no annotations of face bounding boxes and facial landmarks. In this work, we present a thermal face dataset with manually labeled bounding boxes and facial landmarks to address these problems. The dataset contains 9,202 images of 145 subjects, collected in both controlled and wild conditions. As a baseline, we trained the YOLOv5 object detection model and its adaptation for face detection, YOLO5Face, on our dataset. To show the efficacy of our dataset, we evaluated these models on the RWTH-Aachen thermal face dataset in addition to our test set. We have made the dataset, source code, and pretrained models publicly available at https://github.com/IS2AI/TFW to bolster research in thermal face analysis. <br>


Sensors ◽  
2021 ◽  
Vol 21 (16) ◽  
pp. 5360
Author(s):  
Taehyung Kim ◽  
Jiwon Mok ◽  
Euichul Lee

For accurate and fast detection of facial landmarks, we propose a new facial landmark detection method. Previous facial landmark detection models generally perform a face detection step before landmark detection. This greatly affects landmark detection performance depending on which face detection model is used. Therefore, we propose a model that can simultaneously detect a face region and a landmark without performing the face detection step before landmark detection. The proposed single-shot detection model is based on the framework of YOLOv3, a one-stage object detection method, and the loss function and structure are altered to learn faces and landmarks at the same time. In addition, EfficientNet-B0 was utilized as the backbone network to increase processing speed and accuracy. The learned database used 300W-LP with 64 facial landmarks. The average normalized error of the proposed model was 2.32 pixels. The processing time per frame was about 15 milliseconds, and the average precision of face detection was about 99%. As a result of the evaluation, it was confirmed that the single-shot detection model has better performance and speed than the previous methods. In addition, as a result of using the COFW database, which has 29 landmarks instead of 64 to verify the proposed method, the average normalization error was 2.56 pixels, which was also confirmed to show promising performance.


Author(s):  
Xuhai Xu ◽  
Ebrahim Nemati ◽  
Korosh Vatanparvar ◽  
Viswam Nathan ◽  
Tousif Ahmed ◽  
...  

The prevalence of ubiquitous computing enables new opportunities for lung health monitoring and assessment. In the past few years, there have been extensive studies on cough detection using passively sensed audio signals. However, the generalizability of a cough detection model when applied to external datasets, especially in real-world implementation, is questionable and not explored adequately. Beyond detecting coughs, researchers have looked into how cough sounds can be used in assessing lung health. However, due to the challenges in collecting both cough sounds and lung health condition ground truth, previous studies have been hindered by the limited datasets. In this paper, we propose Listen2Cough to address these gaps. We first build an end-to-end deep learning architecture using public cough sound datasets to detect coughs within raw audio recordings. We employ a pre-trained MobileNet and integrate a number of augmentation techniques to improve the generalizability of our model. Without additional fine-tuning, our model is able to achieve an F1 score of 0.948 when tested against a new clean dataset, and 0.884 on another in-the-wild noisy dataset, leading to an advantage of 5.8% and 8.4% on average over the best baseline model, respectively. Then, to mitigate the issue of limited lung health data, we propose to transform the cough detection task to lung health assessment tasks so that the rich cough data can be leveraged. Our hypothesis is that these tasks extract and utilize similar effective representation from cough sounds. We embed the cough detection model into a multi-instance learning framework with the attention mechanism and further tune the model for lung health assessment tasks. Our final model achieves an F1-score of 0.912 on healthy v.s. unhealthy, 0.870 on obstructive v.s. non-obstructive, and 0.813 on COPD v.s. asthma classification, outperforming the baseline by 10.7%, 6.3%, and 3.7%, respectively. Moreover, the weight value in the attention layer can be used to identify important coughs highly correlated with lung health, which can potentially provide interpretability for expert diagnosis in the future.


2021 ◽  
Vol 13 (4) ◽  
pp. 796
Author(s):  
Long Zhang ◽  
Xuezhi Yang ◽  
Jing Shen

The locations and breathing signal of people in disaster areas are significant information for search and rescue missions in prioritizing operations to save more lives. For detecting the living people who are lying on the ground and covered with dust, debris or ashes, a motion magnification-based method has recently been proposed. This current method estimates the locations and breathing signal of people from a drone video by assuming that only human breathing-related motions exist in the video. However, in natural disasters, background motions, such as swing trees and grass caused by wind, are mixed with human breathing, that distort this assumption, resulting in misleading or even no life signs locations. Therefore, the life signs in disaster areas are challenging to be detected due to the undesired background motions. Note that human breathing is a natural physiological phenomenon, and it is a periodic motion with a steady peak frequency; while background motion always involves complex space-time behaviors, their peak frequencies seem to be variable over time. Therefore, in this work we analyze and focus on the frequency properties of motions to model a frequency variability feature used for extracting only human breathing, while eliminating irrelevant background motions in the video, which would ease the challenge in detection and localization of life signs. The proposed method was validated with both drone and camera videos recorded in the wild. The average precision measures of our method for drone and camera videos were 0.94 and 0.92, which are higher than that of compared methods, demonstrating that our method is more robust and accurate to background motions. The implications and limitations regarding the frequency variability feature were discussed.


2018 ◽  
Vol 10 (8) ◽  
pp. 80
Author(s):  
Lei Zhang ◽  
Xiaoli Zhi

Convolutional neural networks (CNN for short) have made great progress in face detection. They mostly take computation intensive networks as the backbone in order to obtain high precision, and they cannot get a good detection speed without the support of high-performance GPUs (Graphics Processing Units). This limits CNN-based face detection algorithms in real applications, especially in some speed dependent ones. To alleviate this problem, we propose a lightweight face detector in this paper, which takes a fast residual network as backbone. Our method can run fast even on cheap and ordinary GPUs. To guarantee its detection precision, multi-scale features and multi-context are fully exploited in efficient ways. Specifically, feature fusion is used to obtain semantic strongly multi-scale features firstly. Then multi-context including both local and global context is added to these multi-scale features without extra computational burden. The local context is added through a depthwise separable convolution based approach, and the global context by a simple global average pooling way. Experimental results show that our method can run at about 110 fps on VGA (Video Graphics Array)-resolution images, while still maintaining competitive precision on WIDER FACE and FDDB (Face Detection Data Set and Benchmark) datasets as compared with its state-of-the-art counterparts.


2021 ◽  
Vol 7 (7) ◽  
pp. 112
Author(s):  
Domonkos Varga

The goal of no-reference image quality assessment (NR-IQA) is to evaluate their perceptual quality of digital images without using the distortion-free, pristine counterparts. NR-IQA is an important part of multimedia signal processing since digital images can undergo a wide variety of distortions during storage, compression, and transmission. In this paper, we propose a novel architecture that extracts deep features from the input image at multiple scales to improve the effectiveness of feature extraction for NR-IQA using convolutional neural networks. Specifically, the proposed method extracts deep activations for local patches at multiple scales and maps them onto perceptual quality scores with the help of trained Gaussian process regressors. Extensive experiments demonstrate that the introduced algorithm performs favorably against the state-of-the-art methods on three large benchmark datasets with authentic distortions (LIVE In the Wild, KonIQ-10k, and SPAQ).


2020 ◽  
Author(s):  
Hendrick ◽  
Surfa Yondri ◽  
Rahmat Hidayat ◽  
Albar Albar ◽  
Hanifa Fitri ◽  
...  

2020 ◽  
Vol 8 ◽  
Author(s):  
Sohaib Younis ◽  
Marco Schmidt ◽  
Claus Weiland ◽  
Stefan Dressler ◽  
Bernhard Seeger ◽  
...  

As herbarium specimens are increasingly becoming digitised and accessible in online repositories, advanced computer vision techniques are being used to extract information from them. The presence of certain plant organs on herbarium sheets is useful information in various scientific contexts and automatic recognition of these organs will help mobilise such information. In our study, we use deep learning to detect plant organs on digitised herbarium specimens with Faster R-CNN. For our experiment, we manually annotated hundreds of herbarium scans with thousands of bounding boxes for six types of plant organs and used them for training and evaluating the plant organ detection model. The model worked particularly well on leaves and stems, while flowers were also present in large numbers in the sheets, but were not equally well recognised.


MENDEL ◽  
2020 ◽  
Vol 26 (2) ◽  
pp. 17-22
Author(s):  
Alzbeta Tureckova ◽  
Tomas Holik ◽  
Zuzana Kominkova Oplatkova

This work presents the real-world application of the object detection which belongs to one of the current research lines in computer vision. Researchers are commonly focused on human face detection. Compared to that, the current paper presents a challenging task of detecting a dog face instead that is an object with extensive variability in appearance. The system utilises YOLO network, a deep convolution neural network, to~predict bounding boxes and class confidences simultaneously. This paper documents the extensive dataset of dog faces gathered from two different sources and the training procedure of the detector. The proposed system was designed for realization on mobile hardware. This Doggie Smile application helps to snapshot dogs at the moment when they face the camera. The proposed mobile application can simultaneously evaluate the gaze directions of three dogs in scene more than 13 times per second, measured on iPhone XR. The average precision of the dogface detection system is 0.92.


Sign in / Sign up

Export Citation Format

Share Document