TFW: Annotated Thermal Faces in the Wild Dataset

10.36227/techrxiv.17004538.v1 ◽

2021 ◽

Author(s):

Askat Kuzdeuov ◽

Dana Aubakirova ◽

Darina Koishigarina ◽

Hüseyin Atakan Varol

Keyword(s):

Face Detection ◽

Facial Landmark Detection ◽

Facial Landmarks ◽

Detection Model ◽

Thermal Cameras ◽

Benchmark Datasets ◽

In The Wild ◽

Bounding Boxes ◽

Thermal Face ◽

Detection And Localization

Face detection and localization of facial landmarks are the primary steps in building many face applications in computer vision. Numerous algorithms and benchmark datasets have been proposed to develop accurate face and facial landmark detection models in the visual domain. However, varying illumination conditions still pose challenging problems. Thermal cameras can address this problem because of their operation in longer wavelengths. However, thermal face detection and localization of facial landmarks in the wild condition are overlooked. The main reason is that most of the existing thermal face datasets have been collected in controlled environments. In addition, many of them contain no annotations of face bounding boxes and facial landmarks. In this work, we present a thermal face dataset with manually labeled bounding boxes and facial landmarks to address these problems. The dataset contains 9,202 images of 145 subjects, collected in both controlled and wild conditions. As a baseline, we trained the YOLOv5 object detection model and its adaptation for face detection, YOLO5Face, on our dataset. To show the efficacy of our dataset, we evaluated these models on the RWTH-Aachen thermal face dataset in addition to our test set. We have made the dataset, source code, and pretrained models publicly available at https://github.com/IS2AI/TFW to bolster research in thermal face analysis. <br>

Download Full-text

Detecting Facial Region and Landmarks at Once via Deep Network

Sensors ◽

10.3390/s21165360 ◽

2021 ◽

Vol 21 (16) ◽

pp. 5360

Author(s):

Taehyung Kim ◽

Jiwon Mok ◽

Euichul Lee

Keyword(s):

Face Detection ◽

Detection Method ◽

Single Shot ◽

Landmark Detection ◽

Facial Landmark Detection ◽

Facial Region ◽

Facial Landmarks ◽

Detection Model ◽

Facial Landmark ◽

Shot Detection

For accurate and fast detection of facial landmarks, we propose a new facial landmark detection method. Previous facial landmark detection models generally perform a face detection step before landmark detection. This greatly affects landmark detection performance depending on which face detection model is used. Therefore, we propose a model that can simultaneously detect a face region and a landmark without performing the face detection step before landmark detection. The proposed single-shot detection model is based on the framework of YOLOv3, a one-stage object detection method, and the loss function and structure are altered to learn faces and landmarks at the same time. In addition, EfficientNet-B0 was utilized as the backbone network to increase processing speed and accuracy. The learned database used 300W-LP with 64 facial landmarks. The average normalized error of the proposed model was 2.32 pixels. The processing time per frame was about 15 milliseconds, and the average precision of face detection was about 99%. As a result of the evaluation, it was confirmed that the single-shot detection model has better performance and speed than the previous methods. In addition, as a result of using the COFW database, which has 29 landmarks instead of 64 to verify the proposed method, the average normalization error was 2.56 pixels, which was also confirmed to show promising performance.

Download Full-text

Listen2Cough

Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies ◽

10.1145/3448124 ◽

2021 ◽

Vol 5 (1) ◽

pp. 1-22

Author(s):

Xuhai Xu ◽

Ebrahim Nemati ◽

Korosh Vatanparvar ◽

Viswam Nathan ◽

Tousif Ahmed ◽

...

Keyword(s):

Health Assessment ◽

Ground Truth ◽

Health Condition ◽

Fine Tuning ◽

Detection Model ◽

Assessment Tasks ◽

Augmentation Techniques ◽

In The Wild ◽

The Rich ◽

Lung Health

The prevalence of ubiquitous computing enables new opportunities for lung health monitoring and assessment. In the past few years, there have been extensive studies on cough detection using passively sensed audio signals. However, the generalizability of a cough detection model when applied to external datasets, especially in real-world implementation, is questionable and not explored adequately. Beyond detecting coughs, researchers have looked into how cough sounds can be used in assessing lung health. However, due to the challenges in collecting both cough sounds and lung health condition ground truth, previous studies have been hindered by the limited datasets. In this paper, we propose Listen2Cough to address these gaps. We first build an end-to-end deep learning architecture using public cough sound datasets to detect coughs within raw audio recordings. We employ a pre-trained MobileNet and integrate a number of augmentation techniques to improve the generalizability of our model. Without additional fine-tuning, our model is able to achieve an F1 score of 0.948 when tested against a new clean dataset, and 0.884 on another in-the-wild noisy dataset, leading to an advantage of 5.8% and 8.4% on average over the best baseline model, respectively. Then, to mitigate the issue of limited lung health data, we propose to transform the cough detection task to lung health assessment tasks so that the rich cough data can be leveraged. Our hypothesis is that these tasks extract and utilize similar effective representation from cough sounds. We embed the cough detection model into a multi-instance learning framework with the attention mechanism and further tune the model for lung health assessment tasks. Our final model achieves an F1-score of 0.912 on healthy v.s. unhealthy, 0.870 on obstructive v.s. non-obstructive, and 0.813 on COPD v.s. asthma classification, outperforming the baseline by 10.7%, 6.3%, and 3.7%, respectively. Moreover, the weight value in the attention layer can be used to identify important coughs highly correlated with lung health, which can potentially provide interpretability for expert diagnosis in the future.

Download Full-text

Frequency Variability Feature for Life Signs Detection and Localization in Natural Disasters

Remote Sensing ◽

10.3390/rs13040796 ◽

2021 ◽

Vol 13 (4) ◽

pp. 796

Author(s):

Long Zhang ◽

Xuezhi Yang ◽

Jing Shen

Keyword(s):

Natural Disasters ◽

Current Method ◽

Peak Frequency ◽

Significant Information ◽

Physiological Phenomenon ◽

In The Wild ◽

Motion Magnification ◽

Background Motion ◽

Detection And Localization ◽

Frequency Properties

The locations and breathing signal of people in disaster areas are significant information for search and rescue missions in prioritizing operations to save more lives. For detecting the living people who are lying on the ground and covered with dust, debris or ashes, a motion magnification-based method has recently been proposed. This current method estimates the locations and breathing signal of people from a drone video by assuming that only human breathing-related motions exist in the video. However, in natural disasters, background motions, such as swing trees and grass caused by wind, are mixed with human breathing, that distort this assumption, resulting in misleading or even no life signs locations. Therefore, the life signs in disaster areas are challenging to be detected due to the undesired background motions. Note that human breathing is a natural physiological phenomenon, and it is a periodic motion with a steady peak frequency; while background motion always involves complex space-time behaviors, their peak frequencies seem to be variable over time. Therefore, in this work we analyze and focus on the frequency properties of motions to model a frequency variability feature used for extracting only human breathing, while eliminating irrelevant background motions in the video, which would ease the challenge in detection and localization of life signs. The proposed method was validated with both drone and camera videos recorded in the wild. The average precision measures of our method for drone and camera videos were 0.94 and 0.92, which are higher than that of compared methods, demonstrating that our method is more robust and accurate to background motions. The implications and limitations regarding the frequency variability feature were discussed.

Download Full-text

A Fast and Lightweight Method with Feature Fusion and Multi-Context for Face Detection

Future Internet ◽

10.3390/fi10080080 ◽

2018 ◽

Vol 10 (8) ◽

pp. 80

Author(s):

Lei Zhang ◽

Xiaoli Zhi

Keyword(s):

Face Detection ◽

Graphics Processing Units ◽

High Performance ◽

Feature Fusion ◽

Local Context ◽

Data Set ◽

Global Context ◽

Detection Algorithms ◽

Multi Scale ◽

Benchmark Datasets

Convolutional neural networks (CNN for short) have made great progress in face detection. They mostly take computation intensive networks as the backbone in order to obtain high precision, and they cannot get a good detection speed without the support of high-performance GPUs (Graphics Processing Units). This limits CNN-based face detection algorithms in real applications, especially in some speed dependent ones. To alleviate this problem, we propose a lightweight face detector in this paper, which takes a fast residual network as backbone. Our method can run fast even on cheap and ordinary GPUs. To guarantee its detection precision, multi-scale features and multi-context are fully exploited in efficient ways. Specifically, feature fusion is used to obtain semantic strongly multi-scale features firstly. Then multi-context including both local and global context is added to these multi-scale features without extra computational burden. The local context is added through a depthwise separable convolution based approach, and the global context by a simple global average pooling way. Experimental results show that our method can run at about 110 fps on VGA (Video Graphics Array)-resolution images, while still maintaining competitive precision on WIDER FACE and FDDB (Face Detection Data Set and Benchmark) datasets as compared with its state-of-the-art counterparts.

Download Full-text

No-Reference Image Quality Assessment with Multi-Scale Orderless Pooling of Deep Features

Journal of Imaging ◽

10.3390/jimaging7070112 ◽

2021 ◽

Vol 7 (7) ◽

pp. 112

Author(s):

Domonkos Varga

Keyword(s):

Image Quality ◽

Quality Assessment ◽

Image Quality Assessment ◽

Multiple Scales ◽

Digital Images ◽

Input Image ◽

Perceptual Quality ◽

Reference Image ◽

Benchmark Datasets ◽

In The Wild

The goal of no-reference image quality assessment (NR-IQA) is to evaluate their perceptual quality of digital images without using the distortion-free, pristine counterparts. NR-IQA is an important part of multimedia signal processing since digital images can undergo a wide variety of distortions during storage, compression, and transmission. In this paper, we propose a novel architecture that extracts deep features from the input image at multiple scales to improve the effectiveness of feature extraction for NR-IQA using convolutional neural networks. Specifically, the proposed method extracts deep activations for local patches at multiple scales and maps them onto perceptual quality scores with the help of trained Gaussian process regressors. Extensive experiments demonstrate that the introduced algorithm performs favorably against the state-of-the-art methods on three large benchmark datasets with authentic distortions (LIVE In the Wild, KonIQ-10k, and SPAQ).

Download Full-text

Face Detection Model for Thermal Images

10.1109/icast51016.2020.9557718 ◽

2020 ◽

Author(s):

Hendrick ◽

Surfa Yondri ◽

Rahmat Hidayat ◽

Albar Albar ◽

Hanifa Fitri ◽

...

Keyword(s):

Face Detection ◽

Thermal Images ◽

Detection Model

Download Full-text

Detection and annotation of plant organs from digitised herbarium scans using deep learning

Biodiversity Data Journal ◽

10.3897/bdj.8.e57090 ◽

2020 ◽

Vol 8 ◽

Author(s):

Sohaib Younis ◽

Marco Schmidt ◽

Claus Weiland ◽

Stefan Dressler ◽

Bernhard Seeger ◽

...

Keyword(s):

Deep Learning ◽

Automatic Recognition ◽

Herbarium Specimens ◽

Plant Organs ◽

Detection Model ◽

Plant Organ ◽

Large Numbers ◽

Bounding Boxes ◽

Advanced Computer ◽

Extract Information

As herbarium specimens are increasingly becoming digitised and accessible in online repositories, advanced computer vision techniques are being used to extract information from them. The presence of certain plant organs on herbarium sheets is useful information in various scientific contexts and automatic recognition of these organs will help mobilise such information. In our study, we use deep learning to detect plant organs on digitised herbarium specimens with Faster R-CNN. For our experiment, we manually annotated hundreds of herbarium scans with thousands of bounding boxes for six types of plant organs and used them for training and evaluating the plant organ detection model. The model worked particularly well on leaves and stems, while flowers were also present in large numbers in the sheets, but were not equally well recognised.

Download Full-text

Dog Face Detection Using YOLO Network

MENDEL ◽

10.13164/mendel.2020.2.017 ◽

2020 ◽

Vol 26 (2) ◽

pp. 17-22

Author(s):

Alzbeta Tureckova ◽

Tomas Holik ◽

Zuzana Kominkova Oplatkova

Keyword(s):

Face Detection ◽

Detection System ◽

Training Procedure ◽

The Gaze ◽

Real World Application ◽

Human Face Detection ◽

Deep Convolution Neural Network ◽

Bounding Boxes ◽

The Moment ◽

Different Sources

This work presents the real-world application of the object detection which belongs to one of the current research lines in computer vision. Researchers are commonly focused on human face detection. Compared to that, the current paper presents a challenging task of detecting a dog face instead that is an object with extensive variability in appearance. The system utilises YOLO network, a deep convolution neural network, to~predict bounding boxes and class confidences simultaneously. This paper documents the extensive dataset of dog faces gathered from two different sources and the training procedure of the detector. The proposed system was designed for realization on mobile hardware. This Doggie Smile application helps to snapshot dogs at the moment when they face the camera. The proposed mobile application can simultaneously evaluate the gaze directions of three dogs in scene more than 13 times per second, measured on iPhone XR. The average precision of the dogface detection system is 0.92.

Download Full-text

SmileNet: Registration-Free Smiling Face Detection In The Wild

2017 IEEE International Conference on Computer Vision Workshops (ICCVW) ◽

10.1109/iccvw.2017.186 ◽

2017 ◽

Cited By ~ 4

Author(s):

Youngkyoon Jang ◽

Hatice Gunes ◽

Ioannis Patras

Keyword(s):

Face Detection ◽

In The Wild

Download Full-text