A Fast and Lightweight Method with Feature Fusion and Multi-Context for Face Detection

Lei Zhang; Xiaoli Zhi

doi:10.3390/fi10080080

A Fast and Lightweight Method with Feature Fusion and Multi-Context for Face Detection

Future Internet ◽

10.3390/fi10080080 ◽

2018 ◽

Vol 10 (8) ◽

pp. 80

Author(s):

Lei Zhang ◽

Xiaoli Zhi

Keyword(s):

Face Detection ◽

Graphics Processing Units ◽

High Performance ◽

Feature Fusion ◽

Local Context ◽

Data Set ◽

Global Context ◽

Detection Algorithms ◽

Multi Scale ◽

Benchmark Datasets

Convolutional neural networks (CNN for short) have made great progress in face detection. They mostly take computation intensive networks as the backbone in order to obtain high precision, and they cannot get a good detection speed without the support of high-performance GPUs (Graphics Processing Units). This limits CNN-based face detection algorithms in real applications, especially in some speed dependent ones. To alleviate this problem, we propose a lightweight face detector in this paper, which takes a fast residual network as backbone. Our method can run fast even on cheap and ordinary GPUs. To guarantee its detection precision, multi-scale features and multi-context are fully exploited in efficient ways. Specifically, feature fusion is used to obtain semantic strongly multi-scale features firstly. Then multi-context including both local and global context is added to these multi-scale features without extra computational burden. The local context is added through a depthwise separable convolution based approach, and the global context by a simple global average pooling way. Experimental results show that our method can run at about 110 fps on VGA (Video Graphics Array)-resolution images, while still maintaining competitive precision on WIDER FACE and FDDB (Face Detection Data Set and Benchmark) datasets as compared with its state-of-the-art counterparts.

DCPNet: A Densely Connected Pyramid Network for Monocular Depth Estimation

Sensors ◽

10.3390/s21206780 ◽

2021 ◽

Vol 21 (20) ◽

pp. 6780

Author(s):

Zhitong Lai ◽

Rui Tian ◽

Zhiguo Wu ◽

Nannan Ding ◽

Linjian Sun ◽

...

Keyword(s):

Multiple Scales ◽

Feature Fusion ◽

State Of The Art ◽

Depth Estimation ◽

Multi Scale ◽

Pyramid Structure ◽

Benchmark Datasets ◽

The Common ◽

Monocular Depth ◽

Multiple Stages

Pyramid architecture is a useful strategy to fuse multi-scale features in deep monocular depth estimation approaches. However, most pyramid networks fuse features only within the adjacent stages in a pyramid structure. To take full advantage of the pyramid structure, inspired by the success of DenseNet, this paper presents DCPNet, a densely connected pyramid network that fuses multi-scale features from multiple stages of the pyramid structure. DCPNet not only performs feature fusion between the adjacent stages, but also non-adjacent stages. To fuse these features, we design a simple and effective dense connection module (DCM). In addition, we offer a new consideration of the common upscale operation in our approach. We believe DCPNet offers a more efficient way to fuse features from multiple scales in a pyramid-like network. We perform extensive experiments using both outdoor and indoor benchmark datasets (i.e., the KITTI and the NYU Depth V2 datasets) and DCPNet achieves the state-of-the-art results.

An Algorithm for the Detection of Faces on the Basis of Gabor Features and Information Maximization

Neural Computation ◽

10.1162/089976604773717577 ◽

2004 ◽

Vol 16 (6) ◽

pp. 1163-1191 ◽

Cited By ~ 3

Author(s):

Hitoshi Imaoka ◽

Kenji Okajima

Keyword(s):

Face Detection ◽

Computational Cost ◽

Generalization Error ◽

High Detection Rate ◽

Data Set ◽

Vast Number ◽

Detection Algorithms ◽

Gabor Features ◽

Information Maximization ◽

Maximization Principle

We propose an algorithm for the detection of facial regions within input images. The characteristics of this algorithm are (1) a vast number of Gabor-type features (196,800) in various orientations, and with various frequencies and central positions, which are used as feature candidates in representing the patterns of an image, and (2) an information maximization principle, which is used to select several hundred features that are suitable for the detection of faces from among these candidates. Using only the selected features in face detection leads to reduced computational cost and is also expected to reduce generalization error. We applied the system, after training, to 42 input images with complex backgrounds (Test Set A from the Carnegie Mellon University face data set). The result was a high detection rate of 87.0%, with only six false detections. We compared the result with other published face detection algorithms.

Learning conditional photometric stereo with high-resolution features

Computational Visual Media ◽

10.1007/s41095-021-0223-y ◽

2021 ◽

Vol 8 (1) ◽

pp. 105-118

Author(s):

Yakun Ju ◽

Yuxin Peng ◽

Muwei Jian ◽

Feng Gao ◽

Junyu Dong

Keyword(s):

Neural Networks ◽

High Resolution ◽

Feature Fusion ◽

Low Frequency ◽

Photometric Stereo ◽

Surface Orientation ◽

Smooth Functions ◽

Multi Scale ◽

Benchmark Datasets ◽

Deep Feature Extraction

AbstractPhotometric stereo aims to reconstruct 3D geometry by recovering the dense surface orientation of a 3D object from multiple images under differing illumination. Traditional methods normally adopt simplified reflectance models to make the surface orientation computable. However, the real reflectances of surfaces greatly limit applicability of such methods to real-world objects. While deep neural networks have been employed to handle non-Lambertian surfaces, these methods are subject to blurring and errors, especially in high-frequency regions (such as crinkles and edges), caused by spectral bias: neural networks favor low-frequency representations so exhibit a bias towards smooth functions. In this paper, therefore, we propose a self-learning conditional network with multi-scale features for photometric stereo, avoiding blurred reconstruction in such regions. Our explorations include: (i) a multi-scale feature fusion architecture, which keeps high-resolution representations and deep feature extraction, simultaneously, and (ii) an improved gradient-motivated conditionally parameterized convolution (GM-CondConv) in our photometric stereo network, with different combinations of convolution kernels for varying surfaces. Extensive experiments on public benchmark datasets show that our calibrated photometric stereo method outperforms the state-of-the-art.

Adaptive Multi-Scale Feature Fusion Based Residual U-net for Fracture Segmentation in Coal Rock Images

10.21203/rs.2.23959/v2 ◽

2020 ◽

Author(s):

Fengli Lu ◽

Chengcai Fu ◽

Guoying Zhang ◽

Jie Shi

Keyword(s):

Spatial Information ◽

Feature Fusion ◽

Ct Images ◽

Published Data ◽

Rock Fractures ◽

Feature Maps ◽

Data Set ◽

Scale Feature ◽

Multi Scale ◽

Coal Rock

Abstract Accurate segmentation of fractures in coal rock CT images is important for safe production and the development of coalbed methane. However, the coal rock fractures formed through natural geological evolution, which are complex, low contrast and different scales. Furthermore, there is no published data set of coal rock. In this paper, we proposed adaptive multi-scale feature fusion based residual U-uet (AMSFFR-U-uet) for fracture segmentation in coal rock CT images. The dilated residual blocks (DResBlock) with dilated ratio (1,2,3) are embedded into encoding branch of the U-uet structure, which can improve the ability of extract feature of network and capture different scales fractures. Furthermore, feature maps of different sizes in the encoding branch are concatenated by adaptive multi-scale feature fusion (AMSFF) module. And AMSFF can not only capture different scales fractures but also improve the restoration of spatial information. To alleviate the lack of coal rock fractures training data, we applied a set of comprehensive data augmentation operations to increase the diversity of training samples. Our network, U-net and Res-U-net are tested on our test set of coal rock CT images with five different region coal rock samples. The experimental results show that our proposed approach improve the average Dice coefficient by 2.9%, the average precision by 7.2% and the average Recall by 9.1% , respectively. Therefore, AMSFFR-U-net can achieve better segmentation results of coal rock fractures, and has stronger generalization ability and robustness.

LCF: A Local Context Focus Mechanism for Aspect-Based Sentiment Classification

Applied Sciences ◽

10.3390/app9163389 ◽

2019 ◽

Vol 9 (16) ◽

pp. 3389 ◽

Cited By ~ 6

Author(s):

Biqing Zeng ◽

Heng Yang ◽

Ruyang Xu ◽

Wu Zhou ◽

Xuli Han

Keyword(s):

State Of The Art ◽

Sentiment Classification ◽

Experimental Results ◽

Local Context ◽

Baseline Model ◽

Global Context ◽

Benchmark Datasets ◽

Art Performance ◽

Context Features

Aspect-based sentiment classification (ABSC) aims to predict sentiment polarities of different aspects within sentences or documents. Many previous studies have been conducted to solve this problem, but previous works fail to notice the correlation between the aspect’s sentiment polarity and the local context. In this paper, a Local Context Focus (LCF) mechanism is proposed for aspect-based sentiment classification based on Multi-head Self-Attention (MHSA). This mechanism is called LCF design, and utilizes the Context features Dynamic Mask (CDM) and Context Features Dynamic Weighted (CDW) layers to pay more attention to the local context words. Moreover, a BERT-shared layer is adopted to LCF design to capture internal long-term dependencies of local context and global context. Experiments are conducted on three common ABSC datasets: the laptop and restaurant datasets of SemEval-2014 and the ACL twitter dataset. Experimental results demonstrate that the LCF baseline model achieves considerable performance. In addition, we conduct ablation experiments to prove the significance and effectiveness of LCF design. Especially, by incorporating with BERT-shared layer, the LCF-BERT model refreshes state-of-the-art performance on all three benchmark datasets.

The Application of Improved YOLO V3 in Multi-Scale Target Detection

Applied Sciences ◽

10.3390/app9183775 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3775 ◽

Cited By ~ 17

Author(s):

Ju ◽

Luo ◽

Wang ◽

Hui ◽

Chang

Keyword(s):

Target Detection ◽

Feature Fusion ◽

Detection Algorithm ◽

Detection Performance ◽

Research Directions ◽

Detection Algorithms ◽

Multi Scale ◽

Derivation Method ◽

Small Targets ◽

Mathematical Derivation

Target detection is one of the most important research directions in computer vision. Recently, a variety of target detection algorithms have been proposed. Since the targets have varying sizes in a scene, it is essential to be able to detect the targets at different scales. To improve the detection performance of targets with different sizes, a multi-scale target detection algorithm was proposed involving improved YOLO (You Only Look Once) V3. The main contributions of our work include: (1) a mathematical derivation method based on Intersection over Union (IOU) was proposed to select the number and the aspect ratio dimensions of the candidate anchor boxes for each scale of the improved YOLO V3; (2) To further improve the detection performance of the network, the detection scales of YOLO V3 have been extended from 3 to 4 and the feature fusion target detection layer downsampled by 4× is established to detect the small targets; (3) To avoid gradient fading and enhance the reuse of the features, the six convolutional layers in front of the output detection layer are transformed into two residual units. The experimental results upon PASCAL VOC dataset and KITTI dataset show that the proposed method has obtained better performance than other state-of-the-art target detection algorithms.

Transmission Line Obstacle Detection Based on Structural Constraint and Feature Fusion

Symmetry ◽

10.3390/sym12030452 ◽

2020 ◽

Vol 12 (3) ◽

pp. 452

Author(s):

Xuhui Ye ◽

Dong Wang ◽

Daode Zhang ◽

Xinyu Hu

Keyword(s):

Feature Fusion ◽

Recognition Rate ◽

Obstacle Detection ◽

Support Vector ◽

Structural Constraints ◽

Structural Constraint ◽

Image Block ◽

Long Distance ◽

Data Set ◽

Multi Scale

Accurate detection and identification of obstacles plays an important role in the navigation and behavior planning of the patrol robot. Aiming at the patrol robot with camera mounted symmetrically, an obstacle detection method based on structural constraint and feature fusion is proposed. Firstly, in order to discover the region of interest, the bounding box algorithm is used to propose the region. The location of the detected ground wire is used to constrain the region, and the image block of interest is clipped. Secondly, in order to accurately represent the multi-view and multi-scale obstacle images, the global shape features and the improved local corner features are fused by different weights. Then, the particle swarm-optimized support vector machine (PSO-SVM) is used for classifying and recognizing obstacles. On block data set B containing multi-view and multi-scale obstacle images, the recognition rate of this method can reach up to 86.2%, which shows the effectiveness of weighted fusion of global and local features. On data set A containing complete images of different distances, the detection success rate of long-distance obstacles can reach 80.2%. The validity of the proposed method based on structural constraints and feature fusion is verified.

Multi-Scale Feature Aggregation Network for Water Area Segmentation

Remote Sensing ◽

10.3390/rs14010206 ◽

2022 ◽

Vol 14 (1) ◽

pp. 206

Author(s):

Kai Hu ◽

Meng Li ◽

Min Xia ◽

Haifeng Lin

Keyword(s):

High Performance ◽

Semantic Information ◽

Feature Fusion ◽

Water Area ◽

Detection Methods ◽

Practical Significance ◽

Scale Feature ◽

Multi Scale ◽

Feature Aggregation ◽

Deep Feature Extraction

Water area segmentation is an important branch of remote sensing image segmentation, but in reality, most water area images have complex and diverse backgrounds. Traditional detection methods cannot accurately identify small tributaries due to incomplete mining and insufficient utilization of semantic information, and the edge information of segmentation is rough. To solve the above problems, we propose a multi-scale feature aggregation network. In order to improve the ability of the network to process boundary information, we design a deep feature extraction module using a multi-scale pyramid to extract features, combined with the designed attention mechanism and strip convolution, extraction of multi-scale deep semantic information and enhancement of spatial and location information. Then, the multi-branch aggregation module is used to interact with different scale features to enhance the positioning information of the pixels. Finally, the two high-performance branches designed in the Feature Fusion Upsample module are used to deeply extract the semantic information of the image, and the deep information is fused with the shallow information generated by the multi-branch module to improve the ability of the network. Global and local features are used to determine the location distribution of each image category. The experimental results show that the accuracy of the segmentation method in this paper is better than that in the previous detection methods, and has important practical significance for the actual water area segmentation.

Context-aware Cross-level Fusion Network for Camouflaged Object Detection

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/142 ◽

2021 ◽

Author(s):

Yujia Sun ◽

Geng Chen ◽

Tao Zhou ◽

Yi Zhang ◽

Nian Liu

Keyword(s):

Object Detection ◽

State Of The Art ◽

Context Aware ◽

Global Context ◽

Feature Representations ◽

Multi Scale ◽

Benchmark Datasets ◽

Multi Level ◽

High Level ◽

Level Fusion

Camouflaged object detection (COD) is a challenging task due to the low boundary contrast between the object and its surroundings. In addition, the appearance of camouflaged objects varies significantly, e.g., object size and shape, aggravating the difficulties of accurate COD. In this paper, we propose a novel Context-aware Cross-level Fusion Network (C2F-Net) to address the challenging COD task. Specifically, we propose an Attention-induced Cross-level Fusion Module (ACFM) to integrate the multi-level features with informative attention coefficients. The fused features are then fed to the proposed Dual-branch Global Context Module (DGCM), which yields multi-scale feature representations for exploiting rich global context information. In C2F-Net, the two modules are conducted on high-level features using a cascaded manner. Extensive experiments on three widely used benchmark datasets demonstrate that our C2F-Net is an effective COD model and outperforms state-of-the-art models remarkably. Our code is publicly available at: https://github.com/thograce/C2FNet.

Prime Proportion Affects Masked Priming of Fixed and Free-Choice Responses

Experimental Psychology (formerly Zeitschrift für Experimentelle Psychologie) ◽

10.1027/1618-3169/a000043 ◽

2010 ◽

Vol 57 (5) ◽

pp. 360-366 ◽

Cited By ~ 23

Author(s):

Glen E. Bodner ◽

Rehman Mulji

Keyword(s):

Decision Process ◽

Free Choice ◽

Masked Priming ◽

Local Context ◽

Previous Trial ◽

Target Response ◽

Global Context ◽

Sequential Trial

Left/right “fixed” responses to arrow targets are influenced by whether a masked arrow prime is congruent or incongruent with the required target response. Left/right “free-choice” responses on trials with ambiguous targets that are mixed among fixed trials are also influenced by masked arrow primes. We show that the magnitude of masked priming of both fixed and free-choice responses is greater when the proportion of fixed trials with congruent primes is .8 rather than .2. Unconscious manipulation of context can thus influence both fixed and free choices. Sequential trial analyses revealed that these effects of the overall prime context on fixed and free-choice priming can be modulated by the local context (i.e., the nature of the previous trial). Our results support accounts of masked priming that posit a memory-recruitment, activation, or decision process that is sensitive to aspects of both the local and global context.