Learning to Combine Local and Global Image Information for Contactless Palmprint Recognition

Marjan Stoimchev; Marija Ivanovska; Vitomir Štruc

doi:10.3390/s22010073

Learning to Combine Local and Global Image Information for Contactless Palmprint Recognition

Sensors ◽

10.3390/s22010073 ◽

2021 ◽

Vol 22 (1) ◽

pp. 73

Author(s):

Marjan Stoimchev ◽

Marija Ivanovska ◽

Vitomir Štruc

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Input Image ◽

Palmprint Recognition ◽

Learning Approaches ◽

Elastic Deformations ◽

Feature Representations ◽

Palmar Surface ◽

Proposed Model ◽

Visual Artifacts

In the past few years, there has been a leap from traditional palmprint recognition methodologies, which use handcrafted features, to deep-learning approaches that are able to automatically learn feature representations from the input data. However, the information that is extracted from such deep-learning models typically corresponds to the global image appearance, where only the most discriminative cues from the input image are considered. This characteristic is especially problematic when data is acquired in unconstrained settings, as in the case of contactless palmprint recognition systems, where visual artifacts caused by elastic deformations of the palmar surface are typically present in spatially local parts of the captured images. In this study we address the problem of elastic deformations by introducing a new approach to contactless palmprint recognition based on a novel CNN model, designed as a two-path architecture, where one path processes the input in a holistic manner, while the second path extracts local information from smaller image patches sampled from the input image. As elastic deformations can be assumed to most significantly affect the global appearance, while having a lesser impact on spatially local image areas, the local processing path addresses the issues related to elastic deformations thereby supplementing the information from the global processing path. The model is trained with a learning objective that combines the Additive Angular Margin (ArcFace) Loss and the well-known center loss. By using the proposed model design, the discriminative power of the learned image representation is significantly enhanced compared to standard holistic models, which, as we show in the experimental section, leads to state-of-the-art performance for contactless palmprint recognition. Our approach is tested on two publicly available contactless palmprint datasets—namely, IITD and CASIA—and is demonstrated to perform favorably against state-of-the-art methods from the literature. The source code for the proposed model is made publicly available.

Get full-text (via PubEx)

Deep learning approaches for speech emotion recognition: state of the art and research challenges

Multimedia Tools and Applications ◽

10.1007/s11042-020-09874-7 ◽

2021 ◽

Author(s):

Rashid Jahangir ◽

Ying Wah Teh ◽

Faiqa Hanif ◽

Ghulam Mujtaba

Keyword(s):

Deep Learning ◽

Emotion Recognition ◽

State Of The Art ◽

Speech Emotion Recognition ◽

Learning Approaches ◽

Research Challenges

Get full-text (via PubEx)

Particle Size Estimation in Mixed Commercial Waste Images Using Deep Learning

10.36227/techrxiv.14762043.v1 ◽

2021 ◽

Author(s):

Phongsathorn Kittiworapanya ◽

Kitsuchart Pasupa ◽

Peter Auer

Keyword(s):

Computer Vision ◽

Particle Size ◽

Deep Learning ◽

Waste Management ◽

State Of The Art ◽

Learning Algorithms ◽

Input Image ◽

Size Estimation ◽

Waste Particles ◽

Set Up

<div>We assessed several state-of-the-art deep learning algorithms and computer vision techniques for estimating the particle size of mixed commercial waste from images. In waste management, the first step is often coarse shredding, using the particle size to set up the shredder machine. The difficulty is separating the waste particles in an image, which can not be performed well. This work focused on estimating size by using the texture from the input image, captured at a fixed height from the camera lens to the ground. We found that EfficientNet achieved the best performance of 0.72 on F1-Score and 75.89% on accuracy.<br></div>

Get full-text (via PubEx)

A Survey of Graphical Page Object Detection with Deep Neural Networks

10.20944/preprints202104.0739.v1 ◽

2021 ◽

Author(s):

Jwalin Bhatt ◽

Khurram Azeem Hashmi ◽

Muhammad Zeshan Afzal ◽

Didier Stricker

Keyword(s):

Deep Learning ◽

Object Detection ◽

Conceptual Understanding ◽

Deep Neural Networks ◽

State Of The Art ◽

Learning Approaches ◽

Document Images ◽

Essential Information ◽

Current State ◽

High Level

In any document, graphical elements like tables, figures, and formulas contain essential information. The processing and interpretation of such information require specialized algorithms. Off-the-shelf OCR components cannot process this information reliably. Therefore, an essential step in document analysis pipelines is to detect these graphical components. It leads to a high-level conceptual understanding of the documents that makes digitization of documents viable. Since the advent of deep learning, the performance of deep learning-based object detection has improved many folds. In this work, we outline and summarize the deep learning approaches for detecting graphical page objects in the document images. Therefore, we discuss the most relevant deep learning-based approaches and state-of-the-art graphical page object detection in document images. This work provides a comprehensive understanding of the current state-of-the-art and related challenges. Furthermore, we discuss leading datasets along with the quantitative evaluation. Moreover, it discusses briefly the promising directions that can be utilized for further improvements.

Get full-text (via PubEx)

Covid-19 detection via deep neural network and occlusion sensitivity maps

10.36227/techrxiv.14100890 ◽

2021 ◽

Author(s):

Noor Ahmad ◽

Muhammad Aminu ◽

Mohd Halim Mohd Noor

Keyword(s):

Neural Network ◽

Deep Learning ◽

Deep Neural Network ◽

State Of The Art ◽

Color Images ◽

Fine Tuning ◽

Training Dataset ◽

Learning Approaches ◽

Learning Models ◽

Sensitivity Maps

Deep learning approaches have attracted a lot of attention in the automatic detection of Covid-19 and transfer learning is the most common approach. However, majority of the pre-trained models are trained on color images, which can cause inefficiencies when fine-tuning the models on Covid-19 images which are often grayscale. To address this issue, we propose a deep learning architecture called CovidNet which requires a relatively smaller number of parameters. CovidNet accepts grayscale images as inputs and is suitable for training with limited training dataset. Experimental results show that CovidNet outperforms other state-of-the-art deep learning models for Covid-19 detection.

Get full-text (via PubEx)

Unsupervised Deep Learning via Affinity Diffusion

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6757 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11029-11036

Author(s):

Jiabo Huang ◽

Qi Dong ◽

Shaogang Gong ◽

Xiatian Zhu

Keyword(s):

Deep Learning ◽

State Of The Art ◽

General Purpose ◽

Training Data ◽

Learning Approach ◽

Model Learning ◽

Feature Representations ◽

Discriminative Feature ◽

Training Samples ◽

Unsupervised Deep Learning

Convolutional neural networks (CNNs) have achieved unprecedented success in a variety of computer vision tasks. However, they usually rely on supervised model learning with the need for massive labelled training data, limiting dramatically their usability and deployability in real-world scenarios without any labelling budget. In this work, we introduce a general-purpose unsupervised deep learning approach to deriving discriminative feature representations. It is based on self-discovering semantically consistent groups of unlabelled training samples with the same class concepts through a progressive affinity diffusion process. Extensive experiments on object image classification and clustering show the performance superiority of the proposed method over the state-of-the-art unsupervised learning models using six common image recognition benchmarks including MNIST, SVHN, STL10, CIFAR10, CIFAR100 and ImageNet.

Get full-text (via PubEx)

A Review on Deep Image Contrast Enhancement

SMART MOVES JOURNAL IJOSCIENCE ◽

10.24113/ijoscience.v6i1.258 ◽

2020 ◽

Vol 6 (1) ◽

pp. 4

Author(s):

Puspad Kumar Sharma ◽

Nitesh Gupta ◽

Anurag Shrivastava

Keyword(s):

Image Processing ◽

Deep Learning ◽

Image Enhancement ◽

Research Work ◽

Atmospheric Condition ◽

Input Image ◽

Learning Approaches ◽

Sensing Applications ◽

Image Contrast Enhancement

In image processing applications, one of the main preprocessing phases is image enhancement that is used to produce high quality image or enhanced image than the original input image. These enhanced images can be used in many applications such as remote sensing applications, geo-satellite images, etc. The quality of an image is affected due to several conditions such as by poor illumination, atmospheric condition, wrong lens aperture setting of the camera, noise, etc [2]. So, such degraded/low exposure images are needed to be enhanced by increasing the brightness as well as its contrast and this can be possible by the method of image enhancement. In this research work different image enhancement techniques are discussed and reviewed with their results. The aim of this study is to determine the application of deep learning approaches that have been used for image enhancement. Deep learning is a machine learning approach which is currently revolutionizing a number of disciplines including image processing and computer vision. This paper will attempt to apply deep learning to image filtering, specifically low-light image enhancement. The review given in this paper is quite efficient for future researchers to overcome problems that helps in designing efficient algorithm which enhances quality of the image.

Get full-text (via PubEx)

Deep Learning for Historical Document Analysis and Recognition—A Survey

Journal of Imaging ◽

10.3390/jimaging6100110 ◽

2020 ◽

Vol 6 (10) ◽

pp. 110

Author(s):

Francesco Lombardi ◽

Simone Marinai

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Point Of View ◽

Historical Documents ◽

Learning Approaches ◽

Historical Document ◽

Research Directions ◽

Research Fields ◽

Definition Of ◽

Novel Applications

Nowadays, deep learning methods are employed in a broad range of research fields. The analysis and recognition of historical documents, as we survey in this work, is not an exception. Our study analyzes the papers published in the last few years on this topic from different perspectives: we first provide a pragmatic definition of historical documents from the point of view of the research in the area, then we look at the various sub-tasks addressed in this research. Guided by these tasks, we go through the different input-output relations that are expected from the used deep learning approaches and therefore we accordingly describe the most used models. We also discuss research datasets published in the field and their applications. This analysis shows that the latest research is a leap forward since it is not the simple use of recently proposed algorithms to previous problems, but novel tasks and novel applications of state of the art methods are now considered. Rather than just providing a conclusive picture of the current research in the topic we lastly suggest some potential future trends that can represent a stimulus for innovative research directions.

Get full-text (via PubEx)

Multi-Person Pose Estimation using an Orientation and Occlusion Aware Deep Learning Network

Sensors ◽

10.3390/s20061593 ◽

2020 ◽

Vol 20 (6) ◽

pp. 1593 ◽

Cited By ~ 1

Author(s):

Yanlei Gu ◽

Huiyang Zhang ◽

Shunsuke Kamijo

Keyword(s):

Deep Learning ◽

Pose Estimation ◽

Body Orientation ◽

Learning Approaches ◽

The Public ◽

Learning Network ◽

Proposed Model ◽

Body Boundary ◽

Public Dataset ◽

Deep Learning Network

Image based human behavior and activity understanding has been a hot topic in the field of computer vision and multimedia. As an important part, skeleton estimation, which is also called pose estimation, has attracted lots of interests. For pose estimation, most of the deep learning approaches mainly focus on the joint feature. However, the joint feature is not sufficient, especially when the image includes multi-person and the pose is occluded or not fully visible. This paper proposes a novel multi-task framework for the multi-person pose estimation. The proposed framework is developed based on Mask Region-based Convolutional Neural Networks (R-CNN) and extended to integrate the joint feature, body boundary, body orientation and occlusion condition together. In order to further improve the performance of the multi-person pose estimation, this paper proposes to organize the different information in serial multi-task models instead of the widely used parallel multi-task network. The proposed models are trained on the public dataset Common Objects in Context (COCO), which is further augmented by ground truths of body orientation and mutual-occlusion mask. Experiments demonstrate the performance of the proposed method for multi-person pose estimation and body orientation estimation. The proposed method can detect 84.6% of the Percentage of Correct Keypoints (PCK) and has an 83.7% Correct Detection Rate (CDR). Comparisons further illustrate the proposed model can reduce the over-detection compared with other methods.

Get full-text (via PubEx)

Comparing Deep-Learning Architectures and Traditional Machine-Learning Approaches for Satire Identification in Spanish Tweets

Mathematics ◽

10.3390/math8112075 ◽

2020 ◽

Vol 8 (11) ◽

pp. 2075

Author(s):

Óscar Apolinario-Arzube ◽

José Antonio García-Díaz ◽

José Medina-Moreira ◽

Harry Luna-Aveiga ◽

Rafael Valencia-García

Keyword(s):

Machine Learning ◽

Deep Learning ◽

User Interfaces ◽

State Of The Art ◽

Learning Approaches ◽

Word Embeddings ◽

Linguistic Features ◽

Intended Meaning ◽

Language User ◽

Learning Architectures

Automatic satire identification can help to identify texts in which the intended meaning differs from the literal meaning, improving tasks such as sentiment analysis, fake news detection or natural-language user interfaces. Typically, satire identification is performed by training a supervised classifier for finding linguistic clues that can determine whether a text is satirical or not. For this, the state-of-the-art relies on neural networks fed with word embeddings that are capable of learning interesting characteristics regarding the way humans communicate. However, as far as our knowledge goes, there are no comprehensive studies that evaluate these techniques in Spanish in the satire identification domain. Consequently, in this work we evaluate several deep-learning architectures with Spanish pre-trained word-embeddings and compare the results with strong baselines based on term-counting features. This evaluation is performed with two datasets that contain satirical and non-satirical tweets written in two Spanish variants: European Spanish and Mexican Spanish. Our experimentation revealed that term-counting features achieved similar results to deep-learning approaches based on word-embeddings, both outperforming previous results based on linguistic features. Our results suggest that term-counting features and traditional machine learning models provide competitive results regarding automatic satire identification, slightly outperforming state-of-the-art models.

Get full-text (via PubEx)

Visual Saliency Prediction Based on Deep Learning

Information ◽

10.3390/info10080257 ◽

2019 ◽

Vol 10 (8) ◽

pp. 257 ◽

Cited By ~ 7

Author(s):

Bashir Ghariba ◽

Mohamed S. Shehata ◽

Peter McGuire

Keyword(s):

Deep Learning ◽

Saliency Detection ◽

Visual Saliency ◽

Semantic Segmentation ◽

Input Image ◽

Human Eye ◽

Proposed Model ◽

Global Accuracy ◽

Visual Saliency Detection ◽

Deep Learning Model

Human eye movement is one of the most important functions for understanding our surroundings. When a human eye processes a scene, it quickly focuses on dominant parts of the scene, commonly known as a visual saliency detection or visual attention prediction. Recently, neural networks have been used to predict visual saliency. This paper proposes a deep learning encoder-decoder architecture, based on a transfer learning technique, to predict visual saliency. In the proposed model, visual features are extracted through convolutional layers from raw images to predict visual saliency. In addition, the proposed model uses the VGG-16 network for semantic segmentation, which uses a pixel classification layer to predict the categorical label for every pixel in an input image. The proposed model is applied to several datasets, including TORONTO, MIT300, MIT1003, and DUT-OMRON, to illustrate its efficiency. The results of the proposed model are quantitatively and qualitatively compared to classic and state-of-the-art deep learning models. Using the proposed deep learning model, a global accuracy of up to 96.22% is achieved for the prediction of visual saliency.

Get full-text (via PubEx)