Pose-preserving Cross Spectral Face Hallucination

To narrow the inherent sensing gap in heterogeneous face recognition (HFR), recent methods have resorted to generative models and explored the ?recognition via generation? framework. Even though, it remains a very challenging task to synthesize photo-realistic visible faces (VIS) from near-infrared (NIR) images especially when paired training data are unavailable. We present an approach to avert the data misalignment problem and faithfully preserve pose, expression and identity information during cross-spectral face hallucination. At the pixel level, we introduce an unsupervised attention mechanism to warping that is jointly learned with the generator to derive pixel-wise correspondence from unaligned data. At the image level, an auxiliary generator is employed to facilitate the learning of mapping from NIR to VIS domain. At the domain level, we first apply the mutual information constraint to explicitly measure the correlation between domains and thus benefit synthesis. Extensive experiments on three heterogeneous face datasets demonstrate that our approach not only outperforms current state-of-the-art HFR methods but also produce visually appealing results at a high resolution.

Download Full-text

Transcription Alignment of Historical Vietnamese Manuscripts without Human-Annotated Learning Samples

Applied Sciences ◽

10.3390/app11114894 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4894

Author(s):

Anna Scius-Bertrand ◽

Michael Jungo ◽

Beat Wolf ◽

Andreas Fischer ◽

Marc Bui

Keyword(s):

Object Detection ◽

State Of The Art ◽

Positive Impact ◽

Detection System ◽

Training Data ◽

Detection Accuracy ◽

Current State ◽

Alignment Task ◽

Scanned Image ◽

Automatic Transcription

The current state of the art for automatic transcription of historical manuscripts is typically limited by the requirement of human-annotated learning samples, which are are necessary to train specific machine learning models for specific languages and scripts. Transcription alignment is a simpler task that aims to find a correspondence between text in the scanned image and its existing Unicode counterpart, a correspondence which can then be used as training data. The alignment task can be approached with heuristic methods dedicated to certain types of manuscripts, or with weakly trained systems reducing the required amount of annotations. In this article, we propose a novel learning-based alignment method based on fully convolutional object detection that does not require any human annotation at all. Instead, the object detection system is initially trained on synthetic printed pages using a font and then adapted to the real manuscripts by means of self-training. On a dataset of historical Vietnamese handwriting, we demonstrate the feasibility of annotation-free alignment as well as the positive impact of self-training on the character detection accuracy, reaching a detection accuracy of 96.4% with a YOLOv5m model without using any human annotation.

Download Full-text

QASC: A Dataset for Question Answering via Sentence Composition

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6319 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8082-8090

Author(s):

Tushar Khot ◽

Peter Clark ◽

Michal Guerquin ◽

Peter Jansen ◽

Ashish Sabharwal

Keyword(s):

Common Sense ◽

Human Performance ◽

Question Answering ◽

State Of The Art ◽

Multiple Choice ◽

Training Data ◽

Language Models ◽

Current State ◽

New Concepts ◽

Large Corpus

Composing knowledge from multiple pieces of texts is a key challenge in multi-hop question answering. We present a multi-hop reasoning dataset, Question Answering via Sentence Composition (QASC), that requires retrieving facts from a large corpus and composing them to answer a multiple-choice question. QASC is the first dataset to offer two desirable properties: (a) the facts to be composed are annotated in a large corpus, and (b) the decomposition into these facts is not evident from the question itself. The latter makes retrieval challenging as the system must introduce new concepts or relations in order to discover potential decompositions. Further, the reasoning model must then learn to identify valid compositions of these retrieved facts using common-sense reasoning. To help address these challenges, we provide annotation for supporting facts as well as their composition. Guided by these annotations, we present a two-step approach to mitigate the retrieval challenges. We use other multiple-choice datasets as additional training data to strengthen the reasoning model. Our proposed approach improves over current state-of-the-art language models by 11% (absolute). The reasoning and retrieval problems, however, remain unsolved as this model still lags by 20% behind human performance.

Download Full-text

Generate, Segment, and Refine: Towards Generic Manipulation Segmentation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.7007 ◽

2020 ◽

Vol 34 (07) ◽

pp. 13058-13065 ◽

Cited By ~ 1

Author(s):

Peng Zhou ◽

Bor-Chun Chen ◽

Xintong Han ◽

Mahyar Najibi ◽

Abhinav Shrivastava ◽

...

Keyword(s):

State Of The Art ◽

Training Data ◽

The Internet ◽

Generation Process ◽

Image Generation ◽

Image Sharing ◽

Current State ◽

Photo Editing ◽

Traditional Work ◽

False News

Detecting manipulated images has become a significant emerging challenge. The advent of image sharing platforms and the easy availability of advanced photo editing software have resulted in a large quantities of manipulated images being shared on the internet. While the intent behind such manipulations varies widely, concerns on the spread of false news and misinformation is growing. Current state of the art methods for detecting these manipulated images suffers from the lack of training data due to the laborious labeling process. We address this problem in this paper, for which we introduce a manipulated image generation process that creates true positives using currently available datasets. Drawing from traditional work on image blending, we propose a novel generator for creating such examples. In addition, we also propose to further create examples that force the algorithm to focus on boundary artifacts during training. Strong experimental results validate our proposal.

Download Full-text

Distributional Correspondence Indexing for Cross-Lingual and Cross-Domain Sentiment Classification (Extended Abstract)

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/802 ◽

2018 ◽

Author(s):

Alejandro Moreo Fernández ◽

Andrea Esuli ◽

Fabrizio Sebastiani

Keyword(s):

Domain Adaptation ◽

State Of The Art ◽

Sentiment Classification ◽

Training Data ◽

Target Domain ◽

Source Domain ◽

Machine Learning Methods ◽

Cross Domain ◽

Current State ◽

Cross Lingual

Domain Adaptation (DA) techniques aim at enabling machine learning methods learn effective classifiers for a “target” domain when the only available training data belongs to a different “source” domain. In this extended abstract, we briefly describe our new DA method called Distributional Correspondence Indexing (DCI) for sentiment classification. DCI derives term representations in a vector space common to both domains where each dimension reflects its distributional correspondence to a pivot, i.e., to a highly predictive term that behaves similarly across domains. The experiments we have conducted show that DCI obtains better performance than current state-of-the-art techniques for cross-lingual and cross-domain sentiment classification.

Download Full-text

Exemplification Modeling: Can You Give Me an Example, Please?

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/520 ◽

2021 ◽

Author(s):

Edoardo Barba ◽

Luigi Procopio ◽

Caterina Lacerra ◽

Tommaso Pasini ◽

Roberto Navigli

Keyword(s):

Gold Standard ◽

State Of The Art ◽

Word Sense Disambiguation ◽

Full Range ◽

Training Data ◽

Training Procedure ◽

Word Sense ◽

The Novel ◽

Current State ◽

Sense Disambiguation

Recently, generative approaches have been used effectively to provide definitions of words in their context. However, the opposite, i.e., generating a usage example given one or more words along with their definitions, has not yet been investigated. In this work, we introduce the novel task of Exemplification Modeling (ExMod), along with a sequence-to-sequence architecture and a training procedure for it. Starting from a set of (word, definition) pairs, our approach is capable of automatically generating high-quality sentences which express the requested semantics. As a result, we can drive the creation of sense-tagged data which cover the full range of meanings in any inventory of interest, and their interactions within sentences. Human annotators agree that the sentences generated are as fluent and semantically-coherent with the input definitions as the sentences in manually-annotated corpora. Indeed, when employed as training data for Word Sense Disambiguation, our examples enable the current state of the art to be outperformed, and higher results to be achieved than when using gold-standard datasets only. We release the pretrained model, the dataset and the software at https://github.com/SapienzaNLP/exmod.

Download Full-text

Disentangled Variational Representation for Heterogeneous Face Recognition

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019005 ◽

2019 ◽

Vol 33 ◽

pp. 9005-9012 ◽

Cited By ~ 15

Author(s):

Xiang Wu ◽

Huaibo Huang ◽

Vishal M. Patel ◽

Ran He ◽

Zhenan Sun

Keyword(s):

Face Recognition ◽

Latent Variable ◽

Near Infrared ◽

Heterogeneous Data ◽

Face Representation ◽

Identity Information ◽

Variational Representation ◽

Heterogeneous Face Recognition ◽

Latent Space ◽

Variable Space

Visible (VIS) to near infrared (NIR) face matching is a challenging problem due to the significant domain discrepancy between the domains and a lack of sufficient data for training cross-modal matching algorithms. Existing approaches attempt to tackle this problem by either synthesizing visible faces from NIR faces, extracting domain-invariant features from these modalities, or projecting heterogeneous data onto a common latent space for cross-modal matching. In this paper, we take a different approach in which we make use of the Disentangled Variational Representation (DVR) for crossmodal matching. First, we model a face representation with an intrinsic identity information and its within-person variations. By exploring the disentangled latent variable space, a variational lower bound is employed to optimize the approximate posterior for NIR and VIS representations. Second, aiming at obtaining more compact and discriminative disentangled latent space, we impose a minimization of the identity information for the same subject and a relaxed correlation alignment constraint between the NIR and VIS modality variations. An alternative optimization scheme is proposed for the disentangled variational representation part and the heterogeneous face recognition network part. The mutual promotion between these two parts effectively reduces the NIR and VIS domain discrepancy and alleviates over-fitting. Extensive experiments on three challenging NIR-VIS heterogeneous face recognition databases demonstrate that the proposed method achieves significant improvements over the state-of-the-art methods.

Download Full-text

Distributional Correspondence Indexing for Cross-Lingual and Cross-Domain Sentiment Classification.

Journal of Artificial Intelligence Research ◽

10.1613/jair.4762 ◽

2016 ◽

Vol 55 ◽

pp. 131-163 ◽

Cited By ~ 13

Author(s):

Alejandro Moreo Fernández ◽

Andrea Esuli ◽

Fabrizio Sebastiani

Keyword(s):

Domain Adaptation ◽

State Of The Art ◽

Computational Cost ◽

Sentiment Classification ◽

Training Data ◽

Target Domain ◽

Machine Learning Methods ◽

Cross Domain ◽

Current State ◽

Cross Lingual

Domain Adaptation (DA) techniques aim at enabling machine learning methods learn effective classifiers for a "target'' domain when the only available training data belongs to a different "source'' domain. In this paper we present the Distributional Correspondence Indexing (DCI) method for domain adaptation in sentiment classification. DCI derives term representations in a vector space common to both domains where each dimension reflects its distributional correspondence to a pivot, i.e., to a highly predictive term that behaves similarly across domains. Term correspondence is quantified by means of a distributional correspondence function (DCF). We propose a number of efficient DCFs that are motivated by the distributional hypothesis, i.e., the hypothesis according to which terms with similar meaning tend to have similar distributions in text. Experiments show that DCI obtains better performance than current state-of-the-art techniques for cross-lingual and cross-domain sentiment classification. DCI also brings about a significantly reduced computational cost, and requires a smaller amount of human intervention. As a final contribution, we discuss a more challenging formulation of the domain adaptation problem, in which both the cross-domain and cross-lingual dimensions are tackled simultaneously.

Download Full-text

Near-infrared detector arrays: current state of the art

10.1117/12.395442 ◽

2000 ◽

Cited By ~ 2

Author(s):

Klaus-Werner Hodapp

Keyword(s):

Near Infrared ◽

State Of The Art ◽

Infrared Detector ◽

Detector Arrays ◽

Current State

Download Full-text

Real-time near-infrared fluorescence guided surgery in gynecologic oncology: A review of the current state of the art

Gynecologic Oncology ◽

10.1016/j.ygyno.2014.08.005 ◽

2014 ◽

Vol 135 (3) ◽

pp. 606-613 ◽

Cited By ~ 53

Author(s):

Henricus J.M. Handgraaf ◽

Floris P.R. Verbeek ◽

Quirijn R.J.G. Tummers ◽

Leonora S.F. Boogerd ◽

Cornelis J.H. van de Velde ◽

...

Keyword(s):

Real Time ◽

Near Infrared ◽

State Of The Art ◽

Gynecologic Oncology ◽

Near Infrared Fluorescence ◽

Guided Surgery ◽

Current State ◽

Fluorescence Guided Surgery

Download Full-text

Multi-Margin based Decorrelation Learning for Heterogeneous Face Recognition

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/96 ◽

2019 ◽

Cited By ~ 2

Author(s):

Bing Cao ◽

Nannan Wang ◽

Xinbo Gao ◽

Jie Li ◽

Zhifeng Li

Keyword(s):

Face Recognition ◽

Large Scale ◽

State Of The Art ◽

Representation Learning ◽

Training Data ◽

Superior Performance ◽

Neural Network Approach ◽

Cross Domain ◽

Face Images ◽

Heterogeneous Face Recognition

Heterogeneous face recognition (HFR) refers to matching face images acquired from different domains with wide applications in security scenarios. However, HFR is still a challenging problem due to the significant cross-domain discrepancy and the lacking of sufficient training data in different domains. This paper presents a deep neural network approach namely Multi-Margin based Decorrelation Learning (MMDL) to extract decorrelation representations in a hyperspherical space for cross-domain face images. The proposed framework can be divided into two components: heterogeneous representation network and decorrelation representation learning. First, we employ a large scale of accessible visual face images to train heterogeneous representation network. The decorrelation layer projects the output of the first component into decorrelation latent subspace and obtain decorrelation representation. In addition, we design a multi-margin loss (MML), which consists of tetradmargin loss (TML) and heterogeneous angular margin loss (HAML), to constrain the proposed framework. Experimental results on two challenging heterogeneous face databases show that our approach achieves superior performance on both verification and recognition tasks, comparing with state-of-the-art methods.

Download Full-text