Bilingual sentence matching using Kernel CCA

Sentence semantic matching requires an agent to determine the semantic relation between two sentences, which is widely used in various natural language tasks such as Natural Language Inference (NLI) and Paraphrase Identification (PI). Among all matching methods, attention mechanism plays an important role in capturing the semantic relations and properly aligning the elements of two sentences. Previous methods utilized attention mechanism to select important parts of sentences at one time. However, the important parts of the sentence during semantic matching are dynamically changing with the degree of sentence understanding. Selecting the important parts at one time may be insufficient for semantic understanding. To this end, we propose a Dynamic Re-read Network (DRr-Net) approach for sentence semantic matching, which is able to pay close attention to a small region of sentences at each step and re-read the important words for better sentence semantic understanding. To be specific, we first employ Attention Stack-GRU (ASG) unit to model the original sentence repeatedly and preserve all the information from bottom-most word embedding input to up-most recurrent output. Second, we utilize Dynamic Re-read (DRr) unit to pay close attention to one important word at one time with the consideration of learned information and re-read the important words for better sentence semantic understanding. Extensive experiments on three sentence matching benchmark datasets demonstrate that DRr-Net has the ability to model sentence semantic more precisely and significantly improve the performance of sentence semantic matching. In addition, it is very interesting that some of finding in our experiments are consistent with the findings of psychological research.

Download Full-text

Syntax-Aware Sentence Matching with Graph Convolutional Networks

Knowledge Science, Engineering and Management - Lecture Notes in Computer Science ◽

10.1007/978-3-030-29563-9_31 ◽

2019 ◽

pp. 353-364

Author(s):

Yangfan Lei ◽

Yue Hu ◽

Xiangpeng Wei ◽

Luxi Xing ◽

Quanchao Liu

Keyword(s):

Convolutional Networks ◽

Sentence Matching

Download Full-text

Adversarial Cross-domain Community Question Retrieval

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3487291 ◽

2022 ◽

Vol 21 (3) ◽

pp. 1-22

Author(s):

Aibo Guo ◽

Xinyi Li ◽

Ning Pang ◽

Xiang Zhao

Keyword(s):

Social Media ◽

Information Sharing ◽

Information Access ◽

Professional Knowledge ◽

Feature Space ◽

Online Information ◽

Cross Domain ◽

Adversarial Training ◽

Sentence Matching ◽

Target Data

Community Q&A forum is a special type of social media that provides a platform to raise questions and to answer them (both by forum participants), to facilitate online information sharing. Currently, community Q&A forums in professional domains have attracted a large number of users by offering professional knowledge. To support information access and save users’ efforts of raising new questions, they usually come with a question retrieval function, which retrieves similar existing questions (and their answers) to a user’s query. However, it can be difficult for community Q&A forums to cover all domains, especially those emerging lately with little labeled data but great discrepancy from existing domains. We refer to this scenario as cross-domain question retrieval. To handle the unique challenges of cross-domain question retrieval, we design a model based on adversarial training, namely, X-QR , which consists of two modules—a domain discriminator and a sentence matcher. The domain discriminator aims at aligning the source and target data distributions and unifying the feature space by domain-adversarial training. With the assistance of the domain discriminator, the sentence matcher is able to learn domain-consistent knowledge for the final matching prediction. To the best of our knowledge, this work is among the first to investigate the domain adaption problem of sentence matching for community Q&A forums question retrieval. The experiment results suggest that the proposed X-QR model offers better performance than conventional sentence matching methods in accomplishing cross-domain community Q&A tasks.

Download Full-text

Fs-DSM: Few-Shot Diagram-Sentence Matching via Cross-Modal Attention Graph Model

IEEE Transactions on Image Processing ◽

10.1109/tip.2021.3112294 ◽

2021 ◽

pp. 1-1

Author(s):

Xin Hu ◽

Lingling Zhang ◽

Jun Liu ◽

Qinghua Zheng ◽

Jianlong Zhou

Keyword(s):

Graph Model ◽

Sentence Matching

Download Full-text

The Processing Implementation of Syntactic Constraints: The Sentence Matching Debate

Studies in Theoretical Psycholinguistics - Island Constraints ◽

10.1007/978-94-017-1980-3_16 ◽

1992 ◽

pp. 419-443

Author(s):

Laurie A. Stowe

Keyword(s):

Syntactic Constraints ◽

Sentence Matching

Download Full-text

Robust kernel canonical correlation analysis to detect gene-gene co-associations: A case study in genetics

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720019500288 ◽

2019 ◽

Vol 17 (04) ◽

pp. 1950028 ◽

Cited By ~ 2

Author(s):

Md. Ashad Alam ◽

Osamu Komori ◽

Hong-Wen Deng ◽

Vince D. Calhoun ◽

Yu-Ping Wang

Keyword(s):

Correlation Analysis ◽

Canonical Correlation Analysis ◽

Canonical Correlation ◽

Enrichment Analysis ◽

Superior Performance ◽

Covariance Operator ◽

Contaminated Data ◽

Cross Covariance ◽

Kernel Canonical Correlation Analysis ◽

Kernel Cca

The kernel canonical correlation analysis based U-statistic (KCCU) is being used to detect nonlinear gene–gene co-associations. Estimating the variance of the KCCU is however computationally intensive. In addition, the kernel canonical correlation analysis (kernel CCA) is not robust to contaminated data. Using a robust kernel mean element and a robust kernel (cross)-covariance operator potentially enables the use of a robust kernel CCA, which is studied in this paper. We first propose an influence function-based estimator for the variance of the KCCU. We then present a non-parametric robust KCCU, which is designed for dealing with contaminated data. The robust KCCU is less sensitive to noise than KCCU. We investigate the proposed method using both synthesized and real data from the Mind Clinical Imaging Consortium (MCIC). We show through simulation studies that the power of the proposed methods is a monotonically increasing function of sample size, and the robust test statistics bring incremental gains in power. To demonstrate the advantage of the robust kernel CCA, we study MCIC data among 22,442 candidate Schizophrenia genes for gene–gene co-associations. We select 768 genes with strong evidence for shedding light on gene–gene interaction networks for Schizophrenia. By performing gene ontology enrichment analysis, pathway analysis, gene–gene network and other studies, the proposed robust methods can find undiscovered genes in addition to significant gene pairs, and demonstrate superior performance over several of current approaches.

Download Full-text