Bilingual sentence matching using Kernel CCA

Author(s):  
Abhishek Tripathi ◽  
Arto Klami ◽  
Sami Virpioja
Keyword(s):  
Author(s):  
Peixin Chen ◽  
Wu Guo ◽  
Zhi Chen ◽  
Jian Sun ◽  
Lanhua You

Cognition ◽  
1987 ◽  
Vol 26 (2) ◽  
pp. 171-186 ◽  
Author(s):  
K.I. Forster ◽  
B.J. Stevenson
Keyword(s):  

Author(s):  
Kun Zhang ◽  
Guangyi Lv ◽  
Linyuan Wang ◽  
Le Wu ◽  
Enhong Chen ◽  
...  

Sentence semantic matching requires an agent to determine the semantic relation between two sentences, which is widely used in various natural language tasks such as Natural Language Inference (NLI) and Paraphrase Identification (PI). Among all matching methods, attention mechanism plays an important role in capturing the semantic relations and properly aligning the elements of two sentences. Previous methods utilized attention mechanism to select important parts of sentences at one time. However, the important parts of the sentence during semantic matching are dynamically changing with the degree of sentence understanding. Selecting the important parts at one time may be insufficient for semantic understanding. To this end, we propose a Dynamic Re-read Network (DRr-Net) approach for sentence semantic matching, which is able to pay close attention to a small region of sentences at each step and re-read the important words for better sentence semantic understanding. To be specific, we first employ Attention Stack-GRU (ASG) unit to model the original sentence repeatedly and preserve all the information from bottom-most word embedding input to up-most recurrent output. Second, we utilize Dynamic Re-read (DRr) unit to pay close attention to one important word at one time with the consideration of learned information and re-read the important words for better sentence semantic understanding. Extensive experiments on three sentence matching benchmark datasets demonstrate that DRr-Net has the ability to model sentence semantic more precisely and significantly improve the performance of sentence semantic matching. In addition, it is very interesting that some of finding in our experiments are consistent with the findings of psychological research.


Author(s):  
Aibo Guo ◽  
Xinyi Li ◽  
Ning Pang ◽  
Xiang Zhao

Community Q&A forum is a special type of social media that provides a platform to raise questions and to answer them (both by forum participants), to facilitate online information sharing. Currently, community Q&A forums in professional domains have attracted a large number of users by offering professional knowledge. To support information access and save users’ efforts of raising new questions, they usually come with a question retrieval function, which retrieves similar existing questions (and their answers) to a user’s query. However, it can be difficult for community Q&A forums to cover all domains, especially those emerging lately with little labeled data but great discrepancy from existing domains. We refer to this scenario as cross-domain question retrieval. To handle the unique challenges of cross-domain question retrieval, we design a model based on adversarial training, namely, X-QR , which consists of two modules—a domain discriminator and a sentence matcher. The domain discriminator aims at aligning the source and target data distributions and unifying the feature space by domain-adversarial training. With the assistance of the domain discriminator, the sentence matcher is able to learn domain-consistent knowledge for the final matching prediction. To the best of our knowledge, this work is among the first to investigate the domain adaption problem of sentence matching for community Q&A forums question retrieval. The experiment results suggest that the proposed X-QR model offers better performance than conventional sentence matching methods in accomplishing cross-domain community Q&A tasks.


Author(s):  
Xin Hu ◽  
Lingling Zhang ◽  
Jun Liu ◽  
Qinghua Zheng ◽  
Jianlong Zhou

2019 ◽  
Vol 17 (04) ◽  
pp. 1950028 ◽  
Author(s):  
Md. Ashad Alam ◽  
Osamu Komori ◽  
Hong-Wen Deng ◽  
Vince D. Calhoun ◽  
Yu-Ping Wang

The kernel canonical correlation analysis based U-statistic (KCCU) is being used to detect nonlinear gene–gene co-associations. Estimating the variance of the KCCU is however computationally intensive. In addition, the kernel canonical correlation analysis (kernel CCA) is not robust to contaminated data. Using a robust kernel mean element and a robust kernel (cross)-covariance operator potentially enables the use of a robust kernel CCA, which is studied in this paper. We first propose an influence function-based estimator for the variance of the KCCU. We then present a non-parametric robust KCCU, which is designed for dealing with contaminated data. The robust KCCU is less sensitive to noise than KCCU. We investigate the proposed method using both synthesized and real data from the Mind Clinical Imaging Consortium (MCIC). We show through simulation studies that the power of the proposed methods is a monotonically increasing function of sample size, and the robust test statistics bring incremental gains in power. To demonstrate the advantage of the robust kernel CCA, we study MCIC data among 22,442 candidate Schizophrenia genes for gene–gene co-associations. We select 768 genes with strong evidence for shedding light on gene–gene interaction networks for Schizophrenia. By performing gene ontology enrichment analysis, pathway analysis, gene–gene network and other studies, the proposed robust methods can find undiscovered genes in addition to significant gene pairs, and demonstrate superior performance over several of current approaches.


Sign in / Sign up

Export Citation Format

Share Document