Estimation of perceptual spaces for speaker identities based on the cross-lingual discrimination task

Author(s):  
Minoru Tsuzaki ◽  
Keiichi Tokuda ◽  
Hisashi Kawai ◽  
Jinfu Ni
Author(s):  
Mikel Artetxe ◽  
Sebastian Ruder ◽  
Dani Yogatama
Keyword(s):  

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Shaofei Wang ◽  
Depeng Dang

PurposePrevious knowledge base question answering (KBQA) models only consider the monolingual scenario and cannot be directly extended to the cross-lingual scenario, in which the language of questions and that of knowledge base (KB) are different. Although a machine translation (MT) model can bridge the gap through translating questions to the language of KB, the noises of translated questions could accumulate and further sharply impair the final performance. Therefore, the authors propose a method to improve the robustness of KBQA models in the cross-lingual scenario.Design/methodology/approachThe authors propose a knowledge distillation-based robustness enhancement (KDRE) method. Specifically, first a monolingual model (teacher) is trained by ground truth (GT) data. Then to imitate the practical noises, a noise-generating model is designed to inject two types of noise into questions: general noise and translation-aware noise. Finally, the noisy questions are input into the student model. Meanwhile, the student model is jointly trained by GT data and distilled data, which are derived from the teacher when feeding GT questions.FindingsThe experimental results demonstrate that KDRE can improve the performance of models in the cross-lingual scenario. The performance of each module in KBQA model is improved by KDRE. The knowledge distillation (KD) and noise-generating model in the method can complementarily boost the robustness of models.Originality/valueThe authors first extend KBQA models from monolingual to cross-lingual scenario. Also, the authors first implement KD for KBQA to develop robust cross-lingual models.


2021 ◽  
Author(s):  
Benjamin Muller ◽  
Yanai Elazar ◽  
Benoît Sagot ◽  
Djamé Seddah
Keyword(s):  

2017 ◽  
Vol 52 (3) ◽  
pp. 771-799 ◽  
Author(s):  
Olga Majewska ◽  
Ivan Vulić ◽  
Diana McCarthy ◽  
Yan Huang ◽  
Akira Murakami ◽  
...  
Keyword(s):  

2013 ◽  
Vol 5 (2) ◽  
pp. 143-167 ◽  
Author(s):  
Oliver Čulo

Translation can generally be seen as a task in which the meaning of the original should be preserved as far as possible. This paper formulates the preservation of meaning in terms of the primacy of the frame hypothesis: ideally, the frame of the original is matched by the frame of the translation. I investigate one factor overriding this principle in translations between English and German through the examination of two grammatical constructions, one in English, one in German, which are not commonly available in the other language. Picking a construction comparable in function in the target language leads to frame shifts. In addition to highlighting the interplay between construction and frame choice, the paper explores how frame-to-frame relations can be used to describe the semantic relatedness of original and translation in cases of frame divergences. Theoretical and methodological questions and implications of the cross-lingual application of frame relations are discussed at the end.


2018 ◽  
Vol 24 (5) ◽  
pp. 677-694 ◽  
Author(s):  
D. LANGLOIS ◽  
M. SAAD ◽  
K. SMAILI

AbstractThe objective, in this article, is to address the issue of the comparability of documents, which are extracted from different sources and written in different languages. These documents are not necessarily translations of each other. This material is referred as multilingual comparable corpora. These language resources are useful for multilingual natural language processing applications, especially for low-resourced language pairs. In this paper, we collect different data in Arabic, English, and French. Two corpora are built by using available hyperlinks for Wikipedia and Euronews. Euronews is an aligned multilingual (Arabic, English, and French) corpus of 34k documents collected from Euronews website. A more challenging issue is to build comparable corpus from two different and independent media having two distinct editorial lines, such as British Broadcasting Corporation (BBC) and Al Jazeera (JSC). To build such corpus, we propose to use the Cross-Lingual Latent Semantic approach. For this purpose, documents have been harvested from BBC and JSC websites for each month of the years 2012 and 2013. The comparability is calculated for each Arabic–English couple of documents of each month. This automatic task is then validated by hand. This led to a multilingual (Arabic–English) aligned corpus of 305 pairs of documents (233k English words and 137k Arabic words). In addition, a study is presented in this paper to analyze the performance of three methods of the literature allowing to measure the comparability of documents on the multilingual reference corpora. A recall at rank 1 of 50.16 per cent is achieved with the Cross-lingual LSI approach for BBC–JSC test corpus, while the dictionary-based method reaches a recall of only 35.41 per cent.


Author(s):  
Louis-Philippe Huberdeau ◽  
Sébastien Paquet ◽  
Alain Désilets
Keyword(s):  

Author(s):  
Aida Hakimova ◽  
Michael Charnine ◽  
Aleksey Klokov ◽  
Evgenii Sokolov

This paper is devoted to the development of a methodology for evaluating the semantic similarity of any texts in different languages is developed. The study is based on the hypothesis that the proximity of vector representations of terms in semantic space can be interpreted as a semantic similarity in the cross-lingual environment. Each text will be associated with a vector in a single multilingual semantic vector space. The measure of the semantic similarity of texts will be determined by the measure of the proximity of the corresponding vectors. We propose a quantitative indicator called Index of Semantic Textual Similarity (ISTS) that measures the degree of semantic similarity of multilingual texts on the basis of identified cross-lingual semantic implicit links. The setting of parameters is based on the correlation with the presence of a formal reference between documents. The measure of semantic similarity expresses the existence of two common terms, phrases or word combinations. Optimal parameters of the algorithm for identifying implicit links are selected on the thematic collection by maximizing the correlation of explicit and implicit connections. The developed algorithm can facilitate the search for close documents in the analysis of multilingual patent documentation.


Sign in / Sign up

Export Citation Format

Share Document