Utilizing Orthographic Similarity for Unsupervised Transliteration

Author(s):  
Anoop Kunchukuttan ◽  
Pushpak Bhattacharyya
2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Candice Frances ◽  
Eugenia Navarra-Barindelli ◽  
Clara D. Martin

AbstractLanguage perception studies on bilinguals often show that words that share form and meaning across languages (cognates) are easier to process than words that share only meaning. This facilitatory phenomenon is known as the cognate effect. Most previous studies have shown this effect visually, whereas the auditory modality as well as the interplay between type of similarity and modality remain largely unexplored. In this study, highly proficient late Spanish–English bilinguals carried out a lexical decision task in their second language, both visually and auditorily. Words had high or low phonological and orthographic similarity, fully crossed. We also included orthographically identical words (perfect cognates). Our results suggest that similarity in the same modality (i.e., orthographic similarity in the visual modality and phonological similarity in the auditory modality) leads to improved signal detection, whereas similarity across modalities hinders it. We provide support for the idea that perfect cognates are a special category within cognates. Results suggest a need for a conceptual and practical separation between types of similarity in cognate studies. The theoretical implication is that the representations of items are active in both modalities of the non-target language during language processing, which needs to be incorporated to our current processing models.


2016 ◽  
Vol 22 (4) ◽  
pp. 517-548 ◽  
Author(s):  
ANN IRVINE ◽  
CHRIS CALLISON-BURCH

AbstractWe use bilingual lexicon induction techniques, which learn translations from monolingual texts in two languages, to build an end-to-end statistical machine translation (SMT) system without the use of any bilingual sentence-aligned parallel corpora. We present detailed analysis of the accuracy of bilingual lexicon induction, and show how a discriminative model can be used to combine various signals of translation equivalence (like contextual similarity, temporal similarity, orthographic similarity and topic similarity). Our discriminative model produces higher accuracy translations than previous bilingual lexicon induction techniques. We reuse these signals of translation equivalence as features on a phrase-based SMT system. These monolingually estimated features enhance low resource SMT systems in addition to allowing end-to-end machine translation without parallel corpora.


2019 ◽  
Author(s):  
Stephen Skalicky ◽  
Scott Crossley ◽  
Cynthia M. Berger

In this study we analyze a large database of lexical decision times for English content words made by speakers of English as an additional language residing in the United States. Our first goal was to test whether the use of statistical measures better able to model variation associated with participants and items would replicate findings of a previous analysis of this data (Berger, Crossley, & Skalicky, 2019). Our second goal was to determine whether variables related to experiences using and learning English would interact with linguistic features of the target words. Results from our statistical analysis suggest affirmative answers to both of these questions. First, our results included significant effects for linguistic features related to contextual diversity and contextual distinctiveness, providing a replication of findings from the original study in that words appearing in more textual and lexical contexts were responded to quicker. Second, a measure of length of English learning and a measure of daily English use interacted with a measure of orthographic similarity. Our study provides further evidence regarding how a large, crowdsourced database can be used to obtain a better understanding of second language lexical recognition behavior and provides suggestions for further research.


Author(s):  
Muhlise Coşgun Ögeyik

Marked and unmarked language forms can be distinguished with the level of simplicity or complexity denotations of the forms. Unmarked target language forms may create little or no difficulty, even if they do not exist in the native language of the learner, while marked forms can be relatively difficult for language learners. In addition to the notions of markedness/unmarkedness, there has also been an emphasis on similarity and dissimilarity between the items of first (L1) and second languages (L2). Along with similarity or dissimilarity of L1 and L2 forms, the level of difficulty may vary enormously in different language-specific procedures. In this chapter, therefore, it is intended to build an understanding of the recognized pronunciation and orthographic problems of similar loanwords in both Turkish (L1 of the participants) and English (L2).


2008 ◽  
Vol 20 (3) ◽  
pp. 406-420 ◽  
Author(s):  
Atira Bick ◽  
Gadi Goelman ◽  
Ram Frost

Is morphology a discrete and independent element of lexical structure or does it simply reflect a fine tuning of the system to the statistical correlation that exists among the orthographic and semantic properties of words? Imaging studies in English failed to show unequivocal morphological activation that is distinct from semantic or orthographic activation. Cognitive research in Hebrew has revealed that morphological decomposition is an important component of print processing. In Hebrew, morphological relatedness does not necessarily induce a clear semantic relatedness, thus, Hebrew provides a unique opportunity to investigate the neural substrates of morphological processing. In this functional magnetic resonance imaging study, participants were required to perform judgment tasks of morphological relatedness, semantic relatedness, rhyming, and orthographic similarity. Half of the morphologically related words were semantically related and half were semantically unrelated. This design was chosen to induce explicit morphological processing. We identified two locations involved in morphological processing: the left middle frontal gyrus and the left inferior parietal sulcus. Comparing locations of morphological related activation to the locations of semantic and orthographic related activation, we found that the areas neighbored but only partially overlapped. The similarity in activation between the two morphological conditions eliminates the possibility that morphological activation simply results from the semantic properties of the words. These results demonstrate the important role of morphological processing in reading and suggest that morphological analysis is a distinct process of visual word recognition.


2014 ◽  
Vol 17 ◽  
Author(s):  
Joana Acha ◽  
Itziar Laka ◽  
Josu Landa ◽  
Pello Salaburu

AbstractThis article presents EHME, the frequency dictionary of Basque structure, an online program that enables researchers in psycholinguistics to extract word and nonword stimuli, based on a broad range of statistics concerning the properties of Basque words. The database consists of 22.7 million tokens, and properties available include morphological structure frequency and word-similarity measures, apart from classical indexes: word frequency, orthographic structure, orthographic similarity, bigram and biphone frequency, and syllable-based measures. Measures are indexed at the lemma, morpheme and word level. We include reliability and validation analysis. The application is freely available, and enables the user to extract words based on concrete statistical criteria1, as well as to obtain statistical characteristics from a list of words2.


Sign in / Sign up

Export Citation Format

Share Document