scholarly journals Construindo corpora bilíngues quimbundo- português-quimbundo / Building Kimbundu-Portuguese-Kimbundu bilingual corpora

2021 ◽  
Vol 29 (2) ◽  
pp. 771
Author(s):  
Paulo Jeferson Pilar Araújo
Keyword(s):  
2014 ◽  
Vol 4 (2) ◽  
pp. 53-65 ◽  
Author(s):  
Kristina HMELJAK SANGAWA

Learning vocabulary is one of the most challenging tasks faced by learners with a non-kanji background when learning Japanese as a foreign language. However, learners are often not aware of the range of different aspects of word knowledge they need in order to successfully use Japanese. This includes not only the spoken and written form of a word and its meaning, but also morphological, grammatical, collocational, connotative and pragmatic knowledge as well as knowledge of social constraints to be observed. In this article, we present some background data on the use of dictionaries among students of Japanese at the University of Ljubljana, a selection of resources and a series of exercises developed with the following aims: a) to foster greater awareness of the different aspects of Japanese vocabulary, both from a monolingual and a contrastive perspective, b) to learn about tools and methods that can be applied in different contexts of language learning and language use, and c) to develop strategies for learning new vocabulary, reinforcing knowledge about known vocabulary, and effectively using this knowledge in receptive and productive language tasks.


2016 ◽  
Author(s):  
Long Duong ◽  
Hiroshi Kanayama ◽  
Tengfei Ma ◽  
Steven Bird ◽  
Trevor Cohn

2011 ◽  
Vol 4 (2) ◽  
pp. 153-183 ◽  
Author(s):  
Diana Carter ◽  
Peredur Davies ◽  
Margaret Deuchar ◽  
María del Carmen Parafita Couto

AbstractIn this paper we compare the code-switching (CS) patterns in three bilingual corpora collected in Wales, Miami and Patagonia, Argentina. Using the Matrix Language Framework to do a clause-based analysis of a sample of data, we consider the impact of structural relationships and extra-linguistic factors on CS patterns. We find that the Matrix Language (ML) is uniform where the language pairs have contrasting word orders, as in Welsh-English (VSO-SVO) and WelshSpanish (VSO-SVO) but diverse where the word order is similar as in Spanish-English (SVO-SVO). We find that the diversity of the ML in Miami is related to the diversity of degrees of proficiency, ethnic identities, and social networks amongst members of that community, while the uniformity of the ML in Wales is related to the uniformity of these factors. This is not so clear in Patagonia, however, where there is little CS produced in conversation. We suggest that the members of the speech community use Spanish or Welsh mostly in a monolingual mode, depending on the interlocutor and the social situation.


Information ◽  
2019 ◽  
Vol 10 (9) ◽  
pp. 267 ◽  
Author(s):  
Bin Li ◽  
Jianmin Yao

Bilingual web pages are widely used to mine translations of unknown terms. This study focused on an effective solution for obtaining relevant web pages, extracting translations with correct lexical boundaries, and ranking the translation candidates. This research adopted co-occurrence information to obtain the subject terms and then expanded the source query with the translation of the subject terms to collect effective bilingual search engine snippets. Afterwards, valid candidates were extracted from small-sized, noisy bilingual corpora using an improved frequency change measurement that combines adjacent information. This research developed a method that considers surface patterns, frequency–distance, and phonetic features to elect an appropriate translation. The experimental results revealed that the proposed method performed remarkably well for mining translations of unknown terms.


2017 ◽  
Vol 108 (1) ◽  
pp. 283-294 ◽  
Author(s):  
Álvaro Peris ◽  
Mara Chinea-Ríos ◽  
Francisco Casacuberta

AbstractCorpora are precious resources, as they allow for a proper estimation of statistical machine translation models. Data selection is a variant of the domain adaptation field, aimed to extract those sentences from an out-of-domain corpus that are the most useful to translate a different target domain. We address the data selection problem in statistical machine translation as a classification task. We present a new method, based on neural networks, able to deal with monolingual and bilingual corpora. Empirical results show that our data selection method provides slightly better translation quality, compared to a state-of-the-art method (cross-entropy), requiring substantially less data. Moreover, the results obtained are coherent across different language pairs, demonstrating the robustness of our proposal.


2018 ◽  
Vol 18 (1) ◽  
pp. 100-119
Author(s):  
Barbara E. Bullock ◽  
Jacqueline Serigos ◽  
Almeida Jacqueline Toribio ◽  
Arthur Wendorf

Abstract This article describes efforts to collect, process, and automatically annotate a corpus of Spanish as spoken in Texas. It elaborates the protocols for the development of the corpus and the procedures for automatic annotation, illustrating the common pitfalls to language identification in bilingual corpora and potential methods for circumventing them. The benefits of a comparative corpus approach to contact varieties is illustrated by a case study of a putative verbal calque from the Spanish in Texas data. It is demonstrated that the relative frequency of the verb is much higher than in its source Mexican variety and that the verb selects different complements in Texas than it does in other varieties. The article concludes with a discussion of how computational tools might be fruitfully exploited to resolve long-standing debates about language variation in contact settings.


2020 ◽  
pp. 136700692095672
Author(s):  
Antje Endesfelder Quick ◽  
Dorota Gaskins ◽  
Oksana Bailleul ◽  
Maria Frick ◽  
Elina Palola

Objectives: This study investigates monolingual and code-mixed utterances in four bilingual children with different language combinations (German–English, English–Polish, Finnish–English, and French–Russian) in terms of utterance lengths (MLUs) and complexities offering a usage-based (UB) explanation based on cognitive mechanisms. Methodology: Utterances from four different child bilingual corpora were extracted and coded for individual monolingual languages and bilingual utterances. Data and analysis: 35.441 utterances between the age of 2–4 were analyzed in terms of MLU and syntactic complexity. Findings/conclusions: Results showed that for all children monolingual MLUs and complexities reflect their input situations: the more input in one language, the longer and more complex those utterances were. However, in all four children code-mixed utterances were longer and more complex from the beginning of the recordings. Implications: This is the first study that systematically compares MLU scores and complexities of monolingual and bilingual utterances taking diverse language combinations into account and offering a UB explanation based on chunking and entrenchment processes as a new alternative for further research in bilingualism.


Sign in / Sign up

Export Citation Format

Share Document