scholarly journals Using ParaConc to extract bilingual terminology from parallel corpora: A case of English and Ndebele

Literator ◽  
2016 ◽  
Vol 37 (1) ◽  
Author(s):  
Ketiwe Ndhlovu

The development of African languages into languages of science and technology is dependent on action being taken to promote the use of these languages in specialised fields such as technology, commerce, administration, media, law, science and education among others. One possible way of developing African languages is the compilation of specialised dictionaries (Chabata 2013). This article explores how parallel corpora can be interrogated using a bilingual concordancer (ParaConc) to extract bilingual terminology that can be used to create specialised bilingual dictionaries. An English–Ndebele Parallel Corpus was used as a resource and through ParaConc, an alphabetic list was compiled from which headwords and possible translations were sought. These translations provided possible terms for entry in a bilingual dictionary. The frequency feature and ‘hot words’ tool in ParaConc were used to determine the suitability of terms for inclusion in the dictionary and for identifying possible synonyms, respectively. Since parallel corpora are aligned and data are presented in context (Key Word in Context), it was possible to draw examples showing how headwords are used. Using this approach produced results quickly and accurately, whilst minimising the process of translating terms manually. It was noted that the quality of the dictionary is dependent on the quality of the corpus, hence the need for creating a representative and clean corpus needs to be emphasised. Although technology has multiple benefits in dictionary making, the research underscores the importance of collaboration between lexicographers, translators, subject experts and target communities so that representative dictionaries are created.

2016 ◽  
Vol 36 (1) ◽  
pp. 147
Author(s):  
Beatriz Sánchez Cárdenas ◽  
Pamela Faber

http://dx.doi.org/10.5007/2175-7968.2016v36nesp1p147Research in terminology has traditionally focused on nouns. Considerably less attention has been paid to other grammatical categories such as adverbs. However, these words can also be problematic for the novice translator, who tends to use the translation correspondences in bilingual dictionaries without realizing that formal equivalence is not necessarily the same as textual equivalence. However, semantic values, acquired in context, go far beyond dictionary meaning and are related to phenomena such as semantic prosody and preferences of lexical selection that can vary, depending on text type and specialized domain.This research explored the reasons why certain adverbial discourse connectors, apparently easy to translate, are a source of translation problems that cannot be easily resolved with a bilingual dictionary. Moreover, this study analyzed the use of parallel corpora in the translation classroom and how it can increase the quality of text production. For this purpose, we compared student translations before and after receiving training on the use of corpus analysis tools


Literator ◽  
2021 ◽  
Vol 42 (1) ◽  
Author(s):  
Nomsa J. Skosana ◽  
Respect Mlambo

The scarcity of adequate resources for South African languages poses a huge challenge for their functional development in specialised fields such as science and technology. The study examines the Autshumato Machine Translation (MT) Web Service, created by the Centre for Text Technology at the North-West University. This software supports both formal and informal translations as a machine-aided human translation tool. We investigate the system in terms of its advantages and limitations and suggest possible solutions for South African languages. The results show that the system is essential as it offers high-speed translation and operates as an open-source platform. It also provides multiple translations from sentences, documents and web pages. Some South African languages were included whilst others were excluded and we find this to be a limitation of the system. We also find that the system was trained with a limited amount of data, and this has an adverse effect on the quality of the output. The study suggests that adding specialised parallel corpora from various contemporary fields for all official languages and involving language experts in the pre-editing of training data can be a major step towards improving the quality of the system’s output. The study also outlines that developers should consider integrating the system with other natural language processing applications. Finally, the initiatives discussed in this study will help to improve this MT system to be a more effective translation tool for all the official languages of South Africa.


2012 ◽  
Vol 43 ◽  
pp. 135-171 ◽  
Author(s):  
T. Flati ◽  
R. Navigli

Bilingual machine-readable dictionaries are knowledge resources useful in many automatic tasks. However, compared to monolingual computational lexicons like WordNet, bilingual dictionaries typically provide a lower amount of structured information, such as lexical and semantic relations, and often do not cover the entire range of possible translations for a word of interest. In this paper we present Cycles and Quasi-Cycles (CQC), a novel algorithm for the automated disambiguation of ambiguous translations in the lexical entries of a bilingual machine-readable dictionary. The dictionary is represented as a graph, and cyclic patterns are sought in the graph to assign an appropriate sense tag to each translation in a lexical entry. Further, we use the algorithm's output to improve the quality of the dictionary itself, by suggesting accurate solutions to structural problems such as misalignments, partial alignments and missing entries. Finally, we successfully apply CQC to the task of synonym extraction.


2020 ◽  
Vol 17 (1) ◽  
pp. 54-60
Author(s):  
B. S. Sowmya Lakshmi ◽  
B. R. Shambhavi

Visvesvaraya Technological University, Belagavi, Karnataka, India One of the promising resources to extract dictionaries are said to be parallel corpora. Majority of the substantial works are based on parallel corpora, whereas for the resource scarce language pairs building a parallel corpus is a challenging task. To prevail over this issue, researchers found comparable corpora could be an alternative to extract dictionary. Proposed approach is to extract dictionary for a low resource language pair English and Kannada using comparable corpora obtained from Wikipedia dumps and corpus received from Indian Language Corpus Initiative (ILCI). Dictionary constructed comprises of both translation and transliteration entities with term level associations from English to Kannada. Resultant dictionary is of size 77545 tokens with precision score of 0.79. Proposed work is independent of language and could be expanded to other language pairs.


2021 ◽  
pp. 016555152199275
Author(s):  
Juryong Cheon ◽  
Youngjoong Ko

Translation language resources, such as bilingual word lists and parallel corpora, are important factors affecting the effectiveness of cross-language information retrieval (CLIR) systems. In particular, when large domain-appropriate parallel corpora are not available, developing an effective CLIR system is particularly difficult. Furthermore, creating a large parallel corpus is costly and requires considerable effort. Therefore, we here demonstrate the construction of parallel corpora from Wikipedia as well as improved query translation, wherein the queries are used for a CLIR system. To do so, we first constructed a bilingual dictionary, termed WikiDic. Then, we evaluated individual language resources and combinations of them in terms of their ability to extract parallel sentences; the combinations of our proposed WikiDic with the translation probability from the Web’s bilingual example sentence pairs and WikiDic was found to be best suited to parallel sentence extraction. Finally, to evaluate the parallel corpus generated from this best combination of language resources, we compared its performance in query translation for CLIR to that of a manually created English–Korean parallel corpus. As a result, the corpus generated by our proposed method achieved a better performance than did the manually created corpus, thus demonstrating the effectiveness of the proposed method for automatic parallel corpus extraction. Not only can the method demonstrated herein be used to inform the construction of other parallel corpora from language resources that are readily available, but also, the parallel sentence extraction method will naturally improve as Wikipedia continues to be used and its content develops.


Linguistica ◽  
1980 ◽  
Vol 20 (1) ◽  
pp. 183-218
Author(s):  
Otto Hietsch

A Critical Look at Two German­English Examples, and A Glossary. Officers, without a word of German, were billeted on fam­ ilies, and the town swarmed witb G.I.s. Lucia, whose English was always considered so good, had great difficulty in under­ standing what they said. She had a bewildered feeling of not being able - in the language sense - to 'hear' the phrases used. 'It beats the crap outa me,' she heard one say. She could not find the key-word in her English-German dictionary. Nor many other words they used. Ethel Mannin, Bavarian Story (London: Arrow Books, 1964), pp.143f. (abridged). Lucia's plight in 1945, and that of untold other non-native speakers before and after, is a common one.In the three decades and a half since then, some very good bilingual dictionaries in the pocket-size, desk and encyclopaedic ranges have been published. Yet, in spite of the praises that have been sung about such publications, most of them fail to do justice both to the richness of the spoken language on either side, and to the many ways , and means by which that richness can, and should, be matched level for level. Such a discovery is as inevi­ table as it is disconcerting.These general dictionaries, both in what they offer and in what they withhold, are, all in all, a sadly distorted reflection of living speech: far too frequently their renderings merely approximate to the usage of native speakers.


2019 ◽  
Vol 28 (3) ◽  
pp. 465-477 ◽  
Author(s):  
Amarnath Pathak ◽  
Partha Pakray

Abstract Machine Translation bridges communication barriers and eases interaction among people having different linguistic backgrounds. Machine Translation mechanisms exploit a range of techniques and linguistic resources for translation prediction. Neural machine translation (NMT), in particular, seeks optimality in translation through training of neural network, using a parallel corpus having a considerable number of instances in the form of a parallel running source and target sentences. Easy availability of parallel corpora for major Indian language forms and the ability of NMT systems to better analyze context and produce fluent translation make NMT a prominent choice for the translation of Indian languages. We have trained, tested, and analyzed NMT systems for English to Tamil, English to Hindi, and English to Punjabi translations. Predicted translations have been evaluated using Bilingual Evaluation Understudy and by human evaluators to assess the quality of translation in terms of its adequacy, fluency, and correspondence with human-predicted translation.


Author(s):  
М.А. Дударенко

Предлагается многоязычная вероятностная тематическая модель, одновременно учитывающая двуязычный словарь и связи между документами параллельной или сравнимой коллекции. Для комбинирования этих двух видов информации применяется аддитивная регуляризация тематических моделей (ARTM). Предлагаются два способа использования двуязычного словаря: первый учитывает только сам факт связи между словами--переводами, во втором настраиваются вероятности переводов в каждой теме. Качество многоязычных моделей измеряется на задаче кросс-язычного поиска, когда запросом является документ на одном языке, а поиск производится среди документов другого языка. Показано, что комбинированный учет слов--переводов из двуязычного словаря и связанных документов улучшает качество кросс-язычного поиска по сравнению с моделями, использующими только один тип информации. Сравнение разных методов включения в модель двуязычных словарей показывает, что оценивание вероятностей переводов не только улучшает качество модели, но и позволяет находить тематический контекст для пар слово--перевод. A multilingual probabilistic topic model based on the additive regularization ARTM allowing to combine both a parallel or comparable corpus and a bilingual translation dictionary is proposed. Two approaches to include information from a bilingual dictionary are discussed: the first one takes into account only the fact of connection between word translations, whereas the second one learns the translation probabilities for each topic. To measure the quality of the proposed multilingual topic model, a cross-language search is performed. For each query document in one language, it is found its translation on another language. It is shown that the combined translation of words from a bilingual dictionary and the corresponding connected documents improves the cross-lingual search compared to the models using only one information source. The use of learning word translation probabilities for bilingual dictionaries improves the quality of the model and allows one to determine a context (a set of topics) for each pair of word translations, where these translations are appropriate.


Author(s):  
Martina Nied Curcio

AbstractMisunderstandings between speakers of different languages occur not only on a linguistic level but also on a cultural one. Consultation of a bilingual dictionary does not necessarily help in this case, as information on the cultural level is often missing. In this paper we will discuss how bilingual dictionaries can draw attention to cultural divergences so that the dictionary user acquires cultural knowledge and is able to build an intercultural competence. Examples from four bilingual dictionaries (German-Italian) are given to illustrate how culture-bound words are represented. For this purpose, a classification of culture-bound words is offered. Finally, the prerequisites and possibilities of an appropriate representation of culture-bound items in bilingual dictionaries will be proposed.


Sign in / Sign up

Export Citation Format

Share Document