An Arabic WordNet enrichment approach using machine translation and external linguistic resources

Author(s):  
Cilia Lachichi ◽  
Chahrazad Bendiaf ◽  
Lamia Berkani ◽  
Ahmed Guessoum
LEKSIKA ◽  
2019 ◽  
Vol 12 (2) ◽  
pp. 66
Author(s):  
Dewi Puspitasari

This article explores the use of DST and the explicit teaching in one of universities in Indonesia on how stu-dents use it to help the learning process. Using a digital story to teach English and Based on multimodal theory, the term of DST has been increasingly used by scholars to illustrate various forms of support of learn-ing to help students learn successfully in a classroom. Despite being widely used in educational context of many countries, DST has received scanty attention from teacher especially in ESP classes. This article specif-ically describes our experience of using DST as a learning aid with students of 18 to 19 years old. In this project they individually created collected the photographs based on their interest related to the specified theme as multimodal text. In the process they utilized two linguistic resources (Bahasa Indonesia and Eng-lish) to help them in understanding the process of creation. Several supports from machine translation and machine pronunciation software were employed during the creation of DST project. The result shows that DST helps students in composing narrative writing by analyzing the visual prompts. This proves that DST is impending to support the writing process as students were engaged during the process.


Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-11 ◽  
Author(s):  
Wenbin Xu ◽  
Chengbo Yin

With the continuous advancement of technology, the amount of information and knowledge disseminated on the Internet every day has been developing several times. At the same time, a large amount of bilingual data has also been produced in the real world. These data are undoubtedly a great asset for statistical machine translation research. Based on the dual-sentence quality corpus screening, two corpus screening strategies are proposed first, based on the double-sentence pair length ratio method and the word-based alignment information method. The innovation of these two methods is that no additional linguistic resources such as bilingual dictionary and syntactic analyzer are needed as auxiliary. No manual intervention is required, and the poor quality sentence pairs can be automatically selected and can be applied to any language pair. Secondly, a domain adaptive method based on massive corpus is proposed. The method based on massive corpus utilizes massive corpus mechanism to carry out multidomain automatic model migration. In this domain, each domain learns the intradomain model independently, and different domains share the same general model. Through the method of massive corpus, these models can be combined and adjusted to make the model learning more accurate. Finally, the adaptive method of massive corpus filtering and statistical machine translation based on cloud platform is verified. Experiments show that both methods have good effects and can effectively improve the translation quality of statistical machines.


2017 ◽  
Vol 68 (2) ◽  
pp. 169-178
Author(s):  
Leonid Iomdin

Abstract Microsyntax is a linguistic discipline dealing with idiomatic elements whose important properties are strongly related to syntax. In a way, these elements may be viewed as transitional entities between the lexicon and the grammar, which explains why they are often underrepresented in both of these resource types: the lexicographer fails to see such elements as full-fledged lexical units, while the grammarian finds them too specific to justify the creation of individual well-developed rules. As a result, such elements are poorly covered by linguistic models used in advanced modern computational linguistic tasks like high-quality machine translation or deep semantic analysis. A possible way to mend the situation and improve the coverage and adequate treatment of microsyntactic units in linguistic resources is to develop corpora with microsyntactic annotation, closely linked to specially designed lexicons. The paper shows how this task is solved in the deeply annotated corpus of Russian, SynTagRus.


2019 ◽  
Vol 28 (3) ◽  
pp. 479-492
Author(s):  
Mary Priya Sebastian ◽  
G. Santhosh Kumar

Abstract Machine translation (MT) from English to foreign languages is a fast developing area of research, and various techniques of translation are discussed in the literature. However, translation from English to Malayalam, a Dravidian language, is still in the rising stage, and works in this field have not flourished to a great extent, so far. The main reason of this shortcoming is the non-availability of linguistic resources and translation tools in the Malayalam language. A parallel corpus with alignment is one of such resources that are essential for a machine translator system. This paper focuses on a technique that enables automatic setting up of a verb-aligned parallel corpus by exploring the internal structure of the English and Malayalam language, which in turn facilitates the task of machine translation from English to Malayalam.


Author(s):  
Bilous O ◽  
◽  
Mishchenko A ◽  
Datska T ◽  
Ivanenko N ◽  
...  

How often students use IT resources is a key factor in the acquisition of skills associated to the new technologies. Strategies aimed at increasing student autonomy need to be developed and should offer resources that encourage them to make use of computing tools in class hours. The analysis of the modern linguistic technologies, concerning intellectual language processing necessary for the creation and function of the highly effective technologies of knowledge operation was considered in the paper under consideration. Computerization of the information sphere has triggered extensive search for solving the problem of the use of natural language mechanisms in automated systems of various types. One of them was creating Controlled languages based on a set of features which made machine translation more refined. Triggered by the economic demand, they are not artificial languages like Esperanto, but natural simplified languages, in terms of vocabulary, grammatical and syntactic structures. More than ever, the tasks of modern computer linguistics behold creating software for natural language processing, information retrieval in large data sets, support of technical authors in the process of creating professional texts and users of computer technology, hence creating new translation tools. Such powerful linguistic resources as corpora of texts, terminology databases and ontologies may facilitate more efficient use of modern multilingual information technology. Creating and improving all methods considered will help make the job of a translator more efficient. One of the programs, CLAT does not aim at producing machine translation, but allows technical editors to create flawless, sequential professional texts through integrated punctuation and spelling modules. Other programs under consideration are to be implemented in Ukrainian translation departments. Moreover, the databases considered in the paper enable studying of the dynamics of the linguistic system and developing areas of applied research such as terminography, terminology, automated data processing etc. Effective cooperation of developers, translators and declarative institutes in the creation of innovative linguistic technologies will promote further development of translation and applied linguistics.


2016 ◽  
Vol 106 (1) ◽  
pp. 193-204
Author(s):  
Víctor M. Sánchez-Cartagena ◽  
Juan Antonio Pérez-Ortiz ◽  
Felipe Sánchez-Martínez

Abstract This paper presents ruLearn, an open-source toolkit for the automatic inference of rules for shallow-transfer machine translation from scarce parallel corpora and morphological dictionaries. ruLearn will make rule-based machine translation a very appealing alternative for under-resourced language pairs because it avoids the need for human experts to handcraft transfer rules and requires, in contrast to statistical machine translation, a small amount of parallel corpora (a few hundred parallel sentences proved to be sufficient). The inference algorithm implemented by ruLearn has been recently published by the same authors in Computer Speech & Language (volume 32). It is able to produce rules whose translation quality is similar to that obtained by using hand-crafted rules. ruLearn generates rules that are ready for their use in the Apertium platform, although they can be easily adapted to other platforms. When the rules produced by ruLearn are used together with a hybridisation strategy for integrating linguistic resources from shallow-transfer rule-based machine translation into phrase-based statistical machine translation (published by the same authors in Journal of Artificial Intelligence Research, volume 55), they help to mitigate data sparseness. This paper also shows how to use ruLearn and describes its implementation.


2016 ◽  
Vol 55 ◽  
pp. 17-61 ◽  
Author(s):  
Víctor M. Sánchez-Cartagena ◽  
Juan Antonio Pérez-Ortiz ◽  
Felipe Sánchez-Martínez

We describe a hybridisation strategy whose objective is to integrate linguistic resources from shallow-transfer rule-based machine translation (RBMT) into phrase-based statistical machine translation (PBSMT). It basically consists of enriching the phrase table of a PBSMT system with bilingual phrase pairs matching transfer rules and dictionary entries from a shallow-transfer RBMT system. This new strategy takes advantage of how the linguistic resources are used by the RBMT system to segment the source-language sentences to be translated, and overcomes the limitations of existing hybrid approaches that treat the RBMT systems as a black box. Experimental results confirm that our approach delivers translations of higher quality than existing ones, and that it is specially useful when the parallel corpus available for training the SMT system is small or when translating out-of-domain texts that are well covered by the RBMT dictionaries. A combination of this approach with a recently proposed unsupervised shallow-transfer rule inference algorithm results in a significantly greater translation quality than that of a baseline PBSMT; in this case, the only hand-crafted resource used are the dictionaries commonly used in RBMT. Moreover, the translation quality achieved by the hybrid system built with automatically inferred rules is similar to that obtained by those built with hand-crafted rules.


2019 ◽  
Vol 28 (3) ◽  
pp. 465-477 ◽  
Author(s):  
Amarnath Pathak ◽  
Partha Pakray

Abstract Machine Translation bridges communication barriers and eases interaction among people having different linguistic backgrounds. Machine Translation mechanisms exploit a range of techniques and linguistic resources for translation prediction. Neural machine translation (NMT), in particular, seeks optimality in translation through training of neural network, using a parallel corpus having a considerable number of instances in the form of a parallel running source and target sentences. Easy availability of parallel corpora for major Indian language forms and the ability of NMT systems to better analyze context and produce fluent translation make NMT a prominent choice for the translation of Indian languages. We have trained, tested, and analyzed NMT systems for English to Tamil, English to Hindi, and English to Punjabi translations. Predicted translations have been evaluated using Bilingual Evaluation Understudy and by human evaluators to assess the quality of translation in terms of its adequacy, fluency, and correspondence with human-predicted translation.


Sign in / Sign up

Export Citation Format

Share Document