“There are many ways to translate it”

2010 ◽  
Vol 10 (1) ◽  
pp. 29-53 ◽  
Author(s):  
May L-Y Wong

The study is motivated by Mona Baker’s (1992) observation that it is almost impossible to find a grammatical category which can be expressed uniformly and regularly across languages. The aim of the present study is to verify Baker’s claim by investigating existential sentences from an English-Chinese contrastive perspective. The data was taken from the Babel English-Chinese Parallel Corpus, which is part-of-speech tagged and aligned at sentence level. Variation in the verbs used in English and Chinese existential clauses is discussed, and patterns of notional subjects (i.e. the noun phrase following the existential verb) and how they are translated are considered. The paper also looks into the applicability of Halliday’s theme-rheme approach to studying Chinese existentials and proposes that the topic-prominence analysis offers a more cogent account for the findings reported here.

Author(s):  
Necva Bölücü ◽  
Burcu Can

Part of speech (PoS) tagging is one of the fundamental syntactic tasks in Natural Language Processing, as it assigns a syntactic category to each word within a given sentence or context (such as noun, verb, adjective, etc.). Those syntactic categories could be used to further analyze the sentence-level syntax (e.g., dependency parsing) and thereby extract the meaning of the sentence (e.g., semantic parsing). Various methods have been proposed for learning PoS tags in an unsupervised setting without using any annotated corpora. One of the widely used methods for the tagging problem is log-linear models. Initialization of the parameters in a log-linear model is very crucial for the inference. Different initialization techniques have been used so far. In this work, we present a log-linear model for PoS tagging that uses another fully unsupervised Bayesian model to initialize the parameters of the model in a cascaded framework. Therefore, we transfer some knowledge between two different unsupervised models to leverage the PoS tagging results, where a log-linear model benefits from a Bayesian model’s expertise. We present results for Turkish as a morphologically rich language and for English as a comparably morphologically poor language in a fully unsupervised framework. The results show that our framework outperforms other unsupervised models proposed for PoS tagging.


enadakultura ◽  
2021 ◽  
Author(s):  
Tamar Makharoblidze

The question of derivates has been repeatedly raised in the teaching processes of language grammar and general linguistics. This circumstance became the basis for creating this short article. It is well known that a word-form can be changeable or unchangeable, and this fact is determined by the parts of speech. Form-changing words can undergo two types of change: inflectional and derivative. During the inflectional change, the form of the word changes, but the lexical and semantic aspects of the word do not change, i.e. its semantic and content data do not change. A classic example of this type of change is flexion of nouns.Derivation is the formation of a word from another word by the addition of non-inflectional affixes. Derivation can be of two types. The first is lexical derivation, in which the derivative affix produces a word with a different lexical content. A word-form can be another part of speech or the same part of speech but with a different lexical content. The second type of derivation is, first of all, grammatical derivation, when grammatical categories are produced. The grammatical category in general (and a word-form in general as well) includes the unity of morphological and semantical aspects. There is no separate semantics without morphology. Any semantic category and/or content must be conveyed in a specific form, so only a specific form has a specific morphosemantics, which can be produced by the grammatical derivatives. The main difference between the two types of derivation mentioned above (and therefore between the two types of derivatives) is the levels of the language hierarchy. The first type of affixes works at the lexical level of the language, while the second type derivatives produce forms at the morphological and semantic levels. The second type derivatives are inter-level affixes, because they act on two hierarchical levels. Any grammatical category includes specific morphosemantic oppositional forms. Thus, unlike inflectional affixes, the rest of the morphological affixes are all other types of inter-level derivatives. It should be noted that the preverb in Kartvelian languages ​​is the only linguistic unit with all possible functions of affix. DOWNLOADS


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Michael Adjeisah ◽  
Guohua Liu ◽  
Douglas Omwenga Nyabuga ◽  
Richard Nuetey Nortey ◽  
Jinling Song

Scaling natural language processing (NLP) to low-resourced languages to improve machine translation (MT) performance remains enigmatic. This research contributes to the domain on a low-resource English-Twi translation based on filtered synthetic-parallel corpora. It is often perplexing to learn and understand what a good-quality corpus looks like in low-resource conditions, mainly where the target corpus is the only sample text of the parallel language. To improve the MT performance in such low-resource language pairs, we propose to expand the training data by injecting synthetic-parallel corpus obtained by translating a monolingual corpus from the target language based on bootstrapping with different parameter settings. Furthermore, we performed unsupervised measurements on each sentence pair engaging squared Mahalanobis distances, a filtering technique that predicts sentence parallelism. Additionally, we extensively use three different sentence-level similarity metrics after round-trip translation. Experimental results on a diverse amount of available parallel corpus demonstrate that injecting pseudoparallel corpus and extensive filtering with sentence-level similarity metrics significantly improves the original out-of-the-box MT systems for low-resource language pairs. Compared with existing improvements on the same original framework under the same structure, our approach exhibits tremendous developments in BLEU and TER scores.


2020 ◽  
Vol 51 (1) ◽  
pp. 161-174
Author(s):  
Suzan Alamin

Abstract This study provides a detailed description of word order types, agreement patterns and alternations found in Tagoi, a Kordofanian language traditionally spoken in South Kordofan. After a brief presentation of the language (section 1), the noun class system is introduced (section 2) and the word order and agreement patterns are examined at the noun phrase level (section 3). Section 4 gives information about the constituent order at clause and sentence level, while Section 5 summarizes the findings and conclusion of the paper. All in all, the paper aims at contributing to a better understanding of the grammar, structure and typological features of Tagoi.


2003 ◽  
Vol 9 (2) ◽  
pp. 209-237 ◽  
Author(s):  
Jan Rijkhoff

Research conducted within the wider theoretical framework of Dik’s Functional Grammar has resulted in important contributions to linguistic typology, and, vice versa, empirical facts from a wide variety of languages have significantly improved the theory of Functional Grammar, especially regarding its typological adequacy. This article discusses the following contributions to Linguistic Typology: the development of a sound sampling methodology, classification of noun categories (Seinsarten), an account of (so-called) number discord, the introduction of the new grammatical category of ‘nominal aspect’, a new typology of classifiers, and a universal concerning the occurrence of adjectives as a distinct word class. Conversely it will be shown that facts from many different languages have played an important role in the development of a layered model of the noun phrase in Functional Grammar and how currently these facts are used to test hypotheses concerning parallels between NPs and clauses.


2019 ◽  
Vol 32 (4) ◽  
pp. 432-457
Author(s):  
Anna Dziemianko

Abstract The current study tests empirically whether linguistically homogeneous or heterogeneous signposts better serve dictionary users. It aims to determine which signposts, homogeneous or heterogeneous, are more beneficial to sense identification, language reception, and production as well as immediate and delayed retention of meaning. The paper also investigates whether the usefulness of the type of signposting is dependent on the grammatical category of headwords. The results indicate that entries with heterogeneous signposts are significantly more useful for sense identification and reception. In production, the results obtained after reference to entries with homogeneous and heterogeneous signposts were comparable. Immediate and delayed retention was significantly better when the subjects had consulted entries with homogeneous signposts. The influence of signpost type on the scores for any task was not dependent on the part of speech.


2019 ◽  
Vol 24 (2) ◽  
pp. 266-288 ◽  
Author(s):  
Abdelkader Hermas

This study investigates the acquisition of genericity in advanced third language (L3) English. The learners are first language (L1) Moroccan Arabic–second language (L2) French adults. They completed an acceptability judgment task testing the interpretation of five count nominal types in noun phrase (NP)-level and sentence-level genericity: definite, indefinite and bare singulars, definite and bare plurals. The study defines the generic or non-generic status of every NP form in the learners’ L3 interlanguage. The results show that the L3 learners are target-like on the generic interpretation of bare plurals, although these are strictly existential in their native language and illicit in L2 French. Definite and bare singulars do not pose any difficulty either. In contrast, non-facilitative L1 transfer induces the generic interpretation of definite plurals and restricts indefinite singulars to the existential interpretation. The results show that the L3 learners do not distinguish NP-level from sentence-level genericity, reflecting L1 Arabic grammar where the two merge. They use the same pattern of NP types for the two types. Thus, knowledge of genericity in L3 English is a patchwork of target-like and non-target-like exponents.


2017 ◽  
pp. 35-46 ◽  
Author(s):  
Irene Doval

This paper reviews the author’s experiences of tokenizing and POS tagging a bilingual parallel corpus, the PaGeS Corpus, consisting mostly of German and Spanish fictional texts. This is part of an ongoing process of annotating the corpus for part-of-speech information. This study discusses the specific problems encountered so far. On the one hand, tagging performance degrades significantly when applied to fictional data and, on the other, pre-existing annotation schemes are all language specific. To further improve accuracy during post-editing, the author has developed a common tagset and identified major error patterns.


Repositor ◽  
2020 ◽  
Vol 2 (7) ◽  
pp. 897
Author(s):  
Dyah Anitia ◽  
Yuda Munarko ◽  
Yufis Azhar

AbstrakPada penelitian ini dilakukan investigasi parser dengan pendekatan left-corner untuk data tweet bahasa Indonesia. Total koleksi tweet sebanyak 850 tweet yang dibagi menjadi tiga kumpulan data, yakni data train POS Tagger, data train dan data uji. Left-corner menggabungkan dua metode yakni top-down dan bottom-up. Dimana top-down digunakan pada proses pengenalan kelas kata dan bottom-up digunakan pada proses pengenalan struktur kalimat. Adapun jenis tag yang digunakan dalam proses top-down berjumlah 23 tagset dan frasa  yang digunakan untuk menentukan struktur kalimat frasa yakni frasa nomina, frasa verbal, frasa adjektiva, frasa adverbia dan frasa preposisional. Hasilnya adalah untuk pendekatan left corner mencapai nilai precision 88,29%, nilai recall 68,3% dan F1 measure 77,02%. Nilai yang diperoleh dengan pendekatan left-corner lebih besar dibandingkan nilai dengan pendekatan bottom-up. Hasil dari nilai yang diperoleh dengan bottom up mencapai nilai precision 68,79%, nilai recall 47,12% dan F1 measure 55,9%. Hal ini disebabkan penggunaan kelas kata pada proses top-down berpengaruh pada sturuktur kalimat pada proses bottom up.AbstractIn this research, we investigated parser with left-corner parser approach for data tweet in Indonesian language. The data used was consisted of 850 tweets which divided for into three data set, that is data train for POS Tagger, data train for parser and data test. The left-corner combines two methods, top-down and bottom-up methods. Top-down  used for processes a sequence of words, and attaches a part of speech tag to each and bottom-up used for processes a sentence structure. We used 41 tags and the pharse used to define the sentence structure is noun phrase, verbal phrase, adjective pharse, adverd phrase and prepositional pharse. The result was that precision 88,29%,  recall 68,3% and F1 measure 77,02% of left-corner approach. The value obtained by the left-corner approach is greater than the value with the bottom-up approach. The result was that precision 68,29%,  recall 47,12% and F1 measure 55,9% of bottom-up approach. This is because the use of word class in top-down process affect the sentence structure in the bottom up process. that is because the use of word class in top-down process affect the sentence structure in the bottom up process.


2017 ◽  
Vol 18 (2) ◽  
pp. 207-229
Author(s):  
Belén Labrador

Abstract The present paper reports on a translation-based teaching-oriented study of the expression of path and manner of motion (Talmy 1972) in English and Spanish. The aim is to explore contrastive differences by analysing translations, with special attention to crossed transposition (Molina and Hurtado Albir 2002), which implies a double shift of part-of-speech from the source text to the target text, and is the expected type of transfer between a satellite-framed language like English and a verb-framed language like Spanish. Two corpora have been used, a monolingual corpus of Children’s Short Stories, the CSS-corpus, and a parallel corpus English-Spanish, P-ACTRES 2.0. The results show a high tendency for implicitation of either path or manner and for compression in the translations into Spanish, whereas crossed transposition is preferred in the translations into English. Also, some pedagogical applications are suggested for including these motion expressions in TEFL to young learners through storytelling.


Sign in / Sign up

Export Citation Format

Share Document