Evaluation of the Syntactic Annotation in EPEC, the Reference Corpus for the Processing of Basque

Author(s):  
Larraitz Uria ◽  
Ainara Estarrona ◽  
Izaskun Aldezabal ◽  
Maria Jesús Aranzabe ◽  
Arantza Díaz de Ilarraza ◽  
...  
2018 ◽  
Vol 18 (1) ◽  
pp. 74-99
Author(s):  
Achim Stein

Abstract This contribution presents two syntactically annotated corpora of Old French, Modéliser le changement: les voies du français (MCVF) and the Syntactic Reference Corpus of Medieval French (SRCMF). The focus is on how the underlying syntactic theory (constituency vs. dependency) influences the grammar model and how this choice is reflected in the syntactic annotations of the corpora. The comparison relates to the most relevant general properties of the corpora as well as to two phenomena, null subjects and cleft constructions. Null subjects highlight possible conflicts between syntactic annotation models and syntactic theory, and the information-structural properties of cleft constructions pose a particular problem for the interpretation and annotation of historical corpora. Both phenomena are major instances of diachronic variation in French. The study is relevant for corpus users working on diachronic syntax, as well for corpus builders wishing to design a grammar model for annotation.


Author(s):  
Izaskun Aldezabal ◽  
Maria Jesus Aranzabe ◽  
Jose Mari Arriola ◽  
Arantza Diaz de Ilarraza

Author(s):  
Norwati Roslim ◽  
Muhammad Hakimi Tew Abdullah ◽  
Anealka Aziz ◽  
Vahid Nimehchisalem ◽  
Azhani Almuddin

Numerous corpus studies have suggested that teaching materials design could greatly benefit from the empirical information about language use provided by corpus linguistics. In spite of the awareness that corpus-based research can offer valuable insights for materials development, still relatively small number of studies report on the practical applications of corpus data for teaching materials development. There is no clear guideline or framework on how corpora and corpus studies could assist in developing teaching materials. Hence, this study focusses on one grammatical item which poses problems to Malaysian learners, that is, prepositions. The objectives are (i) to identify prepositions in the British National Corpus as a reference corpus and the descriptions offered by linguists and grammarians as a reference grammar, and (ii) to provide a framework to use reference corpus, reference grammar and corpus-based research, as a resource for developing materials in the teaching of prepositions. In order to meet the objectives, content analysis was used as the methodology throughout this study. The findings showed that reference corpus, reference grammar and corpus-based research could be used systematically as guidance to develop corpus-informed materials. It is hoped that this contribution of knowledge could have an impact on second language learning-teaching.


2022 ◽  
Author(s):  
Vít Dovalil ◽  
Adriana Hanulíková

Abstract Grammar is the structural foundation of successful communication, language use, and literacy development. Grammar is therefore sometimes viewed as the heart of language with an important place in language teaching. In a classroom setting, regulation of grammar knowledge through teachers is strongly influenced by teachers’ linguistic competence and beliefs. In this paper, we will first show the diversity in this knowledge by means of teacher interviews and speeded grammatical-acceptability data from pupils and students. We will then sketch a socio- and psycholinguistic perspective on several selected morphosyntactic variables in German. These will be discussed with reference to social forces that determine what is standard in a language (language norm authorities, language experts, model texts, and codifiers). Finally, we will draw a roadmap for teachers, language practitioners and editors looking for a qualified solution to grammatical cases of doubt in contemporary German and provide practical examples by drawing upon the German reference corpus.


1999 ◽  
Vol 2 (2) ◽  
pp. 211-229 ◽  
Author(s):  
Tony McEnery ◽  
Richard Xiao

This paper uses an English-Chinese parallel corpus, an L1 Chinese comparable corpus, and an L1 Chinese reference corpus to examine how aspectual meanings in English are translated into Chinese and explore the effects of domains, text types and translation on aspect marking. We will show that while English and Chinese both mark aspect grammatically, the aspect system in the two languages differs considerably. Even though Chinese, as an aspect language, is rich in aspect markers, covert marking (LVM) is a frequent and important strategy in Chinese discourse. The distribution of aspect markers varies significantly across domain and text type. The study also sheds new light on the translation effect by contrasting aspect marking in translated Chinese texts and L1 Chinese texts.


Author(s):  
V. F. Vydrin ◽  
◽  
J. J. Méric ◽  

A model for the development of a corpus-driven spelling dictionary for the Bambara language is described. First, a list of about 4,000 lexemes characterized by spelling variability is extracted from an electronic BambaraFrench dictionary. At the next stage, a script is applied to determine the number of occurrences of each spelling variant in the Bambara Reference Corpus, separately for the entire Corpus (more than 11 million words) and for its disambiguated subcorpus (about 1.5 million words). Statistics on the diversity of sources and authors are also obtained automatically. The statistical data are then sorted manually into two lists of lexemes: those whose standard spelling can be established statistically, and those requiring evaluation by expert linguists. Some difficult cases are discussed in the paper. At the final stage, a representative expert commission will discuss all those lexemes for which statistical data alone do not suffice to define a standard spelling variant, before taking a final decision on each. The resulting Bambara spelling dictionary will be published electronically and on paper.


2020 ◽  
Vol 25 (1) ◽  
pp. 101-123
Author(s):  
Dirk Speelman ◽  
Stefan Grondelaers ◽  
Benedikt Szmrecsanyi ◽  
Kris Heylen

Abstract In this paper, we revisit earlier analyses of the distribution of er ‘there’ in adjunct-initial sentences to demonstrate the merits of computational upscaling in syntactic variation research. Contrary to previous studies, in which major semantic and pragmatic predictors (viz. adjunct type, adjunct concreteness, and verb specificity) had to be coded manually, the present study operationalizes these predictors on the basis of distributional analysis: instead of hand-coding for specific semantic classes, we determine the semantic class of the adjunct, verb, and subject automatically by clustering the lexemes in those slots on the basis of their ‘semantic passport’ (as established on the basis of their distributional behaviour in a reference corpus). These clusters are subsequently interpreted as proxies for semantic classes. In addition, the pragmatic factor ‘subject predictability’ is operationalized automatically on the basis of collocational attraction measures, as well as distributional similarity between the other slots and the subject. We demonstrate that the distribution of er can be modelled equally successfully with the automated approach as in manual annotation-based studies. Crucially, the new method replicates our earlier findings that the Netherlandic data are easier to model than the Belgian data, and that lexical collocations play a bigger role in the Netherlandic than in the Belgian data. On a methodological level, the proposed automatization opens up a window of opportunities. Most important is its scalability: it allows for a larger gamut of alternations that can be investigated in one study, and for much larger datasets to represent each alternation.


Sign in / Sign up

Export Citation Format

Share Document