corpus design
Recently Published Documents


TOTAL DOCUMENTS

97
(FIVE YEARS 28)

H-INDEX

12
(FIVE YEARS 2)

2022 ◽  
Vol 2022 ◽  
pp. 1-9
Author(s):  
Jiahui Gu

The traditional mixed oral English teaching model has many obvious shortcomings, such as the inability to correct the students’ oral pronunciation errors and feed them back in time, which leads to the slow improvement of students’ English learning level. For this reason, this paper proposes a guided teaching model based on core literacy. According to the structure of the oral English mixed teaching model, determine the application plan of the oral English mixed teaching model, design the development environment, obtain the corpus, design the oral training model, extract the oral features, identify the wrong pronunciation and correct it in time, clarify the evaluation purpose, obtain preliminary evaluation indicators, reduce evaluation indicators and determine indicator weights, obtain indicator feature information, generate fuzzy rules, obtain fuzzy matrices, achieve quantitative evaluation, and synthesize all evaluation scores to construct a result vector matrix to realize the study of mixed spoken language teaching mode. Research shows that the mixed teaching method is effective and feasible and can effectively improve the accuracy of the evaluation results of the mixed oral English teaching model.


2021 ◽  
pp. 69-76
Author(s):  
V.N. Bazylev ◽  
◽  
I.A. Kuperman ◽  
E.V. Chmykhova ◽  
M.L. Aranovich ◽  
...  

The article presents the results of a pilot study for the structural transformation of 18 Bachelor’s Degree Programs for e-learning using Consolidated Knowledge Corpus. Consolidated Knowledge Corpus is a multidimensional object-oriented structure of educational content objects and their connections. All academic topics from 224 disciplines were combined into a Consolidated Knowledge Corpus to unify similarities and eliminate duplication. A unified matrix of educational content for the first year of study for 18 bachelor’s programs was compiled and an optimized modular structure of the curriculum was built. As a result, the necessary production volume of digital educational products and subsequently the necessary volume of investments were reduced by 55 percent.


2021 ◽  
pp. 026765832110505
Author(s):  
Cristóbal Lozano

This article presents and reviews a new methodological resource for research in second language acquisition (SLA), CEDEL2 ( Corpus Escrito del Español L2 ‘L2 Spanish Written Corpus’), and its free online search-engine interface ( cedel2.learnercorpora.com ). CEDEL2 is a multi-first-language corpus (Spanish, English, German, Dutch, Portuguese, Italian, French, Greek, Russian, Japanese, Chinese, and Arabic) of L2 Spanish learners at all proficiency levels. It additionally contains several native control subcorpora (English, Portuguese, Greek, Japanese, and Arabic). Its latest release (version 2) holds material from around 4,400 speakers, which amounts to over 1,100,000 words. CEDEL2 follows strict corpus-design criteria (Sinclair, 2005) and L2 corpus-design recommendations (Tracy-Ventura and Paquot, 2021), and all subcorpora are equally designed to be fully contrastable, as recommended by Contrastive Interlanguage Analysis (Granger, 2015). Thanks to its design and web interface, CEDEL2 allows for complex searches which can be further narrowed down according to its SLA-motivated variables, e.g. first language (L1), proficiency level, self-reported proficiency level, age of onset to the L2, length of exposure to the L2, length of residence in a Spanish-speaking country, knowledge of other foreign languages, type of task, etc. These CEDEL2 features allow L2 researchers to address SLA questions and hypotheses.


Author(s):  
Cristóbal Lozano ◽  
Joana Teixeira ◽  
Ana Madeira

This paper presents the L1 Portuguese – L2 Spanish subcorpus of Corpus Escrito del Español L2 (CEDEL2), a new methodological resource for second language acquisition (SLA) research, which is freely searchable and downloadable (http://cedel2.learnercorpora.com). CEDEL2 is a large-scale, multi-L1 learner corpus of L2 Spanish which contains written productions from learners at all proficiency levels as well as 6 native control subcorpora (total size: over 1,100,000 words from over 4,000 participants). CEDEL2 follows strict corpus design criteria (Sinclair, 2005) and learner corpus design recommendations (Tracy-Ventura & Paquot, 2021a). In its current version (CEDEL2 v. 2), its Portuguese component includes an L1 Portuguese – L2 Spanish subcorpus, with 21,662 words written by 164 participants, and an L1 Portuguese native subcorpus, with 3,500 words from 16 L1 speakers of European Portuguese. Thanks to their design features (e.g., same design across subcorpora, inclusion of metadata about SLA-relevant variables, dual native control subcorpora) and freely available web interface, CEDEL2 and its Portuguese subcorpora allow researchers to investigate a wide range of topics in SLA.


2021 ◽  
Author(s):  
Akio Kobayashi ◽  
Keiichi Yasu ◽  
Hiromitsu Nishizaki ◽  
Norihide Kitaoka

2021 ◽  
pp. 000276422110216
Author(s):  
Erdem Yörük ◽  
Ali Hürriyetoğlu ◽  
Fırat Duruşan ◽  
Çağrı Yoltar

What is the most optimal way of creating a gold standard corpus for training a machine learning system that is designed for automatically collecting protest information in a cross-country context? We show that creating a gold standard corpus for training and testing machine learning models on the basis of randomly chosen news articles from news archives yields better performance than selecting news articles on the basis of keyword filtering, which is the most prevalent method currently used in automated event coding. We advance this new bottom-up approach to ensure generalizability and reliability in cross-country comparative protest event collection from international and local news in different countries, languages, sources and time periods, which entails a large variety of event types, actors, and targets. We present the results of comparing our random-sample approach with keyword filtering. We show that the machine learning algorithms, and particularly state-of-the-art deep learning tools, perform much better when they are trained with the gold standard corpus from a randomly selected set of news articles from China, India, and South Africa. Finally, we also present our approach to overcome the major ethical issues that are intrinsic to protest event coding.


Author(s):  
Aissa Amrouche ◽  
Ahcène Abed ◽  
Kamel Ferrat ◽  
Khadidja Nesrine Boubakeur ◽  
Youssouf Bentrcia ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document