Phraseological Analysis of Learner Corpus Based on Language Model

The article enlightens the probem of nonsense and its role in the development of creative thinking and fantasy, and the way how the interpretation of nonsense affects children imagination. The function of imagination inherent to a person, and especially to a child, has a powerful potential – to create artificially new metaphorical models, absurd and most incredible situations based on self-amazement. Children are able to measure the properties of unfamiliar objects with the properties of known things. It is not difficult for small researchers to replace incomprehensible meanings with familiar ones; to think over situations, to make analogies, to transfer signs and properties of one object to another. The problem of nonsense research is interesting and relevant. The element of the game is an integral component of nonsense. In the process of playing, children cognize the world, learn to interact with the world, imitating the adults behavior. Imagination and fantasy help the child to invent his own rules of the game, to choose language elements that best suit his ideas. The child uses the learned productive models of the language system to create their own models and their own language, attracting language signs: words, morphs, sentences. Children’s dictionary stimulates word formation and language nomination processes. Nonsense-words are the result of children’s dictionary, speech errors and occazional formations, presented in the form of contamination, phonetic transformations, lexical substitution, implemented on certain models. The first two models are phonetic imitation and hybrid speech, based on the natural language model. The third model of designing nonsense is represented by words that have no meaning at all and can be attributed to words-portmonaie. Due to the flexibility of interframe relationships and the lack of algorithmic thinking, children can not only capture the implicit similarity of objects and phenomena, but also create it through their imagination. Interpretation of nonsense is an effective method of developing imagination in children, because metaphors, nonsense as a means of creating new meanings, modeling new content from fragments of one’s own experience, are a powerful incentive for creative thinking.

Download Full-text

Text Genre Detection Using Doc2Vec Word-embedding Language Model

Language and Information ◽

10.29403/li.23.2.2 ◽

2019 ◽

Vol 23 (2) ◽

pp. 23-43

Author(s):

Dongsung Kim

Keyword(s):

Language Model ◽

Word Embedding ◽

Text Genre

Download Full-text

N-gram based Language Model for the QWERTY Keyboard Input Errors in a Touch Screen Environment

Korean Institute of Smart Media ◽

10.30693/smj.2018.7.2.54 ◽

2018 ◽

Vol 7 (2) ◽

pp. 54-59

Author(s):

Yoon Gee Ong ◽

◽

Seung Shik Kang ◽

Keyword(s):

Language Model ◽

Touch Screen ◽

Keyboard Input ◽

N Gram

Download Full-text

Extractive Summarisation Based on Keyword Profile and Language Model

10.3115/v1/n15-1013 ◽

2015 ◽

Cited By ~ 1

Author(s):

Han Xu ◽

Eric Martin ◽

Ashesh Mahidadia

Keyword(s):

Language Model

Download Full-text

Learner corpus profiles

10.3726/978-3-0351-0567-4 ◽

2014 ◽

Cited By ~ 2

Author(s):

Madalina Chitez

Keyword(s):

Learner Corpus

Download Full-text

Studies in Learner Corpus Linguistics

10.3726/978-3-0351-0736-4 ◽

2016 ◽

Cited By ~ 1

Keyword(s):

Corpus Linguistics ◽

Learner Corpus

Download Full-text

Jaunu burtu veidošana ar diakritiskajām zīmēm latviešu valodas kā svešvalodas apguvēju tekstos

Valodu apguve: problēmas un perspektīva : zinātnisko rakstu krājums = Language Acquisition: Problems and Perspective : conference proceedings - Valodu apguve: problēmas un perspektīva = Language Acquisition: Problems and Perspective ◽

10.37384/va.2020.16.102 ◽

2020 ◽

pp. 102-110 ◽

Cited By ~ 1

Author(s):

Inga Kaija

Keyword(s):

Quantitative Analysis ◽

Foreign Language ◽

Computer Science ◽

Individual Feature ◽

Common Occurrence ◽

Average Amount ◽

Learner Corpus ◽

Diacritical Mark ◽

Areas Of Interest ◽

Larger Sample

A Latvian learner corpus “LaVA” is being built in the Institute of Mathematics and Computer Science, University of Latvia. The corpus includes texts written by beginner learners in the first two semesters of learning Latvian as a foreign language. The texts are written by hand and digitized afterwards in order to reduce the issues that could be caused by the necessity to learn not only writing itself but also using a foreign keyboard. One of the features that cannot be digitized is the new letters created by adding diacritical marks which are not used that way in the standard Latvian alphabet. Since one of the essential steps in learning to write in a language is learning the letters and diacritical marks of that language, this study aims to find instances of such newly made letters and to discuss the basic quantitative measures in order to define hypotheses and areas of interest for further research of such usage. Altogether 322 texts were searched, and 175 examples were found. The amount of examples found in 2nd semester texts was less than half the amount of examples found in the 1st semester texts, but the percentage of texts containing examples was higher than expected – more than 33 % in the 1st semester and almost 20 % in the 2nd semester. It leads to a conclusion that this is quite a common occurrence but also prone to reduction in the second semester. The corpus does not provide any data on later semesters so it cannot be predicted when such instances should become a rare, individual feature rather than a common one. The average amount of examples in a text is not high, though. Counting only the texts where at least one example was found, the average amount of examples per text is 2.136 in the 1st semester and 1.690 in the 2nd semester. Considering that the absolute lowest possible value here is 1, it should not be considered as a high value. Therefore, using diacritical marks to make new letters, while a common feature of the Latvian interlanguage, could be characterized as casual rather than systemic. However, that does not exclude the possibility of certain patterns in usage. The currently collected data already shows that there are some words – such as garšo, viņš, ļoti, četri – where examples were found in more than one author’s text. Examples of using unsuitable diacritical marks are also sometimes found next to letters for which said diacritical marks would be suitable. This should be explored more thoroughly using qualitative methods. The size of the corpus keeps growing; the expected size upon completion is 1000 texts. When it is reached, it would be useful to repeat the study and check whether the larger amount of data still confirms the same assumptions. The larger sample size would also allow for more detailed quantitative analysis discussing each letter, diacritical mark, placement of the diacritical mark, and metadata collected for the corpus, such as gender, native language and other spoken languages by the authors of the texts.

Download Full-text