scholarly journals THE POTENTIAL OF GOOGLE SEARCH FOR STUDIES IN COGNITIVE CORPUS LINGUISTICS

2019 ◽  
pp. 127-142
Author(s):  
Inna M. Petrova ◽  

The paper investigates the possibility to employ the Google search system as an analogue of the corpus of texts for potential use in further cognitive research of a language. The purpose of the article is to elucidate the significance of statistical data available due to the use of different operators and filters of the search system in the study of cognitive mechanisms of representation of linguistic reality in speech. Experimental observations have been made to compare the results of competing queries in Google and Russian National Corpus based on the word order variability of binomial phrases. The results obtained showed that the volume and variety of language data justifies the employment of Google for these purposes. This leads to the conclusion that these data can be considered valid as linguistic material for further interpretation in cognitive language research

2020 ◽  
Vol 14 (3) ◽  
pp. 549-556
Author(s):  
Maria Petrovna Bezenova ◽  
Grigory Leonidovich Grigoryev

Corpus linguistics is currently one of the most popular sections of linguistics. Most of the major languages of the world today already have their own digital corpora of tens and hundreds of millions of word usage. Recently, special attention has also been paid to the creation of text corpus in the languages of the peoples of Russia, since, on the one hand, corpus research allows you to look at the structure of the language from a completely different perspective, on the other hand, the corpus is a kind of form of storing language data. The article describes the Udmurt National Corpus, which has been developed since the end of 2019 by the staff of the philological research department of the Udmurt Institute of History, Language and Literature of the Udmurt Federal Research Center of the Ural Branch of the Russian Academy of Sciences. It speaks in detail about the capabilities of the information and reference system being created at the moment, as well as about the prospects for using the corpus of texts when conducting research, preparing dictionaries, and creating various programs in the Udmurt language. The article also deals with the Hunspell-based Udmurt spell checker developed by Grigory Grigoriev, which plays an important role in replenishing the Udmurt National Corps. Before uploading new texts to the site, all of them are subjected to a mandatory check for spelling errors that could remain during their proofreading. This extension for text editors, thanks to the vocabulary database associated with the affix file, which contains all possible morphological variants of the lexemes of the main dictionary, identifies spelling errors in the text, allowing you to upload the most verified texts to the website of the Udmurt National Corpus.


2020 ◽  
Vol 74 (4) ◽  
pp. 291-299
Author(s):  
К. Pirmanova ◽  
◽  
B. Karbozova ◽  
D. Tokmyrzhayev ◽  
◽  
...  

When studying the Kazakh language, it is necessary to pay great attention to the field of corpus linguistics and study its theoretical and practical aspects of the world level. Special editions of scientific journals also publish articles on General and specific issues related to the creation and operation of text corpora around the world. However, it is known that Kazakh linguistics requires special study of many issues related to corpus linguistics. It includes: the definition of corpus linguistics and its main concepts, the place of corpus linguistics in the structure of linguistics, methods, etc. Theoretical and practical aspects of the above-mentioned corpus linguistics should also be taken into account when creating a database of texts in the Kazakh language based on a computer corpus. If corpus linguistics is formed as a special section of Kazakh linguistics, it will allow many specialists in the Kazakh language to use large-scale experimental materials, find the necessary language data and make appropriate changes. All this contributes to a new look at the empirical approaches to the reliability of research in the Kazakh language and the introduction of the most important language materials in the field of science.


sjesr ◽  
2020 ◽  
Vol 3 (4) ◽  
pp. 262-267
Author(s):  
Abdul Ghaffar Bhatti ◽  
Muhammad Imran ◽  
Muhammad Younas

Technology plays a pivotal role in the ESL teaching and education sector. In language teaching, gender and language research mostly favors the idea of potential differences in language use between men and women. This paper explores different indicators of gender in the writing of males and females in a large subset of the British National Corpus (BNC) covering the domain of fiction with the application of the Corpus tool. Robin Lakoff's four key linguistic terms that mark female language have been used as benchmarks against which the study has been conducted. Previous researchers like Argamon, Koppel, and Shimoni claim that females use more pronouns and a smaller number of nouns as compared to men. The hits and frequencies of Lakoff's terms and researchers' claims have been checked on BNC to get at the empirical findings. Taking general corpus BNC, corpus research method has been used to answer the research questions. The study found a substantial difference in the documents authored by male and female written text. It was also found that females use many more pronouns and males use many more nouns. Assumptions made regarding Lakoff's terms have been partially substantiated since the results vary a little concerning the use of empty adjectives like 'cute' and 'divine'. The work is a valuable addition to the existing corpus of knowledge about gender differences in language and it provides space for researchers to work in even broader perspectives.


Author(s):  
Deo Kawalya ◽  
Koen Bostoen ◽  
Gilles-Maurice de Schryver

Abstract This article employs a 4-million-word diachronic corpus to examine how the expression of possibility has evolved in Luganda since the 1890s to the present, by focusing on the language’s three main potential markers -yînz-, -sóból- and -andi-, and their historical interaction. It is shown that while the auxiliary -yînz- originally covered the whole modal subdomain of possibility, the auxiliary -sóból- has steadily taken over the more objective categories of dynamic possibility. Currently, -yînz- first and foremost conveys deontic and epistemic possibility. It still prevails in these more subjective modal categories even though the prefix -andi-, a conditional marker in origin, has started to express epistemic possibility since the 1940s, and -sóból- deontic possibility since the 1970s. More generally, this article demonstrates the potential of corpus linguistics for the study of diachronic semantics beyond language comparison. This is an important achievement in Bantu linguistics, where written language data tend to be young.


2021 ◽  
pp. 191-210
Author(s):  
Nikolay D. Golev ◽  
◽  
Irina P. Falomkina ◽  

The paper is dedicated to describing the word-building system of the Russian language in terms of its vocabulary. Lexical factors are discussed influencing the formation of lexical units’ potential as motivating units of word-building processes and relations and the realization of this potential in language activities. Of most interest for the authors are anthropocentric determinants, most of which are coordinating the lexical system and, through its mediation, the word-building system with the worldview of native speakers of the Russian language. The proposed model of derivational development of vocabulary provides such coordination through studying the deep-seated process of conceptualization of the words that are the potential motivators of neologisms. This study identifies the word frequency as an external manifestation of conceptualization. The frequency data were obtained from Google search system statistical data. Capturing not only usual but also occasional and potential words, this source is an effective tool for studying word-building processes and their results. This study has unveiled the interrelation between the language worldview of native speakers of Russian and their “word-building behavior” in language activities. The worldview has been found, first of all, to be determined by the pragmatic factor, which primarily influences the usage of a word in the speech reflected by its frequency. The frequency ranks lexical units due to their derivational potential and thereby provides a researcher with a reliable instrument for its study.


2021 ◽  
pp. 002383092110530
Author(s):  
Dan Villarreal ◽  
Lynn Clark

A growing body of research in psycholinguistics, corpus linguistics, and sociolinguistics shows that we have a strong tendency to repeat linguistic material that we have recently produced, seen, or heard. The present paper investigates whether priming effects manifest in continuous phonetic variation the way it has been reported in phonological, morphological, and syntactic variation. We analyzed nearly 60,000 tokens of vowels involved in the New Zealand English short front vowel shift (SFVS), a change in progress in which trap/dress move in the opposite direction to kit, from a topic-controlled corpus of monologues (166 speakers), to test for effects that are characteristic of priming phenomena: repetition, decay, and lexical boost. Our analysis found evidence for all three effects. Tokens that were relatively high and front tended to be followed by tokens that were also high and front; the repetition effect weakened with greater time between the prime and target; and the repetition effect was stronger if the prime and target belonged to (different tokens of) the same word. Contrary to our expectations, however, the cross-vowel effects suggest that the repetition effect responded not to the direction of vowel changes within the SFVS, but rather the peripherality of the tokens. We also found an interaction between priming behavior and gender, with stronger repetition effects among men than women. While these findings both indicate that priming manifests in continuous phonetic variation and provide further evidence that priming is among the factors providing structure to intraspeaker variation, they also challenge unitary accounts of priming phenomena.


2016 ◽  
Vol 51 (3) ◽  
pp. 63-94
Author(s):  
Radosław Dylewski

Abstract The onset of Professor Jacek Fisiak’s scholarly career is marked by his 1961 Ph.D. dissertation devoted to the lexical influence of English upon Polish. This study, conducted 55 years ago, offers a multilayered analysis and sets the standards of studies on lexical transfer from English to Polish for the years to come. The present article is a tribute to Fisiak’s first scholarly endeavor; it examines the fate of lexical items comprising Fisiak’s corpus in the second decade of the 21st century. More specifically, by conducting searches in the National Corpus of Polish as well as a Google search, the paper checks which borrowings to the Polish language listed and scrutinized by Fisiak gained popularity, which fell out of use, and which underwent semantic changes.


2020 ◽  
Vol 7 (5(74)) ◽  
pp. 33-38
Author(s):  
M.V. Kozis

The paper focuses on language conceptualization of objects’ state of being within the framework of the frame approach studying cognitive frame of spatial position of entities. The author offers a linguistic overview of Russian metaphoric posture verbs stoyat', sidet', lezhat'. The analysis is based on a sample of over 1,500 Russian sentences from the Russian National Corpus, Google search results and utterances offered by native speakers of Russian.Distributive analysis allowed to define the co-occurrence of posture verbs with nouns denoting different objects and to hypothesize the verbs' meanings. Triangulation approach involving corpus experiment, semantics experiment and inquiries in searching systems revealed frequency and acceptability of the verbs stoyat', sidet', lezhat' in utterances representing various denotative situations, which allowed to verify the hypothesis on the verbs' meaning and describe their semantics. The study reveals variability in cognitive interpretation of physical objects’ sate of being and the key role of human prototype in conceptualization of spatial position of entities. The study shows that language representation of the frame “the object’s state of being in space” relies on its salient element –possible one-to-one correspondence between the object’s position and a human posture. The final stage of the research features semantic description of the verbs under study.


AILA Review ◽  
2014 ◽  
Vol 27 ◽  
pp. 80-97 ◽  
Author(s):  
Alison Mackey

Since its inception, the field of second language research has utilized methods from a number of areas, including general linguistics, psychology, education, sociology, anthropology and, recently, neuroscience and corpus linguistics. As the questions and objectives expand, researchers are increasingly pushing methodological boundaries to gain a clearer picture of second language learning. At one end for example, we see measures of cognition (e.g., brain imaging and eye tracking) and at the other end we see exploration of issues of culture and identity (e.g., ethnographies, deep dive case studies, introspective and narrative analyses). There is an emerging emphasis on research synthesis, meta-analysis, and replication. This article illustrates a few of the advancements in methods and research agendas in SLA. I will conclude by highlighting some of the ways that second language researchers can continue to incorporate, assimilate, and shape methodology, as well as pointing out some of the potential pitfalls, and overall, how these methodological innovations benefit the field.


2021 ◽  
pp. 1-12
Author(s):  
Anita Ramalingam ◽  
Subalalitha Chinnaudayar Navaneethakrishnan

Thirukkural, a Tamil classic literature, which was written in 300 BCE is a didactic literature. Though Thirukkural comprises 1330 couplets which are organized into three sections and 133 chapters, in order to retrieve meaningful Thirukkural for a given query in search systems, a better organization of the Thirukkural is needed. This paper lays such a foundation by classifying the Thirukkural into ten new categories called superclasses that is helpful for building a better Information Retrieval (IR) system. The classifier is trained using Multinomial Naïve Bayes algorithm. Each superclass is further classified into two subcategories based on the didactic information. The proposed classification framework is evaluated using precision, recall and F-score metrics and achieved an overall F-score of 82.33% and a comparison analysis has been done with the Support Vector Machine, Logistic Regression and Random Forest algorithms. An IR system is built on top of the proposed system and the performance comparison has been done with the Google search and a locally built keyword search. The proposed classification framework has achieved a mean average precision score of 89%, whereas the Google search and keyword search have yielded 59% and 68% respectively.


Sign in / Sign up

Export Citation Format

Share Document