corpus analysis
Recently Published Documents


TOTAL DOCUMENTS

789
(FIVE YEARS 299)

H-INDEX

27
(FIVE YEARS 2)

Linguistics ◽  
2022 ◽  
Vol 0 (0) ◽  
Author(s):  
Zhuo Jing-Schmidt ◽  
Jun Lang ◽  
Heidi Hui Shi ◽  
Steffi H. Hung ◽  
Lin Zhu

Abstract Despite extensive research efforts to explain the Mandarin Chinese particle le, confusion persists in the absence of a unitary theory and sufficient empirical evidence. This study provides a unitary account of le by adopting a usage-based constructionist approach, one that liberates grammatical aspect from, and is able to accommodate, lexical aspect. We argue that le participates in two distinct family resemblance constructions of aspect construal associated with two distinct sentential positions. The clause-internal le construction construes the closing or final boundary of an event and the clause-final le construction construes the opening or initial boundary of an event. Corpus analysis showed that the two aspect constructions have distinct patterns in natural language uses that are consistent with the proposed construals. Results from elicited response data showed that native speakers paid attention to construction-level formal and semantic cues in making family resemblance judgments about tokens of the two constructions. This study has both theoretical and methodological implications for crosslinguistic research on grammatical aspect in relation to lexical aspect and for usage-based constructionist approaches to grammatical categories beyond aspect.


2022 ◽  
Author(s):  
Petar Gabrić ◽  
Mija Vandek

Verbal fluency tasks are often used in neuropsychological research and may have predictive and diagnostic utility in psychiatry and neurology. However, researchers using verbal fluency have uncritically assumed that there are no category- or phoneme-specific effects on verbal fluency performance. We recruited 16 young adult subjects and administered two semantic (animals, trees) and phonemic (K, M) fluency tasks. Because of the small sample size, results should be regarded as preliminary. On the animal compared to the tree task, subjects produced significantly more legal words, had a significantly lower intrusion rate, significantly shorter first-response latencies and final silence periods, as well as significantly shorter between-cluster response latencies. These differences may be explained by differences in the category sizes, integrity of the categories' borders, and efficiency of the functional connectivity between subcategories. On the K compared to the M task, subjects produced significantly more legal words and had significantly shorter between-cluster response times. Counterintuitively, a corpus analysis revealed there are more words starting with m compared to k in the experimental language. Our results have important implications for research utilizing verbal fluency, including decreased reproducibility, unreliability of diagnostic and predictive tools based on verbal fluency, and decreased knowledge accumulation.


2021 ◽  
Vol 8 (4) ◽  
pp. 15-27
Author(s):  
Mustafa Dolmaci ◽  
Hatice Sezgin

In order to provide “a common basis for the elaboration of language syllabuses, curriculum guidelines, examinations, textbooks, etc. across Europe”, The Common European Framework for Languages (CEFR) was published in 2001 by the Council of Europe. It has affected the way languages are taught, learnt and assessed and also how foreign language proficiency levels are defined all around the world. The CEFR adopts an intercultural approach to foreign language, and the main purpose is to protect cultural diversity and to give importance to cultural activities rather than being a part of foreign language education. For this reason, culture is at the very core of the CEFR. In 2018 and 2020, two Companion Volumes were published to complement the CEFR. The present paper offers a comparative corpus analysis of these three texts focusing on the occurrences of culture-related items using n-gram tool of Sketch Engine (Lexical Computing, n. d.), which creates frequency lists of sequences of tokens. Based on the findings, it is suggested according to the CEFR that rather than focusing on the national culture of the native speakers of the target language, foreign language education should focus more on the “new culture” formed by the encounters of people coming from different cultures.


2021 ◽  
Vol 45 ◽  
Author(s):  
Roman Roszko

On New Manually Aligned and Tagged Bilingual Parallel Corpora and Their ApplicationsThis article is devoted to the manually aligned and tagged bilingual parallel CLARIN-PL-BIZ corpora of the Baltic and Slavic languages which are currently being developed. The study discusses the essential features of these corpora that make their applications go far beyond typical corpus analysis. Applications of these corpora include the design of cross-language models for the development of machine translation and artificial intelligence. The article also draws attention to the high potential of these resources as a model training base for testing natural language processing tools. O nowych ręcznie zrównoleglonych i znakowanych dwujęzycznych korpusach równoległych oraz ich zastosowaniachW artykule autor opisuje obecnie powstające ręcznie zrównoleglone i znakowane dwujęzyczne korpusy równoległe CLARIN-PL-BIZ języków bałtyckich i słowiańskich. Omawia wyróżniające cechy tych korpusów, które sprawią, że zastosowania tych korpusów znacznie wykroczą poza typowe analizy korpusowe. Wśród zastosowań tych korpusów autor wymienia definiowanie modeli międzyjęzykowych na rzecz rozwoju przekładu maszynowego i rozwoju sztucznej inteligencji. Zwraca również uwagę na wysoki potencjał tych zasobów jako wzorcowej bazy treningowej do testowania narzędzi przetwarzania języka naturalnego.


Virittäjä ◽  
2021 ◽  
Vol 125 (4) ◽  
Author(s):  
Terhi Ainiala ◽  
Paula Sjöblom

Artikkelissa tarkastellaan Virittäjässä vuosina 1897–2019 julkaistuja erisnimiä käsitteleviä kirjoituksia ennen kaikkea metodisesta näkökulmasta. Virittäjässä on tarkastelu­ajanjaksona ilmestynyt liki 500 nimistöaiheista kirjoitusta, joista liki puolet, 232, on artikkeleita, katsauksia ja väitöslektioita. Nämä niin sanotut alkuperäistutkimukset on jaettu käytetyn metodin perusteella seitsemään ryhmään. Luokittelu on väistämättä karkea, ja luokkien välillä on päällekkäisyyttä. Kirja-arviot ja konferenssikatsaukset ovat mukana tarkastelussa mutta eivät laskelmissa. Etymologinen suuntaus on ollut voimissaan koko tarkastelujakson ajan, ja kaikista nimistökirjoituksista yli puolet kuuluu tähän ryhmään. Nimitypologinen analyysi, sosio-onomastiikka ja kontaktionomastiikka ovat seuraavaksi suosituimpia metodeja, joiden kunkin osuus alkuperäistutkimuksista on noin 10 prosenttia. Funktionaalis-semanttinen analyysi ja korpusanalyysi ovat melko harvinaisia; vain kolmesta neljään prosenttia kirjoituksista noudattaa jompaakumpaa metodia. Kymmenisen prosenttia kirjoituksista on luokiteltu monitieteistä analyysia hyödyntäviksi, eli niissä onomastisen metodin ohessa käytetään jotain ei-lingvististä metodia.  Virittäjän kirjoituksissa kuvastuu suomalaisen onomastiikan metodinen kehitys, joka on ollut paljolti aiempien metodien pohjalle rakentamista: esimerkiksi typologisen ajattelun piirteitä alettiin tuoda esiin etymologissa kirjoituksissa, ja sosio-onomastinen suuntaus taas sai virtaa typologisesta. Nimien merkityspiirteiden ja funktioiden analyysi on ollut luonteva seuraus erilaisten nimistöjen typologiaa selvitettäessä. Korpusanalyysi taas on luonnollista kehitystä aiemmista suuntauksista ja lingvistiikassa ylipäänsä käyttöön tulleista suurten aineistojen analyysityökaluista. Research methods in the field of onomastics in Virittäjä The article examines onomastic writings, published in Virittäjä between 1897–2019, from a primarily methodological perspective. During this period, Virittäjä published nearly 500 onomastic writings, of which almost half, 232, are articles, review articles and lectures given at the public defences of doctoral dissertations. Here, these research papers are classified by their methods into seven groups. The classification is rough, and there is some overlap between the categories. Book reviews and congress reports have been taken into account, but they are not included in the calculations. Etymological research was prevalent throughout the period: over half of all onomastic papers published in the journal fall into this category. The next most popular methods are name typological analysis, socio-onomastic analysis and contact onomastics, each accounting for ca. 10% of all papers. Functional-semantic analysis and quantitative corpus analysis are rarer occurrences; only 3–4% of all papers follow either of these methods. About 10% have been classified as multidisciplinary papers because they exploit non-linguistic methods in addition to onomastic methods. The writings in Virittäjä reflect the methodical development of Finnish onomastics. New methods always build on what has gone before: for instance, typological thinking can already be observed in the early etymological papers, and socio-onomastic research arose from typological analysis. This being said, semantic and functional analysis was a natural consequence of the typological studies of different nomenclatures, whereas corpus analysis evolved out of previous methods, particularly since the advent of new technological tools for analysing large quantities of data.


Sign in / Sign up

Export Citation Format

Share Document