Methodological Corpora Toolkit and its Possibilities for Modelling Cognitive and Semantic Matrices

Scientific Journal of National Pedagogical Dragomanov University Series 9 Current Trends in Language Development ◽

10.31392/npu-nc.series9.2019.19.03 ◽

2020 ◽

pp. 36-46

Author(s):

N. M. Bober

Keyword(s):

Corpus Linguistics ◽

Native Speakers ◽

Linguistic Meaning ◽

Methodological Principle ◽

Structural Method ◽

New Model ◽

Semantic Clustering ◽

Cognitive Semantic ◽

Structural Semantics ◽

Narrow Context

The article substantiates the necessity and effectiveness of involvement of corpus tools for studying the semantics of a word from the standpoint of interpretation of its cognitive nature, whose representatives have defended the encyclopaedic nature of meaning in general, unlike the views of scholars of classical structural semantics. In this connection, the correctness of Plungyan’s hypothesis that linguistics “outlines the contours of a new model of language, which is significantly and fundamentally different from the former models postulated in the last quarter of the XX century,” is commented on. Given this understanding of linguistic meaning and its role in presenting a new model of language, it has been suggested that it is important to study it in a broad and narrow context, in particular in terms of the combinatorial potencies of words – their lexical and grammatical compatibility, closely linked in corpus linguistics with such concepts such as collocations and colligations. The definitions of both terms have been clarified, and convincing arguments have been made in favour of the fact that collocations are conditionally free combinations of words used to characterize stereotypical situations and are displayed in the language of the native speakers in the form of ready phrases with inherent semantics, while colligations are limited by the morphological-syntactic frame of a certain structure. The methodological experience of corpus studies of colligations and collocations is analysed and proposed to be used to construct cognitive-semantic matrices of phrasal verbs in English. The main focus is on the capabilities of the Sketch Engine corpus system, in particular the availability of tools (Collocations, Word sketch, Thesaurus, Clustering, Sketch diff, etc.) that allow to integrate the classical (structural) method of distribution-statistical analysis of phrase-verbal collocations and colligations, and the method of lexico-semantic clustering, and the method of combinatorial syntagmatics. A hypothetical conclusion has been formulated that these and other procedural methods together will facilitate the disclosure of cognitive-semantic connections between the units under study with quantitative and statistical calculations of their performance. It is proved that the corpus-oriented principle of combinatorial syntagmatics becomes the leading methodological principle of modern cognitive-interpretative semantics.

Download Full-text

Pohľad Na Pomenovanie Cez Prizmu Teoretických Rámcov A Slovníkového Hesla

Journal of Linguistics/Jazykovedný casopis ◽

10.2478/jazcas-2019-0011 ◽

2018 ◽

Vol 69 (3) ◽

pp. 277-301

Author(s):

Alexandra Jarošová

Keyword(s):

Corpus Linguistics ◽

Cognitive Linguistics ◽

Linguistic Meaning ◽

Extended Model ◽

Theoretical Frameworks ◽

Departure Point ◽

Lexical Meaning ◽

Linguistic Pragmatics ◽

Situational Contexts ◽

Methodological Procedures

Abstract The first part of this paper outlines the relevant aspects of functional structuralism serving lexicographers as a departure point for building a model of lexical meaning useable in the Dictionary of Contemporary Slovak Language. This section also points to some aspects of Klára Buzássyová’s research on lexis and wordformation that have enriched the functionalstructuralist paradigm. The second section shows other theoretical and methodological frameworks, such as linguistic pragmatics, cognitive linguistics and corpus linguistics (all of them departing in some respect from the structuralism and, in other aspects, being complementary with it) that can enhance the structuralist basis of the model. The third section outlines an extended model of lexical meaning that represents a synthesis of all those theoretical frameworks and, at the same time, represents a reflection of three language constituents: 1. The social constituent is present in consideration of communicative functions of utterances, naming functions of lexical units, functional styles and registers, language norms, and situational contexts; 2. The psychological component takes the form of consideration of the prototype effect, the abolition of boundaries between linguistic meaning and other parts of cognition; 3. Thanks to the structural/systematic component, a description of paradigmatic and syntagmatic behaviour of words can be performed, and an inventory of formalcontent units and categories (lexemes, lexies, wordforming and grammatical structures) can be provided. In our dictionary practice, the abovementioned model is reflected in the methodological procedures as follows: 1. Systemization of repetitive (regular, standardized) phenomena; 2. Prototypicalization of meaning description; 3. Contextualization/encyclopedization of meaning description; 4. Pragmatization of meaning description; 5. Continualized presentation of language phenomena, i.e., introduction of numerous phenomena of transient and indeterminate nature and indicating the existence of a semanticpragmatic and lexicalgrammatical continuum; 6. “Discretization” of combinatorial continuum, i.e., identification and description of entrenched word combinations with naming functions.

Download Full-text

Do we teach the real language?

Dutch Journal of Applied Linguistics ◽

10.1075/dujal.2.2.07mat ◽

2013 ◽

Vol 2 (2) ◽

pp. 224-241

Author(s):

Yevgen Matusevych ◽

Ad Backus ◽

Martin Reynaert

Keyword(s):

Foreign Language ◽

Communication Skills ◽

Corpus Linguistics ◽

Language Use ◽

Native Speakers ◽

Oral Communication ◽

Spoken Language ◽

Frequency Of Occurrence ◽

Real Language ◽

Recurrent Patterns

This article is about the type of language that is offered to learners in textbooks, using the example of Russian. Many modern textbooks of Russian as a foreign language aim at efficient development of oral communication skills. However, some expressions used in the textbooks are not typical for everyday language. We claim that textbooks’ content should be reassessed based on actual language use, following theoretical and methodological models of cognitive and corpus linguistics. We extracted language patterns from three textbooks, and compared them with alternative patterns that carry similar meaning by (1) calculating the frequency of occurrence of each pattern in a corpus of spoken language, and (2) using Russian native speakers’ intuitions about what is more common. The results demonstrated that for 39 to 53 percent of all the recurrent patterns in the textbooks better alternatives could be found. We further investigated the typical shortcomings of the extracted patterns.

Download Full-text

The Common European Framework of Reference for Languages

ITL - International Journal of Applied Linguistics ◽

10.1075/itl.165.1.01hul ◽

2014 ◽

Vol 165 (1) ◽

pp. 3-18 ◽

Cited By ~ 5

Author(s):

Jan H. Hulstijn

Keyword(s):

Second Language ◽

Second Language Acquisition ◽

Language Proficiency ◽

Corpus Linguistics ◽

Second Language Learners ◽

Native Speakers ◽

Theoretical Perspective ◽

Empirical Support ◽

Proficiency Levels ◽

The Common

The Common European Framework of Reference for Languages (CEFR, Council of Europe, 2001) currently functions as an instrument for educational policy and practice. The view of language proficiency on which it is based and the six proficiency levels it defines lack empirical support from language-use data. Several issues need to be investigated collaboratively by researchers working in the fields of first and second language acquisition, corpus linguistics and language assessment. These issues are concerned with (i) the CEFR’s failure to consistently distinguish between levels of language proficiency (static aspect) and language development (dynamic aspect), (ii) with the CEFR’s confounding of levels of language proficiency and intellectual abilities, and (iii) the potential problem of mismatches between second-language learners’ communicative and linguistic competences. Furthermore, from a more theoretical perspective, this paper proposes (iv) to investigate which CEFR proficiency levels are attainable by native speakers and (v) to empirically delineate the lexical, morpho-syntactic and pragmatic knowledge shared by all native speakers (called Basic Language Cognition).

Download Full-text

Discourse Markers in the Academic Writing of Arab Students of English: A Corpus-based Approach

Theory and Practice in Language Studies ◽

10.17507/tpls.1005.10 ◽

2020 ◽

Vol 10 (5) ◽

pp. 569

Author(s):

Abeer Q. Taweel

Keyword(s):

Second Language ◽

Academic Writing ◽

Corpus Linguistics ◽

Native Speakers ◽

Discourse Markers ◽

Detailed Account ◽

General Tendency ◽

L1 Influence ◽

Arab Students ◽

Shed Light

This study aims to shed light on the discourse markers used in the academic writing of Arab students of English as a second language within the framework of corpus linguistics. By so doing, an attempt will be made to examine the use of the discourse marker expressing attitude, sequence, cause and result, addition, and comparing and contrasting. For comparison purposes, similar-sized authentic corpus will be used to examine the learners’ use, overuse, and underuse of the target markers. Moreover, the study will provide a detailed account of the possible reasons contributing to the disparity between the two corpora in terms of the use of the target markers. Results show that learners use more discourse markers than native speakers. While this is a general tendency, it still remains feasible to attribute the disparity between the two corpora to learners L1 influence where some of the overused markers spring out naturally and smoothly as they have rhetorical functions in learners’ native tongue.

Download Full-text

A Detection Method for Phishing Web Page Using DOM-Based Doc2Vec Model

Journal of Computing and Information Technology ◽

10.20532/cit.2020.1004899 ◽

2020 ◽

Vol 28 (1) ◽

pp. 19-31

Author(s):

Jian Feng ◽

Ying Zhang ◽

Yuqiang Qiao

Keyword(s):

Semantic Information ◽

Detection Method ◽

Structural Characteristics ◽

Web Pages ◽

Web Page ◽

Clustering Method ◽

Semantic Clustering ◽

Dom Tree ◽

Structural Semantics ◽

Linguistic Approach

Detecting phishing web pages is a challenging task. The existing detection method for phishing web page based on DOM (Document Object Model) is mainly aiming at obtaining structural characteristics but ignores the overall representation of web pages and the semantic information that HTML tags may have. This paper regards DOMs as a natural language with Doc2Vec model and learns the structural semantics automatically to detect phishing web pages. Firstly, the DOM structure of the obtained web page is parsed to construct the DOM tree, then the Doc2Vec model is used to vectorize the DOM tree, and to measure the semantic similarity in web pages by the distance between different DOM vectors. Finally, the hierarchical clustering method is used to implement clustering of web pages. Experiments show that the method proposed in the paper achieves higher recall and precision for phishing classification, compared to DOM-based structural clustering method and TF-IDF-based semantic clustering method. The result shows that using Paragraph Vector is effective on DOM in a linguistic approach.

Download Full-text

Development dynamics and cognitive-semantic parameters of English ditransitive construction: verification from the perspective of corpus linguistics

RESEARCH RESULT Theoretical and Applied Linguistics ◽

10.18413/2313-8912-2021-7-4-0-7 ◽

2021 ◽

Vol 7 (4) ◽

Keyword(s):

Corpus Linguistics ◽

Cognitive Semantic ◽

Semantic Parameters

Download Full-text

A Corpus-Based Comparative Study of Malaysian ESL Learners and Native English Speakers in Compliment Patterns

International Journal of Linguistics ◽

10.5296/ijl.v9i5.12070 ◽

2017 ◽

Vol 9 (5) ◽

pp. 232

Author(s):

Paramasivam Muthusamy ◽

Atieh Farashaiyan

Keyword(s):

Language Learners ◽

Corpus Linguistics ◽

Native Speakers ◽

Native English Speakers ◽

Data Driven ◽

English Speakers ◽

Esl Students ◽

Authentic Language ◽

Learner Language ◽

Esl Learners

Even though many language learners are concerned with to master target proficiency, owing to years of meticulous studies, immersion in TL environments, access to multimedia and educational amenities, in addition to availability of affluent sources or merely thanks to God-given language talents, many will seldom take off from conspicuous learner-language and might never produce authentic language either in speech or in writing. In recent years, however, with corpus linguistics gaining currency in academia, a new light has begun to glimmer at the end of the tunnel that corpus-based materials and data-driven language instructions can actively and consciously engage learners and acquaint them with what authentic language is rather than what the text books prescribe it to be. Already, a growing body of research has been dedicated to data-driven learning across the world to survey the effectiveness of incorporating corpora in ELT. As such, the purpose of this research is to investigate the patterns of compliments in writings of the Malay ESL students and compare the findings with native English speakers. The results showed that the Malay ESL learners used a rather different number of syntactic patterns compared to the English native speakers and their frequency of patterns outgrew those of the natives.

Download Full-text

Formulaic language in native speakers: Triangulating psycholinguistics, corpus linguistics, and education

Corpus Linguistics and Linguistic Theory ◽

10.1515/cllt.2009.003 ◽

2009 ◽

Vol 5 (1) ◽

Cited By ~ 26

Author(s):

Nick C. Ellis ◽

Rita Simpson-Vlach

Keyword(s):

Corpus Linguistics ◽

Native Speakers ◽

Formulaic Language

Download Full-text

Corpus-based approach to forming communication skills in the use of idioms

Revista EntreLinguas ◽

10.29051/el.v7iesp.3.15702 ◽

2021 ◽

pp. e021044

Author(s):

Liya F. Shangaraeva ◽

Luiza R. Zakirova ◽

Natalya A. Deputatova ◽

Elina K. Kuznetsova

Keyword(s):

Foreign Language ◽

Language Learning ◽

Communication Skills ◽

Corpus Linguistics ◽

Native Speakers ◽

Opportunity To Learn ◽

Teaching Techniques ◽

The Past ◽

Promising Tool ◽

National Corpus

The paper presents a corpus-based approach to forming communication skills which has been widely accepted nowadays. The methodological apparatus of corpus linguistics is a promising tool for language learning. The purpose of the present research is to study the potential of the Tatar National Corpus in forming communication skills in the use of Tatar idioms. Corpus-based approach has many applications in language learning from extending teaching techniques to arousing learners’ curiosity and improving communication skills. Traditionally, idioms are considered to be fixed expressions, which have a meaning that is not immediately obvious from looking at the meanings of the parts. It has become evident over the past decades that all sorts of creative modifications of idioms are quite frequent and can be varied. Most idioms are not totally opaque. Thus, they are open to the corpus-based approach. Moreover, idioms are typically based on metaphors, and metaphors as mental images are easily modifiable. The native speakers adapt them, combine them and can change parts of them. Undoubtedly, a corpus presents an opportunity to learn the authenticity of the idioms, used in reality without somebody’s selection or previous interpretation. Learning a foreign language on the basis of corpus data allows students to analyze lexical, grammatical and syntactical variations of idioms, to comprehend their semantics, and explore new variants of idioms, unrecorded in dictionaries yet.

Download Full-text

ICONIC ENCODING OF CORPOREALITY IN MODERN ENGLISH

Germanic Philology Journal of Yuriy Fedkovych Chernivtsi National University ◽

10.31861/gph2021.833.38-47 ◽

2021 ◽

pp. 38-47

Author(s):

Anna Zaslonkina

Keyword(s):

Key Words ◽

Native Speakers ◽

Word Formation ◽

Semantic Context ◽

Cognitive Semantic ◽

Semantic Shift ◽

Derivatives Of ◽

Basic Level

The literature on the unity of emotional, volitional, intellectual, and physical states within the holistic cognitive-semantic context of corporeality shows a variety of approaches. The originality of our solution lies in the fact that the object of the prеsеnt study is the domain of Grеimassian sеmiotic thеory (including the so-called thymic category) that has been further developed: Taking into consideration that people use basic-level concepts regularly, we hypothesised that thymic category members can be selected, given that these category members are yielding information on the semantics of perception in the elementary concepts of Modern English. The data obtained suggests that the information on the thymic category is conveyed by the conceptual triad SENSE : FEELING : EMOTION. Furthermore, cognitive and onomasiological features of the basic-level concepts have been analysed. Thus, the previous research has been extended by clarifying the semiotic structure of the thymic category in Modern English and presenting the results on the distribution of cognitive-onomasiological capability within the framework of derivatives of the verbalized conceptual triad SENSE : FEELING : EMOTION. The iconic character of this conceptual complex is one of the means of naïve worldview reconstruction in word-formation. Notably, the iconic aspect is marked by cognitive-semantic shift of the thymic-neuter indices of the conceptual thymic information to its thymic-extremal analogues. This could be a result of the fact that the shift of a given type is based on the correlation of evaluation of the sign-motivator and expressive-gnoseological functions of perception performed by the native speakers. The reconstruction of the thymic composites domain reveals that the motivators of sensory type prevail. Besides, the cognitive-semantic shift was detected: the motivators are represented by the derivatives of the verbalized concept SENSE, while the concept EMOTION is lacunary, a fortiori the composite words with feel and sensation constituents are semantically more mobile and expressive. Key words: concept, corporeality, iconicity, semiosis, sign.

Download Full-text