word token
Recently Published Documents


TOTAL DOCUMENTS

8
(FIVE YEARS 1)

H-INDEX

1
(FIVE YEARS 0)

2021 ◽  
Author(s):  
José Antonio Sánchez Fajardo

This paper seeks to explore the pragmatic functions of the Spanish-induced loanwords, or hispanicisms, used in the novel Death in the Afternoon by Hemingway. These borrowed words have been manually extracted and through the software kit AntConc, each occurrence or word token was examined to determine the prevalent pragmatic motivation in each text string: ‘ideational’, ‘expressive’ or 'textual’. Findings suggest that unadapted borrowings are most widespread, and the vast majority of them correspond to ideationally or referentially motivated loanwords. The assimilation of new referents (i.e. nonexistent in English cultural frames), particularly those related with bullfighting jargon, is linked to the general stylistics of travelogues. Expressive and interpersonal motivations are less frequent but they might reflect the vernacularization of travel writing and the extended use of euphemisms through lexical borrowing. Alternatively, textual motivations are regularly found through the use of synomyms, co-hyponyms and paraphrases, which are intended to ensure text clarity and coherence.



2020 ◽  
pp. 7-21
Author(s):  
Svitlana YERMOLENKO

The ambiguity of the token word is evidenced by the explanatory dictionaries of the Ukrainian language, as well as the linguistic and artistic discourse of the XIX – XXI centuries. In the explanatory dictionary of the Ukrainian language there is an unmotivated separation of lexical and semantic variants, which are actually shades of one of the meanings of the word. Instead, the dictionary does not capture the lexical-semantic variant “instrument of linguistic creativity” actualized in artistic discourse. Compared with the dictionary interpretation, poetic language more widely represents lexical and semantic variants of the studied token: as units of language structure (definition of a linguistic term), the main means of national identity, manifestation of the spiritual life of the nation, instrument of language creativity. The main attention is focused on the functioning of the word in the lexical and associative relations of the word, on its symbolization and the function of linguistic and aesthetic signs of Ukrainian culture. Such signs are recorded in the works of T. Shevchenko, P. Kulish, Lesya Ukrainka, Oleksandr Oles, M. Rylskyi, Lina Kostenko, M. Vinhranovskyi. The semantic-associative connections of the word token in texts of different times reveal the specifics of civic and lyrical motives of the author’s linguistic thinking. Poets turn to the word, talk to it, convey in different modal assessments and their own emotional state, and symbolic semantics of the token word aestheticized by the accumulated experience of mankind. On the example of poetic texts of the XIX – XXI centuries. the increase of semantics of anthropocentrism in signs of a polysemous token word is traced. The echo of generations is revealed on verbalized and preverbal structures of the lexical-semantic variant “word as a tool of creativity”.



2019 ◽  
Vol 24 (1) ◽  
pp. 173-187
Author(s):  
Roland Mittmann ◽  
Ralf Plate

Abstract Working with Old High German and Old Saxon texts has become rare in German language and literature studies. Nowadays, the brief introductions to historical linguistics and philology do not suffice to enable students to explore the oldest German texts in significant depth. A new tool could at least help move closer to this dept of textual exploration: an on-line publication of all Old High German and Old Saxon texts with a morphological word-by-word annotation. The data has been collected by the research project ‘Old German Reference Corpus’ (‘Referenzkorpus Altdeutsch’ / ReA) and can already be used for complex database queries. Within the framework of the ‘eHumanities Centre for Historical Lexicography’ (eHumanities-Zentrum für Historische Lexikographie / ZHistLex), it has been converted into a clearly arranged format, focusing on the information needed for comprehending the texts: the ‘Lesekorpus Altdeutsch’ (LeA). This tool can of course not replace the work with text editions, translations and commentaries, but will help pave the way from the first insight into the language to a profound understanding. Additionally, a supplemental both-way linking between every word token and the corresponding entry in the on-line dictionaries of historical German is being prepared.



2018 ◽  
Vol 23 (4) ◽  
pp. 494-508
Author(s):  
Agnes Tellings ◽  
Nelleke Oostdijk ◽  
Iris Monster ◽  
Franc Grootjen ◽  
Antal van den Bosch

Abstract This short paper introduces BasiScript, a 9-million-word corpus of contemporary Dutch texts written by primary school children. The data were collected over three years with 17,216 children contributing texts throughout this period. Each word token in the corpus is annotated with the correct orthographical form, the associated lemma and the part of speech. The most frequent polysemous words have been annotated for word meaning, while all words in the lexicon that was derived from the BasiScript corpus have been annotated for corpus and subcorpora frequency, dispersion, length, family size, family frequency, orthographic neighborhood size, and orthographic neighborhood frequency. Images of the texts are available to researchers. The present article describes the corpus and presents a comparison of BasiScript with BasiLex (a Dutch corpus with texts primary school children are likely to read, completed in 2015) by means of frequency profiling.



Author(s):  
Tomonari Masada

This paper introduces a new approach for large-scale unsupervised segmentation of bibliographic elements. The problem is segmenting a citation given as an untagged word token sequence into subsequences so that each subsequence corresponds to a different bibliographic element (e.g., authors, paper title, journal name, publication year, etc.). The same bibliographic element should be referred to by contiguous word tokens. This constraint is called contiguity constraint. The authors meet this constraint by using generalized Mallows models, effectively applied to document structure learning by Chen, Branavan, Barzilay, and Karger (2009). However, the method works for this problem only after modification. Therefore, the author proposes strategies to make the method applicable to this problem.



2007 ◽  
Vol 33 (4) ◽  
pp. 553-590 ◽  
Author(s):  
Diana McCarthy ◽  
Rob Koeling ◽  
Julie Weeds ◽  
John Carroll

There has been a great deal of recent research into word sense disambiguation, particularly since the inception of the Senseval evaluation exercises. Because a word often has more than one meaning, resolving word sense ambiguity could benefit applications that need some level of semantic interpretation of language input. A major problem is that the accuracy of word sense disambiguation systems is strongly dependent on the quantity of manually sense-tagged data available, and even the best systems, when tagging every word token in a document, perform little better than a simple heuristic that guesses the first, or predominant, sense of a word in all contexts. The success of this heuristic is due to the skewed nature of word sense distributions. Data for the heuristic can come from either dictionaries or a sample of sense-tagged data. However, there is a limited supply of the latter, and the sense distributions and predominant sense of a word can depend on the domain or source of a document. (The first sense of “star” for example would be different in the popular press and scientific journals). In this article, we expand on a previously proposed method for determining the predominant sense of a word automatically from raw text. We look at a number of different data sources and parameterizations of the method, using evaluation results and error analyses to identify where the method performs well and also where it does not. In particular, we find that the method does not work as well for verbs and adverbs as nouns and adjectives, but produces more accurate predominant sense information than the widely used SemCor corpus for nouns with low coverage in that corpus. We further show that the method is able to adapt successfully to domains when using domain specific corpora as input and where the input can either be hand-labeled for domain or automatically classified.



Nordlyd ◽  
10.7557/12.6 ◽  
2004 ◽  
Vol 31 (2) ◽  
Author(s):  
Hannele Nicholson ◽  
Andreas Hilmo Teig

East Norwegian employs pitch accent contours in order to make lexical distinctions. This paper researches listeners' ability to make lexical distinctions in the absence of f0 (ie. whispered speech) as the listener attempts to determine which pitch accent word token best fits into a whispered ambiguous utterance in spoken Norwegian. The results confirm that local syntactic context alone is not a reliable cue to assist in lexical selection and concur with Fintoft (1970) in suggesting that listeners utilise a separate prosodic cue, possibly syllable duration or intensity, to make the pitch accent distinction in whispered speech.



1989 ◽  
Vol 42 (4) ◽  
Author(s):  
Christina Schäffner ◽  
Eberhard Fischer ◽  
Beate Herting

SummaryResearch on political vocabulary has a long tradition in GDR linguistics. Early works, however, could not provide a convincing explanation, due mainly to the stage of development of lexicon theory. From a textual and discursive perspective two topical questions are dismissed: (i) the question of „ideological naming polarities“ in the present time, and (ii) the role of intertextuality for understanding the specific use of a word. Consequences for describing political word meanings are presented. They are taken as temporary semantic consolidations. Text word-token, text word-type and lexicon-word are distinguished as three types of meaning, i.e. as differently enriched representations of knowledge which are closely interrelated.



Sign in / Sign up

Export Citation Format

Share Document