semantic tagging
Recently Published Documents


TOTAL DOCUMENTS

101
(FIVE YEARS 19)

H-INDEX

9
(FIVE YEARS 1)

ICAME Journal ◽  
2021 ◽  
Vol 45 (1) ◽  
pp. 37-86
Author(s):  
Jonathan Culpeper ◽  
Andrew Hardie ◽  
Jane Demmen ◽  
Jennifer Hughes ◽  
Matt Timperley

Abstract This article explores challenges in the corpus linguistic analysis of Shakespeare’s language, and Early Modern English more generally, with particular focus on elaborating possible solutions and the benefits they bring. An account of work that took place within the Encyclopedia of Shakespeare’s Language Project (2016–2019) is given, which discusses the development of the project’s data resources, specifically, the Enhanced Shakespearean Corpus. Topics covered include the composition of the corpus and its subcomponents; the structure of the XML markup; the design of the extensive character metadata; and the word-level corpus annotation, including spelling regularisation, part-of-speech tagging, lemmatisation and semantic tagging. The challenges that arise from each of these undertakings are not exclusive to a corpus-based treatment of Shakespeare’s plays but it is in the context of Shakespeare’s language that they are so severe as to seem almost insurmountable. The solutions developed for the Enhanced Shakespearean Corpus – often combining automated manipulation with manual interventions, and always principled – offer a way through.


2021 ◽  
Author(s):  
Wenxi Li ◽  
Yiyang Hou ◽  
Yajie Ye ◽  
Li Liang ◽  
Weiwei Sun

2020 ◽  
Vol 4 (6) ◽  
pp. 85-91
Author(s):  
Dildora Bahodirovna Akhmedova ◽  

Background. Semantic markup is an issue that has been thoroughly studied by experts. If the first generation of language corpora was a collection of electronic texts, then a tool with a query-responsive interface was later formed into literal corporations with linguistic and extralinguistic markings. Linguistically marked corpuses were initially only morphological, then morpho-syntactic, and in recent years the perfect form of linguistic marking - the corpus with morphological, syntactic and semantic markings - has undergone a stage of development. The introduction of semantic markup into the case was initially based on theory, while semantic marking problems were explored. Yu.D. Apresyan, I.M. Boguslavskiy, B.L. Iomdin, E.V. Biryaltsev, A.M. Elizarov, N.G. Jiltsov, V.V. Ivanov, O.A. Nevzorova, V.D. Solovev, I.S. Kononenko, E.A. Sidorova, The research of E.I. Yakovchuk, E.V. Rakhilina, G.I. Kustova, O.N. Lyashevskaya, T.I. Reznikova, O.Yu. Shemanaeva, A.A. Kretov can be included in such works


Author(s):  
Kimmo Kettunen

This study continues a work in progress for implementing a full-text lexical semantic tagger for Finnish, FiST. The tagger is based on a 46,226 lexeme semantic lexicon of Finnish that was published in 2016 [1]. Kettunen [2], [3] describes the basic working version of FiST. FiST is based on freely available components: the first implementation uses Omorfi and FinnPos for morphological analysis and disambiguation of Finnish words. The current paper describes work with compound splitting for semantic tagging and its effects on the lexical coverage of the tagger. We try out two different approaches to morphological analysis and disambiguation of words for an improved version of FiST, FiSTComp: FinnPos [4], and Turku Dependency Parser [5], [6], UD1. Both these tools disambiguate morphological interpretations of words and provide boundary markings for compounds, but details and granularity of constituent decomposition vary. Our results with two-, three and four-part compounds show that analysis of compounds through their constituents with UD1 may improve the lexical coverage of the tagger with about 6.6 % units at best. Although we are able to proceed in basic problems of compound splitting, the results are still initial and further work is needed as compounds are a complex phenomenon.


Sign in / Sign up

Export Citation Format

Share Document