scholarly journals Multilingualism in Greater Poland court records (1386–1448): tagging discourse boundaries and code-switching

Corpora ◽  
2020 ◽  
Vol 15 (3) ◽  
pp. 273-290
Author(s):  
Matylda Włodarczyk ◽  
Joanna Kopaczyk ◽  
Michał Kozak

This paper introduces the Electronic Repository of Greater Poland Oaths, eROThA (1386–1446), a digitisation project of a diplomatic edition of mediaeval land court oaths recorded in Latin and Old Polish, resulting in a small, lightly tagged specialised bilingual corpus. We present the background, aims, design and methodology of the project. We also discuss the problems and limitations entrenched in turning a printed diplomatic edition into a machine-readable diplomatic edition equipped with a new interpretative layer that is sensitive to the switches between Latin and Old Polish. In addition to the automatic annotation of code-switched items on the basis of typographic characteristics of the printed edition, flexible coding of recurrent language and discourse boundary phenomena has been introduced manually to account for linguistically ambiguous or neutral forms. The project offers a fully multilingual corpus, as well as customised Polish-only and Latin-only datasets, and enables filtered metadata searches in the online front-end. Overall, the report presents a methodology for constructing multilingual corpora in the context of legal cultures in medieval Central Europe that may be extrapolated to datasets originating in other periods and regions.

Author(s):  
Xiaoqing Wu ◽  
Marjan Mernik ◽  
Barrett R. Bryant ◽  
Jeff Gray

Unlike natural languages, programming languages are strictly stylized entities created to facilitate human communication with computers. In order to make programming languages recognizable by computers, one of the key challenges is to describe and implement language syntax and semantics such that the program can be translated into machine-readable code. This process is normally considered as the front-end of a compiler, which is mainly related to the programming language, but not the target machine. This article will address the most important aspects in building a compiler front-end; that is, syntax and semantic analysis, including related theories, technologies and tools, as well as existing problems and future trends. As the main focus, formal syntax and semantic specifications will be discussed in detail. The article provides the reader with a high-level overview of the language implementation process, as well as some commonly used terms and development practices.


2017 ◽  
Vol 01 (01) ◽  
pp. 1630020 ◽  
Author(s):  
Pierpaolo Basile ◽  
Annalina Caputo

Named Entity Linking (NEL) is the task of semantically annotating entity mentions in a portion of text with links to a knowledge base. The automatic annotation, which requires the recognition and disambiguation of the entity mention, usually exploits contextual clues like the context of usage and the coherence with respect to other entities. In Twitter, the limits of 140 characters originates very short and noisy text messages that pose new challenges to the entity linking task. We propose an overview of NEL methods focusing on approaches specifically developed to deal with short messages, like tweets. NEL is a fundamental task for the extraction and annotation of concepts in tweets, which is necessary for making the Twitter’s huge amount of interconnected user-generated contents machine readable and enable the intelligent information access.


Author(s):  
Dietmar Willoweit

Abstract Crime and Ostracism in the medieval court records of Kulm. As a High Court (Oberhof) of the State of the Teutonic Order in Prussia, the town of Kulm on the Vistula was a center of Magdeburgian law in eastern Central Europe. The article discusses some aspects of criminal practice, which are provided by the recently edited court records of said city from the 14th through the 16th century. Primarily, they show the ostracisms (Verfestungen) of offenders on the run. Because they could be apprehended by anyone - not just the accuser -, the offenders were forced to either conclude a settlement with the victim or hand themselves in to the court. The court records give no information about offenders who had been seized and judged immediately after committing the crime (handhafte Tat). Therefore, albeit the sources contain interesting facts on the history of crime, they don’t allow a complete crime statistics of the town.


1989 ◽  
Vol 15 (4-5) ◽  
pp. 261-267 ◽  
Author(s):  
Alan Griffiths

The author has been preparing machine readable data on Sardinia collected from a wide variety of sources for over eight years. The data, scattered between over 300 ASCII data files, cover topics including medieval legislation, demographic statis tics, bibliographic references and environmental indicators. This paper examines the problem of integrating data where the required structures push the system designer towards using a range of software packages. As the bulk of the database is static a front end solution is proposed which accesses the packages using a single inverted file approach.


Author(s):  
Penelope Gardner-Chloros
Keyword(s):  

2018 ◽  
Vol 68 (2) ◽  
pp. 498-516
Author(s):  
Neil O'Sullivan

Of the hundreds of Greek common nouns and adjectives preserved in our MSS of Cicero, about three dozen are found written in the Latin alphabet as well as in the Greek. So we find, alongside συμπάθεια, also sympathia, and ἱστορικός as well as historicus. This sort of variation has been termed alphabet-switching; it has received little attention in connection with Cicero, even though it is relevant to subjects of current interest such as his bilingualism and the role of code-switching and loanwords in his works. Rather than addressing these issues directly, this discussion sets out information about the way in which the words are written in our surviving MSS of Cicero and takes further some recent work on the presentation of Greek words in Latin texts. It argues that, for the most part, coherent patterns and explanations can be found in the alphabetic choices exhibited by them, or at least by the earliest of them when there is conflict in the paradosis, and that this coherence is evidence for a generally reliable transmission of Cicero's original choices. While a lack of coherence might indicate unreliable transmission, or even an indifference on Cicero's part, a consistent pattern can only really be explained as an accurate record of coherent alphabet choice made by Cicero when writing Greek words.


2009 ◽  
Author(s):  
Katherine J. Midgley ◽  
Kaitlyn A. Litcofsky ◽  
Tali Ditman-Brunye ◽  
Phillip J. Holcomb

Sign in / Sign up

Export Citation Format

Share Document