scholarly journals Evaluating Deep Learning Methods for Word Segmentation of Scripta Continua Texts in Old French and Latin

2020 ◽  
Vol 2020 (Towards a Digital Ecosystem:...) ◽  
Author(s):  
Thibault Clérice

International audience Tokenization of modern and old Western European languages seems to be fairly simple, as it stands on the presence mostly of markers such as spaces and punctuation. However, when dealing with old sources like manuscripts written in scripta continua, antiquity epigraphy or Middle Age manuscripts, (1) such markers are mostly absent, (2) spelling variation and rich morphology make dictionary based approaches difficult. Applying convolutional encoding to characters followed by linear categorization to word-boundary or in-word-sequence is shown to be effective at tokenizing such inputs. Additionally, the software is released with a simple interface for tokenizing a corpus or generating a training set.

Revue Romane ◽  
2013 ◽  
Vol 48 (2) ◽  
pp. 284-306 ◽  
Author(s):  
Mikołaj Nkollo

The paper traces evolutionary pathways of various methods of expressing reciprocity found in Old French (12th century) and Old Portuguese (13–15th century) texts. Most of grammaticalization patterns responsible for the emergence of reciprocal markers and documented in human languages are demonstrated to have been active in Old Romance, too. Medieval reciprocal exponents are first compared with their Latin ancestors to show what Romance innovations consisted in and what formal means were retained throughout. Then, an account is given of how the two languages were different from each other. Finally, two suggestions are made on how current grammaticalization theory can be modified so as to grasp more efficiently the origin of reciprocal markers found in European languages.


2021 ◽  
pp. 1-13
Author(s):  
Jiawen Shi ◽  
Hong Li ◽  
Chiyu Wang ◽  
Zhicheng Pang ◽  
Jiale Zhou

Short text matching is one of the fundamental technologies in natural language processing. In previous studies, most of the text matching networks are initially designed for English text. The common approach to applying them to Chinese is segmenting each sentence into words, and then taking these words as input. However, this method often results in word segmentation errors. Chinese short text matching faces the challenges of constructing effective features and understanding the semantic relationship between two sentences. In this work, we propose a novel lexicon-based pseudo-siamese model (CL2 N), which can fully mine the information expressed in Chinese text. Instead of utilizing a character-sequence or a single word-sequence, CL2 N augments the text representation with multi-granularity information in characters and lexicons. Additionally, it integrates sentence-level features through single-sentence features as well as interactive features. Experimental studies on two Chinese text matching datasets show that our model has better performance than the state-of-the-art short text matching models, and the proposed method can solve the error propagation problem of Chinese word segmentation. Particularly, the incorporation of single-sentence features and interactive features allows the network to capture the contextual semantics and co-attentive lexical information, which contributes to our best result.


2007 ◽  
Vol Volume 6, april 2007, joint... ◽  
Author(s):  
Oleksiy Mazhelis

International audience One-class classifiers employing for training only the data from one class are justified when the data from other classes is difficult to obtain. In particular, their use is justified in mobile-masquerader detection, where user characteristics are classified as belonging to the legitimate user class or to the impostor class, and where collecting the data originated from impostors is problematic. This paper systematically reviews various one-class classification methods, and analyses their suitability in the context of mobile-masquerader detection. For each classification method, its sensitivity to the errors in the training set, computational requirements, and other characteristics are considered. After that, for each category of features used in masquerader detection, suitable classifiers are identified.


2019 ◽  
Vol 33 ◽  
pp. 210-250
Author(s):  
Brigitte L. M. Bauer

Abstract This study investigates the potential influence of Latin syntax on the development of analytic verb forms in a well-defined and concrete instance of language contact, the Old French translation of a Latin Gospel. The data show that the formation of verb forms in the Old French was remarkably independent from the Latin original. While the Old French text closely follows the narrative of the Latin Gospel, its usage of compound verb forms is not dictated by the source text, as reflected e.g. in the quasi-omnipresence of the relative sequence finite verb + pp, which – with a few exceptions – all trace back to a different structure in the Latin text. Engels (VerenigdeStaten) Another important innovative difference in the Old French is the widespread use of aveir ‘have’ as an auxiliary, unknown in Latin. The article examines in detail the relation between the verbal forms in the two texts, showing that the translation is in line with of grammar. The usage of compound verb forms in the Old French Gospel is therefore autonomous rather than contact stimulated, let alone contact induced. The results challenge Blatt’s (1957) assumption identifying compound verb forms as a shared feature in European languages that should be ascribed to Latin influence.


2010 ◽  
Vol 278 (1713) ◽  
pp. 1794-1803 ◽  
Author(s):  
Shijulal Nelson-Sathi ◽  
Johann-Mattis List ◽  
Hans Geisler ◽  
Heiner Fangerau ◽  
Russell D. Gray ◽  
...  

Language evolution is traditionally described in terms of family trees with ancestral languages splitting into descendent languages. However, it has long been recognized that language evolution also entails horizontal components, most commonly through lexical borrowing. For example, the English language was heavily influenced by Old Norse and Old French; eight per cent of its basic vocabulary is borrowed. Borrowing is a distinctly non-tree-like process—akin to horizontal gene transfer in genome evolution—that cannot be recovered by phylogenetic trees. Here, we infer the frequency of hidden borrowing among 2346 cognates (etymologically related words) of basic vocabulary distributed across 84 Indo-European languages. The dataset includes 124 (5%) known borrowings. Applying the uniformitarian principle to inventory dynamics in past and present basic vocabularies, we find that 1373 (61%) of the cognates have been affected by borrowing during their history. Our approach correctly identified 117 (94%) known borrowings. Reconstructed phylogenetic networks that capture both vertical and horizontal components of evolutionary history reveal that, on average, eight per cent of the words of basic vocabulary in each Indo-European language were involved in borrowing during evolution. Basic vocabulary is often assumed to be relatively resistant to borrowing. Our results indicate that the impact of borrowing is far more widespread than previously thought.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Marianne Van Remoortel ◽  
Julie M. Birkholz ◽  
Maria Alesina ◽  
Christina Bezari ◽  
Charlotte D'Eer ◽  
...  

This special issue of the Journal of European Periodical Studies contains a selection of eleven papers presented at the 2019 Women Editors in Europe conference at Ghent University. It explores women’s editorship in a wide range of national and transnational contexts in five full-length articles by Judit Acsády, Lola Alvarez-Morales and Amelia Sanz-Cabrerizo, Aisha Bazlamit, Andrea Penso, and Joanne Shattock, and five shorter pieces by Petra Bozsoki, Zsolt Mészáros, Marie Nedregotten Sørbø, Zsuzsa Török, and Alicja Walczyna, headed by a provocative essay by the conference keynote speaker, Fionnuala Dillane. Spanning three centuries and seven European languages, the special issue not only offers insight into the breadth and diversity of women’s editorial work for the press; it also draws together different national and language traditions in periodical scholarship and makes them accessible to an international audience.


Author(s):  
Ekawat Chaowicharat ◽  
Kanlaya Naruedomkul

A number of word segmentation algorithms have been offered in the past; however, there is still room for improvement. Co-occurrence-Based Error Correction (CBEC), the proposed approach in this chapter, is a novel Thai word segmentation approach that was designed to provide accurate segmentation results based on context and purpose. CBEC quickly segments the input string using any available algorithm; maximal matching was used in the experiment. Next, CBEC checks its segmentation output against an error risk data bank to determine if there is any error risk. The error risk data bank is developed based on a training corpus. The current version of the error risk bank was based on the training corpus available at BEST 2009. Then, CBEC re-segments the input string using the co-occurrence score of the word sequence to ensure the accuracy of the segmentation result.


2020 ◽  
Vol Special issue on... ◽  
Author(s):  
Martti Mäkinen

International audience Automated approaches to identifying authorship of a text have become commonplace in the stylometric studies. The current article applies an unsupervised stylometric approach on Middle English documents using the script Stylo in R, in an attempt to distinguish between texts from different dialectal areas. The approach is based on the distribution of character 3-grams generated from the texts of the corpus of Middle English Local Documents (MELD). The article adopts the middle ground in the study of Middle English spelling variation, between the concept of relational linguistic space and the real linguistic continuum of medieval England. Stylo can distinguish between Middle English dialects by using the less frequent character 3-grams.


Author(s):  
Veronika Burmeister ◽  
R. Swaminathan

Porphyria cutanea tarda (PCT) is a disorder of porphyrin metabolism which occurs most often during middle age. The disease is characterized by excessive production of uroporphyrin which causes photosensitivity and skin eruptions on hands and arms, due to minor trauma and exposure to sunlight. The pathology of the blister is well known, being subepidermal with epidermodermal separation, it is not always absolutely clear, whether the basal lamina is attached to the epidermis or the dermis. The purpose of our investigation was to study the attachment of the basement membrane in the blister by comparing scanning with transmission electron microscopy.


2020 ◽  
Vol 29 (3) ◽  
pp. 419-428
Author(s):  
Jasleen Singh ◽  
Karen A. Doherty

Purpose The aim of the study was to assess how the use of a mild-gain hearing aid can affect hearing handicap, motivation, and attitudes toward hearing aids for middle-age, normal-hearing adults who do and do not self-report trouble hearing in background noise. Method A total of 20 participants (45–60 years of age) with clinically normal-hearing thresholds (< 25 dB HL) were enrolled in this study. Ten self-reported difficulty hearing in background noise, and 10 did not self-report difficulty hearing in background noise. All participants were fit with mild-gain hearing aids, bilaterally, and were asked to wear them for 2 weeks. Hearing handicap, attitudes toward hearing aids and hearing loss, and motivation to address hearing problems were evaluated before and after participants wore the hearing aids. Participants were also asked if they would consider purchasing a hearing aid before and after 2 weeks of hearing aid use. Results After wearing the hearing aids for 2 weeks, hearing handicap scores decreased for the participants who self-reported difficulty hearing in background noise. No changes in hearing handicap scores were observed for the participants who did not self-report trouble hearing in background noise. The participants who self-reported difficulty hearing in background noise also reported greater personal distress from their hearing problems, were more motivated to address their hearing problems, and had higher levels of hearing handicap compared to the participants who did not self-report trouble hearing in background noise. Only 20% (2/10) of the participants who self-reported trouble hearing in background noise reported that they would consider purchasing a hearing aid after 2 weeks of hearing aid use. Conclusions The use of mild-gain hearing aids has the potential to reduce hearing handicap for normal-hearing, middle-age adults who self-report difficulty hearing in background noise. However, this may not be the most appropriate treatment option for their current hearing problems given that only 20% of these participants would consider purchasing a hearing aid after wearing hearing aids for 2 weeks.


Sign in / Sign up

Export Citation Format

Share Document