Mesure de la productivité morphologique des créoles : au-delà des méthodes quantitatives

Author(s):  
Anne-Marie Brousseau

AbstractMost recent measures of morphological productivity are reliable only if they are based on a large corpus of the language. This article presents a detailed demonstration of a method for establishing an inventory of productive affixes in a language for which a large corpus is not available. This method evaluates the productivity of an affix first and foremost on the basis of its threshold of profitability (the number of different words derived via the affix) in correlation with other diagnostics to bolster reliability. These other diagnostics are the semantic and phonological transparency of derived words and the decomposability of such words. The application of the method is illustrated step-by-step with data from St. Lucian.

2004 ◽  
Vol 143-144 ◽  
pp. 109-119
Author(s):  
Giao Quynh Tran

Abstract Inter language pragmatics research has spanned a number of different areas in second language acquisition and pragmatics. In the large corpus of interlanguage pragmatics studies, basic terms such as “interlanguage pragmatics”, “speech acts” and “pragmatic transfer” have been referred to more often than not. But rarely have we stopped to re-evaluate the applicability and appropriateness of these terms. This paper aims to properly interpret or redefine their meanings and to propose more appropriate terms where possible.


Author(s):  
Ryan Cotterell ◽  
Hinrich Schütze

Much like sentences are composed of words, words themselves are composed of smaller units. For example, the English word questionably can be analyzed as question+ able+ ly. However, this structural decomposition of the word does not directly give us a semantic representation of the word’s meaning. Since morphology obeys the principle of compositionality, the semantics of the word can be systematically derived from the meaning of its parts. In this work, we propose a novel probabilistic model of word formation that captures both the analysis of a word w into its constituent segments and the synthesis of the meaning of w from the meanings of those segments. Our model jointly learns to segment words into morphemes and compose distributional semantic vectors of those morphemes. We experiment with the model on English CELEX data and German DErivBase (Zeller et al., 2013) data. We show that jointly modeling semantics increases both segmentation accuracy and morpheme F1 by between 3% and 5%. Additionally, we investigate different models of vector composition, showing that recurrent neural networks yield an improvement over simple additive models. Finally, we study the degree to which the representations correspond to a linguist’s notion of morphological productivity.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Weiwei Gu ◽  
Aditya Tandon ◽  
Yong-Yeol Ahn ◽  
Filippo Radicchi

AbstractNetwork embedding is a general-purpose machine learning technique that encodes network structure in vector spaces with tunable dimension. Choosing an appropriate embedding dimension – small enough to be efficient and large enough to be effective – is challenging but necessary to generate embeddings applicable to a multitude of tasks. Existing strategies for the selection of the embedding dimension rely on performance maximization in downstream tasks. Here, we propose a principled method such that all structural information of a network is parsimoniously encoded. The method is validated on various embedding algorithms and a large corpus of real-world networks. The embedding dimension selected by our method in real-world networks suggest that efficient encoding in low-dimensional spaces is usually possible.


2020 ◽  
Vol 8 ◽  
pp. 199-214
Author(s):  
Xi (Leslie) Chen ◽  
Sarah Ita Levitan ◽  
Michelle Levine ◽  
Marko Mandic ◽  
Julia Hirschberg

Humans rarely perform better than chance at lie detection. To better understand human perception of deception, we created a game framework, LieCatcher, to collect ratings of perceived deception using a large corpus of deceptive and truthful interviews. We analyzed the acoustic-prosodic and linguistic characteristics of language trusted and mistrusted by raters and compared these to characteristics of actual truthful and deceptive language to understand how perception aligns with reality. With this data we built classifiers to automatically distinguish trusted from mistrusted speech, achieving an F1 of 66.1%. We next evaluated whether the strategies raters said they used to discriminate between truthful and deceptive responses were in fact useful. Our results show that, although several prosodic and lexical features were consistently perceived as trustworthy, they were not reliable cues. Also, the strategies that judges reported using in deception detection were not helpful for the task. Our work sheds light on the nature of trusted language and provides insight into the challenging problem of human deception detection.


2013 ◽  
Vol 3 (1) ◽  
pp. 77-99 ◽  
Author(s):  
Aletta G. Dorst ◽  
W.Gudrun Reijnierse ◽  
Gemma Venhuizen

The manual annotation of large corpora is time-consuming and brings about issues of consistency. This paper aims to demonstrate how general rules for determining basic meanings can be formulated in large-scale projects involving multiple analysts applying MIP(VU) to authentic data. Three sets of problematic lexical units — chemical processes, colours, and sharp objects — are discussed in relation to the question of how the basic meaning of a lexical unit can be determined when human and non-human senses compete as candidates for the basic meaning; these analyses can therefore be considered a detailed case study of problems encountered during step 3.b. of MIP(VU). The analyses show how these problematic cases were tackled in a large corpus clean-up project in order to streamline the annotations and ensure a greater consistency of the corpus. In addition, this paper will point out how the formulation of general identification rules and guidelines could provide a first step towards the automatic detection of linguistic metaphors in natural discourse.


2004 ◽  
Vol 30 (1) ◽  
pp. 75-93 ◽  
Author(s):  
Haodi Feng ◽  
Kang Chen ◽  
Xiaotie Deng ◽  
Weimin Zheng

We are interested in the problem of word extraction from Chinese text collections. We define a word to be a meaningful string composed of several Chinese characters. For example, ‘percent’, and, ‘more and more’, are not recognized as traditional Chinese words from the viewpoint of some people. However, in our work, they are words because they are very widely used and have specific meanings. We start with the viewpoint that a word is a distinguished linguistic entity that can be used in many different language environments. We consider the characters that are directly before a string (predecessors) and the characters that are directly after a string (successors) as important factors for determining the independence of the string. We call such characters accessors of the string, consider the number of distinct predecessors and successors of a string in a large corpus (TREC 5 and TREC 6 documents), and use them as the measurement of the context independency of a string from the rest of the sentences in the document. Our experiments confirm our hypothesis and show that this simple rule gives quite good results for Chinese word extraction and is comparable to, and for long words outperforms, other iterative methods.


Lampas ◽  
2021 ◽  
Vol 54 (1) ◽  
pp. 119-136
Author(s):  
Robert Pitt

Abstract Most well-known inscriptions are monumental texts carved on stone. In this contribution, on the other hand, we focus on small, often informal texts scratched or stamped on rocks, metal surfaces and pottery. To this type of so-called ‘little epigraphy’ belong for instance graffiti, ostraca, weights and measures, curse tablets, etcetera. Although the texts themselves are usually very short, together they constitute a large corpus.


2021 ◽  
Author(s):  
Mathilde Hutin ◽  
Yaru Wu ◽  
Adèle Jatteau ◽  
Ioana Vasilescu ◽  
Lori Lamel ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document