Mesure de la productivité morphologique des créoles : au-delà des méthodes quantitatives

Anne-Marie Brousseau

doi:10.1017/s0008413100001754

Mesure de la productivité morphologique des créoles : au-delà des méthodes quantitatives

The Canadian Journal of Linguistics / La revue canadienne de linguistique ◽

10.1017/s0008413100001754 ◽

2011 ◽

Vol 56 (1) ◽

pp. 61-86

Author(s):

Anne-Marie Brousseau

Keyword(s):

Morphological Productivity ◽

Large Corpus

AbstractMost recent measures of morphological productivity are reliable only if they are based on a large corpus of the language. This article presents a detailed demonstration of a method for establishing an inventory of productive affixes in a language for which a large corpus is not available. This method evaluates the productivity of an affix first and foremost on the basis of its threshold of profitability (the number of different words derived via the affix) in correlation with other diagnostics to bolster reliability. These other diagnostics are the semantic and phonological transparency of derived words and the decomposability of such words. The application of the method is illustrated step-by-step with data from St. Lucian.

Download Full-text

Morphological productivity in Maltese verbs

ExLing 2010: Proceedings of 3rd Tutorial and Research Workshop on Experimental Linguistics, ◽

10.36505/exling-2010/03/0049/000169 ◽

2019 ◽

Author(s):

Alina Twist

Keyword(s):

Morphological Productivity

Download Full-text

Terminology in Interlanguage Pragmatics

ITL - International Journal of Applied Linguistics ◽

10.2143/itl.143.0.504648 ◽

2004 ◽

Vol 143-144 ◽

pp. 109-119

Author(s):

Giao Quynh Tran

Keyword(s):

Second Language ◽

Second Language Acquisition ◽

Language Acquisition ◽

Speech Acts ◽

Interlanguage Pragmatics ◽

Pragmatic Transfer ◽

Large Corpus

Abstract Inter language pragmatics research has spanned a number of different areas in second language acquisition and pragmatics. In the large corpus of interlanguage pragmatics studies, basic terms such as “interlanguage pragmatics”, “speech acts” and “pragmatic transfer” have been referred to more often than not. But rarely have we stopped to re-evaluate the applicability and appropriateness of these terms. This paper aims to properly interpret or redefine their meanings and to propose more appropriate terms where possible.

Download Full-text

Joint Semantic Synthesis and Morphological Analysis of the Derived Word

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00003 ◽

2018 ◽

Vol 6 ◽

pp. 33-48 ◽

Cited By ~ 4

Author(s):

Ryan Cotterell ◽

Hinrich Schütze

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Probabilistic Model ◽

Morphological Analysis ◽

Semantic Representation ◽

English Word ◽

Word Formation ◽

Additive Models ◽

Segmentation Accuracy ◽

Morphological Productivity

Much like sentences are composed of words, words themselves are composed of smaller units. For example, the English word questionably can be analyzed as question+ able+ ly. However, this structural decomposition of the word does not directly give us a semantic representation of the word’s meaning. Since morphology obeys the principle of compositionality, the semantics of the word can be systematically derived from the meaning of its parts. In this work, we propose a novel probabilistic model of word formation that captures both the analysis of a word w into its constituent segments and the synthesis of the meaning of w from the meanings of those segments. Our model jointly learns to segment words into morphemes and compose distributional semantic vectors of those morphemes. We experiment with the model on English CELEX data and German DErivBase (Zeller et al., 2013) data. We show that jointly modeling semantics increases both segmentation accuracy and morpheme F1 by between 3% and 5%. Additionally, we investigate different models of vector composition, showing that recurrent neural networks yield an improvement over simple additive models. Finally, we study the degree to which the representations correspond to a linguist’s notion of morphological productivity.

Download Full-text

Principled approach to the selection of the embedding dimension of networks

Nature Communications ◽

10.1038/s41467-021-23795-5 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Weiwei Gu ◽

Aditya Tandon ◽

Yong-Yeol Ahn ◽

Filippo Radicchi

Keyword(s):

Real World ◽

Structural Information ◽

General Purpose ◽

Embedding Dimension ◽

Network Embedding ◽

Machine Learning Technique ◽

Learning Technique ◽

Low Dimensional ◽

Large Corpus ◽

Selection Of

AbstractNetwork embedding is a general-purpose machine learning technique that encodes network structure in vector spaces with tunable dimension. Choosing an appropriate embedding dimension – small enough to be efficient and large enough to be effective – is challenging but necessary to generate embeddings applicable to a multitude of tasks. Existing strategies for the selection of the embedding dimension rely on performance maximization in downstream tasks. Here, we propose a principled method such that all structural information of a network is parsimoniously encoded. The method is validated on various embedding algorithms and a large corpus of real-world networks. The embedding dimension selected by our method in real-world networks suggest that efficient encoding in low-dimensional spaces is usually possible.

Download Full-text

Acoustic-Prosodic and Lexical Cues to Deception and Trust: Deciphering How People Detect Lies

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00311 ◽

2020 ◽

Vol 8 ◽

pp. 199-214

Author(s):

Xi (Leslie) Chen ◽

Sarah Ita Levitan ◽

Michelle Levine ◽

Marko Mandic ◽

Julia Hirschberg

Keyword(s):

Deception Detection ◽

Human Perception ◽

Lie Detection ◽

Challenging Problem ◽

Cues To Deception ◽

Lexical Cues ◽

Insight Into ◽

Large Corpus ◽

Better Than

Humans rarely perform better than chance at lie detection. To better understand human perception of deception, we created a game framework, LieCatcher, to collect ratings of perceived deception using a large corpus of deceptive and truthful interviews. We analyzed the acoustic-prosodic and linguistic characteristics of language trusted and mistrusted by raters and compared these to characteristics of actual truthful and deceptive language to understand how perception aligns with reality. With this data we built classifiers to automatically distinguish trusted from mistrusted speech, achieving an F1 of 66.1%. We next evaluated whether the strategies raters said they used to discriminate between truthful and deceptive responses were in fact useful. Our results show that, although several prosodic and lexical features were consistently perceived as trustworthy, they were not reliable cues. Also, the strategies that judges reported using in deception detection were not helpful for the task. Our work sheds light on the nature of trusted language and provides insight into the challenging problem of human deception detection.

Download Full-text

One small step for MIP towards automated metaphor identification?

Metaphor and the Social World ◽

10.1075/msw.3.1.04dor ◽

2013 ◽

Vol 3 (1) ◽

pp. 77-99 ◽

Cited By ~ 7

Author(s):

Aletta G. Dorst ◽

W.Gudrun Reijnierse ◽

Gemma Venhuizen

Keyword(s):

Large Scale ◽

Chemical Processes ◽

Small Step ◽

Lexical Unit ◽

Basic Meaning ◽

General Rules ◽

Natural Discourse ◽

Authentic Data ◽

Large Corpus

The manual annotation of large corpora is time-consuming and brings about issues of consistency. This paper aims to demonstrate how general rules for determining basic meanings can be formulated in large-scale projects involving multiple analysts applying MIP(VU) to authentic data. Three sets of problematic lexical units — chemical processes, colours, and sharp objects — are discussed in relation to the question of how the basic meaning of a lexical unit can be determined when human and non-human senses compete as candidates for the basic meaning; these analyses can therefore be considered a detailed case study of problems encountered during step 3.b. of MIP(VU). The analyses show how these problematic cases were tackled in a large corpus clean-up project in order to streamline the annotations and ensure a greater consistency of the corpus. In addition, this paper will point out how the formulation of general identification rules and guidelines could provide a first step towards the automatic detection of linguistic metaphors in natural discourse.

Download Full-text

Review of Andersen (2012): Exploring Newspaper Language: Using the web to create and investigate a large corpus of modern Norwegian

Terminology ◽

10.1075/term.19.1.07ber ◽

2013 ◽

Vol 19 (1) ◽

pp. 143-148

Author(s):

Gabriel Bernier-Colborne

Keyword(s):

Large Corpus ◽

The Web

Download Full-text

Accessor Variety Criteria for Chinese Word Extraction

Computational Linguistics ◽

10.1162/089120104773633394 ◽

2004 ◽

Vol 30 (1) ◽

pp. 75-93 ◽

Cited By ~ 55

Author(s):

Haodi Feng ◽

Kang Chen ◽

Xiaotie Deng ◽

Weimin Zheng

Keyword(s):

Iterative Methods ◽

Chinese Text ◽

Simple Rule ◽

Chinese Characters ◽

Chinese Word ◽

Text Collections ◽

Large Corpus

We are interested in the problem of word extraction from Chinese text collections. We define a word to be a meaningful string composed of several Chinese characters. For example, ‘percent’, and, ‘more and more’, are not recognized as traditional Chinese words from the viewpoint of some people. However, in our work, they are words because they are very widely used and have specific meanings. We start with the viewpoint that a word is a distinguished linguistic entity that can be used in many different language environments. We consider the characters that are directly before a string (predecessors) and the characters that are directly after a string (successors) as important factors for determining the independence of the string. We call such characters accessors of the string, consider the number of distinct predecessors and successors of a string in a large corpus (TREC 5 and TREC 6 documents), and use them as the measurement of the context independency of a string from the rest of the sentences in the document. Our experiments confirm our hypothesis and show that this simple rule gives quite good results for Chinese word extraction and is comparable to, and for long words outperforms, other iterative methods.

Download Full-text

Little epigraphy

Lampas ◽

10.5117/lam2021.1.007.pitt ◽

2021 ◽

Vol 54 (1) ◽

pp. 119-136

Author(s):

Robert Pitt

Keyword(s):

Metal Surfaces ◽

The Other ◽

Content Type ◽

Weights And Measures ◽

Other Hand ◽

Large Corpus

Abstract Most well-known inscriptions are monumental texts carved on stone. In this contribution, on the other hand, we focus on small, often informal texts scratched or stamped on rocks, metal surfaces and pottery. To this type of so-called ‘little epigraphy’ belong for instance graffiti, ostraca, weights and measures, curse tablets, etcetera. Although the texts themselves are usually very short, together they constitute a large corpus.

Download Full-text

Synchronic Fortition in Five Romance Languages? A Large Corpus-Based Study of Word-Initial Devoicing

10.21437/interspeech.2021-939 ◽

2021 ◽

Author(s):

Mathilde Hutin ◽

Yaru Wu ◽

Adèle Jatteau ◽

Ioana Vasilescu ◽

Lori Lamel ◽

...

Keyword(s):

Romance Languages ◽

Large Corpus

Download Full-text