Tutorial Dialogue Modes in a Large Corpus of Online Tutoring Transcripts

Author(s):  
Donald M. Morrison ◽  
Benjamin Nye ◽  
Vasile Rus ◽  
Sarah Snyder ◽  
Jennifer Boller ◽  
...  
2004 ◽  
Vol 143-144 ◽  
pp. 109-119
Author(s):  
Giao Quynh Tran

Abstract Inter language pragmatics research has spanned a number of different areas in second language acquisition and pragmatics. In the large corpus of interlanguage pragmatics studies, basic terms such as “interlanguage pragmatics”, “speech acts” and “pragmatic transfer” have been referred to more often than not. But rarely have we stopped to re-evaluate the applicability and appropriateness of these terms. This paper aims to properly interpret or redefine their meanings and to propose more appropriate terms where possible.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Weiwei Gu ◽  
Aditya Tandon ◽  
Yong-Yeol Ahn ◽  
Filippo Radicchi

AbstractNetwork embedding is a general-purpose machine learning technique that encodes network structure in vector spaces with tunable dimension. Choosing an appropriate embedding dimension – small enough to be efficient and large enough to be effective – is challenging but necessary to generate embeddings applicable to a multitude of tasks. Existing strategies for the selection of the embedding dimension rely on performance maximization in downstream tasks. Here, we propose a principled method such that all structural information of a network is parsimoniously encoded. The method is validated on various embedding algorithms and a large corpus of real-world networks. The embedding dimension selected by our method in real-world networks suggest that efficient encoding in low-dimensional spaces is usually possible.


2020 ◽  
Vol 8 ◽  
pp. 199-214
Author(s):  
Xi (Leslie) Chen ◽  
Sarah Ita Levitan ◽  
Michelle Levine ◽  
Marko Mandic ◽  
Julia Hirschberg

Humans rarely perform better than chance at lie detection. To better understand human perception of deception, we created a game framework, LieCatcher, to collect ratings of perceived deception using a large corpus of deceptive and truthful interviews. We analyzed the acoustic-prosodic and linguistic characteristics of language trusted and mistrusted by raters and compared these to characteristics of actual truthful and deceptive language to understand how perception aligns with reality. With this data we built classifiers to automatically distinguish trusted from mistrusted speech, achieving an F1 of 66.1%. We next evaluated whether the strategies raters said they used to discriminate between truthful and deceptive responses were in fact useful. Our results show that, although several prosodic and lexical features were consistently perceived as trustworthy, they were not reliable cues. Also, the strategies that judges reported using in deception detection were not helpful for the task. Our work sheds light on the nature of trusted language and provides insight into the challenging problem of human deception detection.


2013 ◽  
Vol 3 (1) ◽  
pp. 77-99 ◽  
Author(s):  
Aletta G. Dorst ◽  
W.Gudrun Reijnierse ◽  
Gemma Venhuizen

The manual annotation of large corpora is time-consuming and brings about issues of consistency. This paper aims to demonstrate how general rules for determining basic meanings can be formulated in large-scale projects involving multiple analysts applying MIP(VU) to authentic data. Three sets of problematic lexical units — chemical processes, colours, and sharp objects — are discussed in relation to the question of how the basic meaning of a lexical unit can be determined when human and non-human senses compete as candidates for the basic meaning; these analyses can therefore be considered a detailed case study of problems encountered during step 3.b. of MIP(VU). The analyses show how these problematic cases were tackled in a large corpus clean-up project in order to streamline the annotations and ensure a greater consistency of the corpus. In addition, this paper will point out how the formulation of general identification rules and guidelines could provide a first step towards the automatic detection of linguistic metaphors in natural discourse.


2011 ◽  
Vol 49 (5) ◽  
pp. 260-260
Author(s):  
Kenneth W. Ford
Keyword(s):  

2004 ◽  
Vol 30 (1) ◽  
pp. 75-93 ◽  
Author(s):  
Haodi Feng ◽  
Kang Chen ◽  
Xiaotie Deng ◽  
Weimin Zheng

We are interested in the problem of word extraction from Chinese text collections. We define a word to be a meaningful string composed of several Chinese characters. For example, ‘percent’, and, ‘more and more’, are not recognized as traditional Chinese words from the viewpoint of some people. However, in our work, they are words because they are very widely used and have specific meanings. We start with the viewpoint that a word is a distinguished linguistic entity that can be used in many different language environments. We consider the characters that are directly before a string (predecessors) and the characters that are directly after a string (successors) as important factors for determining the independence of the string. We call such characters accessors of the string, consider the number of distinct predecessors and successors of a string in a large corpus (TREC 5 and TREC 6 documents), and use them as the measurement of the context independency of a string from the rest of the sentences in the document. Our experiments confirm our hypothesis and show that this simple rule gives quite good results for Chinese word extraction and is comparable to, and for long words outperforms, other iterative methods.


2017 ◽  
Vol 29 (4) ◽  
pp. 4-19
Author(s):  
Steven R. Sligar ◽  
Christopher D. Pelletier ◽  
Heidi Stone Bonner ◽  
Elizabeth Coghill ◽  
Daniel Guberman ◽  
...  

Lampas ◽  
2021 ◽  
Vol 54 (1) ◽  
pp. 119-136
Author(s):  
Robert Pitt

Abstract Most well-known inscriptions are monumental texts carved on stone. In this contribution, on the other hand, we focus on small, often informal texts scratched or stamped on rocks, metal surfaces and pottery. To this type of so-called ‘little epigraphy’ belong for instance graffiti, ostraca, weights and measures, curse tablets, etcetera. Although the texts themselves are usually very short, together they constitute a large corpus.


Sign in / Sign up

Export Citation Format

Share Document