Modelling crosslinguistic n-gram correspondence in typologically different languages

Languages in Contrast ◽

10.1075/lic.19018.mil ◽

2021 ◽

Author(s):

Jiří Milička ◽

Václav Cvrček ◽

Lucie Lukešová

Keyword(s):

Popular Method ◽

Language Patterns ◽

General Guide ◽

N Gram

Abstract N-gram analysis (popularized e.g. by Biber et al., 1999) has become a popular method for the identification of recurrent language patterns. Although the extraction of n-grams from a corpus may seem straightforward, it proves to be very challenging when applied cross-linguistically (cf. e.g. Ebeling and Ebeling, 2013; Granger and Lefer, 2013; Čermáková and Chlumská, 2017). The major issue is that the quantities of n-grams of a certain length in typologically different languages do not correspond. Consequently, n-grams of a given length may function differently across languages, rendering a direct comparison inadequate. Our paper introduces a function capable of modelling the relation between the quantities of n-grams in typologically distant languages, using the example of Czech and English (and some other language pairs). Based on our model, we can suggest what n-gram lengths should be contrasted to better reflect the size of n-gram inventories in each language. The correspondence may not be intuitive (e.g. a Czech 2-gram may best correspond to an English 2.5-gram), but it still provides researchers with a general guide as to what might be useful to include in their analysis (e.g. in this case 2-grams in Czech and 2- and 3-grams in English).

Download Full-text

“Because... I was told... so much”: Linguistic Indicators of Mental Health Status on Twitter

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2019-0063 ◽

2019 ◽

Vol 2019 (4) ◽

pp. 152-171 ◽

Cited By ~ 1

Author(s):

Janith Weerasinghe ◽

Kediel Morales ◽

Rachel Greenstadt

Keyword(s):

Mental Health ◽

Machine Learning ◽

Mental Illnesses ◽

Control Group ◽

Privacy Concerns ◽

Part Of Speech ◽

Machine Learning Model ◽

Language Patterns ◽

N Gram ◽

Applications Of Machine Learning

Abstract Recent studies have shown that machine learning can identify individuals with mental illnesses by analyzing their social media posts. Topics and words related to mental health are some of the top predictors. These findings have implications for early detection of mental illnesses. However, they also raise numerous privacy concerns. To fully evaluate the implications for privacy, we analyze the performance of different machine learning models in the absence of tweets that talk about mental illnesses. Our results show that machine learning can be used to make predictions even if the users do not actively talk about their mental illness. To fully understand the implications of these findings, we analyze the features that make these predictions possible. We analyze bag-of-words, word clusters, part of speech n-gram features, and topic models to understand the machine learning model and to discover language patterns that differentiate individuals with mental illnesses from a control group. This analysis confirmed some of the known language patterns and uncovered several new patterns. We then discuss the possible applications of machine learning to identify mental illnesses, the feasibility of such applications, associated privacy implications, and analyze the feasibility of potential mitigations.

Download Full-text

Preservation of Plant Cell Surfaces For Scanning Electron Microscopy

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100072800 ◽

1973 ◽

Vol 31 ◽

pp. 472-473

Author(s):

Linda M. Sicko ◽

Thomas E. Jensen

Keyword(s):

Electron Microscopy ◽

Scanning Electron Microscopy ◽

Critical Point ◽

Filter Paper ◽

Freeze Drying ◽

Cell Surfaces ◽

Air Drying ◽

Popular Method ◽

Critical Point Drying ◽

Scanning Electron

The use of critical point drying is rapidly becoming a popular method of preparing biological samples for scanning electron microscopy. The procedure is rapid, and produces consistent results with a variety of samples. The preservation of surface details is much greater than that of air drying, and the procedure is less complicated than that of freeze drying. This paper will present results comparing conventional air-drying of plant specimens to critical point drying, both of fixed and unfixed material. The preservation of delicate structures which are easily damaged in processing and the use of filter paper as a vehicle for drying will be discussed.

Download Full-text

N-gram based Language Model for the QWERTY Keyboard Input Errors in a Touch Screen Environment

Korean Institute of Smart Media ◽

10.30693/smj.2018.7.2.54 ◽

2018 ◽

Vol 7 (2) ◽

pp. 54-59

Author(s):

Yoon Gee Ong ◽

◽

Seung Shik Kang ◽

Keyword(s):

Language Model ◽

Touch Screen ◽

Keyboard Input ◽

N Gram

Download Full-text

Dealing with imperfect elicitation results

10.31234/osf.io/unwcz ◽

2019 ◽

Author(s):

Rens van de Schoot ◽

Elian Griffioen ◽

Sonja Désirée Winter

Keyword(s):

Statistical Parameter ◽

Research Practices ◽

Popular Method ◽

Questionable Research Practices ◽

Present Information

The trial-and-roulette method is a popular method to extract experts’ beliefs about a statistical parameter. However, most studies examining the validity of this method only use ‘perfect’ elicitation results. In practice, it is sometimes hard to obtain such neat elicitation results. In our project about predicting fraud and questionable research practices among PhD candidates, we ran into issues with imperfect elicitation results. The goal of the current chapter is to provide an over-view of the solutions we used for dealing with these imperfect results, so that others can benefit from our experience. We present information about the nature of our project, the reasons for the imperfect results, and how we resolved these sup-ported by annotated R-syntax.

Download Full-text

Control charts. General guide and introduction

10.3403/00344481u ◽

2015 ◽

Keyword(s):

Control Charts ◽

General Guide

Download Full-text

Educar la mirada: Un proyecto de creación audiovisual para sujetos no productivos

AVANCA | CINEMA ◽

10.37390/ac.v0i0.11 ◽

2020 ◽

Author(s):

Mariano García Plaza ◽

Marisa Víctor Crespo ◽

Jesús Ramé López

Keyword(s):

Public School ◽

Technological Development ◽

Careful Attention ◽

Educational Community ◽

Audiovisual Communication ◽

Creative People ◽

Daily Lives ◽

The Public ◽

Historical Landscape ◽

Language Patterns

Multiscreen society bombards us with images about which we can not think, to this is added a technological development that is hast urned us as a issues – receivers of pictures / images in our daily lives. Thus arises a need to deepen the possibilities of emancipation that the current socio-historical landscape can have.“Educar la mirada” we are a group of professionals in education and audiovisual communication that pretend, through film- art and new audiovisual creation devices, to encourage literacy and audiovisual creation for life. We start from work with collectives whose artistic motive has no lucrative interest, such as the public school; hence our interest in non-productive subjects. This project arises from the work carried out by the Trabenco Educational Community ( Public School ) in relation to the environment that exists between childhood and the audiovisual media.Theories of reflection on audiovisual literacy and ways of doing creative people who have a clearer meaning for our approach are: F.P.R Bergala, work CineSinAutor, proposals for Medvedkin, language patterns Alxander and creative crystallizations by authors such as Trier, Rossellini, Rodari, Vigotsky or Svankmajer.This project aims at a careful attention to the audiovisual with the intention of giving it a use beyond stagnant paradigms, where the possibilities we seek are those that make effective the needs and purposes that are given by the collectives themselves.

Download Full-text