Modelling crosslinguistic n-gram correspondence in typologically different languages

2021 ◽  
Author(s):  
Jiří Milička ◽  
Václav Cvrček ◽  
Lucie Lukešová

Abstract N-gram analysis (popularized e.g. by Biber et al., 1999) has become a popular method for the identification of recurrent language patterns. Although the extraction of n-grams from a corpus may seem straightforward, it proves to be very challenging when applied cross-linguistically (cf. e.g. Ebeling and Ebeling, 2013; Granger and Lefer, 2013; Čermáková and Chlumská, 2017). The major issue is that the quantities of n-grams of a certain length in typologically different languages do not correspond. Consequently, n-grams of a given length may function differently across languages, rendering a direct comparison inadequate. Our paper introduces a function capable of modelling the relation between the quantities of n-grams in typologically distant languages, using the example of Czech and English (and some other language pairs). Based on our model, we can suggest what n-gram lengths should be contrasted to better reflect the size of n-gram inventories in each language. The correspondence may not be intuitive (e.g. a Czech 2-gram may best correspond to an English 2.5-gram), but it still provides researchers with a general guide as to what might be useful to include in their analysis (e.g. in this case 2-grams in Czech and 2- and 3-grams in English).

2019 ◽  
Vol 2019 (4) ◽  
pp. 152-171 ◽  
Author(s):  
Janith Weerasinghe ◽  
Kediel Morales ◽  
Rachel Greenstadt

Abstract Recent studies have shown that machine learning can identify individuals with mental illnesses by analyzing their social media posts. Topics and words related to mental health are some of the top predictors. These findings have implications for early detection of mental illnesses. However, they also raise numerous privacy concerns. To fully evaluate the implications for privacy, we analyze the performance of different machine learning models in the absence of tweets that talk about mental illnesses. Our results show that machine learning can be used to make predictions even if the users do not actively talk about their mental illness. To fully understand the implications of these findings, we analyze the features that make these predictions possible. We analyze bag-of-words, word clusters, part of speech n-gram features, and topic models to understand the machine learning model and to discover language patterns that differentiate individuals with mental illnesses from a control group. This analysis confirmed some of the known language patterns and uncovered several new patterns. We then discuss the possible applications of machine learning to identify mental illnesses, the feasibility of such applications, associated privacy implications, and analyze the feasibility of potential mitigations.


Author(s):  
Linda M. Sicko ◽  
Thomas E. Jensen

The use of critical point drying is rapidly becoming a popular method of preparing biological samples for scanning electron microscopy. The procedure is rapid, and produces consistent results with a variety of samples. The preservation of surface details is much greater than that of air drying, and the procedure is less complicated than that of freeze drying. This paper will present results comparing conventional air-drying of plant specimens to critical point drying, both of fixed and unfixed material. The preservation of delicate structures which are easily damaged in processing and the use of filter paper as a vehicle for drying will be discussed.


2019 ◽  
Author(s):  
Rens van de Schoot ◽  
Elian Griffioen ◽  
Sonja Désirée Winter

The trial-and-roulette method is a popular method to extract experts’ beliefs about a statistical parameter. However, most studies examining the validity of this method only use ‘perfect’ elicitation results. In practice, it is sometimes hard to obtain such neat elicitation results. In our project about predicting fraud and questionable research practices among PhD candidates, we ran into issues with imperfect elicitation results. The goal of the current chapter is to provide an over-view of the solutions we used for dealing with these imperfect results, so that others can benefit from our experience. We present information about the nature of our project, the reasons for the imperfect results, and how we resolved these sup-ported by annotated R-syntax.


2020 ◽  
Author(s):  
Mariano García Plaza ◽  
Marisa Víctor Crespo ◽  
Jesús Ramé López

Multiscreen society bombards us with images about which we can not think, to this is added a technological development that is hast urned us as a issues – receivers of pictures / images in our daily lives. Thus arises a need to deepen the possibilities of emancipation that the current socio-historical landscape can have.“Educar la mirada” we are a group of professionals in education and audiovisual communication that pretend, through film- art and new audiovisual creation devices, to encourage literacy and audiovisual creation for life. We start from work with collectives whose artistic motive has no lucrative interest, such as the public school; hence our interest in non-productive subjects. This project arises from the work carried out by the Trabenco Educational Community ( Public School ) in relation to the environment that exists between childhood and the audiovisual media.Theories of reflection on audiovisual literacy and ways of doing creative people who have a clearer meaning for our approach are: F.P.R Bergala, work CineSinAutor, proposals for Medvedkin, language patterns Alxander and creative crystallizations by authors such as Trier, Rossellini, Rodari, Vigotsky or Svankmajer.This project aims at a careful attention to the audiovisual with the intention of giving it a use beyond stagnant paradigms, where the possibilities we seek are those that make effective the needs and purposes that are given by the collectives themselves.


Author(s):  
Vitaly Kuznetsov ◽  
Hank Liao ◽  
Mehryar Mohri ◽  
Michael Riley ◽  
Brian Roark

2020 ◽  
Author(s):  
Grant P. Strimel ◽  
Ariya Rastrow ◽  
Gautam Tiwari ◽  
Adrien Piérard ◽  
Jon Webb

2019 ◽  
Vol 1193 ◽  
pp. 012032
Author(s):  
D Purwantoro ◽  
H Akbar ◽  
A Hidayati ◽  
Sfenrianto
Keyword(s):  

2020 ◽  
Vol 12 (1) ◽  
pp. 1-24 ◽  
Author(s):  
Al Hafiz Akbar Maulana Siagian ◽  
Masayoshi Aritsugi
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document