Multilingual Facilitation
Latest Publications


TOTAL DOCUMENTS

25
(FIVE YEARS 25)

H-INDEX

0
(FIVE YEARS 0)

Published By University Of Helsinki

9789515150257

2021 ◽  
pp. 275-288
Author(s):  
Khalid Alnajjar

Big languages such as English and Finnish have many natural language processing (NLP) resources and models, but this is not the case for low-resourced and endangered languages as such resources are so scarce despite the great advantages they would provide for the language communities. The most common types of resources available for low-resourced and endangered languages are translation dictionaries and universal dependencies. In this paper, we present a method for constructing word embeddings for endangered languages using existing word embeddings of different resource-rich languages and the translation dictionaries of resource-poor languages. Thereafter, the embeddings are fine-tuned using the sentences in the universal dependencies and aligned to match the semantic spaces of the big languages; resulting in cross-lingual embeddings. The endangered languages we work with here are Erzya, Moksha, Komi-Zyrian and Skolt Sami. Furthermore, we build a universal sentiment analysis model for all the languages that are part of this study, whether endangered or not, by utilizing cross-lingual word embeddings. The evaluation conducted shows that our word embeddings for endangered languages are well-aligned with the resource-rich languages, and they are suitable for training task-specific models as demonstrated by our sentiment analysis models which achieved high accuracies. All our cross-lingual word embeddings and sentiment analysis models will be released openly via an easy-to-use Python library.


2021 ◽  
pp. 208-227
Author(s):  
Rogier Blokland ◽  
Niko Partanen ◽  
Michael Rießler

In this paper we analyse an epic song recorded in 1966 by the Hungarian-Australian Erik Vászolyi in Kolva in the Komi ASSR and discuss its background and wider historical context. We go through different ways how such materials can contribute to data driven and sociolinguistically oriented research, specifically in connection to contemporary documentary languistics, and point directions for further research.


2021 ◽  
pp. 248-262
Author(s):  
Jörg Tiedemann

This paper presents our on-going efforts to develop a comprehensive data set and benchmark for machine translation beyond high-resource languages. The current release includes 500GB of compressed parallel data for almost 3,000 language pairs covering over 500 languages and language variants. We present the structure of the data set and demonstrate its use for systematic studies based on baseline experiments with multilingual neural machine translation between Finno-Ugric languages and other language groups. Our initial results show the capabilities of training effective multilingual translation models with skewed training data but also stress the shortcomings with low-resource settings and the difficulties to obtain sufficient information through straightforward transfer from related languages.


2021 ◽  
pp. 104-127
Author(s):  
Markus Juutinen ◽  
Jukka Mettovaara

We provide an overview of indefinite pronouns in Saami languages that have been borrowed or calqued from Finnic, Scandinavian or Russian. We define indefinite pronouns in the traditional way, i.e. encompassing all pronouns not belonging to any other pronoun class. The treatment of Saami indefinite pronouns in earlier literature varies, but generally they haven’t received as much attention as other pronouns. From Finnic sources, Saami languages have borrowed e.g. pronouns harva ‘few’, joku ‘some(one)’, kaikki ‘all’, moni ‘many’ and muu ‘other’ as well as pronominal elements ikänänsä ‘-ever’, saati ‘let alone’ and vaikka ‘even (if)’. Loans from Scandinavian include e.g. mange ‘many’, noen ~ någon ‘some’ and same ~ samma ‘same’. Russian loans include pronominal elements ни- ‘not (even)’ хоть ‘even (if)’. Indefinite pronouns in Saami prove to be rather an open class, and elements with similar meanings have been borrowed time after time. The variation is especially abundant in pronouns of indifference and free choice. Most of the pronouns in our data have been noted as loans before, but there are some unnoticed cases. Especially these warrant further study.


2021 ◽  
pp. 197-207
Author(s):  
Trond Trosterud ◽  
Sjur Moshagen

The article discusses correcting of typos due to erroneous use of the so-called soft sign in Skolt Sami, one of the most common orthographic symbols, and the most common source of typographic errors. The discussion is based upon the suggestion mechanism of an existing open source Skolt Sami speller. The discussion shows that with an improved suggestion mechanism, the speller is able to restore a single soft sign error in over 97 % of the cases, and remove a hypercorrect soft sign as first correction in 90 % of the cases. Allowing the target form to be within top-5, the correction performance is well above 99 %. Improving the suggestion mechanism also had a positive impact of its overall performance, rising the percentage of target forms within top-5 from 74.1 % to 84.7 %.


2021 ◽  
pp. 128-132
Author(s):  
Мария З. Левина

The systematization of the linguistic material and the creation of an electronic corpus contributes not only to the preservation of the Mordvin (Moksha and Erzyan) languages, but also has a special significance for the Finno-Ugric linguistics in general. Electronic resources will create wider opportunities for conducting comparative-historical, contrastive and typological research.


2021 ◽  
pp. 147-152
Author(s):  
Paula Kokkonen

Artikkeli käsittelee Heimolasten laulukirjasta ja Sukukansain lauluja -vihkosesta löytyviä kominkielisiä runoja, niiden suomennoksia ja niihin tehtyjä sävellyksiä. Tutkimuksen kohteena on Mihail Lebedevin ja Ivan Kuratovin runot sekä V. I. Lytkinin käännökset J. H. Erkon runoista.


2021 ◽  
pp. 289-298
Author(s):  
Janne Saarikivi

The question as to how the linguistic and archaeological data can be combined together to create a comprehensive account on the prehistory of present ethnicities is a debated issue around the globe. In particular, the identification of the new language groups in the material remnants of a particular area, or discerning in the material culture correlates for the language contact periods reflected in the loan word layers are complex and often probably insolvable questions. Regarding the early history of the Finns and the related people, Valter Lang’s new monograph on the archaeology of Estonia and the “arrivals of the Finnic people” (Läänemeresoome tulemised, 2018) has been considered a paradigm changing work in this respect. In my article I argue that despite undisputed progress in this ouevre, many of the old questions regarding time, place and method are still in place.


2021 ◽  
pp. 43-53
Author(s):  
Галина Пунегова

Гижöдын асьсö гöлöссö, акустикасö некыдз оз позь петкöдлыны. Та понда гижöдын персонажлысь шуанног аслыспöлöслунсö серпасалöны гижысь кывъясöн, тшöкыдакодь сёрнилöн сикас йылысь позьö тöдмавны и геройяслöн асланыс сёрниысь, сiйöс донъялöмысь. Статтяын видлалöма-туялöма да петкöдлöма персонаж сёрниысь горсö да сылысь ёнлунсö, сёрни öдсö, ритмсö, ставсö, мый тöдчö герой сёрнилöн сикас вылö. Кывкöрталöма, мый юргана сёрнисö гижöдын стöча, тыр-бура да мичаа петкöдлöмын ыджыд тöдчанлун кутö авторлöн гижан сямыс.


2021 ◽  
pp. 61-73
Author(s):  
Tommi Jantunen ◽  
Rebekah Rousi ◽  
Päivi Rainò ◽  
Markku Turunen ◽  
Mohammad Moeen Valipoor ◽  
...  

This article discusses the prerequisites for the machine translation of sign languages. The topic is complex, including questions relating to technology, interaction design, linguistics and culture. At the moment, despite the affordances provided by the technology, automated translation between signed and spoken languages – or between sign languages – is not possible. The very need of such translation and its associated technology can also be questioned. Yet, we believe that contributing to the improvement of sign language detection, processing and even sign language translation to spoken languages in the future is a matter that should not be abandoned. However, we argue that this work should focus on all necessary aspects of sign languages and sign language user communities. Thus, a more diverse and critical perspective towards these issues is needed in order to avoid generalisations and bias that is often manifested within dominant research paradigms particularly in the fields of spoken language research and speech community.


Sign in / Sign up

Export Citation Format

Share Document