Utilizing Language Technology in the Documentation of Endangered Uralic Languages

Author(s):  
Ciprian Gerstenberger ◽  
Niko Partanen ◽  
Michael Rießler ◽  
Joshua Wilbur

The paper describes work-in-progress by the Pite Saami, Kola Saami and Izhva Komi language documentation projects, all of which record new spoken language data, digitize available recordings and annotate these multimedia data in order to provide comprehensive language corpora as databases for future research on and for endangered – and under-described – Uralic speech communities. Applying language technology in language documentation helps us to create more systematically annotated corpora, rather than eclectic data collections. Specifically, we describe a script providing interactivity between different morphosyntactic analysis modules implemented as Finite State Transducers and ELAN, a Graphical User Interface tool for annotating and presenting multimodal corpora. Ultimately, the spoken corpora created in our projects will be useful for scientifically significant quantitative investigations on these languages in the future.

Author(s):  
Tanmai Khanna ◽  
Jonathan N. Washington ◽  
Francis M. Tyers ◽  
Sevilay Bayatlı ◽  
Daniel G. Swanson ◽  
...  

AbstractThis paper presents an overview of Apertium, a free and open-source rule-based machine translation platform. Translation in Apertium happens through a pipeline of modular tools, and the platform continues to be improved as more language pairs are added. Several advances have been implemented since the last publication, including some new optional modules: a module that allows rules to process recursive structures at the structural transfer stage, a module that deals with contiguous and discontiguous multi-word expressions, and a module that resolves anaphora to aid translation. Also highlighted is the hybridisation of Apertium through statistical modules that augment the pipeline, and statistical methods that augment existing modules. This includes morphological disambiguation, weighted structural transfer, and lexical selection modules that learn from limited data. The paper also discusses how a platform like Apertium can be a critical part of access to language technology for so-called low-resource languages, which might be ignored or deemed unapproachable by popular corpus-based translation technologies. Finally, the paper presents some of the released and unreleased language pairs, concluding with a brief look at some supplementary Apertium tools that prove valuable to users as well as language developers. All Apertium-related code, including language data, is free/open-source and available at https://github.com/apertium.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Fabiane Florencio de Souza ◽  
Alana Corsi ◽  
Regina Negri Pagani ◽  
Giles Balbinotti ◽  
João Luiz Kovaleski

PurposeThe purpose of this article is to explore the new concept of TQM 4.0 as a way of adapting quality management (QM) in Industry 4.0 (I4.0), guiding industries to this new phase, which has generated adaptations in numerous areas, one of which is QM and human resources.Design/methodology/approachA systematic review of the literature was carried out. Methodi Ordinatio was applied to build the portfolio of articles with scientific relevance, which is the source of data collections and content analysis. To help out in the analysis, NVivo 12 and VOSviewer software programs were used.FindingsThe results demonstrate that when adapting the QM to the technologies of I4.0, the result is an ecosystem that supports the integration between technology, quality and people in the industrial scenario.Research limitations/implicationsThis article presents a systematic review of the literature, but without delving into specific issues such as the different industrial sectors and the culture of countries in which industries may be inserted, for example, which characterizes a limitation of this research.Practical implicationsThis study provides an ecosystem model that can guide future research, regarding the concept of TQM 4.0, in addition to pointing out some ways of combining technologies, quality and people in the industrial context.Originality/valueThis is one of the first articles to employ a systematic review of the literature using Methodi Ordinatio to build a bibliographic panorama on the intertwining of the themes total QM (TQM) and I4.0, focusing on the emerging concept of TQM 4.0.


2019 ◽  
Vol 24 (4) ◽  
pp. 278-296
Author(s):  
Diane Lawong ◽  
Gerald R. Ferris ◽  
Wayne Hochwarter ◽  
Liam Maher

Purpose Researchers have identified various recruiter and organization characteristics that individually influence staffing effectiveness. In extending contemporary research, the purpose of this paper is to address a straightforward question unexamined in previous research, namely, does recruiter political skill interact with organization reputation to influence applicant attraction in the recruitment process? Specifically, the authors hypothesized that for recruiters high in political skill, as organization reputation increases, applicant attraction to the organization increases. Alternatively, for recruiters low in political skill, as organization reputation increases, there is no change in applicant attraction to the organization. Design/methodology/approach Three studies were conducted to create the experimental manipulation materials, pilot test them and then conduct tests of the hypotheses. Study 1 created and tested the content validity of the recruiter political skill script. Study 2 reported on the effectiveness of the recruiter political skill experimental manipulation, whereby a male actor was hired to play the part of a recruiter high in political skill and one low in political skill. Finally, Study 3 was the primary hypothesis testing investigation. Findings Results from a 2×2 between-subjects experimental study (N=576) supported the hypotheses. Specifically, high recruiter political skill and favorable organization reputation each demonstrated significant main effects on applicant attraction to the organization. Additionally, the authors hypothesized, and confirmed, a significant organization reputation × recruiter political skill interaction. Specifically, findings demonstrated that increases in organization reputation resulted in increased applicant attraction to the organization for those exposed to a recruiter high in political skill. However, the effect was not for a recruiter low in political skill. Research limitations/implications Despite the single source nature of data collections, the authors took steps to minimize potential biasing factors (e.g. time separation, including affectivity). Future research will benefit from gathering multiple sources of data. In addition, no experimental research to date exists, examining political skill in a laboratory context. This finding has important implications for the growing research base on political skill in organizations. Practical implications First impressions are lasting impressions, and it is very costly to organizations when recruiters lose good candidates due to the failure to make a memorable and favorable impression. This paper supports the use of political skill in the recruitment process and highlights its capability to influence and attract job applicants to organizations successfully. Originality/value Despite its scientific and practical appeal, the causal effects of political skill on important work outcomes in an experimental setting have not been formally investigated. As the first experimental investigation of political skill, the authors can see more clearly and precisely what political skill behaviors of recruiters tend to influence applicant attraction to organizations in the recruitment process.


Author(s):  
Nicolas Zhou ◽  
Erin M. Corsini ◽  
Shida Jin ◽  
Gregory R. Barbosa ◽  
Trey Kell ◽  
...  

In the first part of this series, we introduced the tools of Big Data, including Not Only Standard Query Language data warehouse, natural language processing (NLP), optical character recognition (OCR), and Internet of Things (IoT). There are nuances to the utilization of these analytics tools, which must be well understood by clinicians seeking to take advantage of these innovative research strategies. One must recognize technical challenges to NLP, such as unintended search outcomes and variability in the expression of human written texts. Other caveats include dealing written texts in image formats, which may ultimately be handled with transformation to text format by OCR, though this technology is still under development. IoT is beginning to be used in cardiac monitoring, medication adherence alerts, lifestyle monitoring, and saving traditional labs from equipment failure catastrophes. These technologies will become more prevalent in the future research landscape, and cardiothoracic surgeons should understand the advantages of these technologies to propel our research to the next level. Experience and understanding of technology are needed in building a robust NLP search result, and effective communication with the data management team is a crucial step in successful utilization of these technologies. In this second installment of the series, we provide examples of published investigations utilizing the advanced analytic tools introduced in Part I. We will explain our processes in developing the research question, barriers to achieving the research goals using traditional research methods, tools used to overcome the barriers, and the research findings.


2021 ◽  
Author(s):  
Diana Sofia Ovalle Lopez ◽  
Robert Vann

This report considers linguistic analyses as matters of ethical practice and quality assurance in the anonymization of recordings of spoken language for deposit in a digital language archive. Ethically, researchers must be committed to protecting the identities of primary data providers. Accordingly, conducting pragmatic analyses before initiating technical anonymization procedures can aid in determining exactly what discourse, in what contexts, might constitute identifying information in need of anonymization. Qualitatively, one of the main goals of language documentation is to preserve as much primary data as possible for future research. Accordingly, conducting phonotactic analyses with the help of computer software can aid in determining precise chronometer readings for each tonal insertion to excise as little primary data as possible during anonymizations. These findings warrant further research on anonymization protocols in digital language archive projects.


2019 ◽  
Vol 9 (2) ◽  
pp. 99
Author(s):  
Niladri Sekhar Dash ◽  
Kesavan Vadakalur Elumalai ◽  
Mufleh Salem M. Alqahtani ◽  
May Abdulaziz Abumelha

In this paper, we have made an attempt to portray a perceivable sketch of extratextual documentative annotation which, in the present frame of text annotation, is considered as one of the indispensable processes through which we can add representational information to the texts included in a written corpus. This becomes more important when a corpus is made with a large number of texts obtained from different genres and text types. To develop a workable frame for extratextual annotation, at each stage, we have broadly classified the existing processes of corpus annotation into two broad types. Moreover, we have tried to explain different layers that are embedded with extratextual annotation of texts as well as marked out the applications which can substantially enhance the accessibility of language data from a corpus for the works of text file management, information retrieval, lexical items extraction, and language processing. The techniques that we have proposed and described in this paper are unique in the sense that these are highly useful for expanding the utility of data of a written text corpus beyond the immediate horizons of language processing to the realms of theoretical, descriptive, and applied linguistics. In this paper, we have also argued that we should try to annotate all kinds of written text corpora so far developed in different natural languages at the extratextual level in a uniform manner so that the text samples stored in corpora can be uniformly used for various works of descriptive linguistics, theoretical linguistics, language technology, and applied linguistics including grammar writing, dictionary compilation, and language teaching. The annotation scheme proposed here is applied on a sample Bangla text corpus and we have noted that the accessibility of data and information from this kind of corpus is far easier than that of an un-annotated raw corpus.


2021 ◽  
Author(s):  
Steffen Willwacher ◽  
Markus Kurz ◽  
Johanna Robbin ◽  
Matthias Thelen ◽  
Joseph Hamill ◽  
...  

Objective To identify and evaluate the evidence of the most relevant running-related risk factors (RRRFs) for running-related overuse injuries (ROIs) and to suggest future research directions. Design Systematic review considering prospective and retrospective studies. (PROSPERO_ID: 236832) Data sources Pubmed. Connected Papers. The search was performed in February 2021. Eligibility criteria English language. Studies on participants whose primary sport is running addressing the risk for the seven most common ROIs and at least one kinematic, kinetic (including pressure measurements), or electromyographic RRRF. An RRRF needed to be identified in at least one prospective or two retrospective studies. Results Sixty-two articles fulfilled our eligibility criteria. Levels of evidence for specific ROIs ranged from conflicting to moderate evidence. Running populations and methods applied varied considerably between studies. While some RRRFs appeared for several ROIs, most RRRFs were specific for a particular ROI. The biomechanical measurements performed in many studies would have allowed for consideration of many more RRRFs than have been reported, highlighting a potential for more effective data usage in the future. Conclusion This study offers a comprehensive overview of RRRFs for the most common ROIs, which might serve as a starting point to develop ROI-specific risk profiles of individual runners. Future work should use macroscopic (big data) approaches involving long-term data collections in the real world and microscopic approaches involving precise stress calculations using recent developments in biomechanical modelling. However, consensus on data collection standards (including the quantification of workload and stress tolerance variables and the reporting of injuries) is warranted.


Author(s):  
Tobias Weber

The South Estonian Kraasna subdialect was spoken until the first half of the 20th century by a now vanished community in Krasnogorodsk, Russia. All linguistic descriptions to date are based on textual sources, mostly manuscripts from Heikki Ojansuu’s 1911/12 and 1914 fieldwork. Ojansuu’s phonograph recordings were thought to be lost by previous researchers and remained unused. The rediscovery of these recordings allows for the first analysis of Kraasna based on spoken language data, closing gaps in the description and enabling further research. This description follows a theory-neutral and framework-free approach, while respecting traditions in Estonian linguistics and linking the results to research in Estonian dialectology. It provides key information on the Kraasna subdialect based on the corpus – phonology, morphology, syntax – despite being restricted to the phonograph recordings. Future research can expand on these points and build on the present description. Kokkuvõte. Tobias Weber: Heikki Ojansuu Kraasna murraku fonogrammide lingvistiline analüüs. Venemaal Pihkva oblastis Krasnogorodski ümbruses elanud Kraasna maarahvas rääkis lõunaeestipärast Kraasna murrakut 20. sajandi esimese pooleni. Kõik keeleteaduslikud käsitlused Kraasna murra- kust on siiani kasutanud kirjalikke allikaid, enamjaolt Heikki Ojansuu 1911.– 12. ning 1914. aastal kogunud käsikirju. Ojansuu tehtud fonogrammid arvati enne käesoleva uurimistöö tegemist olevat kadunud ning sellepärast pole neid varasemad uurijad kasutanud. Taasleitud helisalvestiste abil on selles artiklis kirjeldatud Kraasna murrakut esimest korda suulise kõne andmete alusel, täites lünki eelnevates analüüsides. Siinses kirjelduses järgitakse teoreetiliselt neutraalset deskriptiivset lähenemist, samas austades Eesti keeleteaduse traditsioone ja arvestades Eesti murdeuurimise varasemate tulemustega. Artikkel esitab Kraasna fonoloogia, morfoloogia ja süntaksi kohta põhiteavet, piirdudes aga korpuspõhise uurimusena fonogrammide keeleainesega. See on aluseks järgnevatele uurimisprojektidele, mis saavad käesolevat kirjeldust lähtekohaks kasutades arendada analüüsi edasi, seda laiendades ja süvendades.


2006 ◽  
Vol 9 (6) ◽  
pp. 971-977 ◽  
Author(s):  
Kelly L. Klump ◽  
S. Alexandra Burt

AbstractThe primary aim of the Michigan State University Twin Registry (MSUTR) is to examine developmental differences in genetic, environmental, and neurobiological influences on internalizing and externalizing symptoms, with disordered eating and antisocial behavior representing particular areas of interest. Twin participants span several developmental stages (i.e., childhood, adolescence, and young adulthood). Assessments include comprehensive, multiinformant measures of psychiatric and behavioral phenotypes, buccal swab and salivary DNA samples, assays of adolescent and adult steroid hormone levels (e.g., estradiol, progesterone, testosterone, cortisol), and videotaped parent–child interactions of child and adolescent twin families. To date, we have collected data on over 1000 twins, with additional data collections underway. This article provides an overview of the newly developed MSUTR and describes current and future research directions.


Sign in / Sign up

Export Citation Format

Share Document