The IMP historical Slovene language resources

2015 ◽  
Vol 49 (3) ◽  
pp. 753-775 ◽  
Author(s):  
Tomaž Erjavec
2020 ◽  
Vol 46 (2) ◽  
pp. 465-482
Author(s):  
Špela Arhar Holdt

The Thesaurus of Modern Slovene is a responsive dictionary: it is compiled automatically from existing language resources while further developments of the dictionary include user participation. Many of the features introduced by the responsive model are new to the Slovene language community (e.g. data is extracted automatically and includes some errors; nonexperts are involved in dictionary compilation; the resource is never truly finished). With financial support from the Slovene Ministry of culture, a survey was conducted to gauge (potential) user opinions on the new features. The paper presents the results of the survey (n = 671) including statistical analyses of dependencies between the respondents’ opinions and their reported familiarity with the new dictionary, their age, and their professional occupation.


2018 ◽  
Vol 69 (3) ◽  
pp. 572-580
Author(s):  
Irena Stramljič Breznik

Abstract The paper focuses on new verbal formations in Slovene coined from borrowed nouns ending in -ing with the Slovenian morpheme -irati (e.g. šoping-irati) on the basis of analogous phonological and semantic structures in the language, and examines their spread in the sphere of informal language use. The word­formational potential of such verbs is further examined with the basic categories of cognitive grammar, such as morphemic transparency, schematicity of the word­formational pattern and the established status of the phonemic sequence *ingira* in the previously existing lexical units of the Slovene language.


2010 ◽  
Author(s):  
Kartik Bhavsar ◽  
Reanna Poncheri Harman ◽  
Amber Harris ◽  
Kathryn Nelson ◽  
Eric A. Surface ◽  
...  

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Pilar López-Úbeda ◽  
Alexandra Pomares-Quimbaya ◽  
Manuel Carlos Díaz-Galiano ◽  
Stefan Schulz

Abstract Background Controlled vocabularies are fundamental resources for information extraction from clinical texts using natural language processing (NLP). Standard language resources available in the healthcare domain such as the UMLS metathesaurus or SNOMED CT are widely used for this purpose, but with limitations such as lexical ambiguity of clinical terms. However, most of them are unambiguous within text limited to a given clinical specialty. This is one rationale besides others to classify clinical text by the clinical specialty to which they belong. Results This paper addresses this limitation by proposing and applying a method that automatically extracts Spanish medical terms classified and weighted per sub-domain, using Spanish MEDLINE titles and abstracts as input. The hypothesis is biomedical NLP tasks benefit from collections of domain terms that are specific to clinical subdomains. We use PubMed queries that generate sub-domain specific corpora from Spanish titles and abstracts, from which token n-grams are collected and metrics of relevance, discriminatory power, and broadness per sub-domain are computed. The generated term set, called Spanish core vocabulary about clinical specialties (SCOVACLIS), was made available to the scientific community and used in a text classification problem obtaining improvements of 6 percentage points in the F-measure compared to the baseline using Multilayer Perceptron, thus demonstrating the hypothesis that a specialized term set improves NLP tasks. Conclusion The creation and validation of SCOVACLIS support the hypothesis that specific term sets reduce the level of ambiguity when compared to a specialty-independent and broad-scope vocabulary.


ZDM ◽  
2021 ◽  
Author(s):  
Sandra Crespo ◽  
Diana Bowen ◽  
Tarik Buli ◽  
Nicole Bannister ◽  
Crystal Kalinec-Craig

Sign in / Sign up

Export Citation Format

Share Document