scholarly journals Semantic transparency effects in German compounds: A large dataset and multiple-task investigation

2019 ◽  
Author(s):  
Fritz Guenther ◽  
Marco Marelli ◽  
Jens Bölte

In the present study, we provide a comprehensive analysis and a multi-dimensional dataset of semantic transparency measures for 1,810 German compound words. Compound words are considered semantically transparent when the contribution of the constituents’ meaning to the compound meaning is clear (as in airport), but the degree of semantic transparency varies between compounds (compare strawberry or sandman). Our dataset includes both compositional and relatedness-based semantic transparency measures, also differentiated by constituents. The measures are obtained from a computational and fully implemented semantic model based on distributional semantics. We validate the measures using data from four behavioral experiments: Explicit transparency ratings, two different lexical decision tasks using different nonwords, and an eye-tracking study. We demonstrate that different semantic effects emerge in different behavioral tasks, which can only be capturedusing a multi-dimensional approach to semantic transparency. We further provide the semantic transparency measures derived from the model for a dataset of 40,475 additional German compounds, as well as for 2,061 novel German compounds.

2020 ◽  
Vol 52 (3) ◽  
pp. 1208-1224
Author(s):  
Fritz Günther ◽  
Marco Marelli ◽  
Jens Bölte

2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Tian Shen ◽  
R. Harald Baayen

Abstract In structuralist linguistics, compounds are argued not to constitute morphological categories, due to the absence of systematic form-meaning correspondences. This study investigates subsets of compounds for which systematic form-meaning correspondences are present: adjective–noun compounds in Mandarin. We show that there are substantial differences in the productivity of these compounds. One set of productivity measures (the count of types, the count of hapax legomena, and the estimated count of unseen types) reflect compounds’ profitability. By contrast, the category-conditioned degree of productivity is found to correlate with the internal semantic transparency of the words belonging to a morphological category. Greater semantic transparency, gauged by distributional semantics, predicts greater category-conditioned productivity. This dovetails well with the hypothesis that semantic transparency is a prerequisite for a word formation process to be productive.


Morphology ◽  
2021 ◽  
Author(s):  
Rossella Varvara ◽  
Gabriella Lapesa ◽  
Sebastian Padó

AbstractWe present the results of a large-scale corpus-based comparison of two German event nominalization patterns: deverbal nouns in -ung (e.g., die Evaluierung, ‘the evaluation’) and nominal infinitives (e.g., das Evaluieren, ‘the evaluating’). Among the many available event nominalization patterns for German, we selected these two because they are both highly productive and challenging from the semantic point of view. Both patterns are known to keep a tight relation with the event denoted by the base verb, but with different nuances. Our study targets a better understanding of the differences in their semantic import.The key notion of our comparison is that of semantic transparency, and we propose a usage-based characterization of the relationship between derived nominals and their bases. Using methods from distributional semantics, we bring to bear two concrete measures of transparency which highlight different nuances: the first one, cosine, detects nominalizations which are semantically similar to their bases; the second one, distributional inclusion, detects nominalizations which are used in a subset of the contexts of the base verb. We find that only the inclusion measure helps in characterizing the difference between the two types of nominalizations, in relation with the traditionally considered variable of relative frequency (Hay, 2001). Finally, the distributional analysis allows us to frame our comparison in the broader coordinates of the inflection vs. derivation cline.


2020 ◽  
Vol 92 (1) ◽  
pp. 261-274
Author(s):  
Jie Zhang ◽  
Huiyu Zhu ◽  
Siwei Yu ◽  
Jianwei Ma

Abstract The ability to calculate the seismogram of an earthquake at a local or regional scale is critical but challenging for many seismological studies because detailed knowledge about the 3D heterogeneities in the Earth’s subsurface, although essential, is often insufficient. Here, we present an application of compressed sensing technology that can help predict the seismograms of earthquakes at any position using data from past events randomly distributed in the same area in Jinggu County, Yunnan, China. This first data-driven approach for calculating seismograms generates a large dataset in 3D with a volume encompassing an active fault zone. The input number of earthquakes comprises only 1.27% of the total output events. We use the output data to create a database intended to find the best-matching waveform of a new event by applying an earthquake search engine, which instantly reveals the hypocenter and focal-mechanism solution.


2021 ◽  
Vol 14 (7) ◽  
pp. 681
Author(s):  
Mashiro Okunaka ◽  
Daisuke Kano ◽  
Reiko Matsui ◽  
Toshikatsu Kawasaki ◽  
Yoshihiro Uesawa

Chemotherapy-induced neutropenia (CIN) has been associated with a risk of infections and chemotherapy dose reductions and delays. The chemotherapy regimen remains one of the primary determinants of the risk of neutropenia, with some regimens being more myelotoxic than others. Although a number of clinical trials have currently highlighted the risk of CIN with each chemotherapy regimen, only a few ones have comprehensively examined the risk associated with all chemotherapeutic agents. Therefore, this study aimed to investigate the risk factors and characteristics of CIN caused by each neoplastic agent using data from the large voluntary reporting Food and Drug Administration Adverse Event Reporting System database. Initially, univariate analysis showed that an age ≥ 65 years, the female sex, and treatment with chemotherapeutic agents were factors that caused CIN. Then, cluster and component analyses showed that cytotoxic agents (i.e., alkylating agents, antimetabolic agents, antineoplastic antibiotics, platinating agents, and plant-derived alkaloids) were associated with infection following neutropenia. This comprehensive analysis comparing CIN risk suggests that elderly or underweight patients treated with cytotoxic drugs require particularly careful monitoring.


2018 ◽  
Author(s):  
Miguel Á. Collado ◽  
Daniel Sol ◽  
Ignasi Bartomeus

ABSTRACTHabitat loss and alteration is widely considered one of the main drivers of the current loss of pollinator diversity. Unfortunately, we still lack a comprehensive analysis of habitat importance, use and preference for major groups of pollinators. Here, we address this gap analysing a large dataset of 15,762 bee specimens (more than 400 species) across northeast USA. We found that natural habitats sustain the highest bee diversity, with many species strongly depending on such habitats. By characterizing habitat use and preference for the 45 most abundant species, we also show that many bee species can use human-altered habitats despite exhibiting strong and clear preferences for forested habitats. However, only a few species appear to do well when the habitat has been drastically modified. We conclude that although altered environments may harbor a substantial number of species, preserving natural areas is still essential to guarantee the conservation of bee biodiversity.


2015 ◽  
Vol 143 (16) ◽  
pp. 3538-3545 ◽  
Author(s):  
M. TREMBLAY ◽  
J. S. DAHM ◽  
C. N. WAMAE ◽  
W. A. DE GLANVILLE ◽  
E. M. FÈVRE ◽  
...  

SUMMARYLarge datasets are often not amenable to analysis using traditional single-step approaches. Here, our general objective was to apply imputation techniques, principal component analysis (PCA), elastic net and generalized linear models to a large dataset in a systematic approach to extract the most meaningful predictors for a health outcome. We extracted predictors for Plasmodium falciparum infection, from a large covariate dataset while facing limited numbers of observations, using data from the People, Animals, and their Zoonoses (PAZ) project to demonstrate these techniques: data collected from 415 homesteads in western Kenya, contained over 1500 variables that describe the health, environment, and social factors of the humans, livestock, and the homesteads in which they reside. The wide, sparse dataset was simplified to 42 predictors of P. falciparum malaria infection and wealth rankings were produced for all homesteads. The 42 predictors make biological sense and are supported by previous studies. This systematic data-mining approach we used would make many large datasets more manageable and informative for decision-making processes and health policy prioritization.


Psihologija ◽  
2017 ◽  
Vol 50 (4) ◽  
pp. 503-520 ◽  
Author(s):  
Marco Marelli

Distributional semantics has been for long a source of successful models in psycholinguistics, permitting to obtain semantic estimates for a large number of words in an automatic and fast way. However, resources in this respect remain scarce or limitedly accessible for languages different from English. The present paper describes WEISS (Word-Embeddings Italian Semantic Space), a distributional semantic model based on Italian. WEISS includes models of semantic representations that are trained adopting state-of-the-art word-embeddings methods, applying neural networks to induce distributed representations for lexical meanings. The resource is evaluated against two test sets, demonstrating that WEISS obtains a better performance with respect to a baseline encoding word associations. Moreover, an extensive qualitative analysis of the WEISS output provides examples of the model potentialities in capturing several semantic phenomena. Two variants of WEISS are released and made easily accessible via web through the SNAUT graphic interface.


Sign in / Sign up

Export Citation Format

Share Document