Semantic transparency effects in German compounds: A large dataset and multiple-task investigation

In the present study, we provide a comprehensive analysis and a multi-dimensional dataset of semantic transparency measures for 1,810 German compound words. Compound words are considered semantically transparent when the contribution of the constituents’ meaning to the compound meaning is clear (as in airport), but the degree of semantic transparency varies between compounds (compare strawberry or sandman). Our dataset includes both compositional and relatedness-based semantic transparency measures, also differentiated by constituents. The measures are obtained from a computational and fully implemented semantic model based on distributional semantics. We validate the measures using data from four behavioral experiments: Explicit transparency ratings, two different lexical decision tasks using different nonwords, and an eye-tracking study. We demonstrate that different semantic effects emerge in different behavioral tasks, which can only be capturedusing a multi-dimensional approach to semantic transparency. We further provide the semantic transparency measures derived from the model for a dataset of 40,475 additional German compounds, as well as for 2,061 novel German compounds.

Download Full-text

Semantic transparency effects in German compounds: A large dataset and multiple-task investigation

Behavior Research Methods ◽

10.3758/s13428-019-01311-4 ◽

2020 ◽

Vol 52 (3) ◽

pp. 1208-1224

Author(s):

Fritz Günther ◽

Marco Marelli ◽

Jens Bölte

Keyword(s):

Semantic Transparency ◽

Large Dataset ◽

Multiple Task

Download Full-text

Faculty Opinions recommendation of Prevalence of major comorbidities in subjects with COPD and incidence of myocardial infarction and stroke: a comprehensive analysis using data from primary care.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.6288956.6377054 ◽

2010 ◽

Author(s):

Jadwiga Wedzicha ◽

Anant Patel

Keyword(s):

Myocardial Infarction ◽

Primary Care ◽

Comprehensive Analysis ◽

Using Data

Download Full-text

Adjective–noun compounds in Mandarin: a study on productivity

Corpus Linguistics and Linguistic Theory ◽

10.1515/cllt-2020-0059 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Tian Shen ◽

R. Harald Baayen

Keyword(s):

Formation Process ◽

Word Formation ◽

Distributional Semantics ◽

Semantic Transparency ◽

Noun Compounds ◽

Hapax Legomena

Abstract In structuralist linguistics, compounds are argued not to constitute morphological categories, due to the absence of systematic form-meaning correspondences. This study investigates subsets of compounds for which systematic form-meaning correspondences are present: adjective–noun compounds in Mandarin. We show that there are substantial differences in the productivity of these compounds. One set of productivity measures (the count of types, the count of hapax legomena, and the estimated count of unseen types) reflect compounds’ profitability. By contrast, the category-conditioned degree of productivity is found to correlate with the internal semantic transparency of the words belonging to a morphological category. Greater semantic transparency, gauged by distributional semantics, predicts greater category-conditioned productivity. This dovetails well with the hypothesis that semantic transparency is a prerequisite for a word formation process to be productive.

Download Full-text

Grounding semantic transparency in context

Morphology ◽

10.1007/s11525-021-09382-w ◽

2021 ◽

Author(s):

Rossella Varvara ◽

Gabriella Lapesa ◽

Sebastian Padó

Keyword(s):

Large Scale ◽

Point Of View ◽

Distributional Semantics ◽

Semantic Transparency ◽

Inclusion Measure ◽

The Difference ◽

Semantic Point ◽

The Many ◽

The Relationship

AbstractWe present the results of a large-scale corpus-based comparison of two German event nominalization patterns: deverbal nouns in -ung (e.g., die Evaluierung, ‘the evaluation’) and nominal infinitives (e.g., das Evaluieren, ‘the evaluating’). Among the many available event nominalization patterns for German, we selected these two because they are both highly productive and challenging from the semantic point of view. Both patterns are known to keep a tight relation with the event denoted by the base verb, but with different nuances. Our study targets a better understanding of the differences in their semantic import.The key notion of our comparison is that of semantic transparency, and we propose a usage-based characterization of the relationship between derived nominals and their bases. Using methods from distributional semantics, we bring to bear two concrete measures of transparency which highlight different nuances: the first one, cosine, detects nominalizations which are semantically similar to their bases; the second one, distributional inclusion, detects nominalizations which are used in a subset of the contexts of the base verb. We find that only the inclusion measure helps in characterizing the difference between the two types of nominalizations, in relation with the traditionally considered variable of relative frequency (Hay, 2001). Finally, the distributional analysis allows us to frame our comparison in the broader coordinates of the inflection vs. derivation cline.

Download Full-text

Constructing the Seismograms of Future Earthquakes in Yunnan, China, Using Compressed Sensing

Seismological Research Letters ◽

10.1785/0220190382 ◽

2020 ◽

Vol 92 (1) ◽

pp. 261-274

Author(s):

Jie Zhang ◽

Huiyu Zhu ◽

Siwei Yu ◽

Jianwei Ma

Keyword(s):

Compressed Sensing ◽

Active Fault ◽

Regional Scale ◽

Focal Mechanism Solution ◽

Total Output ◽

Detailed Knowledge ◽

Large Dataset ◽

Sensing Technology ◽

Data Driven Approach ◽

Using Data

Abstract The ability to calculate the seismogram of an earthquake at a local or regional scale is critical but challenging for many seismological studies because detailed knowledge about the 3D heterogeneities in the Earth’s subsurface, although essential, is often insufficient. Here, we present an application of compressed sensing technology that can help predict the seismograms of earthquakes at any position using data from past events randomly distributed in the same area in Jinggu County, Yunnan, China. This first data-driven approach for calculating seismograms generates a large dataset in 3D with a volume encompassing an active fault zone. The input number of earthquakes comprises only 1.27% of the total output events. We use the output data to create a database intended to find the best-matching waveform of a new event by applying an earthquake search engine, which instantly reveals the hypocenter and focal-mechanism solution.

Download Full-text

Comprehensive Analysis of Chemotherapeutic Agents THAT Induce Infectious Neutropenia

Pharmaceuticals ◽

10.3390/ph14070681 ◽

2021 ◽

Vol 14 (7) ◽

pp. 681

Author(s):

Mashiro Okunaka ◽

Daisuke Kano ◽

Reiko Matsui ◽

Toshikatsu Kawasaki ◽

Yoshihiro Uesawa

Keyword(s):

Chemotherapy Regimen ◽

Alkylating Agents ◽

Univariate Analysis ◽

Adverse Event Reporting System ◽

Adverse Event Reporting ◽

Chemotherapeutic Agents ◽

Comprehensive Analysis ◽

Cytotoxic Agents ◽

Underweight Patients ◽

Using Data

Chemotherapy-induced neutropenia (CIN) has been associated with a risk of infections and chemotherapy dose reductions and delays. The chemotherapy regimen remains one of the primary determinants of the risk of neutropenia, with some regimens being more myelotoxic than others. Although a number of clinical trials have currently highlighted the risk of CIN with each chemotherapy regimen, only a few ones have comprehensively examined the risk associated with all chemotherapeutic agents. Therefore, this study aimed to investigate the risk factors and characteristics of CIN caused by each neoplastic agent using data from the large voluntary reporting Food and Drug Administration Adverse Event Reporting System database. Initially, univariate analysis showed that an age ≥ 65 years, the female sex, and treatment with chemotherapeutic agents were factors that caused CIN. Then, cluster and component analyses showed that cytotoxic agents (i.e., alkylating agents, antimetabolic agents, antineoplastic antibiotics, platinating agents, and plant-derived alkaloids) were associated with infection following neutropenia. This comprehensive analysis comparing CIN risk suggests that elderly or underweight patients treated with cytotoxic drugs require particularly careful monitoring.

Download Full-text

Bees use anthropogenic habitats despite strong natural habitat preferences

10.1101/278812 ◽

2018 ◽

Cited By ~ 1

Author(s):

Miguel Á. Collado ◽

Daniel Sol ◽

Ignasi Bartomeus

Keyword(s):

Habitat Loss ◽

Natural Habitat ◽

Habitat Preferences ◽

Abundant Species ◽

Comprehensive Analysis ◽

Natural Habitats ◽

Large Dataset ◽

Anthropogenic Habitats ◽

Bee Diversity ◽

Northeast Usa

ABSTRACTHabitat loss and alteration is widely considered one of the main drivers of the current loss of pollinator diversity. Unfortunately, we still lack a comprehensive analysis of habitat importance, use and preference for major groups of pollinators. Here, we address this gap analysing a large dataset of 15,762 bee specimens (more than 400 species) across northeast USA. We found that natural habitats sustain the highest bee diversity, with many species strongly depending on such habitats. By characterizing habitat use and preference for the 45 most abundant species, we also show that many bee species can use human-altered habitats despite exhibiting strong and clear preferences for forested habitats. However, only a few species appear to do well when the habitat has been drastically modified. We conclude that although altered environments may harbor a substantial number of species, preserving natural areas is still essential to guarantee the conservation of bee biodiversity.

Download Full-text

Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya

Epidemiology and Infection ◽

10.1017/s0950268815000710 ◽

2015 ◽

Vol 143 (16) ◽

pp. 3538-3545 ◽

Cited By ~ 3

Author(s):

M. TREMBLAY ◽

J. S. DAHM ◽

C. N. WAMAE ◽

W. A. DE GLANVILLE ◽

E. M. FÈVRE ◽

...

Keyword(s):

Plasmodium Falciparum ◽

Linear Models ◽

Principal Component ◽

Large Datasets ◽

Single Step ◽

Large Dataset ◽

Western Kenya ◽

The People ◽

Increased Risk ◽

Using Data

SUMMARYLarge datasets are often not amenable to analysis using traditional single-step approaches. Here, our general objective was to apply imputation techniques, principal component analysis (PCA), elastic net and generalized linear models to a large dataset in a systematic approach to extract the most meaningful predictors for a health outcome. We extracted predictors for Plasmodium falciparum infection, from a large covariate dataset while facing limited numbers of observations, using data from the People, Animals, and their Zoonoses (PAZ) project to demonstrate these techniques: data collected from 415 homesteads in western Kenya, contained over 1500 variables that describe the health, environment, and social factors of the humans, livestock, and the homesteads in which they reside. The wide, sparse dataset was simplified to 42 predictors of P. falciparum malaria infection and wealth rankings were produced for all homesteads. The 42 predictors make biological sense and are supported by previous studies. This systematic data-mining approach we used would make many large datasets more manageable and informative for decision-making processes and health policy prioritization.

Download Full-text

The Role of Semantic Transparency in the Processing of Compound Words in Polish: Evidence from a Masked Priming Experiment

Studies in Polish Linguistics ◽

10.4467/23005920spl.17.008.7200 ◽

2017 ◽

Vol 12 (3) ◽

Keyword(s):

Masked Priming ◽

Semantic Transparency ◽

Compound Words

Download Full-text

Word-embeddings Italian semantic spaces: A semantic model for psycholinguistic research

Psihologija ◽

10.2298/psi161208011m ◽

2017 ◽

Vol 50 (4) ◽

pp. 503-520 ◽

Cited By ~ 8

Author(s):

Marco Marelli

Keyword(s):

Neural Networks ◽

State Of The Art ◽

Semantic Space ◽

Distributional Semantics ◽

Semantic Model ◽

Word Embeddings ◽

Psycholinguistic Research ◽

Graphic Interface ◽

Test Sets ◽

Semantic Spaces

Distributional semantics has been for long a source of successful models in psycholinguistics, permitting to obtain semantic estimates for a large number of words in an automatic and fast way. However, resources in this respect remain scarce or limitedly accessible for languages different from English. The present paper describes WEISS (Word-Embeddings Italian Semantic Space), a distributional semantic model based on Italian. WEISS includes models of semantic representations that are trained adopting state-of-the-art word-embeddings methods, applying neural networks to induce distributed representations for lexical meanings. The resource is evaluated against two test sets, demonstrating that WEISS obtains a better performance with respect to a baseline encoding word associations. Moreover, an extensive qualitative analysis of the WEISS output provides examples of the model potentialities in capturing several semantic phenomena. Two variants of WEISS are released and made easily accessible via web through the SNAUT graphic interface.

Download Full-text