Mapping the semantic organization of the English odor vocabulary using natural language data

Mapping Intimacies ◽

10.31234/osf.io/hm8av ◽

2020 ◽

Author(s):

Thomas Hörberg ◽

Maria Larsson ◽

Jonas Olofsson

Keyword(s):

Natural Language ◽

Full Range ◽

Semantic Space ◽

Data Driven ◽

Semantic Organization ◽

Language Data ◽

Data Driven Approach ◽

Descriptive System ◽

Small Set ◽

Odor Descriptors

Olfactory experiences are hard to verbalize, partly because most languages lack devoted odor vocabularies. Yet, there is a need for a standardized odor vocabulary, but no descriptive system for describing the full range of odor experiences has been agreed upon. Many studies of the English odor vocabulary have been based on perceptual data such as odor-descriptor ratings, thereby being limited to a small set of pre-selected descriptors. In the present study, we present a data-driven approach that automatically identifies odor descriptors in English, and then derive their semantic organization on the basis of their distributions in natural texts. Olfactory descriptors are automatically identified on the basis of their degree of olfactory association, and their semantic organization is derived with a distributional-semantic word embedding model. We identify and derive the semantic organization of the descriptors most frequently used to describe odors and flavors in English, both within and across source-based, abstract and evaluative descriptor categories. Our method is to a large extent able to capture semantic differences between descriptors related to aroma and flavor qualities, rather than e.g. functional or linguistic aspects, in that it primarily differentiates descriptors with respect to valence and edibility, and the semantic space derived from it is qualitatively similar to a space derived from perceptual data.

Download Full-text

Regional regression models of percentile flows for the contiguous US: Expert versus data-driven independent variable selection

10.5194/hess-2016-639 ◽

2016 ◽

Author(s):

Geoffrey Fouad ◽

André Skupin ◽

Christina L. Tague

Keyword(s):

Regression Model ◽

Regression Models ◽

Predictive Performance ◽

Data Driven ◽

Mean Annual Precipitation ◽

Expert Assessment ◽

Independent Variables ◽

Regional Regression ◽

Data Driven Approach ◽

Small Set

Abstract. Percentile flows are statistics derived from the flow duration curve (FDC) that describe the flow equaled or exceeded for a given percent of time. These statistics provide important information for managing rivers, but are often unavailable since most basins are ungauged. A common approach for predicting percentile flows is to deploy regional regression models based on gauged percentile flows and related independent variables derived from physical and climatic data. The first step of this process identifies groups of basins through a cluster analysis of the independent variables, followed by the development of a regression model for each group. This entire process hinges on the independent variables selected to summarize the physical and climatic state of basins. Distributed physical and climatic datasets now exist for the contiguous United States (US). However, it remains unclear how to best represent these data for the development of regional regression models. The study presented here developed regional regression models for the contiguous US, and evaluated the effect of different approaches for selecting the initial set of independent variables on the predictive performance of the regional regression models. An expert assessment of the dominant controls on the FDC was used to identify a small set of independent variables likely related to percentile flows. A data-driven approach was also applied to evaluate two larger sets of variables that consist of either (1) the averages of data for each basin or (2) both the averages and statistical distribution of basin data distributed in space and time. The small set of variables from the expert assessment of the FDC and two larger sets of variables for the data-driven approach were each applied for a regional regression procedure. Differences in predictive performance were evaluated using 184 validation basins withheld from regression model development. The small set of independent variables selected through expert assessment produced similar, if not better, performance than the two larger sets of variables. A parsimonious set of variables only consisted of mean annual precipitation, potential evapotranspiration, and baseflow index. Additional variables in the two larger sets of variables added little to no predictive information. Regional regression models based on the parsimonious set of variables were developed using 734 calibration basins, and were converted into a tool for predicting 13 percentile flows in the contiguous US. Supplementary Material for this paper includes an R graphical user interface for predicting the percentile flows of basins within the range of conditions used to calibrate the regression models. The equations and performance statistics of the models are also supplied in tabular form.

Download Full-text

A data-driven approach to the semantics of iconicity in American Sign Language and English

Language and Cognition ◽

10.1017/langcog.2019.52 ◽

2020 ◽

Vol 12 (1) ◽

pp. 182-202 ◽

Cited By ~ 1

Author(s):

BILL THOMPSON ◽

MARCUS PERLMAN ◽

GARY LUPYAN ◽

ZED SEVCIKOVA SEHYR ◽

KAREN EMMOREY

Keyword(s):

American Sign Language ◽

Sign Language ◽

Negative Relationship ◽

Semantic Space ◽

Data Driven ◽

American Sign ◽

Use Of Data ◽

Text Corpora ◽

Data Driven Approach ◽

Iconic Signs

abstractA growing body of research shows that both signed and spoken languages display regular patterns of iconicity in their vocabularies. We compared iconicity in the lexicons of American Sign Language (ASL) and English by combining previously collected ratings of ASL signs (Caselli, Sevcikova Sehyr, Cohen-Goldberg, & Emmorey, 2017) and English words (Winter, Perlman, Perry, & Lupyan, 2017) with the use of data-driven semantic vectors derived from English. Our analyses show that models of spoken language lexical semantics drawn from large text corpora can be useful for predicting the iconicity of signs as well as words. Compared to English, ASL has a greater number of regions of semantic space with concentrations of highly iconic vocabulary. There was an overall negative relationship between semantic density and the iconicity of both English words and ASL signs. This negative relationship disappeared for highly iconic signs, suggesting that iconic forms may be more easily discriminable in ASL than in English. Our findings contribute to an increasingly detailed picture of how iconicity is distributed across different languages.

Download Full-text

Interactive Exploration of the Readability of Science Authors

10.31219/osf.io/xuzdr ◽

2021 ◽

Author(s):

Russell J Jarvis ◽

Patrick M. McGurrin ◽

Rebecca Featherston ◽

Marc Skov Madsen ◽

Shivam Bansal ◽

...

Keyword(s):

Natural Language ◽

Text Analysis ◽

Journal Article ◽

Data Driven ◽

Analysis Tool ◽

Single Text ◽

Search Service ◽

Text Types ◽

Data Driven Approach ◽

Academic Authors

Here we present a new text analysis tool that consists of a text analysis service and an author search service. These services were created by using or extending many existing Free and Open Source tools, including streamlit, requests, WordCloud, TextStat, and The Natural Language Tool Kit. The tool has the capability to retrieve journal hosting links and journal article content from APIs and journal hosting websites. Together, these services allow the user to review the complexity of a scientist’s published work relative to other online-based text repositories. Rather than providing feedback as to the complexity of a single text as previous tools have done, the tool presented here shows the relative complexity across many texts from the same author, while also comparing the readability of the author’s body of work to a variety of other scientific and lay text types. The goal of this work is to apply a more data-driven approach that provides established academic authors with statistical insights into their body of published peer reviewed work. By monitoring these readability metrics, scientists may be able to cater their writing to reach broader audiences, contributing to an improved global communication and understanding of complex topics.

Download Full-text

A “small-data”-driven approach to dialogue systems for natural language human computer interaction

2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) ◽

10.1109/sped.2017.7990441 ◽

2017 ◽

Author(s):

Tiberiu Boros ◽

Stefan Daniel Dumitrescu

Keyword(s):

Natural Language ◽

Human Computer Interaction ◽

Data Driven ◽

Small Data ◽

Dialogue Systems ◽

Data Driven Approach ◽

Computer Interaction

Download Full-text

A Data-Driven Approach to Infer Knowledge Base Representation for Natural Language Relations

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/163 ◽

2017 ◽

Author(s):

Kangqi Luo ◽

Xusheng Luo ◽

Xianyang Chen ◽

Kenny Q. Zhu

Keyword(s):

Knowledge Representation ◽

Natural Language ◽

Knowledge Base ◽

Search Algorithm ◽

Knowledge Bases ◽

Data Driven ◽

Learning Approach ◽

Data Driven Approach ◽

Structured Knowledge

This paper studies the problem of discovering the structured knowledge representation of binary natural language relations.The representation, known as the schema, generalizes the traditional path of predicates to support more complex semantics.We present a search algorithm to generate schemas over a knowledge base, and propose a data-driven learning approach to discover the most suitable representations to one relation. Evaluation results show that inferred schemas are able to represent precise semantics, and can be used to enrich manually crafted knowledge bases.

Download Full-text

Data Driven Approach to Forecast Building Occupant Complaints

Construction Research Congress 2020 ◽

10.1061/9780784482865.019 ◽

2020 ◽

Author(s):

Sena Assaf ◽

Mohamad Awada ◽

Issam Srour

Keyword(s):

Data Driven ◽

Data Driven Approach

Download Full-text

A Review and evaluation of Machine Translation methods for Lumasaaba

Journal of Digital Science ◽

10.33847/2686-8296.2.1_1 ◽

2020 ◽

pp. 3-17

Author(s):

Peter Nabende

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Research Area ◽

Data Driven ◽

East African ◽

Data Set ◽

African Languages ◽

Translation Methods

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.

Download Full-text

Analysis and Interpretation of Multi-Source Data at the Hydraulic Fracturing Test Site: A Data-Driven Approach to Improve Well Performance Evaluation in Heterogeneous Formations

Proceedings of the 2020 Latin America Unconventional Resources Technology Conference ◽

10.15530/urtec-2020-1560 ◽

2020 ◽

Author(s):

Shadi Salahshoor

Keyword(s):

Performance Evaluation ◽

Hydraulic Fracturing ◽

Test Site ◽

Data Driven ◽

Well Performance ◽

Source Data ◽

Data Driven Approach ◽

Heterogeneous Formations

Download Full-text

Taking the initiative in natural language data base interactions

10.3115/991813.991879 ◽

1982 ◽

Cited By ~ 4

Author(s):

Bonnie Webber ◽

Aravind Joshi

Keyword(s):

Natural Language ◽

Data Base ◽

Language Data

Download Full-text

Extended-Range Prediction with Low-Dimensional, Stochastic-Dynamic Models: A Data-driven Approach

10.21236/ada572180 ◽

2012 ◽

Author(s):

Michael Ghil ◽

Mickael D. Chekroun ◽

Dmitri Kondrashov ◽

Michael K. Tippett ◽

Andrew Robertson ◽

...

Keyword(s):

Dynamic Models ◽

Data Driven ◽

Stochastic Dynamic ◽

Extended Range ◽

Data Driven Approach ◽

Low Dimensional

Download Full-text