Linking norms, ratings, and relations of words and concepts across multiple language varieties

Behavior Research Methods ◽

10.3758/s13428-021-01650-1 ◽

2021 ◽

Author(s):

Annika Tjuka ◽

Robert Forkel ◽

Johann-Mattis List

Keyword(s):

Case Studies ◽

Web Application ◽

Data Curation ◽

Data Sets ◽

Linguistic Information ◽

Language Varieties ◽

Advance Research ◽

Combine Information ◽

And Linguistics

AbstractPsychologists and linguists collect various data on word and concept properties. In psychology, scholars have accumulated norms and ratings for a large number of words in languages with many speakers. In linguistics, scholars have accumulated cross-linguistic information about the relations between words and concepts. Until now, however, there have been no efforts to combine information from the two fields, which would allow comparison of psychological and linguistic properties across different languages. The Database of Cross-Linguistic Norms, Ratings, and Relations for Words and Concepts (NoRaRe) is the first attempt to close this gap. Building on a reference catalog that offers standardization of concepts used in historical and typological language comparison, it integrates data from psychology and linguistics, collected from 98 data sets, covering 65 unique properties for 40 languages. The database is curated with the help of manual, automated, semi-automated workflows and uses a software API to control and access the data. The database is accessible via a web application, the software API, or using scripting languages. In this study, we present how the database is structured, how it can be extended, and how we control the quality of the data curation process. To illustrate its application, we present three case studies that test the validity of our approach, the accuracy of our workflows, and the integrative potential of the database. Due to regular version updates, the NoRaRe database has the potential to advance research in psychology and linguistics by offering researchers an integrated perspective on both fields.

Download Full-text

Linking Norms, Ratings, and Relations of Words and Concepts Across Multiple Language Varieties

10.31234/osf.io/tgw3z ◽

2020 ◽

Author(s):

Annika Tjuka ◽

Robert Forkel ◽

Johann-Mattis List

Keyword(s):

Web Application ◽

Age Of Acquisition ◽

Data Curation ◽

Data Sets ◽

Data Types ◽

Word Meanings ◽

Language Varieties ◽

Diverse Data ◽

Multiple Languages ◽

Word Frequencies

Psychologists and linguists have collected a great diversity of data for word and concept properties. In psychology, many studies accumulate norms and ratings such as word frequencies or age-of-acquisition often for a large number of words. Linguistics, on the other hand, provides valuable insights into relations of word meanings. We present a collection of those data sets for norms, ratings, and relations that cover different languages: ‘NoRaRe.’ To enable a comparison between the diverse data types, we established workflows that facilitate the expansion of the database. A web application allows convenient access to the data (https://digling.org/norare/). Furthermore, a software API ensures consistent data curation by providing tests to validate the data sets. The NoRaRe collection is linked to the database curated by the Concepticon project (https://concepticon.clld.org) which offers a reference catalog of unified concept sets. The link between words in the data sets and the Concepticon concept sets makes a cross-linguistic comparison possible. In three case studies, we test the validity of our approach, the accuracy of our workflow, and the applicability of our database. The results indicate that the NoRaRe database can be applied for the study of word properties across multiple languages. The data can be used by psychologists and linguists to benefit from the knowledge rooted in both research disciplines.

Download Full-text

Mapeamento de Problemas de Qualidade no Linked Data

Journal on Advances in Theoretical and Applied Informatics ◽

10.26729/jadi.v1i1.1043 ◽

2015 ◽

Vol 1 (1) ◽

pp. 38

Author(s):

Jessica Oliveira De Souza ◽

Jose Eduardo Santarem Segundo

Keyword(s):

Semantic Web ◽

User Experience ◽

Web Application ◽

Linked Data ◽

Data Sets ◽

Quality Of Data ◽

Related Quality ◽

Primary Means ◽

Basic Semantic

Since the Semantic Web was created in order to improve the current web user experience, the Linked Data is the primary means in which semantic web application is theoretically full, respecting appropriate criteria and requirements. Therefore, the quality of data and information stored on the linked data sets is essential to meet the basic semantic web objectives. Hence, this article aims to describe and present specific dimensions and their related quality issues.

Download Full-text

Teaching the Scientific Study of International Processes

10.1093/acrefore/9780190846626.013.314 ◽

2017 ◽

Author(s):

D. Scott Bennett

Keyword(s):

Game Theory ◽

Case Studies ◽

Teaching Methods ◽

Finite Time ◽

International Politics ◽

Scientific Study ◽

Data Sets ◽

Scientific Approach ◽

Comparative Case Studies

The Scientific Study of International Processes (SSIP) is an approach aimed at teaching of international politics scientifically. Teaching scientifically means teaching students how to use evidence to support or disprove some particular logical argument or hypothesis that reaches some level of generalization about relationships between concepts. Closely related to simply asking what evidence there is, is teaching students to address the breadth, depth, and quality of that evidence. The scientific approach may also draw attention to the logic of arguments and policies. Are policies, positions, and the arguments behind them logical? Or is some policy or position based on assumptions that are not logically related, or only true if certain auxiliary assumptions hold true? Teaching methods for SSIP include comparative case studies, experiments and surveys, data sets, and game theory and simulation. Instructors also face several challenges when seeking to teach scientifically, and in particular when they try to make time to teach methodology as part of an international politics course. Some problems are relatively easily overcome just by focusing on effective teaching. Other are unique to SSIP and cannot be dealt with quite so easily. Among these are the need to appeal to a broad audience, and dealing with students' negative reactions to the term “science” and the constraint of finite time in a course.

Download Full-text

Composite Time Lines: A Means to Leverage Resolving Power from Radioisotopic Dates and Biostratigraphy

The Paleontological Society Papers ◽

10.1017/s108933260000139x ◽

2006 ◽

Vol 12 ◽

pp. 145-170 ◽

Cited By ~ 5

Author(s):

Peter M. Sadler

Keyword(s):

Resolving Power ◽

Data Sets ◽

Time Line ◽

Relative Age ◽

Large Numbers ◽

Extinction Events ◽

Combine Information ◽

Best Fit ◽

Sequencing Procedure

Species origination and extinction events far outnumber radioisotopically dated events in the ancient stratigraphic record. In order to calibrate rapid rates of Mesozoic and Paleozoic change and to estimate the ages of paleobiologic events it would be ideal to have multiple dated events in single stratigraphic sections. This condition is rarely realized and the practical alternative is to build composite sections that combine information from many different locations. The compositing process takes advantage of all available evidence of relative age to produce high resolution time lines; i.e. ordered sequences of individual events whose average spacing is much finer than the duration of biostratigraphic zones and can approach the uncertainty intervals of the highest precision radioisotopic dates. Dated events are included in the compositing process from the outset. As a result the sequencing procedure is more efficient and the dated events find their optimal positions in the time line independent of any biostratigraphic zonal schemes. The sequencing procedures follow simple logical rules that may be learned from tiny data sets. When usefully large numbers of events are involved, however, the sequencing must be undertaken by computer and there is seldom a unique solution that best fits the field data. The range of positions in sequence that an event may occupy across the full set of equally best-fit solutions is a measure of the resolving power of the event. As new high-precision dates and detailed range charts continue to become available, the quality of the time lines will improve and they will become increasingly viable alternatives to zonal time scales in the older parts of the Phanerozoic.

Download Full-text

Trust in the Police: Cross-country Comparisons

Voprosy Ekonomiki ◽

10.32609/0042-8736-2012-11-24-47 ◽

2012 ◽

pp. 24-47

Author(s):

V. Gimpelson ◽

G. Monusova

Keyword(s):

Public Opinion ◽

Public Attitudes ◽

Crime Rates ◽

Authoritarian Regimes ◽

Data Sets ◽

The Public ◽

Positive Attitudes ◽

Cross Country ◽

Police Activity

Using different cross-country data sets and simple econometric techniques we study public attitudes towards the police. More positive attitudes are more likely to emerge in the countries that have better functioning democratic institutions, less prone to corruption but enjoy more transparent and accountable police activity. This has a stronger impact on the public opinion (trust and attitudes) than objective crime rates or density of policemen. Citizens tend to trust more in those (policemen) with whom they share common values and can have some control over. The latter is a function of democracy. In authoritarian countries — “police states” — this tendency may not work directly. When we move from semi-authoritarian countries to openly authoritarian ones the trust in the police measured by surveys can also rise. As a result, the trust appears to be U-shaped along the quality of government axis. This phenomenon can be explained with two simple facts. First, publicly spread information concerning police activity in authoritarian countries is strongly controlled; second, the police itself is better controlled by authoritarian regimes which are afraid of dangerous (for them) erosion of this institution.

Download Full-text

Plasma Cleaning Improves the Image Quality of Serial Block-face Scanning Electron Microscopy (SBFSEM) Volumetric Data Sets

Microscopy and Microanalysis ◽

10.1017/s1431927617006997 ◽

2017 ◽

Vol 23 (S1) ◽

pp. 1266-1267 ◽

Cited By ~ 1

Author(s):

Barbara Armbruster ◽

Christopher Booth ◽

Stuart Searle ◽

Michael Cable ◽

Ronald Vane

Keyword(s):

Electron Microscopy ◽

Scanning Electron Microscopy ◽

Image Quality ◽

Data Sets ◽

Plasma Cleaning ◽

Volumetric Data ◽

Face Scanning ◽

Block Face ◽

Scanning Electron

Download Full-text

Improvements for research data repositories: The case of text spam

Journal of Information Science ◽

10.1177/0165551521998636 ◽

2021 ◽

pp. 016555152199863

Author(s):

Ismael Vázquez ◽

María Novo-Lourés ◽

Reyes Pavón ◽

Rosalía Laza ◽

José Ramón Méndez ◽

...

Keyword(s):

Web Application ◽

Research Data ◽

Data Sets ◽

Data Repositories ◽

Software Applications ◽

Public Data ◽

Protection Mechanisms ◽

Experimental Protocols ◽

Learning Research ◽

Processing Steps

Current research has evolved in such a way scientists must not only adequately describe the algorithms they introduce and the results of their application, but also ensure the possibility of reproducing the results and comparing them with those obtained through other approximations. In this context, public data sets (sometimes shared through repositories) are one of the most important elements for the development of experimental protocols and test benches. This study has analysed a significant number of CS/ML ( Computer Science/ Machine Learning) research data repositories and data sets and detected some limitations that hamper their utility. Particularly, we identify and discuss the following demanding functionalities for repositories: (1) building customised data sets for specific research tasks, (2) facilitating the comparison of different techniques using dissimilar pre-processing methods, (3) ensuring the availability of software applications to reproduce the pre-processing steps without using the repository functionalities and (4) providing protection mechanisms for licencing issues and user rights. To show the introduced functionality, we created STRep (Spam Text Repository) web application which implements our recommendations adapted to the field of spam text repositories. In addition, we launched an instance of STRep in the URL https://rdata.4spam.group to facilitate understanding of this study.

Download Full-text

Integration of Satellite Data with High Resolution Ratio: Improvement of Spectral Quality with Preserving Spatial Details

Sensors ◽

10.3390/s18124418 ◽

2018 ◽

Vol 18 (12) ◽

pp. 4418 ◽

Cited By ~ 4

Author(s):

Aleksandra Sekrecka ◽

Michal Kedzierski

Keyword(s):

High Ratio ◽

Data Sets ◽

Spectral Quality ◽

Local Statistics ◽

Panchromatic Image ◽

Urbanized Area ◽

Fusion Methods ◽

Spatial Quality ◽

Different Levels

Commonly used image fusion techniques generally produce good results for images obtained from the same sensor, with a standard ratio of spatial resolution (1:4). However, an atypical high ratio of resolution reduces the effectiveness of fusion methods resulting in a decrease in the spectral or spatial quality of the sharpened image. An important issue is the development of a method that allows for maintaining simultaneous high spatial and spectral quality. The authors propose to strengthen the pan-sharpening methods through prior modification of the panchromatic image. Local statistics of the differences between the original panchromatic image and the intensity of the multispectral image are used to detect spatial details. The Euler’s number and the distance of each pixel from the nearest pixel classified as a spatial detail determine the weight of the information collected from each integrated image. The research was carried out for several pan-sharpening methods and for data sets with different levels of spectral matching. The proposed solution allows for a greater improvement in the quality of spectral fusion, while being able to identify the same spatial details for most pan-sharpening methods and is mainly dedicated to Intensity-Hue-Saturation based methods for which the following improvements in spectral quality were achieved: about 30% for the urbanized area and about 15% for the non-urbanized area.

Download Full-text

AN INTELLIGENT TECHNOLOGY BASED METHOD FOR INTERPRETING SENSORY EVALUATION DATA PROVIDED BY MULTIPLE PANELS

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s021848850800556x ◽

2008 ◽

Vol 16 (05) ◽

pp. 683-698

Author(s):

BIN ZHOU ◽

XIANYI ZENG ◽

LUDOVIC KOEHL ◽

YONGSHENG DING

Keyword(s):

Fuzzy Numbers ◽

Data Sets ◽

Sensory Data ◽

Fabric Hand ◽

Different Populations ◽

Intelligent Technology ◽

Fuzzy Distances ◽

Data Variation ◽

Fabric Hand Evaluation

This paper presents an intelligent technology based method for analyzing and interpreting sensory data provided by multiple panels in evaluation of industrial products. In order to process the uncertainty existing in these sensory data, we first transform all sensory data on an unified optimal scale. Based on these normalized data sets, we compute the dissimilarities or distances between different panels and between different evaluation terms used by them, defined according to the degree of consistency of data variation. The obtained distances are then transformed into fuzzy numbers for physical interpretation. These fuzzy distances permit to characterize the evaluation behaviour of each panel and the quality of the evaluation terms used. Also, based on a Genetic Algorithm with punishment policy and the dissimilarity between terms, we develop a procedure for interpreting terms of one panel using those of another panel. This method has been applied to the fabric hand evaluation for a number of samples of knitted cotton in order to identify consumers' preference of different populations.

Download Full-text

A MODEL BASED ON FUZZY LINGUISTIC INFORMATION TO EVALUATE THE QUALITY OF DIGITAL LIBRARIES

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622010003907 ◽

2010 ◽

Vol 09 (03) ◽

pp. 455-472 ◽

Cited By ~ 31

Author(s):

F. J. CABRERIZO ◽

J. LÓPEZ-GIJÓN ◽

A. A. RUÍZ ◽

E. HERRERA-VIEDMA

Keyword(s):

Digital Libraries ◽

User Satisfaction ◽

Information Access ◽

Great Influence ◽

Linguistic Information ◽

Model Based ◽

Global Quality ◽

Fuzzy Linguistic ◽

The Web

The Web is changing the information access processes and it is one of the most important information media. Thus, the developments on the Web are having a great influence over the developments on others information access instruments as digital libraries. As the development of digital libraries is to satisfy user need, user satisfaction is essential for the success of a digital library. The aim of this paper is to present a model based on fuzzy linguistic information to evaluate the quality of digital libraries. The quality evaluation of digital libraries is defined using users' perceptions on the quality of digital services provided through their Websites. We assume a fuzzy linguistic modeling to represent the users' perception and apply automatic tools of fuzzy computing with words based on the LOWA and LWA operators to compute global quality evaluations of digital libraries. Additionally, we show an example of application of this model where three Spanish academic digital libraries are evaluated by fifty users.

Download Full-text