Lexibank: A public repository of standardized wordlists with computed phonological and lexical features

Author(s):  
Johann-Mattis List ◽  
Robert Forkel ◽  
Simon J. Greenhill ◽  
Christoph Rzymski ◽  
Johannes Englisch ◽  
...  

Abstract The past decades have seen substantial growth in digital data on the world's languages. At the same time, the demand for cross-linguistic datasets has been increasing, as witnessed by numerous studies devoted to diverse questions on human prehistory, cultural evolution, and human cognition. Unfortunately, the majority of published datasets lack standardization which makes their comparison difficult. Here, we present the first step to increase the comparability of cross-linguistic lexical data. We have designed workflows for the computer-assisted lifting of datasets to Cross-Linguistic Data Formats, a collection of standards that increase the FAIRness of linguistic data. We test the Lexibank workflow on a collection of 100 lexical datasets from which we derive an aggregated database of wordlists in unified phonetic transcriptions covering more than 2000 language varieties. We illustrate the benefits of our approach by showing how phonological and lexical features can be automatically inferred, complementing and expanding existing cross-linguistic datasets.

2018 ◽  
Vol 22 (2) ◽  
pp. 277-306 ◽  
Author(s):  
Johann-Mattis List ◽  
Simon J. Greenhill ◽  
Cormac Anderson ◽  
Thomas Mayer ◽  
Tiago Tresoldi ◽  
...  

Abstract The Database of Cross-Linguistic Colexifications (CLICS), has established a computer-assisted framework for the interactive representation of cross-linguistic colexification patterns. In its current form, it has proven to be a useful tool for various kinds of investigation into cross-linguistic semantic associations, ranging from studies on semantic change, patterns of conceptualization, and linguistic paleontology. But CLICS has also been criticized for obvious shortcomings, ranging from the underlying dataset, which still contains many errors, up to the limits of cross-linguistic colexification studies in general. Building on recent standardization efforts reflected in the Cross-Linguistic Data Formats initiative (CLDF) and novel approaches for fast, efficient, and reliable data aggregation, we have created a new database for cross-linguistic colexifications, which not only supersedes the original CLICS database in terms of coverage but also offers a much more principled procedure for the creation, curation and aggregation of datasets. The paper presents the new database and discusses its major features.


Author(s):  
Noam Sagiv ◽  
Monika Sobczak-Edmans ◽  
Adrian L. Williams

Defining synaesthesia has proven to be a challenging task as the number of synaesthesia variants and associated phenomena reported by synaesthetes has increased over the past decade or so. This chapter discusses the inclusion of non-sensory concurrents in the category of synaesthesia. For example, many grapheme-colour synaesthetes also attribute gender and personality to letters and numbers consistently and involuntarily. Here we assess the question of including synaesthetic personification as a type of synaesthesia. We also discuss the relationship between synaesthetic personification and other instances of personification and mentalizing. We hope to convince readers that whether or not they embrace atypical forms of personification as a synaesthesia variant, studying the phenomenon is a worthwhile effort that could yield novel insights into human cognition and brain function.


Author(s):  
Stephanie Kirschbaum ◽  
Thilo Kakzhad ◽  
Fabian Granrath ◽  
Andrzej Jasina ◽  
Jakub Oronowicz ◽  
...  

Abstract Purpose This study aimed to evaluate both publication and authorship characteristics in Knee Surgery, Sports Traumatology, Arthroscopy journal (KSSTA) regarding knee arthroplasty over the past 15 years. Methods PubMed was searched for articles published in KSSTA between January 1, 2006, and December 31st, 2020, utilising the search term ‘knee arthroplasty’. 1288 articles met the inclusion criteria. The articles were evaluated using the following criteria: type of article, type of study, main topic and special topic, use of patient-reported outcome scores, number of references and citations, level of evidence (LOE), number of authors, gender of the first author and continent of origin. Three time intervals were compared: 2006–2010, 2011–2015 and 2016–2020. Results Between 2016 and 2020, publications peaked at 670 articles (52%) compared with 465 (36%) published between 2011 and 2016 and 153 articles (12%) between 2006 and 2010. While percentage of reviews (2006–2010: 0% vs. 2011–2015: 5% vs. 2016–2020: 5%) and meta-analyses (1% vs. 6% vs. 5%) increased, fewer case reports were published (13% vs. 3% vs. 1%) (p < 0.001). Interest in navigation and computer-assisted surgery decreased, whereas interest in perioperative management, robotic and individualized surgery increased over time (p < 0.001). There was an increasing number of references [26 (2–73) vs. 30 (2–158) vs. 31 (1–143), p < 0.001] while number of citations decreased [30 (0–188) vs. 22 (0–264) vs. 6 (0–106), p < 0.001]. LOE showed no significant changes (p = 0.439). The number of authors increased between each time interval (p < 0.001), while the percentage of female authors was comparable between first and last interval (p = 0.252). Europe published significantly fewer articles over time (56% vs. 47% vs. 52%), whereas the number of articles from Asia increased (35% vs. 45% vs. 37%, p = 0.005). Conclusion Increasing interest in the field of knee arthroplasty-related surgery arose within the last 15 years in KSSTA. The investigated topics showed a significant trend towards the latest techniques at each time interval. With rising number of authors, the part of female first authors also increased—but not significantly. Furthermore, publishing characteristics showed an increasing number of publications from Asia and a slightly decreasing number in Europe. Level of evidence IV.


2018 ◽  
Vol 18 (1) ◽  
pp. 124-154
Author(s):  
Dimitris Papazachariou ◽  
Anna Fterniati ◽  
Argiris Archakis ◽  
Vasia Tsami

Abstract Over the past decades, contemporary sociolinguistics has challenged the existence of fixed and rigid linguistic boundaries, thus focusing on how the speakers themselves define language varieties and how specific linguistic choices end up being perceived as language varieties. In this light, the present paper explores the influence of metapragmatic stereotypes on elementary school pupils’ attitudes towards geographical varieties. Specifically, we investigate children’s beliefs as to the acceptability of geographical varieties and their perception of the overt and covert prestige of geographical varieties and dialectal speakers. Furthermore, we explore the relationship between the children’s specific beliefs and factors such as gender, the social stratification of the school location and the pupils’ performance in language subjects. The data of the study was collected via questionnaires with closed questions. The research findings indicate that the children of our sample associate geographical varieties with rural settings and informal communicative contexts. Moreover, children recognize a lack of overt prestige in geographical variation; at the same time, they evaluate positively the social attractiveness and the personal reliability of the geographical varieties and their speakers. Our research showed that pupils’ beliefs are in line with the dominant metapragmatic stereotypes which promote language homogeneity.


2021 ◽  
Vol 9 ◽  
Author(s):  
Mario Locati ◽  
Roberto Vallone ◽  
Matteo Ghetta ◽  
Nyall Dawson

An increasing number of web services providing convenient access to seismological data have become available in recent years. A huge effort at multiple levels was required to achieve this goal and the seismological community was engaged in the standardization of both data formats and web services. Although access to seismological data is much easier than in the past, users encounter problems because of the large number of web services, and due to the complexity of the discipline-specific data encodings. In addition, instead of adopting cross-disciplinary standards such as those by the Open Geospatial Consortium (OGC), most seismological web services created their own standards, primarily those by the International Federation of Digital Seismograph Networks (FDSN). This article introduces “QQuake,” a plugin for QGIS—the Open Source Geographic Information System—that aims at making access to seismological data easier. The plugin is based on an Open Source code available on GitHub, and it is designed in a modular and customizable way, allowing users to easily include new web services.


2019 ◽  
Author(s):  
Tobias Heycke ◽  
Lisa Spitzer

Recently in psychological science and many related fields, a surprisingly large amount of experiments could not be replicated by independent researchers. A non-replication could indicate that a previous finding might have been a false positive statistical result and the effect does not exist. However, it could also mean that a specific detail of the experimental procedure is essential for the effect to emerge, which might not have been included in the replication attempt. Therefore any replication attempt that does not replicate the original effect is most informative when the original procedure is closely adhered to. One proposed solution to facilitate the empirical reproducibility of the experimental procedures in psychology is to upload the experimental script and materials to a public repository. However, we believe that merely providing the materials of an experimental procedure is not sufficient, as many software solutions are not freely available, software solutions might change, and it is time consuming to set up the procedure. We argue that there is a simple solution to these problems when an experiment is conducted using computers: recording an example procedure with a screen capture software and providing the video in an online repository. We therefore provide a brief tutorial on screen recordings using an open source screen recording software. With this information, individual researchers should be able to record their experimental procedures and we hope to facilitate the use of screen recordings in computer assisted data collection procedures.


2010 ◽  
Vol 6 (4) ◽  
pp. 20-37 ◽  
Author(s):  
Rubén A. Mendoza ◽  
T. Ravichandran

XML-based vertical standards are an emerging compatibility standard for describing business processes and data formats in specific industries that have emerged in the past decade. Vertical standards, typically implemented using eXtensible Markup Language (XML), are incomplete products in constant evolution, continually adding functionality to reflect changing business needs. Vertical standards are public goods because they are freely obtained from sponsoring organizations without investing resources in their development, which gives rise to linked collective action dilemmas at the development and diffusion stages. Firms must be persuaded to invest in development without being able to profit from the output, and a commitment to ensure the diffusion of the standard must be secured from enough potential adopters to guarantee success. In this paper, the authors explore organizational drivers for participation in vertical standards development activities for supply- and demand-side organizations (i.e., vendors and end-user firms) in light of the restrictions imposed by these dilemmas.


BMJ Open ◽  
2019 ◽  
Vol 9 (11) ◽  
pp. e033237 ◽  
Author(s):  
Owen Taylor ◽  
Sandrine Loubiere ◽  
Aurelie Tinland ◽  
Maria Vargas-Moniz ◽  
Freek Spinnewijn ◽  
...  

ObjectivesTo examine the lifetime, 5-year and past-year prevalence of homelessness among European citizens in eight European nations.DesignA nationally representative telephone survey using trained bilingual interviewers and computer-assisted telephone interview software.SettingThe study was conducted in France, Ireland, Italy, the Netherlands, Poland, Portugal, Spain and Sweden.ParticipantsEuropean adult citizens, selected from opt-in panels from March to December 2017. Total desired sample size was 5600, with 700 per country. Expected response rates of approximately 30% led to initial sample sizes of 2500 per country.Main outcome measuresHistory of homelessness was assessed for lifetime, past 5 years and past year. Sociodemographic data were collected to assess correlates of homelessness prevalence using generalised linear models for clustered and weighted samples.ResultsResponse rates ranged from 30.4% to 33.5% (n=5631). Homelessness prevalence was 4.96% for lifetime (95% CI 4.39% to 5.59%), 1.92% in the past 5 years (95% CI 1.57% to 2.33%) and 0.71% for the past year (95% CI 0.51% to 0.98%) and varied significantly between countries (pairwise comparison difference test, p<0.0001). Time spent homeless ranged between less than a week (21%) and more than a year (18%), with high contrasts between countries (p<0.0001). Male gender, age 45–54, lower secondary education, single status, unemployment and an urban environment were all independently strongly associated with lifetime homelessness (all OR >1.5).ConclusionsThe prevalence of homelessness among the surveyed nations is significantly higher than might be expected from point-in-time and homeless service use statistics. There was substantial variation in estimated prevalence across the eight nations. Coupled with the well-established health impacts of homelessness, medical professionals need to be aware of the increased health risks of those with experience of homelessness. These findings support policies aiming to improve health services for people exposed to homelessness.


1971 ◽  
Vol 11 ◽  
pp. 185-187
Author(s):  
J. B. Hutchings

In the past several years many, if not most, observatory microphotometers have been made or adapted to give digital output. At the same time, astronomers have been faced with the task of specifying exactly all the folk lore and experience necessary to reduce microphotometer output, so that the task can be performed by a computer. In addition to the speed and accuracy gains therefore, we have also been obliged to separate the subjective from the objective in this processing, and this may be the most important improvement of all.Knowing that many other observatories have developed similar programming systems, this paper presumably does not contain much that is startlingly new, but it is hoped that it may promote the exchange of ideas desirable to overcome common difficulties and avoid duplication (or multiplication) of effort.


2016 ◽  
Vol 9 (9) ◽  
pp. 139 ◽  
Author(s):  
Katsunori Kotani ◽  
Takehiko Yoshimi ◽  
Hiroaki Nanjo ◽  
Hitoshi Isahara

<p>In order to develop effective teaching methods and computer-assisted language teaching systems for learners of English as a foreign language who need to study the basic linguistic competences for writing, pronunciation, reading, and listening, it is necessary to first investigate which vocabulary and grammar they have or have not yet learned. Identifying such vocabulary and grammar requires a learner corpus for analyzing the accuracy and fluency of learners’ linguistic competences. However, it is difficult to use previous learner corpora for this purpose because they have not compiled all the types of linguistic data that we need. Therefore, this study aimed to solve this problem by designing and developing a new learner corpus that compiles linguistic data regarding the accuracy and fluency of the four basic linguistic competences of writing, pronunciation, reading, and listening. The reliability and validity of the learner corpus were partially confirmed, and practical application of the learner corpus is reported here as case studies.</p>


Sign in / Sign up

Export Citation Format

Share Document