Morpheme Ordering Across Languages Reflects Optimization for Processing Efficiency

Abstract The ordering of morphemes in a word displays well-documented regularities across languages. Previous work has explained these in terms of notions such as semantic scope, relevance, and productivity. Here, we test a recently formulated processing theory of the ordering of linguistic units, the efficient tradeoff hypothesis (Hahn et al., 2021). The claim of the theory is that morpheme ordering can partly be explained by the optimization of a tradeoff between memory and surprisal. This claim has received initial empirical support from two languages. In this work, we test this idea more extensively using data from four additional agglutinative languages with significant amounts of morphology, and by considering nouns in addition to verbs. We find that the efficient tradeoff hypothesis predicts ordering in most cases with high accuracy, and accounts for cross-linguistic regularities in noun and verb inflection. Our work adds to a growing body of work suggesting that many ordering properties of language arise from a pressure for efficient language processing.

Download Full-text

Is the bottom line reached? An exploration of supervisor bottom-line mentality, team performance avoidance goal orientation and team performance

Human Relations ◽

10.1177/00187267211002917 ◽

2021 ◽

pp. 001872672110029

Author(s):

Yuying Lin ◽

Mengxi Yang ◽

Matthew J Quade ◽

Wansi Chen

Keyword(s):

Goal Orientation ◽

Team Performance ◽

Social Information Processing ◽

Negative Relationship ◽

Bottom Line ◽

Team Members ◽

Processing Theory ◽

Using Data ◽

Exclusive Focus ◽

Avoidance Goal

How do supervisors who treat the bottom line as more important than anything else influence team success? Drawing from social information processing theory, we explore how and when supervisor bottom-line mentality (i.e. an exclusive focus on bottom-line outcomes at the expense of other priorities) exerts influence on the bottom-line itself, in the form of team performance. We argue that a supervisor’s bottom-line mentality provides significant social cues for the team that securing bottom-line objectives is of sole importance, which stimulates team performance avoidance goal orientation, and thus decreases team performance. Further, we argue performing tension (i.e. tension between contradictory needs, demands, and goals), serving as team members’ mutual perception of the confusing environment, will strengthen the indirect negative relationship between supervisor bottom-line mentality and team performance through team performance avoidance goal orientation. We conduct a path analysis using data from 258 teams in a Chinese food-chain company, which provides support for our hypotheses. Overall, our findings suggest that supervisor’s exclusive focus on the bottom-line can serve to impede team performance. Theoretical contributions and practical implications are discussed.

Download Full-text

Deep learning for cephalometric landmark detection: systematic review and meta-analysis

Clinical Oral Investigations ◽

10.1007/s00784-021-03990-w ◽

2021 ◽

Author(s):

Falk Schwendicke ◽

Akhilanand Chaurasia ◽

Lubaina Arsiwala ◽

Jae-Hong Lee ◽

Karim Elhennawy ◽

...

Keyword(s):

Systematic Review ◽

Deep Learning ◽

Meta Analysis ◽

High Accuracy ◽

Risk Of Bias ◽

Automated Detection ◽

Reference Test ◽

Landmark Detection ◽

Future Studies ◽

Using Data

Abstract Objectives Deep learning (DL) has been increasingly employed for automated landmark detection, e.g., for cephalometric purposes. We performed a systematic review and meta-analysis to assess the accuracy and underlying evidence for DL for cephalometric landmark detection on 2-D and 3-D radiographs. Methods Diagnostic accuracy studies published in 2015-2020 in Medline/Embase/IEEE/arXiv and employing DL for cephalometric landmark detection were identified and extracted by two independent reviewers. Random-effects meta-analysis, subgroup, and meta-regression were performed, and study quality was assessed using QUADAS-2. The review was registered (PROSPERO no. 227498). Data From 321 identified records, 19 studies (published 2017–2020), all employing convolutional neural networks, mainly on 2-D lateral radiographs (n=15), using data from publicly available datasets (n=12) and testing the detection of a mean of 30 (SD: 25; range.: 7–93) landmarks, were included. The reference test was established by two experts (n=11), 1 expert (n=4), 3 experts (n=3), and a set of annotators (n=1). Risk of bias was high, and applicability concerns were detected for most studies, mainly regarding the data selection and reference test conduct. Landmark prediction error centered around a 2-mm error threshold (mean; 95% confidence interval: (–0.581; 95 CI: –1.264 to 0.102 mm)). The proportion of landmarks detected within this 2-mm threshold was 0.799 (0.770 to 0.824). Conclusions DL shows relatively high accuracy for detecting landmarks on cephalometric imagery. The overall body of evidence is consistent but suffers from high risk of bias. Demonstrating robustness and generalizability of DL for landmark detection is needed. Clinical significance Existing DL models show consistent and largely high accuracy for automated detection of cephalometric landmarks. The majority of studies so far focused on 2-D imagery; data on 3-D imagery are sparse, but promising. Future studies should focus on demonstrating generalizability, robustness, and clinical usefulness of DL for this objective.

Download Full-text

Interpersonal Meaning Analysis of Foreign Literature Communication and Modality Based on Language Processing Theory under Computer Network Technology Environment

2021 2nd International Conference on Computers, Information Processing and Advanced Education ◽

10.1145/3456887.3459723 ◽

2021 ◽

Author(s):

Yixuan Du

Keyword(s):

Language Processing ◽

Computer Network ◽

Network Technology ◽

Foreign Literature ◽

Processing Theory ◽

Meaning Analysis

Download Full-text

The Attention Economy and Esports: An Econometric Analysis of Twitch Viewership

Journal of Sport Management ◽

10.1123/jsm.2020-0383 ◽

2021 ◽

pp. 1-14

Author(s):

Nicholas M. Watanabe ◽

Hanhan Xue ◽

Joshua I. Newman ◽

Grace Yan

Keyword(s):

Theoretical Approach ◽

Econometric Analysis ◽

Scarce Resource ◽

Structural Factors ◽

Sport Organizations ◽

Economic Framework ◽

Attention Economy ◽

Growing Body ◽

And Behaviors ◽

Using Data

With the expansion of the esports industry, there is a growing body of literature examining the motivations and behaviors of consumers and participants. The current study advances this line of research by considering esports consumption through an economic framework, which has been underutilized in this context. Specifically, the “attention economy” is introduced as a theoretical approach—which operates with the understanding that due to increased connectivity and availability of information, it is the attention of consumers that becomes a scarce resource for which organizations must compete. Using data from the Twitch streaming platform, the results of econometric analysis further highlight the importance of structural factors in drawing attention from online viewers. As such, this research advances the theoretical and empirical understanding of online viewership behaviors, while also providing important ramifications for both esports and traditional sport organizations attempting to capture the attention of users in the digital realm.

Download Full-text

Psycholinguistics

10.1093/oxfordhb/9780198736745.013.17 ◽

2018 ◽

Author(s):

Pouneh Shabani-Jadidi

Keyword(s):

Language Processing ◽

State Of The Art ◽

Mental Lexicon ◽

Linguistic Analysis ◽

Language Impairments ◽

Growing Body ◽

Research Tools

Psycholinguistics encompasses the psychology of language as well as linguistic psychology. Although they might sound similar, they are actually distinct. The first is a branch of linguistics, while the latter is a subdivision of psychology. In the psychology of language, the means are the research tools adopted from psychology and the end is the study of language. However, in linguistic psychology, the means are the data derived from linguistic studies and the end is psychology. This chapter focuses on the first of these two components; that is, the psychology of language. The goal of this chapter is to give a state-of-the-art perspective on the small but growing body of research using psycholinguistic tools to study Persian with a focus on two areas: presenting longstanding debates about the mental lexicon, language impairments and language processing; and introducing a source of data for the linguistic analysis of Persian.

Download Full-text

Learning from Disagreement: A Survey

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12752 ◽

2021 ◽

Vol 72 ◽

pp. 1385-1470

Author(s):

Alexandra N. Uma ◽

Tommaso Fornaciari ◽

Dirk Hovy ◽

Silviu Paun ◽

Barbara Plank ◽

...

Keyword(s):

Language Processing ◽

Gold Standard ◽

Training Methods ◽

High Quality ◽

Training Models ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Growing Body ◽

Research Questions ◽

Speech Tagging

Many tasks in Natural Language Processing (NLP) and Computer Vision (CV) offer evidence that humans disagree, from objective tasks such as part-of-speech tagging to more subjective tasks such as classifying an image or deciding whether a proposition follows from certain premises. While most learning in artificial intelligence (AI) still relies on the assumption that a single (gold) interpretation exists for each item, a growing body of research aims to develop learning methods that do not rely on this assumption. In this survey, we review the evidence for disagreements on NLP and CV tasks, focusing on tasks for which substantial datasets containing this information have been created. We discuss the most popular approaches to training models from datasets containing multiple judgments potentially in disagreement. We systematically compare these different approaches by training them with each of the available datasets, considering several ways to evaluate the resulting models. Finally, we discuss the results in depth, focusing on four key research questions, and assess how the type of evaluation and the characteristics of a dataset determine the answers to these questions. Our results suggest, first of all, that even if we abandon the assumption of a gold standard, it is still essential to reach a consensus on how to evaluate models. This is because the relative performance of the various training methods is critically affected by the chosen form of evaluation. Secondly, we observed a strong dataset effect. With substantial datasets, providing many judgments by high-quality coders for each item, training directly with soft labels achieved better results than training from aggregated or even gold labels. This result holds for both hard and soft evaluation. But when the above conditions do not hold, leveraging both gold and soft labels generally achieved the best results in the hard evaluation. All datasets and models employed in this paper are freely available as supplementary materials.

Download Full-text

Uncovering interpretable potential confounders in electronic medical records

10.1101/2021.02.03.21251034 ◽

2021 ◽

Author(s):

Jiaming Zeng ◽

Michael F. Gensheimer ◽

Daniel L. Rubin ◽

Susan Athey ◽

Ross D. Shachter

Keyword(s):

Electronic Medical Records ◽

Selection Bias ◽

Language Processing ◽

Medical Records ◽

Randomized Clinical Trials ◽

Research Database ◽

High Stake ◽

Medical Decisions ◽

Clinical Text ◽

Using Data

AbstractIn medicine, randomized clinical trials (RCT) are the gold standard for informing treatment decisions. Observational comparative effectiveness research (CER) is often plagued by selection bias, and expert-selected covariates may not be sufficient to adjust for confounding. We explore how the unstructured clinical text in electronic medical records (EMR) can be used to reduce selection bias and improve medical practice. We develop a method based on natural language processing to uncover interpretable potential confounders from the clinical text. We validate our method by comparing the hazard ratio (HR) from survival analysis with and without the confounders against the results from established RCTs. We apply our method to four study cohorts built from localized prostate and lung cancer datasets from the Stanford Cancer Institute Research Database and show that our method adjusts the HR estimate towards the RCT results. We further confirm that the uncovered terms can be interpreted by an oncologist as potential confounders. This research helps enable more credible causal inference using data from EMRs, offers a transparent way to improve the design of observational CER, and could inform high-stake medical decisions. Our method can also be applied to studies within and beyond medicine to extract important information from observational data to support decisions.

Download Full-text

Embeddings in Natural Language Processing. Theory and Advances in Vector Representations of Meaning

Computational Linguistics ◽

10.1162/coli_r_00410 ◽

2021 ◽

pp. 1-4

Author(s):

Marcos Garcia

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Processing Theory ◽

Vector Representations

Download Full-text

The CoRisk-Index: Measuring economic risks related to COVID-19 in real time

10.21203/rs.3.rs-81992/v1 ◽

2021 ◽

Author(s):

Fabian Braesemann ◽

Fabian Stephany ◽

Leonie Neuhäuser ◽

Niklas Stoehr ◽

Philipp Darius ◽

...

Keyword(s):

Stock Market ◽

Real Time ◽

Language Processing ◽

Economic Indicator ◽

Economic Consequences ◽

Economic Downturn ◽

Related Risk ◽

Unemployment Data ◽

Using Data ◽

Processing Techniques

Abstract The global spread of Covid-19 has caused major economic disruptions. Governments around the world provide considerable financial support to mitigate the economic downturn. However, effective policy responses require reliable data on the economic consequences of the corona pandemic. We propose the CoRisk-Index: a real-time economic indicator of Covid-19 related risk assessments by industry. Using data mining, we analyse all reports from US companies filed since January 2020, representing more than a third of all US employees. We construct two measures - the number of 'corona' words in each report and the average text negativity of the sentences mentioning corona in each industry - that are aggregated in the CoRisk-Index. The index correlates with U.S. unemployment data and preempts stock market losses of February 2020. Moreover, thanks to topic modelling and natural language processing techniques, the CoRisk data provides unique granularity with regards to the particular contexts of the crisis and the concerns of individual industries about them. The data presented here help researchers and decision makers to measure, the previously unobserved, risk awareness of industries with regard to Covid-19, bridging the quantification gap between highly volatile stock market dynamics and long-term macro-economic figures. For immediate access to the data, we provide all findings and raw data on an interactive online dashboard in real time.

Download Full-text

Developing an Analytical Solution that Mimics Simulation Modeling for Construction Planning: Earthwork Case

Modular and Offsite Construction (MOC) Summit Proceedings ◽

10.29173/mocs138 ◽

2019 ◽

pp. 552-562

Author(s):

Waleed Shakeel ◽

Ming Lu

Keyword(s):

Analytical Solution ◽

Dynamic System ◽

Simulation Modeling ◽

High Accuracy ◽

Analytical Tool ◽

Controlled Experiments ◽

Cost Estimating ◽

Construction Planning ◽

Using Data

Deriving a reliable earthwork job cost estimate entails analysis of the interaction of numerous variables defined in a highly complex and dynamic system. Using simulation to plan earthwork haul jobs delivers high accuracy in cost estimating. However, given practical limitations of time and expertise, simulation remains prohibitively expensive and rarely applied in the construction field. The development of a pragmatic tool for field applications that would mimic simulation-derived results while consuming less time was thus warranted. In this research, a spreadsheet based analytical tool was developed using data from industry benchmark databases (such as CAT Handbook and RSMeans). Based on a case study, the proposed methodology outperformed commonly used estimating methods and compared closely to the results obtained from simulation in controlled experiments.

Download Full-text