Benchmarking Text Understanding Systems to Human Performance: An Exploration.

1990 ◽  
Author(s):  
Frances A. Butler ◽  
Eva L. Baker ◽  
Tine Falk ◽  
Howard Herl ◽  
Younghee Jang ◽  
...  
2020 ◽  
Vol 34 (05) ◽  
pp. 7554-7561
Author(s):  
Pengxiang Cheng ◽  
Katrin Erk

Recent progress in NLP witnessed the development of large-scale pre-trained language models (GPT, BERT, XLNet, etc.) based on Transformer (Vaswani et al. 2017), and in a range of end tasks, such models have achieved state-of-the-art results, approaching human performance. This clearly demonstrates the power of the stacked self-attention architecture when paired with a sufficient number of layers and a large amount of pre-training data. However, on tasks that require complex and long-distance reasoning where surface-level cues are not enough, there is still a large gap between the pre-trained models and human performance. Strubell et al. (2018) recently showed that it is possible to inject knowledge of syntactic structure into a model through supervised self-attention. We conjecture that a similar injection of semantic knowledge, in particular, coreference information, into an existing model would improve performance on such complex problems. On the LAMBADA (Paperno et al. 2016) task, we show that a model trained from scratch with coreference as auxiliary supervision for self-attention outperforms the largest GPT-2 model, setting the new state-of-the-art, while only containing a tiny fraction of parameters compared to GPT-2. We also conduct a thorough analysis of different variants of model architectures and supervision configurations, suggesting future directions on applying similar techniques to other problems.


Author(s):  
Johannes Welbl ◽  
Pontus Stenetorp ◽  
Sebastian Riedel

Most Reading Comprehension methods limit themselves to queries which can be answered using a single sentence, paragraph, or document. Enabling models to combine disjoint pieces of textual evidence would extend the scope of machine comprehension methods, but currently no resources exist to train and test this capability. We propose a novel task to encourage the development of models for text understanding across multiple documents and to investigate the limits of existing methods. In our task, a model learns to seek and combine evidence — effectively performing multihop, alias multi-step, inference. We devise a methodology to produce datasets for this task, given a collection of query-answer pairs and thematically linked documents. Two datasets from different domains are induced, and we identify potential pitfalls and devise circumvention strategies. We evaluate two previously proposed competitive models and find that one can integrate information across documents. However, both models struggle to select relevant information; and providing documents guaranteed to be relevant greatly improves their performance. While the models outperform several strong baselines, their best accuracy reaches 54.5% on an annotated test set, compared to human performance at 85.0%, leaving ample room for improvement.


2008 ◽  
Vol 44 ◽  
pp. 11-26 ◽  
Author(s):  
Ralph Beneke ◽  
Dieter Böning

Human performance, defined by mechanical resistance and distance per time, includes human, task and environmental factors, all interrelated. It requires metabolic energy provided by anaerobic and aerobic metabolic energy sources. These sources have specific limitations in the capacity and rate to provide re-phosphorylation energy, which determines individual ratios of aerobic and anaerobic metabolic power and their sustainability. In healthy athletes, limits to provide and utilize metabolic energy are multifactorial, carefully matched and include a safety margin imposed in order to protect the integrity of the human organism under maximal effort. Perception of afferent input associated with effort leads to conscious or unconscious decisions to modulate or terminate performance; however, the underlying mechanisms of cerebral control are not fully understood. The idea to move borders of performance with the help of biochemicals is two millennia old. Biochemical findings resulted in highly effective substances widely used to increase performance in daily life, during preparation for sport events and during competition, but many of them must be considered as doping and therefore illegal. Supplements and food have ergogenic potential; however, numerous concepts are controversially discussed with respect to legality and particularly evidence in terms of usefulness and risks. The effect of evidence-based nutritional strategies on adaptations in terms of gene and protein expression that occur in skeletal muscle during and after exercise training sessions is widely unknown. Biochemical research is essential for better understanding of the basic mechanisms causing fatigue and the regulation of the dynamic adaptation to physical and mental training.


2004 ◽  
Vol 171 (4S) ◽  
pp. 496-497
Author(s):  
Edward D. Matsumoto ◽  
George V. Kondraske ◽  
Lucas Jacomides ◽  
Kenneth Ogan ◽  
Margaret S. Pearle ◽  
...  

2015 ◽  
Vol 31 (1) ◽  
pp. 20-30 ◽  
Author(s):  
William S. Helton ◽  
Katharina Näswall

Conscious appraisals of stress, or stress states, are an important aspect of human performance. This article presents evidence supporting the validity and measurement characteristics of a short multidimensional self-report measure of stress state, the Short Stress State Questionnaire (SSSQ; Helton, 2004 ). The SSSQ measures task engagement, distress, and worry. A confirmatory factor analysis of the SSSQ using data pooled from multiple samples suggests the SSSQ does have a three factor structure and post-task changes are not due to changes in factor structure, but to mean level changes (state changes). In addition, the SSSQ demonstrates sensitivity to task stressors in line with hypotheses. Different task conditions elicited unique patterns of stress state on the three factors of the SSSQ in line with prior predictions. The 24-item SSSQ is a valid measure of stress state which may be useful to researchers interested in conscious appraisals of task-related stress.


Sign in / Sign up

Export Citation Format

Share Document