A Data-Driven Metric of Hardness for WSC Sentences

Mapping Intimacies ◽

10.29007/398z ◽

2018 ◽

Author(s):

Nicos Isaak ◽

Loizos Michael

Keyword(s):

Large Scale ◽

Native Speakers ◽

Turing Test ◽

Data Driven ◽

Training Material ◽

Commonsense Knowledge ◽

Shallow Parsing ◽

Scale Experiment ◽

Large Corpus

The Winograd Schema Challenge (WSC) — the task of resolving pronouns in certain sentences where shallow parsing techniques seem not to be directly applicable — has been proposed as an alternative to the Turing Test. According to Levesque, having access to a large corpus of text would likely not help much in the WSC. Among a number of attempts to tackle this challenge, one particular approach has demonstrated the plausibility of using commonsense knowledge automatically acquired from raw text in English Wikipedia.Here, we present the results of a large-scale experiment that shows how the performance of that particular automated approach varies with the availability of training material. We compare the results of this experiment with two studies: one from the literature that investigates how adult native speakers tackle the WSC, and one that we design and undertake to investigate how teenager non-native speakers tackle the WSC. We find that the performance of the automated approach correlates positively with the performance of humans, suggesting that the performance of the particular automated approach could be used as a metric of hardness for WSC instances.

Download Full-text

Accelerating In-Transit Co-Processing for Scientific Simulations Using Region-Based Data-Driven Analysis

Algorithms ◽

10.3390/a14050154 ◽

2021 ◽

Vol 14 (5) ◽

pp. 154

Author(s):

Marcus Walldén ◽

Masao Okita ◽

Fumihiko Ino ◽

Dimitris Drikakis ◽

Ioannis Kokkinakis

Keyword(s):

Large Scale ◽

Data Driven ◽

Data Sets ◽

Output Constraints ◽

Data Driven Approach ◽

Scientific Simulations ◽

Multiple Metrics ◽

In Transit ◽

Multiple Compression ◽

Large Scale Simulations

Increasing processing capabilities and input/output constraints of supercomputers have increased the use of co-processing approaches, i.e., visualizing and analyzing data sets of simulations on the fly. We present a method that evaluates the importance of different regions of simulation data and a data-driven approach that uses the proposed method to accelerate in-transit co-processing of large-scale simulations. We use the importance metrics to simultaneously employ multiple compression methods on different data regions to accelerate the in-transit co-processing. Our approach strives to adaptively compress data on the fly and uses load balancing to counteract memory imbalances. We demonstrate the method’s efficiency through a fluid mechanics application, a Richtmyer–Meshkov instability simulation, showing how to accelerate the in-transit co-processing of simulations. The results show that the proposed method expeditiously can identify regions of interest, even when using multiple metrics. Our approach achieved a speedup of 1.29× in a lossless scenario. The data decompression time was sped up by 2× compared to using a single compression method uniformly.

Download Full-text

Automated Data-Driven Generation of Personalized Pedagogical Interventions in Intelligent Tutoring Systems

International Journal of Artificial Intelligence in Education ◽

10.1007/s40593-021-00267-x ◽

2021 ◽

Author(s):

Ekaterina Kochmar ◽

Dung Do Vu ◽

Robert Belfer ◽

Varun Gupta ◽

Iulian Vlad Serban ◽

...

Keyword(s):

Machine Learning ◽

Student Performance ◽

Language Processing ◽

Intelligent Tutoring Systems ◽

Large Scale ◽

Intelligent Tutoring ◽

Performance Outcomes ◽

Data Driven ◽

Personalized Feedback ◽

Tutoring Systems

AbstractIntelligent tutoring systems (ITS) have been shown to be highly effective at promoting learning as compared to other computer-based instructional approaches. However, many ITS rely heavily on expert design and hand-crafted rules. This makes them difficult to build and transfer across domains and limits their potential efficacy. In this paper, we investigate how feedback in a large-scale ITS can be automatically generated in a data-driven way, and more specifically how personalization of feedback can lead to improvements in student performance outcomes. First, in this paper we propose a machine learning approach to generate personalized feedback in an automated way, which takes individual needs of students into account, while alleviating the need of expert intervention and design of hand-crafted rules. We leverage state-of-the-art machine learning and natural language processing techniques to provide students with personalized feedback using hints and Wikipedia-based explanations. Second, we demonstrate that personalized feedback leads to improved success rates at solving exercises in practice: our personalized feedback model is used in , a large-scale dialogue-based ITS with around 20,000 students launched in 2019. We present the results of experiments with students and show that the automated, data-driven, personalized feedback leads to a significant overall improvement of 22.95% in student performance outcomes and substantial improvements in the subjective evaluation of the feedback.

Download Full-text

Data-Driven Energy Use Estimation in Large Scale Transportation Networks

Proceedings of the 2nd ACM/EIGSCC Symposium on Smart Cities and Communities - SCC '19 ◽

10.1145/3357492.3358632 ◽

2019 ◽

Author(s):

Bin Wang ◽

Cy Chan ◽

Divya Somasi ◽

Jane Macfarlane ◽

Eric Rask

Keyword(s):

Large Scale ◽

Energy Use ◽

Transportation Networks ◽

Data Driven

Download Full-text

Improving the management of type 2 diabetes through large-scale general practice: the role of a data-driven and technology-enabled education programme

BMJ Open Quality ◽

10.1136/bmjoq-2020-001087 ◽

2021 ◽

Vol 10 (1) ◽

pp. e001087

Author(s):

Tarek F Radwan ◽

Yvette Agyako ◽

Alireza Ettefaghian ◽

Tahira Kamran ◽

Omar Din ◽

...

Keyword(s):

Type 2 Diabetes ◽

Primary Care ◽

Large Scale ◽

Education Programme ◽

Educational Programme ◽

Data Driven ◽

Treatment Targets ◽

Care Processes ◽

Data Driven Approach

A quality improvement (QI) scheme was launched in 2017, covering a large group of 25 general practices working with a deprived registered population. The aim was to improve the measurable quality of care in a population where type 2 diabetes (T2D) care had previously proved challenging. A complex set of QI interventions were co-designed by a team of primary care clinicians and educationalists and managers. These interventions included organisation-wide goal setting, using a data-driven approach, ensuring staff engagement, implementing an educational programme for pharmacists, facilitating web-based QI learning at-scale and using methods which ensured sustainability. This programme was used to optimise the management of T2D through improving the eight care processes and three treatment targets which form part of the annual national diabetes audit for patients with T2D. With the implemented improvement interventions, there was significant improvement in all care processes and all treatment targets for patients with diabetes. Achievement of all the eight care processes improved by 46.0% (p<0.001) while achievement of all three treatment targets improved by 13.5% (p<0.001). The QI programme provides an example of a data-driven large-scale multicomponent intervention delivered in primary care in ethnically diverse and socially deprived areas.

Download Full-text

Data-Driven Lightweight Interest Point Selection for Large-Scale Visual Search

IEEE Transactions on Multimedia ◽

10.1109/tmm.2018.2818012 ◽

2018 ◽

Vol 20 (10) ◽

pp. 2774-2787 ◽

Cited By ~ 2

Author(s):

Feng Gao ◽

Xinfeng Zhang ◽

Yicheng Huang ◽

Yong Luo ◽

Xiaoming Li ◽

...

Keyword(s):

Visual Search ◽

Large Scale ◽

Data Driven ◽

Interest Point ◽

Point Selection ◽

Selection For

Download Full-text

One small step for MIP towards automated metaphor identification?

Metaphor and the Social World ◽

10.1075/msw.3.1.04dor ◽

2013 ◽

Vol 3 (1) ◽

pp. 77-99 ◽

Cited By ~ 7

Author(s):

Aletta G. Dorst ◽

W.Gudrun Reijnierse ◽

Gemma Venhuizen

Keyword(s):

Large Scale ◽

Chemical Processes ◽

Small Step ◽

Lexical Unit ◽

Basic Meaning ◽

General Rules ◽

Natural Discourse ◽

Authentic Data ◽

Large Corpus

The manual annotation of large corpora is time-consuming and brings about issues of consistency. This paper aims to demonstrate how general rules for determining basic meanings can be formulated in large-scale projects involving multiple analysts applying MIP(VU) to authentic data. Three sets of problematic lexical units — chemical processes, colours, and sharp objects — are discussed in relation to the question of how the basic meaning of a lexical unit can be determined when human and non-human senses compete as candidates for the basic meaning; these analyses can therefore be considered a detailed case study of problems encountered during step 3.b. of MIP(VU). The analyses show how these problematic cases were tackled in a large corpus clean-up project in order to streamline the annotations and ensure a greater consistency of the corpus. In addition, this paper will point out how the formulation of general identification rules and guidelines could provide a first step towards the automatic detection of linguistic metaphors in natural discourse.

Download Full-text

Marsh rabbit mortalities tie pythons to the precipitous decline of mammals in the Everglades

Proceedings of The Royal Society B Biological Sciences ◽

10.1098/rspb.2015.0120 ◽

2015 ◽

Vol 282 (1805) ◽

pp. 20150120 ◽

Cited By ~ 39

Author(s):

Robert A. McCleery ◽

Adia Sovie ◽

Robert N. Reed ◽

Mark W. Cunningham ◽

Margaret E. Hunter ◽

...

Keyword(s):

National Park ◽

Large Scale ◽

Ongoing Debate ◽

Faunal Communities ◽

Python Molurus ◽

Scale Experiment ◽

Precipitous Decline ◽

Python Molurus Bivittatus ◽

The Impact ◽

Ecological Functioning

To address the ongoing debate over the impact of invasive species on native terrestrial wildlife, we conducted a large-scale experiment to test the hypothesis that invasive Burmese pythons ( Python molurus bivittatus ) were a cause of the precipitous decline of mammals in Everglades National Park (ENP). Evidence linking pythons to mammal declines has been indirect and there are reasons to question whether pythons, or any predator, could have caused the precipitous declines seen across a range of mammalian functional groups. Experimentally manipulating marsh rabbits, we found that pythons accounted for 77% of rabbit mortalities within 11 months of their translocation to ENP and that python predation appeared to preclude the persistence of rabbit populations in ENP. On control sites, outside of the park, no rabbits were killed by pythons and 71% of attributable marsh rabbit mortalities were classified as mammal predations. Burmese pythons pose a serious threat to the faunal communities and ecological functioning of the Greater Everglades Ecosystem, which will probably spread as python populations expand their range.

Download Full-text