Automatically Measuring Question Authenticity in Real-World Classrooms

2018 ◽  
Vol 47 (7) ◽  
pp. 451-464 ◽  
Author(s):  
Sean Kelly ◽  
Andrew M. Olney ◽  
Patrick Donnelly ◽  
Martin Nystrand ◽  
Sidney K. D’Mello

Analyzing the quality of classroom talk is central to educational research and improvement efforts. In particular, the presence of authentic teacher questions, where answers are not predetermined by the teacher, helps constitute and serves as a marker of productive classroom discourse. Further, authentic questions can be cultivated to improve teaching effectiveness and consequently student achievement. Unfortunately, current methods to measure question authenticity do not scale because they rely on human observations or coding of teacher discourse. To address this challenge, we set out to use automatic speech recognition, natural language processing, and machine learning to train computers to detect authentic questions in real-world classrooms automatically. Our methods were iteratively refined using classroom audio and human-coded observational data from two sources: (a) a large archival database of text transcripts of 451 observations from 112 classrooms; and (b) a newly collected sample of 132 high-quality audio recordings from 27 classrooms, obtained under technical constraints that anticipate large-scale automated data collection and analysis. Correlations between human-coded and computer-coded authenticity at the classroom level were sufficiently high ( r = .602 for archival transcripts and .687 for audio recordings) to provide a valuable complement to human coding in research efforts.

Author(s):  
Cao Liu ◽  
Shizhu He ◽  
Kang Liu ◽  
Jun Zhao

By reason of being able to obtain natural language responses, natural answers are more favored in real-world Question Answering (QA) systems. Generative models learn to automatically generate natural answers from large-scale question answer pairs (QA-pairs). However, they are suffering from the uncontrollable and uneven quality of QA-pairs crawled from the Internet. To address this problem, we propose a curriculum learning based framework for natural answer generation (CL-NAG), which is able to take full advantage of the valuable learning data from a noisy and uneven-quality corpus. Specifically, we employ two practical measures to automatically measure the quality (complexity) of QA-pairs. Based on the measurements, CL-NAG firstly utilizes simple and low-quality QA-pairs to learn a basic model, and then gradually learns to produce better answers with richer contents and more complete syntaxes based on more complex and higher-quality QA-pairs. In this way, all valuable information in the noisy and uneven-quality corpus could be fully exploited. Experiments demonstrate that CL-NAG outperforms the state-of-the-arts, which increases 6.8% and 8.7% in the accuracy for simple and complex questions, respectively.


1992 ◽  
Vol 338 (1285) ◽  
pp. 329-334

The purpose of this contribution is to summarize the papers and discussions, to bring out the highlights, and to focus on outstanding problems and uncertainties. Sixteen years ago Sir Vivian Fuchs and I organized a similar meeting on research in the Antarctic. Since then there has been an explosion of interest in all branches of environm ental science in this region. There have been major advances in theory, and improved technology made possible by the rapid development of electronics has made data collection and analysis easier; but above all the difference between the two meetings is in the development of large-scale numerical modelling as a tool. Also there has been an increasing realization of the value of comparisons between the two polar regions, which is brought out by the contributions to this meeting. The meeting has been distinguished by the quality of the science, the clarity of exposition and excellent visual presentations. It is also striking how much crossfertilization between disciplines has occurred


2007 ◽  
Vol 3 (6) ◽  
pp. 603-606
Author(s):  
Dale Joachim ◽  
Eben Goodale

Playback is an important method of surveying animals, assessing habitats and studying animal communication. However, conventional playback methods require on-site observers and therefore become labour-intensive when covering large areas. Such limitations could be circumvented by the use of cellular telephony, a ubiquitous technology with increasing biological applications. In addressing concerns about the low audio quality of cellular telephones, this paper presents experimental data to show that owls of two species ( Strix varia and Megascops asio ) respond similarly to calls played through cellular telephones as to calls played through conventional playback technology. In addition, the telephone audio recordings are of sufficient quality to detect most of the two owl species' responses. These findings are a first important step towards large-scale applications where networks of cellular phones conduct real-time monitoring tasks.


Author(s):  
Carla Marchetti ◽  
Massimo Mecella ◽  
Monica Scannapieco ◽  
Antoninio Virgillito

A Cooperative Information System (CIS) is a large-scale information system that interconnects various systems of different and autonomous organizations, geographically distributed and sharing common objectives (De Michelis et al., 1997). Among the different resources that are shared by organizations, data are fundamental; in real world scenarios, organization A may not request data from organization B, if it does not trust B’s data (i.e., if A does not know that the quality of the data that B can provide is high). As an example, in an e-government scenario in which public administrations cooperate in order to fulfill service requests from citizens and enterprises (Batini & Mecella, 2001), administrations very often prefer asking citizens for data rather than from other administrations that have stored the same data, because the quality of such data is not known. Therefore, lack of cooperation may occur due to lack of quality certification.


2019 ◽  
Vol 3 (1) ◽  
pp. 63-86 ◽  
Author(s):  
Yanan Wang ◽  
Jianqiang Li ◽  
Sun Hongbo ◽  
Yuan Li ◽  
Faheem Akhtar ◽  
...  

Purpose Simulation is a well-known technique for using computers to imitate or simulate the operations of various kinds of real-world facilities or processes. The facility or process of interest is usually called a system, and to study it scientifically, we often have to make a set of assumptions about how it works. These assumptions, which usually take the form of mathematical or logical relationships, constitute a model that is used to gain some understanding of how the corresponding system behaves, and the quality of these understandings essentially depends on the credibility of given assumptions or models, known as VV&A (verification, validation and accreditation). The main purpose of this paper is to present an in-depth theoretical review and analysis for the application of VV&A in large-scale simulations. Design/methodology/approach After summarizing the VV&A of related research studies, the standards, frameworks, techniques, methods and tools have been discussed according to the characteristics of large-scale simulations (such as crowd network simulations). Findings The contributions of this paper will be useful for both academics and practitioners for formulating VV&A in large-scale simulations (such as crowd network simulations). Originality/value This paper will help researchers to provide support of a recommendation for formulating VV&A in large-scale simulations (such as crowd network simulations).


10.2196/20545 ◽  
2021 ◽  
Vol 23 (2) ◽  
pp. e20545
Author(s):  
Paul J Barr ◽  
James Ryan ◽  
Nicholas C Jacobson

COVID-19 cases are exponentially increasing worldwide; however, its clinical phenotype remains unclear. Natural language processing (NLP) and machine learning approaches may yield key methods to rapidly identify individuals at a high risk of COVID-19 and to understand key symptoms upon clinical manifestation and presentation. Data on such symptoms may not be accurately synthesized into patient records owing to the pressing need to treat patients in overburdened health care settings. In this scenario, clinicians may focus on documenting widely reported symptoms that indicate a confirmed diagnosis of COVID-19, albeit at the expense of infrequently reported symptoms. While NLP solutions can play a key role in generating clinical phenotypes of COVID-19, they are limited by the resulting limitations in data from electronic health records (EHRs). A comprehensive record of clinic visits is required—audio recordings may be the answer. A recording of clinic visits represents a more comprehensive record of patient-reported symptoms. If done at scale, a combination of data from the EHR and recordings of clinic visits can be used to power NLP and machine learning models, thus rapidly generating a clinical phenotype of COVID-19. We propose the generation of a pipeline extending from audio or video recordings of clinic visits to establish a model that factors in clinical symptoms and predict COVID-19 incidence. With vast amounts of available data, we believe that a prediction model can be rapidly developed to promote the accurate screening of individuals at a high risk of COVID-19 and to identify patient characteristics that predict a greater risk of a more severe infection. If clinical encounters are recorded and our NLP model is adequately refined, benchtop virologic findings would be better informed. While clinic visit recordings are not the panacea for this pandemic, they are a low-cost option with many potential benefits, which have recently begun to be explored.


2021 ◽  
pp. 193229682110008
Author(s):  
Alexander Turchin ◽  
Luisa F. Florez Builes

Background: Real-world evidence research plays an increasingly important role in diabetes care. However, a large fraction of real-world data are “locked” in narrative format. Natural language processing (NLP) technology offers a solution for analysis of narrative electronic data. Methods: We conducted a systematic review of studies of NLP technology focused on diabetes. Articles published prior to June 2020 were included. Results: We included 38 studies in the analysis. The majority (24; 63.2%) described only development of NLP tools; the remainder used NLP tools to conduct clinical research. A large fraction (17; 44.7%) of studies focused on identification of patients with diabetes; the rest covered a broad range of subjects that included hypoglycemia, lifestyle counseling, diabetic kidney disease, insulin therapy and others. The mean F1 score for all studies where it was available was 0.882. It tended to be lower (0.817) in studies of more linguistically complex concepts. Seven studies reported findings with potential implications for improving delivery of diabetes care. Conclusion: Research in NLP technology to study diabetes is growing quickly, although challenges (e.g. in analysis of more linguistically complex concepts) remain. Its potential to deliver evidence on treatment and improving quality of diabetes care is demonstrated by a number of studies. Further growth in this area would be aided by deeper collaboration between developers and end-users of natural language processing tools as well as by broader sharing of the tools themselves and related resources.


2020 ◽  
Vol 34 (05) ◽  
pp. 9346-9353
Author(s):  
Bingcong Xue ◽  
Sen Hu ◽  
Lei Zou ◽  
Jiashu Cheng

Paraphrase, i.e., differing textual realizations of the same meaning, has proven useful for many natural language processing (NLP) applications. Collecting paraphrase for predicates in knowledge bases (KBs) is the key to comprehend the RDF triples in KBs. Existing works have published some paraphrase datasets automatically extracted from large corpora, but have too many redundant pairs or don't cover enough predicates, which cannot be improved by computer only and need the help of human beings. This paper shows a full process of collecting large-scale and high-quality paraphrase dictionaries for predicates in knowledge bases, which takes advantage of existing datasets and combines the technologies of machine mining and crowdsourcing. Our dataset comprises 2284 distinct predicates in DBpedia and 31130 paraphrase pairs in total, the quality of which is a great leap over previous works. Then it is demonstrated that such good paraphrase dictionaries can do great help to natural language processing tasks such as question answering and language generation. We also publish our own dictionary for further research.


AI Magazine ◽  
2020 ◽  
Vol 41 (3) ◽  
pp. 18-27
Author(s):  
Mikhail Burtsev ◽  
Varvara Logacheva

Development of conversational systems is one of the most challenging tasks in natural language processing, and it is especially hard in the case of open-domain dialogue. The main factors that hinder progress in this area are lack of training data and difficulty of automatic evaluation. Thus, to reliably evaluate the quality of such models, one needs to resort to time-consuming and expensive human evaluation. We tackle these problems by organizing the Conversational Intelligence Challenge (ConvAI) — open competition of dialogue systems. Our goals are threefold: to work out a good design for human evaluation of open-domain dialogue, to grow open-source code base for conversational systems, and to harvest and publish new datasets. Over the course of ConvAI1 and ConvAI2 competitions, we developed a framework for evaluation of chatbots in messaging platforms and used it to evaluate over 30 dialogue systems in two conversational tasks — discussion of short text snippets from Wikipedia and personalized small talk. These large-scale evaluation experiments were performed by recruiting volunteers as well as paid workers. As a result, we succeeded in collecting a dataset of around 5,000 long meaningful human-to-bot dialogues and got many insights into the organization of human evaluation. This dataset can be used to train an automatic evaluation model or to improve the quality of dialogue systems. Our analysis of ConvAI1 and ConvAI2 competitions shows that the future work in this area should be centered around the more active participation of volunteers in the assessment of dialogue systems. To achieve that, we plan to make the evaluation setup more engaging.


Sign in / Sign up

Export Citation Format

Share Document