scholarly journals Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale Pretraining

2020 ◽  
Vol 8 ◽  
pp. 810-827
Author(s):  
Ananya B. Sai ◽  
Akash Kumar Mohankumar ◽  
Siddhartha Arora ◽  
Mitesh M. Khapra

There is an increasing focus on model-based dialog evaluation metrics such as ADEM, RUBER, and the more recent BERT-based metrics. These models aim to assign a high score to all relevant responses and a low score to all irrelevant responses. Ideally, such models should be trained using multiple relevant and irrelevant responses for any given context. However, no such data is publicly available, and hence existing models are usually trained using a single relevant response and multiple randomly selected responses from other contexts (random negatives). To allow for better training and robust evaluation of model-based metrics, we introduce the DailyDialog++ dataset, consisting of (i) five relevant responses for each context and (ii) five adversarially crafted irrelevant responses for each context. Using this dataset, we first show that even in the presence of multiple correct references, n-gram based metrics and embedding based metrics do not perform well at separating relevant responses from even random negatives. While model-based metrics perform better than n-gram and embedding based metrics on random negatives, their performance drops substantially when evaluated on adversarial examples. To check if large scale pretraining could help, we propose a new BERT-based evaluation metric called DEB, which is pretrained on 727M Reddit conversations and then finetuned on our dataset. DEB significantly outperforms existing models, showing better correlation with human judgments and better performance on random negatives (88.27% accuracy). However, its performance again drops substantially when evaluated on adversarial responses, thereby highlighting that even large-scale pretrained evaluation models are not robust to the adversarial examples in our dataset. The dataset 1 and code 2 are publicly available.

2009 ◽  
Vol 69-70 ◽  
pp. 675-679
Author(s):  
D.S. Liu ◽  
Chun Hua Ju

To address the problem of customer churn in CRM in manufacturing industry, this paper proposes a prediction model based on Support Vector Machine (SVM). Considering the large-scale and imbalanced churn data, principal component analysis (PCA) is adopted to reduce dimensions and eliminate redundant information, which makes the sample space for SVM more compact and reasonable. An improved SVM is used to predict customer churn. Firstly, PCA is adopted to process 17 dimensional feature vectors of customer churn data, and then the application in manufacturing industry verifies that this model based on both PCA and SVM performs better than the model based on SVM only and other traditional models.


2016 ◽  
Vol 25 (02) ◽  
pp. 1650006
Author(s):  
Aleksander Smywinski-Pohl ◽  
Bartosz Ziółko

In this paper we investigate the usefulness of morphosyntactic information as well as clustering in modeling Polish for automatic speech recognition. Polish is an inflectional language, thus we investigate the usefulness of an N-gram model based on morphosyntactic features. We present how individual types of features influence the model and which types of features are best suited for building a language model for automatic speech recognition. We compared the results of applying them with a class-based model that is automatically derived from the training corpus. We show that our approach towards clustering performs significantly better than frequently used SRI LM clustering method. However, this difference is apparent only for smaller corpora.


2020 ◽  
Vol 140 (4) ◽  
pp. 272-280
Author(s):  
Wataru Ohnishi ◽  
Hiroshi Fujimoto ◽  
Koichi Sakata

2018 ◽  
Vol 16 (1) ◽  
pp. 67-76
Author(s):  
Disyacitta Neolia Firdana ◽  
Trimurtini Trimurtini

This research aimed to determine the properness and effectiveness of the big book media on learning equivalent fractions of fourth grade students. The method of research is Research and Development  (R&D). This study was conducted in fourth grade of SDN Karanganyar 02 Kota Semarang. Data sources from media validation, material validation, learning outcomes, and teacher and students responses on developed media. Pre-experimental research design with one group pretest-posttest design. Big book developed consist of equivalent fractions material, students learning activities sheets with rectangle and circle shape pictures, and questions about equivalent fractions. Big book was developed based on students and teacher needs. This big book fulfill the media validity of 3,75 with very good criteria and scored 3 by material experts with good criteria. In large-scale trial, the result of students posttest have learning outcomes completness 82,14%. The result of N-gain calculation with result 0,55 indicates the criterion “medium”. The t-test result 9,6320 > 2,0484 which means the average of posttest outcomes is better than the average of pretest outcomes. Based on that data, this study has produced big book media which proper and effective as a media of learning equivalent fractions of fourth grade elementary school.


Energies ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 1261
Author(s):  
Christopher Gradwohl ◽  
Vesna Dimitrievska ◽  
Federico Pittino ◽  
Wolfgang Muehleisen ◽  
András Montvay ◽  
...  

Photovoltaic (PV) technology allows large-scale investments in a renewable power-generating system at a competitive levelized cost of electricity (LCOE) and with a low environmental impact. Large-scale PV installations operate in a highly competitive market environment where even small performance losses have a high impact on profit margins. Therefore, operation at maximum performance is the key for long-term profitability. This can be achieved by advanced performance monitoring and instant or gradual failure detection methodologies. We present in this paper a combined approach on model-based fault detection by means of physical and statistical models and failure diagnosis based on physics of failure. Both approaches contribute to optimized PV plant operation and maintenance based on typically available supervisory control and data acquisition (SCADA) data. The failure detection and diagnosis capabilities were demonstrated in a case study based on six years of SCADA data from a PV plant in Slovenia. In this case study, underperforming values of the inverters of the PV plant were reliably detected and possible root causes were identified. Our work has led us to conclude that the combined approach can contribute to an efficient and long-term operation of photovoltaic power plants with a maximum energy yield and can be applied to the monitoring of photovoltaic plants.


2021 ◽  
Vol 9 (3) ◽  
pp. 264
Author(s):  
Shanti Bhushan ◽  
Oumnia El Fajri ◽  
Graham Hubbard ◽  
Bradley Chambers ◽  
Christopher Kees

This study evaluates the capability of Navier–Stokes solvers in predicting forward and backward plunging breaking, including assessment of the effect of grid resolution, turbulence model, and VoF, CLSVoF interface models on predictions. For this purpose, 2D simulations are performed for four test cases: dam break, solitary wave run up on a slope, flow over a submerged bump, and solitary wave over a submerged rectangular obstacle. Plunging wave breaking involves high wave crest, plunger formation, and splash up, followed by second plunger, and chaotic water motions. Coarser grids reasonably predict the wave breaking features, but finer grids are required for accurate prediction of the splash up events. However, instabilities are triggered at the air–water interface (primarily for the air flow) on very fine grids, which induces surface peel-off or kinks and roll-up of the plunger tips. Reynolds averaged Navier–Stokes (RANS) turbulence models result in high eddy-viscosity in the air–water region which decays the fluid momentum and adversely affects the predictions. Both VoF and CLSVoF methods predict the large-scale plunging breaking characteristics well; however, they vary in the prediction of the finer details. The CLSVoF solver predicts the splash-up event and secondary plunger better than the VoF solver; however, the latter predicts the plunger shape better than the former for the solitary wave run-up on a slope case.


Sign in / Sign up

Export Citation Format

Share Document