Empirical Study on Human Evaluation of Complex Argumentation Frameworks

Author(s):  
Marcos Cramer ◽  
Mathieu Guillaume
Author(s):  
Alexandra Constantin ◽  
Maja Matarić

In this paper, we present a metric for assessing the quality of arm movement imitation. We develop a joint-rotational-angle-based segmentation and comparison algorithm that rates pairwise similarity of arm movement trajectories on a scale of 1-10. We describe an empirical study designed to validate the algorithm we developed, by comparing it to human evaluation of imitation. The results provide evidence that the evaluation of the automatic metric did not significantly differ from human evaluation.


Author(s):  
Ahrii Kim ◽  
Jinhyun Kim

SacreBLEU, by incorporating a text normalizing step in the pipeline, has been well-received as an automatic evaluation metric in recent years. With agglutinative languages such as Korean, however, the metric cannot provide a conceivable result without the help of customized pre-tokenization. In this regard, this paper endeavors to examine the influence of diversified pre-tokenization schemes –word, morpheme, character, and subword– on the aforementioned metric by performing a meta-evaluation with manually-constructed into-Korean human evaluation data. Our empirical study demonstrates that the correlation of SacreBLEU (to human judgment) fluctuates consistently by the token type. The reliability of the metric even deteriorates due to some tokenization, and MeCab is not an exception. Guiding through the proper usage of tokenizer for each metric, we stress the significance of a character level and the insignificance of a Jamo level in MT evaluation.


1996 ◽  
Vol 81 (1) ◽  
pp. 76-87 ◽  
Author(s):  
Connie R. Wanberg ◽  
John D. Watt ◽  
Deborah J. Rumsey

Sign in / Sign up

Export Citation Format

Share Document