Insight into Multiple References in an MT Evaluation Metric

MinKSR: A Novel MT Evaluation Metric for Coordinating Human Translators with the CAT-Oriented Input Method

Communications in Computer and Information Science - Machine Translation ◽

10.1007/978-981-10-3635-4_1 ◽

2016 ◽

pp. 1-13 ◽

Cited By ~ 1

Author(s):

Guoping Huang ◽

Chunlu Zhao ◽

Hongyuan Ma ◽

Yu Zhou ◽

Jiajun Zhang

Keyword(s):

Input Method ◽

Mt Evaluation ◽

Evaluation Metric

Download Full-text

Retrieval and Timing Performance of Chewing-Based Eating Event Detection in Wearable Sensors

Sensors ◽

10.3390/s20020557 ◽

2020 ◽

Vol 20 (2) ◽

pp. 557 ◽

Cited By ~ 1

Author(s):

Rui Zhang ◽

Oliver Amft

Keyword(s):

Event Detection ◽

Wearable Sensors ◽

Detection Algorithm ◽

Research Community ◽

Free Living ◽

Bottom Up ◽

Timing Errors ◽

Eating Occasions ◽

Evaluation Metric ◽

Insight Into

We present an eating detection algorithm for wearable sensors based on first detecting chewing cycles and subsequently estimating eating phases. We term the corresponding algorithm class as a bottom-up approach. We evaluated the algorithm using electromyographic (EMG) recordings from diet-monitoring eyeglasses in free-living and compared the bottom-up approach against two top-down algorithms. We show that the F1 score was no longer the primary relevant evaluation metric when retrieval rates exceeded approx. 90%. Instead, detection timing errors provided more important insight into detection performance. In 122 hours of free-living EMG data from 10 participants, a total of 44 eating occasions were detected, with a maximum F1 score of 99.2%. Average detection timing errors of the bottom-up algorithm were 2.4 ± 0.4 s and 4.3 ± 0.4 s for the start and end of eating occasions, respectively. Our bottom-up algorithm has the potential to work with different wearable sensors that provide chewing cycle data. We suggest that the research community report timing errors (e.g., using the metrics described in this work).

Download Full-text

Machine Translation Evaluation: Unveiling the Role of Dense Sentence Vector Embedding for Morphologically Rich Language

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001420590016 ◽

2019 ◽

Vol 34 (01) ◽

pp. 2059001

Author(s):

Samiksha Tripathi ◽

Vineet Kansal

Keyword(s):

Machine Translation ◽

Optimal Solution ◽

Poor Performance ◽

Target Language ◽

Linguistic Knowledge ◽

Machine Translation Evaluation ◽

Mt Evaluation ◽

Morphologically Rich Languages ◽

Evaluation Metric

Machine Translation (MT) evaluation metrics like BiLingual Evaluation Understudy (BLEU) and Metric for Evaluation of Translation with Explicit Ordering (METEOR) are known to have poor performance for word-order and morphologically rich languages. Application of linguistic knowledge to evaluate MTs for morphologically rich language like Hindi as a target language, is shown to be more effective and accurate [S. Tripathi and V. Kansal, Using linguistic knowledge for machine translation evaluation with Hindi as a target language, Comput. Sist.21(4) (2017) 717–724]. Leveraging the recent progress made in the domain of word vector and sentence vector embedding [T. Mikolov and J. Dean, Distributed representations of words and phrases and their compositionality, Adv. Neural Inf. Process. Syst. 2 (2013) 3111–3119], authors have trained a large corpus of pre-processed Hindi text ([Formula: see text] million tokens) for obtaining the word vectors and sentence vector embedding for Hindi. The training has been performed on high end system configuration utilizing Google Cloud platform resources. This sentence vector embedding is further used to corroborate the findings through linguistic knowledge in evaluation metric. For morphologically rich language as target, evaluation metric of MT systems is considered as an optimal solution. In this paper, authors have demonstrated that MT evaluation using sentence embedding-based approach closely mirrors linguistic evaluation technique. The relevant codes used to generate the vector embedding for Hindi have been uploaded on code sharing platform Github. a

Download Full-text

BiMEANT: Integrating Cross-Lingual and Monolingual Semantic Frame Similarities in the MEANT Semantic MT Evaluation Metric

Statistical Language and Speech Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-11397-5_6 ◽

2014 ◽

pp. 82-93

Author(s):

Chi-kiu Lo ◽

Dekai Wu

Keyword(s):

Mt Evaluation ◽

Evaluation Metric ◽

Cross Lingual

Download Full-text

A simple automatic MT evaluation metric

10.3115/1626431.1626436 ◽

2009 ◽

Cited By ~ 1

Author(s):

Petr Homola ◽

Vladislav Kuboň ◽

Pavel Pecina

Keyword(s):

Mt Evaluation ◽

Evaluation Metric

Download Full-text

Unsupervised Quality Estimation Model for English to German Translation and Its Application in Extensive Supervised Evaluation

The Scientific World JOURNAL ◽

10.1155/2014/760301 ◽

2014 ◽

Vol 2014 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Aaron L.-F. Han ◽

Derek F. Wong ◽

Lidia S. Chao ◽

Liangye He ◽

Yi Lu

Keyword(s):

Rapid Development ◽

Linguistic Features ◽

Estimation Model ◽

Linguistic Feature ◽

Language Bias ◽

Automatic Translation ◽

Part Of Speech ◽

Mt Evaluation ◽

Evaluation Metric ◽

Translation Systems

With the rapid development of machine translation (MT), the MT evaluation becomes very important to timely tell us whether the MT system makes any progress. The conventional MT evaluation methods tend to calculate the similarity between hypothesis translations offered by automatic translation systems and reference translations offered by professional translators. There are several weaknesses in existing evaluation metrics. Firstly, the designed incomprehensive factors result in language-bias problem, which means they perform well on some special language pairs but weak on other language pairs. Secondly, they tend to use no linguistic features or too many linguistic features, of which no usage of linguistic feature draws a lot of criticism from the linguists and too many linguistic features make the model weak in repeatability. Thirdly, the employed reference translations are very expensive and sometimes not available in the practice. In this paper, the authors propose an unsupervised MT evaluation metric using universal part-of-speech tagset without relying on reference translations. The authors also explore the performances of the designed metric on traditional supervised evaluation tasks. Both the supervised and unsupervised experiments show that the designed methods yield higher correlation scores with human judgments.

Download Full-text

A Pointwise Evaluation Metric to Visualize Errors in Machine Learning Surrogate Models

10.3233/faia210386 ◽

2021 ◽

Author(s):

Seyed Shayan Sajjadinia ◽

Bruno Carpentieri ◽

Gerhard A. Holzapfel

Keyword(s):

Machine Learning ◽

Numerical Model ◽

High Performance ◽

Mean Squared Error ◽

Performance Model ◽

Squared Error ◽

The Mean ◽

Evaluation Metric ◽

Definition Of ◽

Insight Into

Numerical simulation is widely used to study physical systems, although it can be computationally too expensive. To counter this limitation, a surrogate may be used, which is a high-performance model that replaces the main numerical model by using, e.g., a machine learning (ML) regressor that is trained on a previously generated subset of possible inputs and outputs of the numerical model. In this context, inspired by the definition of the mean squared error (MSE) metric, we introduce the pointwise MSE (PMSE) metric, which can give a better insight into the performance of such ML models over the test set, by focusing on every point that forms the physical system. To show the merits of the metric, we will create a dataset of a physics problem that will be used to train an ML surrogate, which will then be evaluated by the metrics. In our experiment, the PMSE contour demonstrates how the model learns the physics in different model regions and, in particular, the correlation between the characteristics of the numerical model and the learning progress can be observed. We therefore conclude that this simple and efficient metric can provide complementary and potentially interpretable information regarding the performance and functionality of the surrogate.

Download Full-text

ENTF: An Entropy-Based MT Evaluation Metric

Communications in Computer and Information Science - Machine Translation ◽

10.1007/978-981-10-7134-8_7 ◽

2017 ◽

pp. 68-77

Author(s):

Hui Yu ◽

Weizhi Xu ◽

Shouxun Lin ◽

Qun Liu

Keyword(s):

Mt Evaluation ◽

Evaluation Metric

Download Full-text

Guidance to Pre-tokeniztion for SacreBLEU: Meta-Evaluation in Korean

10.20944/preprints202201.0018.v1 ◽

2022 ◽

Author(s):

Ahrii Kim ◽

Jinhyun Kim

Keyword(s):

Empirical Study ◽

Automatic Evaluation ◽

Human Judgment ◽

Evaluation Data ◽

Human Evaluation ◽

Mt Evaluation ◽

Evaluation Metric ◽

Agglutinative Languages

SacreBLEU, by incorporating a text normalizing step in the pipeline, has been well-received as an automatic evaluation metric in recent years. With agglutinative languages such as Korean, however, the metric cannot provide a conceivable result without the help of customized pre-tokenization. In this regard, this paper endeavors to examine the influence of diversified pre-tokenization schemes –word, morpheme, character, and subword– on the aforementioned metric by performing a meta-evaluation with manually-constructed into-Korean human evaluation data. Our empirical study demonstrates that the correlation of SacreBLEU (to human judgment) fluctuates consistently by the token type. The reliability of the metric even deteriorates due to some tokenization, and MeCab is not an exception. Guiding through the proper usage of tokenizer for each metric, we stress the significance of a character level and the insignificance of a Jamo level in MT evaluation.

Download Full-text

Calibration problems in hydrogen luminosities

Symposium - International Astronomical Union ◽

10.1017/s0074180900053626 ◽

1966 ◽

Vol 24 ◽

pp. 322-330

Author(s):

A. Beer

Keyword(s):

Galactic Structure ◽

B Stars ◽

Insight Into

The investigations which I should like to summarize in this paper concern recent photo-electric luminosity determinations of O and B stars. Their final aim has been the derivation of new stellar distances, and some insight into certain patterns of galactic structure.

Download Full-text