scholarly journals Taking MT Evaluation Metrics to Extremes: Beyond Correlation with Human Judgments

2019 ◽  
Vol 45 (3) ◽  
pp. 515-558
Author(s):  
Marina Fomicheva ◽  
Lucia Specia

Automatic Machine Translation (MT) evaluation is an active field of research, with a handful of new metrics devised every year. Evaluation metrics are generally benchmarked against manual assessment of translation quality, with performance measured in terms of overall correlation with human scores. Much work has been dedicated to the improvement of evaluation metrics to achieve a higher correlation with human judgments. However, little insight has been provided regarding the weaknesses and strengths of existing approaches and their behavior in different settings. In this work we conduct a broad meta-evaluation study of the performance of a wide range of evaluation metrics focusing on three major aspects. First, we analyze the performance of the metrics when faced with different levels of translation quality, proposing a local dependency measure as an alternative to the standard, global correlation coefficient. We show that metric performance varies significantly across different levels of MT quality: Metrics perform poorly when faced with low-quality translations and are not able to capture nuanced quality distinctions. Interestingly, we show that evaluating low-quality translations is also more challenging for humans. Second, we show that metrics are more reliable when evaluating neural MT than the traditional statistical MT systems. Finally, we show that the difference in the evaluation accuracy for different metrics is maintained even if the gold standard scores are based on different criteria.

1992 ◽  
Vol 24 (3) ◽  
pp. 301-315 ◽  
Author(s):  
L. Rosetta

There is a wide range of duration of post-partum amenorrhoea and resumption of ovulation between individuals, within an individual or between populations. Several extraneous variables, such as parity, mother's age, sex of the breast-fed baby, socioeconomic status and cultural level of the family, can be controlled; then the remaining variables will probably explain a part of the total variability in post-partum amenorrhoea duration but say nothing about the physiological process. In attempting to question physiological aspects of the return of fertility several observational studies have tended to favour one of the different factors which are supposed to play a major role in the regulation and have compared different levels of it, such as body composition of the mother (Frisch & McArthur, 1974), breast-feeding pattern (Jones, 1989) or the life style of the women. Life style can be related to women's physical activity in normal life (Ellison, 1991), the difference between urban and rural life (Carael, 1981) or the environment (Laurenson et al., 1985). Prolactin as a possible mediator of the central regulation has been carefully considered (Lunn, Austin & Whitehead, 1984; Howie et al., 1982). These studies were mainly observational rather than experimental, supplementing mothers during the lactating period or during the pregnancy. If this information is added to what is known of other animal species (Loudon, 1987) or animal experimentation (Plant et al., 1989; Williams et al., 1990a; Williams et al., 1990b), the combination of several of the main factors believed to have a major role in the human species can be clarified and the aetiology of the resumption of fertility in nursing women investigated.


2021 ◽  
Vol 15 (1) ◽  
Author(s):  
Ch Ram Anirudh ◽  
Kavi Narayana Murthy

Machine Translated texts are often far from perfect and postediting is essential to get publishable quality. Post-editing may not always be a pleasant task. However, modern machine translation (MT) approaches like Statistical MT (SMT) and Neural MT (NMT) seem to hold greater promise. In this work, we present a quantitative method for scoring translations and computing the post-editability of MT system outputs.We show that the scores we get correlate well with MT evaluation metrics as also with the actual time and effort required for post-editing. We compare the outputs of three modern MT systems namely phrase-based SMT (PBMT), NMT, and Google translate for their Post-Editability for English to Hindi translation. Further, we explore the effect of various kinds of errors in MT outputs on postediting time and effort. Including an Indian language in this kind of post-editability study and analyzing the influence oferrors on postediting time and effort for NMT are highlights of this work.


2019 ◽  
Vol 50 (4) ◽  
pp. 693-702 ◽  
Author(s):  
Christine Holyfield ◽  
Sydney Brooks ◽  
Allison Schluterman

Purpose Augmentative and alternative communication (AAC) is an intervention approach that can promote communication and language in children with multiple disabilities who are beginning communicators. While a wide range of AAC technologies are available, little is known about the comparative effects of specific technology options. Given that engagement can be low for beginning communicators with multiple disabilities, the current study provides initial information about the comparative effects of 2 AAC technology options—high-tech visual scene displays (VSDs) and low-tech isolated picture symbols—on engagement. Method Three elementary-age beginning communicators with multiple disabilities participated. The study used a single-subject, alternating treatment design with each technology serving as a condition. Participants interacted with their school speech-language pathologists using each of the 2 technologies across 5 sessions in a block randomized order. Results According to visual analysis and nonoverlap of all pairs calculations, all 3 participants demonstrated more engagement with the high-tech VSDs than the low-tech isolated picture symbols as measured by their seconds of gaze toward each technology option. Despite the difference in engagement observed, there was no clear difference across the 2 conditions in engagement toward the communication partner or use of the AAC. Conclusions Clinicians can consider measuring engagement when evaluating AAC technology options for children with multiple disabilities and should consider evaluating high-tech VSDs as 1 technology option for them. Future research must explore the extent to which differences in engagement to particular AAC technologies result in differences in communication and language learning over time as might be expected.


2020 ◽  
Vol 7 (2) ◽  
pp. 34-41
Author(s):  
VLADIMIR NIKONOV ◽  
◽  
ANTON ZOBOV ◽  

The construction and selection of a suitable bijective function, that is, substitution, is now becoming an important applied task, particularly for building block encryption systems. Many articles have suggested using different approaches to determining the quality of substitution, but most of them are highly computationally complex. The solution of this problem will significantly expand the range of methods for constructing and analyzing scheme in information protection systems. The purpose of research is to find easily measurable characteristics of substitutions, allowing to evaluate their quality, and also measures of the proximity of a particular substitutions to a random one, or its distance from it. For this purpose, several characteristics were proposed in this work: difference and polynomial, and their mathematical expectation was found, as well as variance for the difference characteristic. This allows us to make a conclusion about its quality by comparing the result of calculating the characteristic for a particular substitution with the calculated mathematical expectation. From a computational point of view, the thesises of the article are of exceptional interest due to the simplicity of the algorithm for quantifying the quality of bijective function substitutions. By its nature, the operation of calculating the difference characteristic carries out a simple summation of integer terms in a fixed and small range. Such an operation, both in the modern and in the prospective element base, is embedded in the logic of a wide range of functional elements, especially when implementing computational actions in the optical range, or on other carriers related to the field of nanotechnology.


2019 ◽  
Author(s):  
Le Wang ◽  
Devon Jakob ◽  
Haomin Wang ◽  
Alexis Apostolos ◽  
Marcos M. Pires ◽  
...  

<div>Infrared chemical microscopy through mechanical probing of light-matter interactions by atomic force microscopy (AFM) bypasses the diffraction limit. One increasingly popular technique is photo-induced force microscopy (PiFM), which utilizes the mechanical heterodyne signal detection between cantilever mechanical resonant oscillations and the photo induced force from light-matter interaction. So far, photo induced force microscopy has been operated in only one heterodyne configuration. In this article, we generalize heterodyne configurations of photoinduced force microscopy by introducing two new schemes: harmonic heterodyne detection and sequential heterodyne detection. In harmonic heterodyne detection, the laser repetition rate matches integer fractions of the difference between the two mechanical resonant modes of the AFM cantilever. The high harmonic of the beating from the photothermal expansion mixes with the AFM cantilever oscillation to provide PiFM signal. In sequential heterodyne detection, the combination of the repetition rate of laser pulses and polarization modulation frequency matches the difference between two AFM mechanical modes, leading to detectable PiFM signals. These two generalized heterodyne configurations for photo induced force microscopy deliver new avenues for chemical imaging and broadband spectroscopy at ~10 nm spatial resolution. They are suitable for a wide range of heterogeneous materials across various disciplines: from structured polymer film, polaritonic boron nitride materials, to isolated bacterial peptidoglycan cell walls. The generalized heterodyne configurations introduce flexibility for the implementation of PiFM and related tapping mode AFM-IR, and provide possibilities for additional modulation channel in PiFM for targeted signal extraction with nanoscale spatial resolution.</div>


2021 ◽  
Vol 11 (10) ◽  
pp. 4589
Author(s):  
Ivan Duvnjak ◽  
Domagoj Damjanović ◽  
Marko Bartolac ◽  
Ana Skender

The main principle of vibration-based damage detection in structures is to interpret the changes in dynamic properties of the structure as indicators of damage. In this study, the mode shape damage index (MSDI) method was used to identify discrete damages in plate-like structures. This damage index is based on the difference between modified modal displacements in the undamaged and damaged state of the structure. In order to assess the advantages and limitations of the proposed algorithm, we performed experimental modal analysis on a reinforced concrete (RC) plate under 10 different damage cases. The MSDI values were calculated through considering single and/or multiple damage locations, different levels of damage, and boundary conditions. The experimental results confirmed that the MSDI method can be used to detect the existence of damage, identify single and/or multiple damage locations, and estimate damage severity in the case of single discrete damage.


2020 ◽  
Vol 30 (Supplement_5) ◽  
Author(s):  
R Bosco ◽  
S Gambelli ◽  
V Urbano ◽  
G Cevenini ◽  
G Messina

Abstract Background Sanitizing the operating theatres (OT) is important to minimize risk of post-operative infections. Disinfection procedures between one operation and another is less aggressive than final cleaning procedures, at the end of the day. Aim was assessing the difference of contamination: i) between different levels of disinfection; ii) before and after the use of a UVC Device (UVC-D). Methods Between December 2019/February 2020 a cross sectional study was conducted in OT in a real clinical context. 94 Petri dishes (PD) were used in 3 OT. Three different sanitation levels (SL1-3) were compared pre- and post-use of UVC-D: i) No cleaning after surgery (SL1); ii) after in-between cleaning (SL2); iii) after terminal cleaning (SL3). UVC-D was employed for 6 minutes, 3 minutes per bed side. PD were incubated at 36 °C and colony forming unit (CFU) counted at 48h. Descriptive statistic, Wilcoxon and Mann-Whitney tests were performed to assess the contamination levels in total, pre/post use of UVC-D, and between different sanitation levels, respectively. Results In total we had a mean of 3.39 CFU/PD (C.I. 2.05 - 4.74) and a median of 1 CFU/PD (Min. 0 - Max. 39), after UVC-D use we had a mean of 2.20 CFU/PD (C.I. 0.69 - 5.09) and a median of 0 CFU/PD (Min. 0 - Max. 133). The UVC-D led to a significant reduction of CFU (p &lt; 0.001). Without UVC-D we had a significant CFU drop (p &lt; 0.05) between SL1 and SL3. Using UVC-D, we observed significant reductions of contamination (p &lt; 0.05) between SL3 and SL1. Comparing SL1 (median 0) post UVC-D use vs SL2 pre UVC-D use (median 0.5), and SL2 post UVC-D use (median 0) vs SL3 pre UVC-D use (median 1) we had a significant reduction of contamination (p &lt; 0.05). Conclusions UVC-D improved environmental contamination in any of the three sanitation levels. Furthermore, the use of UVC-D alone was better than in-between and terminal cleaning. Although these encouraging results, the cleaning procedures executed by dedicated staff has to be considered. Key messages UVC are efficient to decrease contamination in operating theatres regardless of sanitation levels. The additional use of UVC technology to standard cleaning procedures significantly improves sanitation levels.


2021 ◽  
Vol 7 (1) ◽  
pp. 331-341
Author(s):  
Urszula Chwesiuk

Abstract The aim of this study was an attempt to verify whether Polish speakers of English insert a vowel in the word-final clusters containing a consonant and a syllabic /l/ or /n/ due to the L1–L2 transfer. L1 Polish speakers are mostly unaware of the existence of syllabic consonants; hence, they use the Polish phonotactics and articulate a vocalic sound before a final sonorant which is deprived of its syllabicity. This phenomenon was examined among L1 Polish speakers, 1-year students of English studies, and the recording sessions were repeated a year later. Since, over that time, they were instructed with regard to phonetics and phonology but also the overall practical language learning, the results demonstrated the occurrence of the phenomenon of vowel insertion on different levels of advanced command of English. If the vowels were inserted, their quality and length were monitored and analysed. With regard to the English system, pronouncing vowel /ə/ before a syllabic consonant is possible, yet not usual. That is why another aim of this study is to examine to what extent the vowels articulated by the subjects differ from the standard pronunciation of non-final /ə/. The quality differences between the vowels articulated in the words ending with /l/ and /n/ were examined as well as the potential influence from the difference between /l/ and /n/ on the occurrence of vowel reduction. Even though Polish phonotactics permit numerous consonantal combinations in all word positions, it proved to be challenging for L1 Polish speakers to pronounce word-final consonantal clusters containing both syllabic sonorants. This result carries practical implications for the teaching methodology of English phonetics.


Religions ◽  
2019 ◽  
Vol 10 (6) ◽  
pp. 389
Author(s):  
James Robert Brown

Religious notions have long played a role in epistemology. Theological thought experiments, in particular, have been effective in a wide range of situations in the sciences. Some of these are merely picturesque, others have been heuristically important, and still others, as I will argue, have played a role that could be called essential. I will illustrate the difference between heuristic and essential with two examples. One of these stems from the Newton–Leibniz debate over the nature of space and time; the other is a thought experiment of my own constructed with the aim of making a case for a more liberal view of evidence in mathematics.


Author(s):  
A Jodat ◽  
M Moghiman

In the present study, the applicability of widely used evaporation models (Dalton approach-based correlations) is experimentally investigated for natural, forced, and combined convection regimes. A series of experimental measurements are carried out over a wide range of water temperatures and air velocities for 0.01 ≤ Gr/Re2 ≤ 100 in a heated rectangular pool. The investigations show that the evaporation rate strongly depends on the convection regime's Gr/ Re2 value. The results show that the evaporation rate increases with the difference in vapour pressures over both forced convection (0.01 ≤ Gr/Re2 ≤ 0.1) and turbulent mixed convection regimes (0.15 ≤ Gr/Re2 ≤ 25). However, the escalation rate of evaporation decreases with Gr/ Re2 in the forced convection regime whereas in the turbulent mixed convection it increases. In addition, over the range of the free convection regime ( Gr/Re2 ≥ 25), the evaporation rate is affected not only by the vapour pressure difference but also by the density variation. A dimensionless correlation using the experimental data of all convection regimes (0.01 ≤ Gr/Re2 ≤ 100) is proposed to cover different water surface geometries and airflow conditions.


Sign in / Sign up

Export Citation Format

Share Document