Taking MT Evaluation Metrics to Extremes: Beyond Correlation with Human Judgments

Automatic Machine Translation (MT) evaluation is an active field of research, with a handful of new metrics devised every year. Evaluation metrics are generally benchmarked against manual assessment of translation quality, with performance measured in terms of overall correlation with human scores. Much work has been dedicated to the improvement of evaluation metrics to achieve a higher correlation with human judgments. However, little insight has been provided regarding the weaknesses and strengths of existing approaches and their behavior in different settings. In this work we conduct a broad meta-evaluation study of the performance of a wide range of evaluation metrics focusing on three major aspects. First, we analyze the performance of the metrics when faced with different levels of translation quality, proposing a local dependency measure as an alternative to the standard, global correlation coefficient. We show that metric performance varies significantly across different levels of MT quality: Metrics perform poorly when faced with low-quality translations and are not able to capture nuanced quality distinctions. Interestingly, we show that evaluating low-quality translations is also more challenging for humans. Second, we show that metrics are more reliable when evaluating neural MT than the traditional statistical MT systems. Finally, we show that the difference in the evaluation accuracy for different metrics is maintained even if the gold standard scores are based on different criteria.

Download Full-text

Aetiological approach of female reproductive physiology in lactational amenorrhoea

Journal of Biosocial Science ◽

10.1017/s0021932000019866 ◽

1992 ◽

Vol 24 (3) ◽

pp. 301-315 ◽

Cited By ~ 4

Author(s):

L. Rosetta

Keyword(s):

Life Style ◽

Animal Species ◽

Post Partum ◽

Wide Range ◽

The Family ◽

Total Variability ◽

The Difference ◽

Mother’S Age ◽

Different Levels ◽

Breast Fed

There is a wide range of duration of post-partum amenorrhoea and resumption of ovulation between individuals, within an individual or between populations. Several extraneous variables, such as parity, mother's age, sex of the breast-fed baby, socioeconomic status and cultural level of the family, can be controlled; then the remaining variables will probably explain a part of the total variability in post-partum amenorrhoea duration but say nothing about the physiological process. In attempting to question physiological aspects of the return of fertility several observational studies have tended to favour one of the different factors which are supposed to play a major role in the regulation and have compared different levels of it, such as body composition of the mother (Frisch & McArthur, 1974), breast-feeding pattern (Jones, 1989) or the life style of the women. Life style can be related to women's physical activity in normal life (Ellison, 1991), the difference between urban and rural life (Carael, 1981) or the environment (Laurenson et al., 1985). Prolactin as a possible mediator of the central regulation has been carefully considered (Lunn, Austin & Whitehead, 1984; Howie et al., 1982). These studies were mainly observational rather than experimental, supplementing mothers during the lactating period or during the pregnancy. If this information is added to what is known of other animal species (Loudon, 1987) or animal experimentation (Plant et al., 1989; Williams et al., 1990a; Williams et al., 1990b), the combination of several of the main factors believed to have a major role in the human species can be clarified and the aetiology of the resumption of fertility in nursing women investigated.

Download Full-text

On Post-Editability of Machine Translated Texts

Translation Today ◽

10.46623/tt/2021.15.1.ar4 ◽

2021 ◽

Vol 15 (1) ◽

Author(s):

Ch Ram Anirudh ◽

Kavi Narayana Murthy

Keyword(s):

Machine Translation ◽

Quantitative Method ◽

Evaluation Metrics ◽

Indian Language ◽

Actual Time ◽

Mt Evaluation ◽

Statistical Mt ◽

Modern Machine

Machine Translated texts are often far from perfect and postediting is essential to get publishable quality. Post-editing may not always be a pleasant task. However, modern machine translation (MT) approaches like Statistical MT (SMT) and Neural MT (NMT) seem to hold greater promise. In this work, we present a quantitative method for scoring translations and computing the post-editability of MT system outputs.We show that the scores we get correlate well with MT evaluation metrics as also with the actual time and effort required for post-editing. We compare the outputs of three modern MT systems namely phrase-based SMT (PBMT), NMT, and Google translate for their Post-Editability for English to Hindi translation. Further, we explore the effect of various kinds of errors in MT outputs on postediting time and effort. Including an Indian language in this kind of post-editability study and analyzing the influence oferrors on postediting time and effort for NMT are highlights of this work.

Download Full-text

Comparative Effects of High-Tech Visual Scene Displays and Low-Tech Isolated Picture Symbols on Engagement From Students With Multiple Disabilities

Language Speech and Hearing Services in Schools ◽

10.1044/2019_lshss-19-0007 ◽

2019 ◽

Vol 50 (4) ◽

pp. 693-702 ◽

Cited By ~ 1

Author(s):

Christine Holyfield ◽

Sydney Brooks ◽

Allison Schluterman

Keyword(s):

Language Learning ◽

Augmentative And Alternative Communication ◽

Visual Analysis ◽

Multiple Disabilities ◽

Visual Scene ◽

Future Research ◽

High Tech ◽

Single Subject ◽

Wide Range ◽

The Difference

Purpose Augmentative and alternative communication (AAC) is an intervention approach that can promote communication and language in children with multiple disabilities who are beginning communicators. While a wide range of AAC technologies are available, little is known about the comparative effects of specific technology options. Given that engagement can be low for beginning communicators with multiple disabilities, the current study provides initial information about the comparative effects of 2 AAC technology options—high-tech visual scene displays (VSDs) and low-tech isolated picture symbols—on engagement. Method Three elementary-age beginning communicators with multiple disabilities participated. The study used a single-subject, alternating treatment design with each technology serving as a condition. Participants interacted with their school speech-language pathologists using each of the 2 technologies across 5 sessions in a block randomized order. Results According to visual analysis and nonoverlap of all pairs calculations, all 3 participants demonstrated more engagement with the high-tech VSDs than the low-tech isolated picture symbols as measured by their seconds of gaze toward each technology option. Despite the difference in engagement observed, there was no clear difference across the 2 conditions in engagement toward the communication partner or use of the AAC. Conclusions Clinicians can consider measuring engagement when evaluating AAC technology options for children with multiple disabilities and should consider evaluating high-tech VSDs as 1 technology option for them. Future research must explore the extent to which differences in engagement to particular AAC technologies result in differences in communication and language learning over time as might be expected.

Download Full-text

COMBINATORIAL POLYNOMIALLY COMPUTABLE CHARACTERISTICS OF SUBSTITUTIONS AND THEIR PROPERTIES

Computational nanotechnology ◽

10.33693/2313-223x-2020-7-2-34-41 ◽

2020 ◽

Vol 7 (2) ◽

pp. 34-41

Author(s):

VLADIMIR NIKONOV ◽

◽

ANTON ZOBOV ◽

Keyword(s):

Mathematical Expectation ◽

Point Of View ◽

Small Range ◽

Computational Point ◽

Wide Range ◽

Bijective Function ◽

The Difference ◽

Element Base ◽

Selection Of

The construction and selection of a suitable bijective function, that is, substitution, is now becoming an important applied task, particularly for building block encryption systems. Many articles have suggested using different approaches to determining the quality of substitution, but most of them are highly computationally complex. The solution of this problem will significantly expand the range of methods for constructing and analyzing scheme in information protection systems. The purpose of research is to find easily measurable characteristics of substitutions, allowing to evaluate their quality, and also measures of the proximity of a particular substitutions to a random one, or its distance from it. For this purpose, several characteristics were proposed in this work: difference and polynomial, and their mathematical expectation was found, as well as variance for the difference characteristic. This allows us to make a conclusion about its quality by comparing the result of calculating the characteristic for a particular substitution with the calculated mathematical expectation. From a computational point of view, the thesises of the article are of exceptional interest due to the simplicity of the algorithm for quantifying the quality of bijective function substitutions. By its nature, the operation of calculating the difference characteristic carries out a simple summation of integer terms in a fixed and small range. Such an operation, both in the modern and in the prospective element base, is embedded in the logic of a wide range of functional elements, especially when implementing computational actions in the optical range, or on other carriers related to the field of nanotechnology.

Download Full-text

Generalized Heterodyne Configurations for Photo-induced Force Microscopy

10.26434/chemrxiv.9633407 ◽

2019 ◽

Author(s):

Le Wang ◽

Devon Jakob ◽

Haomin Wang ◽

Alexis Apostolos ◽

Marcos M. Pires ◽

...

Keyword(s):

Spatial Resolution ◽

Repetition Rate ◽

Modulation Frequency ◽

High Harmonic ◽

Chemical Imaging ◽

Heterodyne Detection ◽

Force Microscopy ◽

Wide Range ◽

The Difference ◽

Afm Cantilever

<div>Infrared chemical microscopy through mechanical probing of light-matter interactions by atomic force microscopy (AFM) bypasses the diffraction limit. One increasingly popular technique is photo-induced force microscopy (PiFM), which utilizes the mechanical heterodyne signal detection between cantilever mechanical resonant oscillations and the photo induced force from light-matter interaction. So far, photo induced force microscopy has been operated in only one heterodyne configuration. In this article, we generalize heterodyne configurations of photoinduced force microscopy by introducing two new schemes: harmonic heterodyne detection and sequential heterodyne detection. In harmonic heterodyne detection, the laser repetition rate matches integer fractions of the difference between the two mechanical resonant modes of the AFM cantilever. The high harmonic of the beating from the photothermal expansion mixes with the AFM cantilever oscillation to provide PiFM signal. In sequential heterodyne detection, the combination of the repetition rate of laser pulses and polarization modulation frequency matches the difference between two AFM mechanical modes, leading to detectable PiFM signals. These two generalized heterodyne configurations for photo induced force microscopy deliver new avenues for chemical imaging and broadband spectroscopy at ~10 nm spatial resolution. They are suitable for a wide range of heterogeneous materials across various disciplines: from structured polymer film, polaritonic boron nitride materials, to isolated bacterial peptidoglycan cell walls. The generalized heterodyne configurations introduce flexibility for the implementation of PiFM and related tapping mode AFM-IR, and provide possibilities for additional modulation channel in PiFM for targeted signal extraction with nanoscale spatial resolution.</div>

Download Full-text

Mode Shape-Based Damage Detection Method (MSDI): Experimental Validation

Applied Sciences ◽

10.3390/app11104589 ◽

2021 ◽

Vol 11 (10) ◽

pp. 4589

Author(s):

Ivan Duvnjak ◽

Domagoj Damjanović ◽

Marko Bartolac ◽

Ana Skender

Keyword(s):

Boundary Conditions ◽

Damage Detection ◽

Mode Shape ◽

Experimental Validation ◽

Detection Method ◽

Dynamic Properties ◽

Damage Index ◽

Main Principle ◽

The Difference ◽

Different Levels

The main principle of vibration-based damage detection in structures is to interpret the changes in dynamic properties of the structure as indicators of damage. In this study, the mode shape damage index (MSDI) method was used to identify discrete damages in plate-like structures. This damage index is based on the difference between modified modal displacements in the undamaged and damaged state of the structure. In order to assess the advantages and limitations of the proposed algorithm, we performed experimental modal analysis on a reinforced concrete (RC) plate under 10 different damage cases. The MSDI values were calculated through considering single and/or multiple damage locations, different levels of damage, and boundary conditions. The experimental results confirmed that the MSDI method can be used to detect the existence of damage, identify single and/or multiple damage locations, and estimate damage severity in the case of single discrete damage.

Download Full-text

Six ultraviolet minutes for cleaner operating theatres

European Journal of Public Health ◽

10.1093/eurpub/ckaa166.580 ◽

2020 ◽

Vol 30 (Supplement_5) ◽

Author(s):

R Bosco ◽

S Gambelli ◽

V Urbano ◽

G Cevenini ◽

G Messina

Keyword(s):

Cross Sectional Study ◽

Cross Sectional ◽

Before And After ◽

Operating Theatres ◽

Cleaning Procedures ◽

The Difference ◽

Contamination Levels ◽

Different Levels ◽

Better Than ◽

Final Cleaning

Abstract Background Sanitizing the operating theatres (OT) is important to minimize risk of post-operative infections. Disinfection procedures between one operation and another is less aggressive than final cleaning procedures, at the end of the day. Aim was assessing the difference of contamination: i) between different levels of disinfection; ii) before and after the use of a UVC Device (UVC-D). Methods Between December 2019/February 2020 a cross sectional study was conducted in OT in a real clinical context. 94 Petri dishes (PD) were used in 3 OT. Three different sanitation levels (SL1-3) were compared pre- and post-use of UVC-D: i) No cleaning after surgery (SL1); ii) after in-between cleaning (SL2); iii) after terminal cleaning (SL3). UVC-D was employed for 6 minutes, 3 minutes per bed side. PD were incubated at 36 °C and colony forming unit (CFU) counted at 48h. Descriptive statistic, Wilcoxon and Mann-Whitney tests were performed to assess the contamination levels in total, pre/post use of UVC-D, and between different sanitation levels, respectively. Results In total we had a mean of 3.39 CFU/PD (C.I. 2.05 - 4.74) and a median of 1 CFU/PD (Min. 0 - Max. 39), after UVC-D use we had a mean of 2.20 CFU/PD (C.I. 0.69 - 5.09) and a median of 0 CFU/PD (Min. 0 - Max. 133). The UVC-D led to a significant reduction of CFU (p < 0.001). Without UVC-D we had a significant CFU drop (p < 0.05) between SL1 and SL3. Using UVC-D, we observed significant reductions of contamination (p < 0.05) between SL3 and SL1. Comparing SL1 (median 0) post UVC-D use vs SL2 pre UVC-D use (median 0.5), and SL2 post UVC-D use (median 0) vs SL3 pre UVC-D use (median 1) we had a significant reduction of contamination (p < 0.05). Conclusions UVC-D improved environmental contamination in any of the three sanitation levels. Furthermore, the use of UVC-D alone was better than in-between and terminal cleaning. Although these encouraging results, the cleaning procedures executed by dedicated staff has to be considered. Key messages UVC are efficient to decrease contamination in operating theatres regardless of sanitation levels. The additional use of UVC technology to standard cleaning procedures significantly improves sanitation levels.

Download Full-text

Insertion of vowels in English syllabic consonantal clusters pronounced by L1 Polish speakers

Open Linguistics ◽

10.1515/opli-2021-0014 ◽

2021 ◽

Vol 7 (1) ◽

pp. 331-341

Author(s):

Urszula Chwesiuk

Keyword(s):

Language Learning ◽

Teaching Methodology ◽

English Studies ◽

Vowel Reduction ◽

Potential Influence ◽

English System ◽

The Difference ◽

Practical Implications ◽

Different Levels

Abstract The aim of this study was an attempt to verify whether Polish speakers of English insert a vowel in the word-final clusters containing a consonant and a syllabic /l/ or /n/ due to the L1–L2 transfer. L1 Polish speakers are mostly unaware of the existence of syllabic consonants; hence, they use the Polish phonotactics and articulate a vocalic sound before a final sonorant which is deprived of its syllabicity. This phenomenon was examined among L1 Polish speakers, 1-year students of English studies, and the recording sessions were repeated a year later. Since, over that time, they were instructed with regard to phonetics and phonology but also the overall practical language learning, the results demonstrated the occurrence of the phenomenon of vowel insertion on different levels of advanced command of English. If the vowels were inserted, their quality and length were monitored and analysed. With regard to the English system, pronouncing vowel /ə/ before a syllabic consonant is possible, yet not usual. That is why another aim of this study is to examine to what extent the vowels articulated by the subjects differ from the standard pronunciation of non-final /ə/. The quality differences between the vowels articulated in the words ending with /l/ and /n/ were examined as well as the potential influence from the difference between /l/ and /n/ on the occurrence of vowel reduction. Even though Polish phonotactics permit numerous consonantal combinations in all word positions, it proved to be challenging for L1 Polish speakers to pronounce word-final consonantal clusters containing both syllabic sonorants. This result carries practical implications for the teaching methodology of English phonetics.

Download Full-text

Natural Science and Supernatural Thought Experiments

Religions ◽

10.3390/rel10060389 ◽

2019 ◽

Vol 10 (6) ◽

pp. 389

Author(s):

James Robert Brown

Keyword(s):

Natural Science ◽

Thought Experiment ◽

Thought Experiments ◽

The Other ◽

Space And Time ◽

Wide Range ◽

The Difference ◽

Theological Thought

Religious notions have long played a role in epistemology. Theological thought experiments, in particular, have been effective in a wide range of situations in the sciences. Some of these are merely picturesque, others have been heuristically important, and still others, as I will argue, have played a role that could be called essential. I will illustrate the difference between heuristic and essential with two examples. One of these stems from the Newton–Leibniz debate over the nature of space and time; the other is a thought experiment of my own constructed with the aim of making a case for a more liberal view of evidence in mathematics.

Download Full-text

An experimental assessment of the evaporation correlations for natural, forced and combined convection regimes

Proceedings of the Institution of Mechanical Engineers Part C Journal of Mechanical Engineering Science ◽

10.1177/0954406211413961 ◽

2011 ◽

Vol 226 (1) ◽

pp. 145-153 ◽

Cited By ~ 2

Author(s):

A Jodat ◽

M Moghiman

Keyword(s):

Mixed Convection ◽

Forced Convection ◽

Evaporation Rate ◽

Density Variation ◽

Convection Regime ◽

Combined Convection ◽

Wide Range ◽

The Difference ◽

Rate Of Evaporation ◽

Evaporation Models

In the present study, the applicability of widely used evaporation models (Dalton approach-based correlations) is experimentally investigated for natural, forced, and combined convection regimes. A series of experimental measurements are carried out over a wide range of water temperatures and air velocities for 0.01 ≤ Gr/Re2 ≤ 100 in a heated rectangular pool. The investigations show that the evaporation rate strongly depends on the convection regime's Gr/ Re2 value. The results show that the evaporation rate increases with the difference in vapour pressures over both forced convection (0.01 ≤ Gr/Re2 ≤ 0.1) and turbulent mixed convection regimes (0.15 ≤ Gr/Re2 ≤ 25). However, the escalation rate of evaporation decreases with Gr/ Re2 in the forced convection regime whereas in the turbulent mixed convection it increases. In addition, over the range of the free convection regime ( Gr/Re2 ≥ 25), the evaporation rate is affected not only by the vapour pressure difference but also by the density variation. A dimensionless correlation using the experimental data of all convection regimes (0.01 ≤ Gr/Re2 ≤ 100) is proposed to cover different water surface geometries and airflow conditions.

Download Full-text