evaluation scores
Recently Published Documents


TOTAL DOCUMENTS

293
(FIVE YEARS 132)

H-INDEX

19
(FIVE YEARS 3)

2022 ◽  
Vol 40 (3) ◽  
pp. 1-47
Author(s):  
Ameer Albahem ◽  
Damiano Spina ◽  
Falk Scholer ◽  
Lawrence Cavedon

In many search scenarios, such as exploratory, comparative, or survey-oriented search, users interact with dynamic search systems to satisfy multi-aspect information needs. These systems utilize different dynamic approaches that exploit various user feedback granularity types. Although studies have provided insights about the role of many components of these systems, they used black-box and isolated experimental setups. Therefore, the effects of these components or their interactions are still not well understood. We address this by following a methodology based on Analysis of Variance (ANOVA). We built a Grid Of Points that consists of systems based on different ways to instantiate three components: initial rankers, dynamic rerankers, and user feedback granularity. Using evaluation scores based on the TREC Dynamic Domain collections, we built several ANOVA models to estimate the effects. We found that (i) although all components significantly affect search effectiveness, the initial ranker has the largest effective size, (ii) the effect sizes of these components vary based on the length of the search session and the used effectiveness metric, and (iii) initial rankers and dynamic rerankers have more prominent effects than user feedback granularity. To improve effectiveness, we recommend improving the quality of initial rankers and dynamic rerankers. This does not require eliciting detailed user feedback, which might be expensive or invasive.


Author(s):  
Rupjyoti Baruah ◽  
Rajesh Kumar Mundotiya ◽  
Anil Kumar Singh

Machine translation (MT) systems have been built using numerous different techniques for bridging the language barriers. These techniques are broadly categorized into approaches like Statistical Machine Translation (SMT) and Neural Machine Translation (NMT). End-to-end NMT systems significantly outperform SMT in translation quality on many language pairs, especially those with the adequate parallel corpus. We report comparative experiments on baseline MT systems for Assamese to other Indo-Aryan languages (in both translation directions) using the traditional Phrase-Based SMT as well as some more successful NMT architectures, namely basic sequence-to-sequence model with attention, Transformer, and finetuned Transformer. The results are evaluated using the most prominent and popular standard automatic metric BLEU (BiLingual Evaluation Understudy), as well as other well-known metrics for exploring the performance of different baseline MT systems, since this is the first such work involving Assamese. The evaluation scores are compared for SMT and NMT models for the effectiveness of bi-directional language pairs involving Assamese and other Indo-Aryan languages (Bangla, Gujarati, Hindi, Marathi, Odia, Sinhalese, and Urdu). The highest BLEU scores obtained are for Assamese to Sinhalese for SMT (35.63) and the Assamese to Bangla for NMT systems (seq2seq is 50.92, Transformer is 50.01, and finetuned Transformer is 50.19). We also try to relate the results with the language characteristics, distances, family trees, domains, data sizes, and sentence lengths. We find that the effect of the domain is the most important factor affecting the results for the given data domains and sizes. We compare our results with the only existing MT system for Assamese (Bing Translator) and also with pairs involving Hindi.


2022 ◽  
Author(s):  
Valery Semenovich Lukashenko ◽  
Irina Pavlovna Saleeva ◽  
Victor Grigorievich Volik ◽  
Dilaram Yuldashevna Ismailova ◽  
Evgenia Vladimirovna Zhuravchuk

The aim of this research was to study the biochemical properties of a new protein-rich feed additive produced by the short-term intense thermal treatment and subsequent enzymatic hydrolysis of the wastes of poultry slaughter and primary processing (feathers and fluff). It was found that this feather-based fermented feed additive contained high amounts of crude protein (86.52%); and the content of easily digestible low-molecular peptides in the additive was 9% higher compared to fishmeal. The amino acid profiles of the additive and fishmeal were compared. The effectiveness of substituting the additive for fishmeal in the diet of broiler chicks was demonstrated by the in vivoexperiments. The results showed that the digestibility of the dietary nutrients was higher in broilers that were fed the new additive compared to those fed fishmeal, which resulted in higher meat productivity: the average daily weight gains in additive-fed broilers was 3.82% higher (p <0.01) compared to fishmeal-fed control broilers, the dressing was 1.4%higher, the muscle in the carcass was 2.1% higher, and the feed conversion ratio was 3.57%lower. The sensory evaluation scores of the meat and broth were also higher in the additive-fed broilers. Keywords: feedadditive, feather wastes of poultry slaughter, enzymatic hydrolysis, distribution of molecular peptide weights, digestibility, productive performance in broilers


Technologies ◽  
2022 ◽  
Vol 10 (1) ◽  
pp. 7
Author(s):  
Christos Sevastopoulos ◽  
Stasinos Konstantopoulos ◽  
Keshav Balaji ◽  
Mohammad Zaki Zadeh ◽  
Fillia Makedon

Training on simulation data has proven invaluable in applying machine learning in robotics. However, when looking at robot vision in particular, simulated images cannot be directly used no matter how realistic the image rendering is, as many physical parameters (temperature, humidity, wear-and-tear in time) vary and affect texture and lighting in ways that cannot be encoded in the simulation. In this article we propose a different approach for extracting value from simulated environments: although neither of the trained models can be used nor are any evaluation scores expected to be the same on simulated and physical data, the conclusions drawn from simulated experiments might be valid. If this is the case, then simulated environments can be used in early-stage experimentation with different network architectures and features. This will expedite the early development phase before moving to (harder to conduct) physical experiments in order to evaluate the most promising approaches. In order to test this idea we created two simulated environments for the Unity engine, acquired simulated visual datasets, and used them to reproduce experiments originally carried out in a physical environment. The comparison of the conclusions drawn in the physical and the simulated experiments is promising regarding the validity of our approach.


2022 ◽  
Vol 9 ◽  
Author(s):  
Joseph Ollier ◽  
Marcia Nißen ◽  
Florian von Wangenheim

Background: Conversational agents (CAs) are a novel approach to delivering digital health interventions. In human interactions, terms of address often change depending on the context or relationship between interlocutors. In many languages, this encompasses T/V distinction—formal and informal forms of the second-person pronoun “You”—that conveys different levels of familiarity. Yet, few research articles have examined whether CAs' use of T/V distinction across language contexts affects users' evaluations of digital health applications.Methods: In an online experiment (N = 284), we manipulated a public health CA prototype to use either informal or formal T/V distinction forms in French (“tu” vs. “vous”) and German (“du” vs. “Sie”) language settings. A MANCOVA and post-hoc tests were performed to examine the effects of the independent variables (i.e., T/V distinction and Language) and the moderating role of users' demographic profile (i.e., Age and Gender) on eleven user evaluation variables. These were related to four themes: (i) Sociability, (ii) CA-User Collaboration, (iii) Service Evaluation, and (iv) Behavioral Intentions.Results: Results showed a four-way interaction between T/V Distinction, Language, Age, and Gender, influencing user evaluations across all outcome themes. For French speakers, when the informal “T form” (“Tu”) was used, higher user evaluation scores were generated for younger women and older men (e.g., the CA felt more humanlike or individuals were more likely to recommend the CA), whereas when the formal “V form” (“Vous”) was used, higher user evaluation scores were generated for younger men and older women. For German speakers, when the informal T form (“Du”) was used, younger users' evaluations were comparable regardless of Gender, however, as individuals' Age increased, the use of “Du” resulted in lower user evaluation scores, with this effect more pronounced in men. When using the formal V form (“Sie”), user evaluation scores were relatively stable, regardless of Gender, and only increasing slightly with Age.Conclusions: Results highlight how user CA evaluations vary based on the T/V distinction used and language setting, however, that even within a culturally homogenous language group, evaluations vary based on user demographics, thus highlighting the importance of personalizing CA language.


2022 ◽  
Vol 2022 ◽  
pp. 1-9
Author(s):  
Jiahui Gu

The traditional mixed oral English teaching model has many obvious shortcomings, such as the inability to correct the students’ oral pronunciation errors and feed them back in time, which leads to the slow improvement of students’ English learning level. For this reason, this paper proposes a guided teaching model based on core literacy. According to the structure of the oral English mixed teaching model, determine the application plan of the oral English mixed teaching model, design the development environment, obtain the corpus, design the oral training model, extract the oral features, identify the wrong pronunciation and correct it in time, clarify the evaluation purpose, obtain preliminary evaluation indicators, reduce evaluation indicators and determine indicator weights, obtain indicator feature information, generate fuzzy rules, obtain fuzzy matrices, achieve quantitative evaluation, and synthesize all evaluation scores to construct a result vector matrix to realize the study of mixed spoken language teaching mode. Research shows that the mixed teaching method is effective and feasible and can effectively improve the accuracy of the evaluation results of the mixed oral English teaching model.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Shishir Ram Shetty ◽  
Satyavrat Arya ◽  
Vinayak Kamath ◽  
Saad Al-Bayatti ◽  
Hesham Marei ◽  
...  

Objectives. Radiography-based indices can help surgeons perform detailed examinations of the surgical site and predict the surgical difficulty of cases. We aimed to develop and validate a novel CBCT-based index that can predict the surgical difficulty of sinus-augmentation procedures. Materials and Methods. In the first stage, five experienced dental specialists performed a review of the literature and closed group discussions and designed the novel index. In the next stage, the index was validated. CBCT scans of 30 patients scheduled for sinus-augmentation procedures were evaluated and assigned presurgical CBCT evaluation scores (PSCESs) by five examiners. Subsequently, one oral surgeon performed sinus augmentation using the lateral antrostomy technique and assigned surgical difficulty scores (SDSs) to each of the 30 cases along with 2 observers. The PSCESs and SDSs were statistically analysed to determine the interrater reliability and validity of the index. Results. The interrater agreement of the PSCES among the five presurgical evaluators was 0.85. The PSCES of the five evaluators had highly significant correlation ( P < 0.001 , r = 0.68   to   0.76 ) with the SDS. Regression analysis revealed that for every unit increase in the PSCES, there is 0.46 to 0.57 increase in the SDS value. Conclusion. The results of this pilot study revealed that a novel CBCT-based index can be used as a reliable tool for predicting the surgical difficulty of sinus-augmentation procedures. However, the novel index needs to be tested on a larger sample of patients and evaluators for a more concrete validity and reliability.


Sensors ◽  
2021 ◽  
Vol 21 (24) ◽  
pp. 8368
Author(s):  
Kouki Fujioka

Cheese aroma is known to affect consumer preference. One of the methods to measure cheese aroma is the use of an electronic nose (e-nose), which has been used to classify cheese types, production areas, and cheese ages. However, few studies have directly compared the aroma intensity scores derived from sensory evaluations with the values of metal oxide semiconductor sensors that can easily measure the aroma intensity. This pilot study aimed to investigate the relationship between sensory evaluation scores and e-nose values with respect to cheese aroma. Five types of processed cheese (two types of normal processed cheese, one type containing aged cheese, and two types containing blue cheese), and one type of natural cheese were used as samples. The sensor values obtained using the electronic nose, which measured sample aroma non-destructively, and five sensory evaluation scores related to aroma (aroma intensity before intake, during mastication, and after swallowing; taste intensity during mastication; and remaining flavor after swallowing (lasting flavor)) determined by six panelists, were compared. The e-nose values of many of the tested cheese types were significantly different, whereas the sensory scores of the one or two types of processed cheese containing blue cheese and those of the natural cheese were significantly different. Significant correlations were observed between the means of e-nose values and the medians of aroma intensity scores derived from the sensory evaluation testing before intake, during mastication, and after swallowing. In particular, the aroma intensity score during mastication was found to have a linear relationship with the e-nose values (Pearson’s R = 0.983). In conclusion, the e-nose values correlated with the sensory scores with respect to cheese aroma intensity and could be helpful in predicting them.


2021 ◽  
Vol 13 (23) ◽  
pp. 13453
Author(s):  
Hyunmin Oh ◽  
Sambock Park

This study empirically analyzes the relationship between cost stickiness and earnings transparency. Additionally, this study examines the effect of corporate sustainable management (CSM) on the relationship between cost stickiness and earnings transparency. The evaluation scores of Korea Corporate Governance Service (KCGS) are employed to measure CSM activities. The empirical results show that the relationship between cost stickiness and earnings transparency is significant in the negative direction. This means that the more sticky the costs of a firm, the lower the earnings transparency of the firm. In addition, the relationship between the interaction variables of CSM and cost stickiness and earnings transparency is significant in the positive direction. This indicates that CSM activities act as a mechanism to mitigate the negative relationship between cost stickiness and earnings transparency. The findings of this study, which presented the effects of cost stickiness on earnings transparency and the fact that CSM activities act as a device to suppress the opportunistic cost behavior of managers, are expected to provide important implications to investors, external auditors, and supervisors.


Author(s):  
Danielle Werle ◽  
Courtney T. Byrd

Purpose: The purpose of this study was to examine the perceptual ratings and performance evaluations of students who do and do not stutter by professors who require oral presentations. Additionally, this study sought to investigate the influence of behaviors related to communication competence on perceptual and evaluative ratings. Method: One hundred fifty-eight college instructors who require oral presentations in their classes participated in this study. Participants viewed one video of four possible randomized conditions: (a) presence of stuttering + low communication competence, (b) absence of stuttering + low communication competence, (c) presence of stuttering + high communication competence, and (d) absence of stuttering + high communication competence. Participants evaluated student performance against a standardized rubric and rated the student along 16 personality traits. Results: Results of separate 2 × 2 analyses of variance revealed professors' view and evaluate students presenting with high communication competence more positively overall, regardless as to whether stuttering is present or not. Significant interactions between fluency (i.e., presence vs. absence of stuttering) and communication competence (i.e., high vs. low) were found for negative personality traits, as well as delivery evaluation scores. The video for which the student stuttered and presented with low communication competence was rated more positively than the video for which the student did not stutter and presented with low communication competence. Conclusions: Professors perceive and evaluate students who stutter differently from their nonstuttering peers, and those ratings are moderated by levels of communication competence. High-communication-competence behaviors improved perceptual and evaluation scores; however, in the presence of low-communication-competence behaviors, professors overcorrect in the form of positive feedback bias, which may have negative long-term academic consequences.


Sign in / Sign up

Export Citation Format

Share Document