evaluation measure
Recently Published Documents


TOTAL DOCUMENTS

256
(FIVE YEARS 50)

H-INDEX

20
(FIVE YEARS 2)

Author(s):  
Jos Hornikx ◽  
Annemarie Weerman ◽  
Hans Hoeken

According to Mercier and Sperber (2009, 2011, 2017), people have an immediate and intuitive feeling about the strength of an argument. These intuitive evaluations are not captured by current evaluation methods of argument strength, yet they could be important to predict the extent to which people accept the claim supported by the argument. In an exploratory study, therefore, a newly developed intuitive evaluation method to assess argument strength was compared to an explicit argument strength evaluation method (the PAS scale; Zhao et al., 2011), on their ability to predict claim acceptance (predictive validity) and on their sensitivity to differences in the manipulated quality of arguments (construct validity). An experimental study showed that the explicit argument strength evaluation performed well on the two validity measures. The intuitive evaluation measure, on the other hand, was not found to be valid. Suggestions for other ways of constructing and testing intuitive evaluation measures are presented.


2021 ◽  
Vol 11 (21) ◽  
pp. 10267
Author(s):  
Puri Phakmongkol ◽  
Peerapon Vateekul

Question Answering (QA) is a natural language processing task that enables the machine to understand a given context and answer a given question. There are several QA research trials containing high resources of the English language. However, Thai is one of the languages that have low availability of labeled corpora in QA studies. According to previous studies, while the English QA models could achieve more than 90% of F1 scores, Thai QA models could obtain only 70% in our baseline. In this study, we aim to improve the performance of Thai QA models by generating more question-answer pairs with Multilingual Text-to-Text Transfer Transformer (mT5) along with data preprocessing methods for Thai. With this method, the question-answer pairs can synthesize more than 100 thousand pairs from provided Thai Wikipedia articles. Utilizing our synthesized data, many fine-tuning strategies were investigated to achieve the highest model performance. Furthermore, we have presented that the syllable-level F1 is a more suitable evaluation measure than Exact Match (EM) and the word-level F1 for Thai QA corpora. The experiment was conducted on two Thai QA corpora: Thai Wiki QA and iApp Wiki QA. The results show that our augmented model is the winner on both datasets compared to other modern transformer models: Roberta and mT5.


2021 ◽  
Vol 39 (4) ◽  
pp. 1-22
Author(s):  
Aldo Lipani ◽  
Ben Carterette ◽  
Emine Yilmaz

As conversational agents like Siri and Alexa gain in popularity and use, conversation is becoming a more and more important mode of interaction for search. Conversational search shares some features with traditional search, but differs in some important respects: conversational search systems are less likely to return ranked lists of results (a SERP), more likely to involve iterated interactions, and more likely to feature longer, well-formed user queries in the form of natural language questions. Because of these differences, traditional methods for search evaluation (such as the Cranfield paradigm) do not translate easily to conversational search. In this work, we propose a framework for offline evaluation of conversational search, which includes a methodology for creating test collections with relevance judgments, an evaluation measure based on a user interaction model, and an approach to collecting user interaction data to train the model. The framework is based on the idea of “subtopics”, often used to model novelty and diversity in search and recommendation, and the user model is similar to the geometric browsing model introduced by RBP and used in ERR. As far as we know, this is the first work to combine these ideas into a comprehensive framework for offline evaluation of conversational search.


2021 ◽  
Author(s):  
Tsukasa Fukunaga ◽  
Wataru Iwasaki

Motivation: Phylogenetic profiling is a powerful computational method for revealing the functions of function-unknown genes. Although conventional similarity evaluation measures in phylogenetic profiling showed high prediction accuracy, they have two estimation biases: an evolutionary bias and a spurious correlation bias. Existing studies have focused on the evolutionary bias, but the spurious correlation bias has not been analyzed. Results: To eliminate the spurious correlation bias, we applied an evaluation measure based on the inverse Potts model (IPM) to phylogenetic profiling. We also proposed an evaluation measure to remove both the evolutionary and spurious correlation biases using the IPM. In an empirical dataset analysis, we demonstrated that these IPM-based evaluation measures improved the prediction performance of phylogenetic profiling. In addition, we found that the integration of several evaluation measures, including the IPM-based evaluation measures, had superior performance to a single evaluation measure.


2021 ◽  
Author(s):  
Jeffrey Smith ◽  
Alexander Whalley ◽  
Nathaniel Wilcox

Managers of workforce training programs are often unable to afford costly, full-fledged experimental or nonexperimental evaluations to determine their programs’ impacts. Therefore, many rely on the survey responses of program participants to gauge program impacts. Smith, Whalley, and Wilcox present the first attempt to assess such measures despite their already widespread use in program evaluations. They develop a multidisciplinary framework for addressing the issue and apply it to three case studies: the National Job Training Partnership Act Study, the U.S. National Supported Work Demonstration, and the Connecticut Jobs First Program. Each of these studies were subjected to experimental evaluations that included a survey-based participant evaluation measure. The authors apply econometric methods specifically developed to obtain estimates of program impacts among individuals in the studies and then compare these estimates with survey-based participant evaluation measures to obtain an assessment of the surveys’ efficacy. The authors also discuss how their findings fit into the broader literatures in economics, psychology, and survey research.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Mira Park ◽  
Hoe-Bin Jeong ◽  
Jong-Hyun Lee ◽  
Taesung Park

Abstract Background Identifying interaction effects between genes is one of the main tasks of genome-wide association studies aiming to shed light on the biological mechanisms underlying complex diseases. Multifactor dimensionality reduction (MDR) is a popular approach for detecting gene–gene interactions that has been extended in various forms to handle binary and continuous phenotypes. However, only few multivariate MDR methods are available for multiple related phenotypes. Current approaches use Hotelling’s T2 statistic to evaluate interaction models, but it is well known that Hotelling’s T2 statistic is highly sensitive to heavily skewed distributions and outliers. Results We propose a robust approach based on nonparametric statistics such as spatial signs and ranks. The new multivariate rank-based MDR (MR-MDR) is mainly suitable for analyzing multiple continuous phenotypes and is less sensitive to skewed distributions and outliers. MR-MDR utilizes fuzzy k-means clustering and classifies multi-locus genotypes into two groups. Then, MR-MDR calculates a spatial rank-sum statistic as an evaluation measure and selects the best interaction model with the largest statistic. Our novel idea lies in adopting nonparametric statistics as an evaluation measure for robust inference. We adopt tenfold cross-validation to avoid overfitting. Intensive simulation studies were conducted to compare the performance of MR-MDR with current methods. Application of MR-MDR to a real dataset from a Korean genome-wide association study demonstrated that it successfully identified genetic interactions associated with four phenotypes related to kidney function. The R code for conducting MR-MDR is available at https://github.com/statpark/MR-MDR. Conclusions Intensive simulation studies comparing MR-MDR with several current methods showed that the performance of MR-MDR was outstanding for skewed distributions. Additionally, for symmetric distributions, MR-MDR showed comparable power. Therefore, we conclude that MR-MDR is a useful multivariate non-parametric approach that can be used regardless of the phenotype distribution, the correlations between phenotypes, and sample size.


2021 ◽  
Vol 11 (17) ◽  
pp. 8051
Author(s):  
Chengxiao Shen ◽  
Liping Qian ◽  
Ningning Yu

In an era of big data, face images captured in social media and forensic investigations, etc., generally lack labels, while the number of identities (clusters) may range from a few dozen to thousands. Therefore, it is of practical importance to cluster a large number of unlabeled face images into an efficient range of identities or even the exact identities, which can avoid image labeling by hand. Here, we propose adaptive facial imagery clustering that involves face representations, spectral clustering, and reinforcement learning (Q-learning). First, we use a deep convolutional neural network (DCNN) to generate face representations, and we adopt a spectral clustering model to construct a similarity matrix and achieve clustering partition. Then, we use an internal evaluation measure (the Davies–Bouldin index) to evaluate the clustering quality. Finally, we adopt Q-learning as the feedback module to build a dynamic multiparameter debugging process. The experimental results on the ORL Face Database show the effectiveness of our method in terms of an optimal number of clusters of 39, which is almost the actual number of 40 clusters; our method can achieve 99.2% clustering accuracy. Subsequent studies should focus on reducing the computational complexity of dealing with more face images.


Hand ◽  
2021 ◽  
pp. 155894472110289
Author(s):  
Simo Mattila ◽  
Eero Waris

Background Implant arthroplasties for trapeziometacarpal osteoarthritis are often associated with high complication and revision surgery rates. There are no previous studies reporting revision outcomes of failed interposition implant arthroplasty. Methods A patient register search was done for all implant arthroplasties performed for trapeziometacarpal osteoarthritis during a 10-year period in a single hand surgical unit. Altogether, 32 patients had primary interposition implant arthroplasty (Artelon 22, Pyrosphere 6, Ortosphere 2, and Pyrodisk 2), and 19 of these patients had revision surgery with 23 revision procedures performed. In all, 15 of the revised 19 patients were reexamined clinically (Connolly-Rath score, Quick Disabilities of the Arm Shoulder and Hand, patient evaluation measure, the visual analog score for pain, thumb range of motion and strength measurements) and radiographically. Results The indication for revision surgery was pain alone or implant dislocation accompanied by pain in all cases. Thirteen of the revised 15 patients reported functional deficit and pain after revision. There was no statistically significant difference in the revision outcomes between patients operated on primarily with the Artelon implant versus pyrocarbon/ceramic implants. Compared to previous studies on revision surgery and primary trapeziometacarpal arthroplasty, our results showed slightly higher pain and poorer functional scores. Conclusions Interposition implant arthroplasty may yield high revision rates. The results after revision surgery may be worse than previously described, and there may also be a tendency for worse results than those of primary arthroplasty. Interposition implant arthroplasty should always be thoroughly contemplated.


Author(s):  
Nelson Loyola Lopez ◽  
Carlos Acuña Carrasco ◽  
Leonardo Arenas Bravo ◽  
Mariela Arriola Herrera

The eggplant (Solanum melongena L.) it is edible fruit plant, rich in vitamins, minerals and phenolic compounds, so its consumption brings health benefits. The objective of this work was to evaluate the quality nutritional, sensory and hygiene assessment of the beverage based on eggplant juice with CO2 injection. This study had four treatments; Treatment T0 Eggplant juice (330 mL), T1 Eggplant juice (330 mL) + CO2 (1,94 g) T2 Eggplant juice (330 mL) + CO2 (1,94 g) + Benzoate (1 g), T3 Eggplant juice (330 mL) + CO2 (1,94 g) + Sucralose (1 g). Evaluations were carried out at 24 hours, 30 and 60 days of storage (0°C and 95% RH) and, at each of these moments, the acidity content, pH, reducing sugars, soluble solids and Vitamin C were determined. Sensorial evaluation measure were; color, flavor, texture, aroma and also acceptability and appearance. In regards to microbiological analysis, total coliforms were measured at 24 hours after the beverage was made. There was significant difference in the results of appearances between beverages elaborated with treatments T3 in relation with T0, T1 and T2 at 24 hours and 30 days of storage. Because of beverage with treatment T3 had got better acceptability by the panelists, mainly at 30 days of storage. Therefore, beverages elaborated with treatments T0 at 24 hours and 30 days of storage had lower appearance and acceptability. The vitamin C content showed a 25 mg (23 mg standard) in the beverages elaborated with treatments T3. The pasteurization process allowed the absence of total coliform in the beverages elaborated and also the good manufacture practices obtained an innocuous product to be consume.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Alessandro Taurino ◽  
Linda A. Antonucci ◽  
Paolo Taurisano ◽  
Domenico Laera

Abstract Background Substance Use Disorder (SUD) causes a great deal of personal suffering for patients. Recent evidence highlights how defenses and emotion regulation may play a crucial part in the onset and development of this disorder. The aim of this study was to investigate potential differences in the defensive functioning between SUD patients and non-clinical controls. Secondly, we aimed at investigating the relationships between alexithymia and maladaptive/assimilation defenses. Methods The authors assessed defensive functioning (Response Evaluation Measure-71, REM-71), personality (MMPI-II), and alexithymia (TAS-20) of 171 SUD patients (17% female; mean age = 36.5), compared to 155 controls. Authors performed a series of ANOVAs to investigate the defensive array in SUD patients compared to that of non-clinical controls. Student t test for indipendent samples was used to compare clinical characteristics between the SUD group and the controls. To investigate the role of single defenses in explaining alexithimia’s subscores, stepwise multiple regression analysis were carried out on socio-demographic characteristics of participants (gender, age, and years of education), with REM-71 defenses as predictors. Results SUD patients presented a more maladaptive/assimilation (Factor 1) defensive array (p < .001). Among SUD sub-groups, Alcohol Use Disorder patients showed more disfuncional defenses. Factor 1 defenses were related to a worse psychological functioning. In addition, alexyhimia (particularly DIF) was strongly related to Factor 1 defenses, expecially Projection (38% of variance explained, β = .270, p < .001). Conclusion The REM-71 and the TAS-20 might be useful screening instruments among SUD patients.


Sign in / Sign up

Export Citation Format

Share Document