human evaluation
Recently Published Documents


TOTAL DOCUMENTS

251
(FIVE YEARS 124)

H-INDEX

19
(FIVE YEARS 5)

2022 ◽  
Author(s):  
Tong Guo

In industry deep learning application, our manually labeled data has a certain number of noisy data. To solve this problem and achieve more than 90 score in dev dataset, we present a simple method to find the noisy data and re-label the noisy data by human, given the model predictions as references in human labeling. In this paper, we illustrate our idea for a broad set of deep learning tasks, includes classification, sequence tagging, object detection, sequence generation, click-through rate prediction. The experimental results and human evaluation results verify our idea.


Author(s):  
Ahrii Kim ◽  
Jinhyun Kim

SacreBLEU, by incorporating a text normalizing step in the pipeline, has been well-received as an automatic evaluation metric in recent years. With agglutinative languages such as Korean, however, the metric cannot provide a conceivable result without the help of customized pre-tokenization. In this regard, this paper endeavors to examine the influence of diversified pre-tokenization schemes –word, morpheme, character, and subword– on the aforementioned metric by performing a meta-evaluation with manually-constructed into-Korean human evaluation data. Our empirical study demonstrates that the correlation of SacreBLEU (to human judgment) fluctuates consistently by the token type. The reliability of the metric even deteriorates due to some tokenization, and MeCab is not an exception. Guiding through the proper usage of tokenizer for each metric, we stress the significance of a character level and the insignificance of a Jamo level in MT evaluation.


Author(s):  
G. Deena

This paper proposes a new rule-based approach to automated question generation. The proposed approach focuses on the analysis of both sentence syntax and semantic structure. The design and implementation of the proposed approach is also described in detail. Although the primary purpose of a design system is to generate query from sentences, automated evaluation results show that it can also perform great when reading comprehension datasets that focus on question output from paragraphs. With regard to human evaluation, the designed system performs better than all other systems and generates the most natural (human-like) questions. We present a fresh approach to automatic question generation that significantly increases the percentage of acceptable questions compared to prior state-of-the-art systems. In our system, we will take data from various sources for a particular topic and summarize it for the convenience of the people, so that they don't have to go through so multiple sites for relevant data.


2021 ◽  
Author(s):  
Yvonne W Leung ◽  
Bomi Park ◽  
Rachel Heo ◽  
Achini Adikari ◽  
Suja Chackochan ◽  
...  

BACKGROUND The negative psychosocial impacts of cancer diagnoses and treatments are well documented. Virtual care has become an essential mode of care delivery during the COVID-19 pandemic and online support groups (OSGs) are shown to improve accessibility to psychosocial and supportive care. The de Souza Institute offers CancerChatCanada, a therapist-led OSG service where sessions are monitored by an artificial intelligence-based co-facilitator (AICF). AICF is equipped with a recommender system that uses natural language processing to tailor online resources to patients according to their psychosocial needs. OBJECTIVE To outline the development protocol and to evaluate AICF on its precision and recall in recommending resources to cancer OSG members. METHODS Human input informed the design and evaluation on its ability to 1) appropriately identify key words indicating a psychosocial concern and 2) recommend the most appropriate online resource to the OSG member expressing each concern. Three rounds of human evaluation and algorithm improvement were performed iteratively. RESULTS We evaluated 7,190 outputs and achieved .797 precision, .981 recall, and an F1 score of .880 by the third round of evaluation. Resources were recommended to 48 patients and 25 (52.1%) accessed at least one resource. Of those who accessed the resources, 75.4% found them useful. CONCLUSIONS The preliminary findings suggest that AICF can help provide tailored support for cancer OSG members with high precision, recall, and satisfaction. AICF has undergone rigorous human evaluation and the results provide much-needed evidence, while outlining potential strengths and weaknesses for future applications in supportive care.


2021 ◽  
Author(s):  
Tong Guo

In industry deep learning application, our manually labeled data has a certain number of noisy data. To solve this problem and achieve more than 90 score in dev dataset, we present a simple method to find the noisy data and re-label the noisy data by human, given the model predictions as references in human labeling. In this paper, we illustrate our idea for a broad set of deep learning tasks, includes classification, sequence tagging, object detection, sequence generation, click-through rate prediction. The experimental results and human evaluation results verify our idea.


2021 ◽  
Author(s):  
Tong Guo

In industry deep learning application, our manually labeled data has a certain number of noisy data. To solve this problem and achieve more than 90 score in dev dataset, we present a simple method to find the noisy data and re-label the noisy data by human, given the model predictions as references in human labeling. In this paper, we illustrate our idea for a broad set of deep learning tasks, includes classification, sequence tagging, object detection, sequence generation, click-through rate prediction. The experimental results and human evaluation results verify our idea.


2021 ◽  
Author(s):  
Tereza Novotná

In this article, I present the results of the human evaluation experiment of three commonly used methods in legal information retrieval and a new “multilayered” approach. I use the doc2vec model, citation network analysis and two topic modelling algorithms for the Czech Supreme Court decisions retrieval and evaluate their performance. To improve the accuracy of the results of these methods, I combine the methods in a “multilayered” way and perform the subsequent evaluation. Both evaluation experiments are conducted with a group of legal experts to assess the applicability and usability of the methods for legal information retrieval. The combination of the doc2vec and citations is found satisfactory accurate for practical use for the Czech court decisions retrieval.


2021 ◽  
Author(s):  
Tong Guo

<div> <div> <p>In industry NLP application, our manually labeled data has a certain number of noisy data. We present a simple method to find the noisy data and re-label their labels to the result of model prediction. We select the noisy data whose human label is not contained in the top-K model’s predictions. The model is trained on the origin dataset. The experiment result shows that our method works. For industry deep learning application, our method improve the text classification accuracy from 80.5% to 90.6% in dev dataset, and improve the human-evaluation accuracy from 83.2% to 90.5%.<br></p> </div> </div>


Author(s):  
Julia Garbaruk ◽  
Doina Logofatu ◽  
Costin Badica ◽  
Florin Leon

Whether for optimizing the speed of microprocessors or for sequence analysis in molecular biology — evolutionary algorithms are used in astoundingly many fields. Also, the art was influenced by evolutionary algorithms — with principles of natural evolution works of art that can be created or imitated, whereby initially generated art is put through an iterated process of selection and modification. This paper covers an application in which given images are emulated evolutionary using a finite number of semi-transparent overlapping polygons, which also became known under the name “Evolution of Mona Lisa”. In this context, different approaches to solve the problem are tested and presented here. In particular, we want to investigate whether Hill Climbing Algorithm in combination with Delaunay Triangulation and Canny Edge Detector that extracts the initial population directly from the original image performs better than the conventional Hill Climbing and Genetic Algorithm, where the initial population is generated randomly.


Sign in / Sign up

Export Citation Format

Share Document