human evaluation Latest Research Papers

Achieving 90% In Data-Centric Industry Deep Learning Task

10.36227/techrxiv.17128475.v2 ◽

2022 ◽

Author(s):

Tong Guo

Keyword(s):

Deep Learning ◽

Object Detection ◽

Noisy Data ◽

Learning Task ◽

Simple Method ◽

Sequence Generation ◽

Learning Tasks ◽

Human Evaluation ◽

Model Predictions ◽

Click Through Rate

In industry deep learning application, our manually labeled data has a certain number of noisy data. To solve this problem and achieve more than 90 score in dev dataset, we present a simple method to find the noisy data and re-label the noisy data by human, given the model predictions as references in human labeling. In this paper, we illustrate our idea for a broad set of deep learning tasks, includes classification, sequence tagging, object detection, sequence generation, click-through rate prediction. The experimental results and human evaluation results verify our idea.

Download Full-text

Guidance to Pre-tokeniztion for SacreBLEU: Meta-Evaluation in Korean

10.20944/preprints202201.0018.v1 ◽

2022 ◽

Author(s):

Ahrii Kim ◽

Jinhyun Kim

Keyword(s):

Empirical Study ◽

Automatic Evaluation ◽

Human Judgment ◽

Evaluation Data ◽

Human Evaluation ◽

Mt Evaluation ◽

Evaluation Metric ◽

Agglutinative Languages

SacreBLEU, by incorporating a text normalizing step in the pipeline, has been well-received as an automatic evaluation metric in recent years. With agglutinative languages such as Korean, however, the metric cannot provide a conceivable result without the help of customized pre-tokenization. In this regard, this paper endeavors to examine the influence of diversified pre-tokenization schemes –word, morpheme, character, and subword– on the aforementioned metric by performing a meta-evaluation with manually-constructed into-Korean human evaluation data. Our empirical study demonstrates that the correlation of SacreBLEU (to human judgment) fluctuates consistently by the token type. The reliability of the metric even deteriorates due to some tokenization, and MeCab is not an exception. Guiding through the proper usage of tokenizer for each metric, we stress the significance of a character level and the insignificance of a Jamo level in MT evaluation.

Download Full-text

A Study on Text Rank Algorithm for Automatic Text Summarization and Question Generation

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-2302 ◽

2021 ◽

pp. 24-28

Author(s):

G. Deena

Keyword(s):

Design System ◽

Question Generation ◽

Automated Evaluation ◽

Automatic Text Summarization ◽

Human Evaluation ◽

The People ◽

Rule Based Approach ◽

Automatic Question Generation ◽

Automatic Text ◽

Better Than

This paper proposes a new rule-based approach to automated question generation. The proposed approach focuses on the analysis of both sentence syntax and semantic structure. The design and implementation of the proposed approach is also described in detail. Although the primary purpose of a design system is to generate query from sentences, automated evaluation results show that it can also perform great when reading comprehension datasets that focus on question output from paragraphs. With regard to human evaluation, the designed system performs better than all other systems and generates the most natural (human-like) questions. We present a fresh approach to automatic question generation that significantly increases the percentage of acceptable questions compared to prior state-of-the-art systems. In our system, we will take data from various sources for a particular topic and summarize it for the convenience of the people, so that they don't have to go through so multiple sites for relevant data.

Download Full-text

Providing care beyond the therapy session — a natural language processing–based recommender system that identifies cancer patients who experience psychosocial challenges and provides self-care support (Preprint)

10.2196/preprints.35893 ◽

2021 ◽

Author(s):

Yvonne W Leung ◽

Bomi Park ◽

Rachel Heo ◽

Achini Adikari ◽

Suja Chackochan ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Supportive Care ◽

Support Groups ◽

Language Processing ◽

Recommender System ◽

Care Delivery ◽

Online Support Groups ◽

Psychosocial Needs ◽

Human Evaluation

BACKGROUND The negative psychosocial impacts of cancer diagnoses and treatments are well documented. Virtual care has become an essential mode of care delivery during the COVID-19 pandemic and online support groups (OSGs) are shown to improve accessibility to psychosocial and supportive care. The de Souza Institute offers CancerChatCanada, a therapist-led OSG service where sessions are monitored by an artificial intelligence-based co-facilitator (AICF). AICF is equipped with a recommender system that uses natural language processing to tailor online resources to patients according to their psychosocial needs. OBJECTIVE To outline the development protocol and to evaluate AICF on its precision and recall in recommending resources to cancer OSG members. METHODS Human input informed the design and evaluation on its ability to 1) appropriately identify key words indicating a psychosocial concern and 2) recommend the most appropriate online resource to the OSG member expressing each concern. Three rounds of human evaluation and algorithm improvement were performed iteratively. RESULTS We evaluated 7,190 outputs and achieved .797 precision, .981 recall, and an F1 score of .880 by the third round of evaluation. Resources were recommended to 48 patients and 25 (52.1%) accessed at least one resource. Of those who accessed the resources, 75.4% found them useful. CONCLUSIONS The preliminary findings suggest that AICF can help provide tailored support for cancer OSG members with high precision, recall, and satisfaction. AICF has undergone rigorous human evaluation and the results provide much-needed evidence, while outlining potential strengths and weaknesses for future applications in supportive care.

Download Full-text

Achieving 90% In Data-Centric Industry Deep Learning Task

10.36227/techrxiv.17128475 ◽

2021 ◽

Author(s):

Tong Guo

Keyword(s):

Deep Learning ◽

Object Detection ◽

Noisy Data ◽

Learning Task ◽

Simple Method ◽

Sequence Generation ◽

Learning Tasks ◽

Human Evaluation ◽

Model Predictions ◽

Click Through Rate

In industry deep learning application, our manually labeled data has a certain number of noisy data. To solve this problem and achieve more than 90 score in dev dataset, we present a simple method to find the noisy data and re-label the noisy data by human, given the model predictions as references in human labeling. In this paper, we illustrate our idea for a broad set of deep learning tasks, includes classification, sequence tagging, object detection, sequence generation, click-through rate prediction. The experimental results and human evaluation results verify our idea.

Download Full-text

Achieving 90% In Data-Centric Industry Deep Learning Task

10.36227/techrxiv.17128475.v1 ◽

2021 ◽

Author(s):

Tong Guo

Keyword(s):

Deep Learning ◽

Object Detection ◽

Noisy Data ◽

Learning Task ◽

Simple Method ◽

Sequence Generation ◽

Learning Tasks ◽

Human Evaluation ◽

Model Predictions ◽

Click Through Rate

In industry deep learning application, our manually labeled data has a certain number of noisy data. To solve this problem and achieve more than 90 score in dev dataset, we present a simple method to find the noisy data and re-label the noisy data by human, given the model predictions as references in human labeling. In this paper, we illustrate our idea for a broad set of deep learning tasks, includes classification, sequence tagging, object detection, sequence generation, click-through rate prediction. The experimental results and human evaluation results verify our idea.

Download Full-text

Human Evaluation Experiment of Legal Information Retrieval Methods

10.3233/faia210328 ◽

2021 ◽

Author(s):

Tereza Novotná

Keyword(s):

Information Retrieval ◽

Network Analysis ◽

Supreme Court ◽

Citation Network ◽

Court Decisions ◽

Topic Modelling ◽

Legal Information ◽

Human Evaluation ◽

Citation Network Analysis ◽

Information Retrieval Methods

In this article, I present the results of the human evaluation experiment of three commonly used methods in legal information retrieval and a new “multilayered” approach. I use the doc2vec model, citation network analysis and two topic modelling algorithms for the Czech Supreme Court decisions retrieval and evaluate their performance. To improve the accuracy of the results of these methods, I combine the methods in a “multilayered” way and perform the subsequent evaluation. Both evaluation experiments are conducted with a group of legal experts to assess the applicability and usability of the methods for legal information retrieval. The combination of the doc2vec and citations is found satisfactory accurate for practical use for the Czech court decisions retrieval.

Download Full-text

Self-Refine Learning For Data-Centric Text Classification

10.36227/techrxiv.16610629.v3 ◽

2021 ◽

Author(s):

Tong Guo

Keyword(s):

Deep Learning ◽

Text Classification ◽

Model Prediction ◽

Classification Accuracy ◽

Noisy Data ◽

Simple Method ◽

Human Evaluation ◽

Evaluation Accuracy

<div> <div> <p>In industry NLP application, our manually labeled data has a certain number of noisy data. We present a simple method to find the noisy data and re-label their labels to the result of model prediction. We select the noisy data whose human label is not contained in the top-K model’s predictions. The model is trained on the origin dataset. The experiment result shows that our method works. For industry deep learning application, our method improve the text classification accuracy from 80.5% to 90.6% in dev dataset, and improve the human-evaluation accuracy from 83.2% to 90.5%.<br></p> </div> </div>

Download Full-text

First in‐human evaluation of a novel intravascular ultrasound and optical coherence tomography system for intracoronary imaging

Catheterization and Cardiovascular Interventions ◽

10.1002/ccd.30001 ◽

2021 ◽

Author(s):

Elie Akl ◽

Natalia Pinilla‐Echeverri ◽

Hector M. Garcia‐Garcia ◽

Shamir R. Mehta ◽

Kazuhiro Dan ◽

...

Keyword(s):

Optical Coherence Tomography ◽

Intravascular Ultrasound ◽

Optical Coherence ◽

Intracoronary Imaging ◽

Optical Coherence Tomography System ◽

Tomography System ◽

Human Evaluation

Download Full-text

Digital Image Evolution of Artwork Without Human Evaluation Using the Example of the Evolving Mona Lisa Problem

Vietnam Journal of Computer Science ◽

10.1142/s2196888822500075 ◽

2021 ◽

pp. 1-13

Author(s):

Julia Garbaruk ◽

Doina Logofatu ◽

Costin Badica ◽

Florin Leon

Keyword(s):

Evolutionary Algorithms ◽

Hill Climbing ◽

Initial Population ◽

Natural Evolution ◽

Hill Climbing Algorithm ◽

Human Evaluation ◽

Canny Edge ◽

Iterated Process ◽

Mona Lisa ◽

Better Than

Whether for optimizing the speed of microprocessors or for sequence analysis in molecular biology — evolutionary algorithms are used in astoundingly many fields. Also, the art was influenced by evolutionary algorithms — with principles of natural evolution works of art that can be created or imitated, whereby initially generated art is put through an iterated process of selection and modification. This paper covers an application in which given images are emulated evolutionary using a finite number of semi-transparent overlapping polygons, which also became known under the name “Evolution of Mona Lisa”. In this context, different approaches to solve the problem are tested and presented here. In particular, we want to investigate whether Hill Climbing Algorithm in combination with Delaunay Triangulation and Canny Edge Detector that extracts the initial population directly from the original image performs better than the conventional Hill Climbing and Genetic Algorithm, where the initial population is generated randomly.

Download Full-text

human evaluation
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Achieving 90% In Data-Centric Industry Deep Learning Task

Guidance to Pre-tokeniztion for SacreBLEU: Meta-Evaluation in Korean

A Study on Text Rank Algorithm for Automatic Text Summarization and Question Generation

Providing care beyond the therapy session — a natural language processing–based recommender system that identifies cancer patients who experience psychosocial challenges and provides self-care support (Preprint)

Achieving 90% In Data-Centric Industry Deep Learning Task

Achieving 90% In Data-Centric Industry Deep Learning Task

Human Evaluation Experiment of Legal Information Retrieval Methods

Self-Refine Learning For Data-Centric Text Classification

First in‐human evaluation of a novel intravascular ultrasound and optical coherence tomography system for intracoronary imaging

Digital Image Evolution of Artwork Without Human Evaluation Using the Example of the Evolving Mona Lisa Problem

Export Citation Format

human evaluationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Achieving 90% In Data-Centric Industry Deep Learning Task

Guidance to Pre-tokeniztion for SacreBLEU: Meta-Evaluation in Korean

A Study on Text Rank Algorithm for Automatic Text Summarization and Question Generation

Providing care beyond the therapy session — a natural language processing–based recommender system that identifies cancer patients who experience psychosocial challenges and provides self-care support (Preprint)

Achieving 90% In Data-Centric Industry Deep Learning Task

Achieving 90% In Data-Centric Industry Deep Learning Task

Human Evaluation Experiment of Legal Information Retrieval Methods

Self-Refine Learning For Data-Centric Text Classification

First in‐human evaluation of a novel intravascular ultrasound and optical coherence tomography system for intracoronary imaging

Digital Image Evolution of Artwork Without Human Evaluation Using the Example of the Evolving Mona Lisa Problem

human evaluation
Recently Published Documents