Self-Refine Learning For Data-Centric Text Classification

10.36227/techrxiv.16610629 ◽

2021 ◽

Author(s):

Tong Guo

Keyword(s):

Deep Learning ◽

Text Classification ◽

Model Prediction ◽

Classification Accuracy ◽

Noisy Data ◽

Simple Method ◽

Human Evaluation ◽

Evaluation Accuracy

<div> <div> <p>In industry NLP application, our manually labeled data has a certain number of noisy data. We present a simple method to find the noisy data and re-label their labels to the result of model prediction. We select the noisy data whose human label is not contained in the top-K model’s predictions. The model is trained on the origin dataset. The experiment result shows that our method works. For industry deep learning application, our method improve the text classification accuracy from 80.5% to 90.6% in dev dataset, and improve the human-evaluation accuracy from 83.2% to 90.5%.<br></p> </div> </div>

Download Full-text

Self-Refine Learning For Data-Centric Text Classification

10.36227/techrxiv.16610629.v1 ◽

2021 ◽

Author(s):

Tong Guo

Keyword(s):

Deep Learning ◽

Text Classification ◽

Model Prediction ◽

Classification Accuracy ◽

Noisy Data ◽

Simple Method ◽

Human Evaluation ◽

Evaluation Accuracy

<div> <div> <p>In industry NLP application, our manually labeled data has a certain number of noisy data. We present a simple method to find the noisy data and re-label their labels to the result of model prediction. We select the noisy data whose human label is not contained in the top-K model’s predictions. The model is trained on the origin dataset. The experiment result shows that our method works. For industry deep learning application, our method improve the text classification accuracy from 80.5% to 90.6% in dev dataset, and improve the human-evaluation accuracy from 83.2% to 90.5%.<br></p> </div> </div>

Download Full-text

Self-Refine Learning For Data-Centric Text Classification

10.36227/techrxiv.16610629.v3 ◽

2021 ◽

Author(s):

Tong Guo

Keyword(s):

Deep Learning ◽

Text Classification ◽

Model Prediction ◽

Classification Accuracy ◽

Noisy Data ◽

Simple Method ◽

Human Evaluation ◽

Evaluation Accuracy

<div> <div> <p>In industry NLP application, our manually labeled data has a certain number of noisy data. We present a simple method to find the noisy data and re-label their labels to the result of model prediction. We select the noisy data whose human label is not contained in the top-K model’s predictions. The model is trained on the origin dataset. The experiment result shows that our method works. For industry deep learning application, our method improve the text classification accuracy from 80.5% to 90.6% in dev dataset, and improve the human-evaluation accuracy from 83.2% to 90.5%.<br></p> </div> </div>

Download Full-text

Learning From How Human Correct

10.36227/techrxiv.13647974.v2 ◽

2021 ◽

Author(s):

Tong Guo

Keyword(s):

Deep Learning ◽

Text Classification ◽

Classification Accuracy ◽

Noisy Data ◽

Learning Model ◽

Simple Method ◽

Know How ◽

Novel Method ◽

Deep Learning Model

In industry NLP application, our manually labeled data has a certain number of noisy data. We present a simple method to find the noisy data and relabel them manually, meanwhile we collect the correction information. Then we present novel method to incorporate the human correction information into deep learning model. Human know how to correct noisy data. So the correction information can be inject into deep learning model. We do the experiment on our own text classification dataset, which is manually labeled, because we relabel the noisy data in our dataset for our industry application. The experiment result shows that our method improve the classification accuracy from 91.7% to 92.5%. The 91.7% baseline is based on BERT training on the corrected dataset, which is hard to surpass.

Download Full-text

Learning From How Human Correct

10.36227/techrxiv.13647974.v1 ◽

2021 ◽

Author(s):

Tong Guo

Keyword(s):

Deep Learning ◽

Text Classification ◽

Classification Accuracy ◽

Noisy Data ◽

Learning Model ◽

Simple Method ◽

Know How ◽

Novel Method ◽

Deep Learning Model

In industry NLP application, our manually labeled data has a certain number of noisy data. We present a simple method to find the noisy data and relabel them manually, meanwhile we collect the correction information. Then we present novel method to incorporate the human correction information into deep learning model. Human know how to correct noisy data. So the correction information can be inject into deep learning model. We do the experiment on our own text classification dataset, which is manually labeled, because we relabel the noisy data in our dataset for our industry application. The experiment result shows that our method improve the classification accuracy from 91.7% to 92.5%. The 91.7% baseline is based on BERT training on the corrected dataset, which is hard to surpass.

Download Full-text

Learning From How Human Correct

10.36227/techrxiv.13647974 ◽

2021 ◽

Author(s):

Tong Guo

Keyword(s):

Deep Learning ◽

Text Classification ◽

Classification Accuracy ◽

Noisy Data ◽

Learning Model ◽

Simple Method ◽

Know How ◽

Novel Method ◽

Deep Learning Model

In industry NLP application, our manually labeled data has a certain number of noisy data. We present a simple method to find the noisy data and relabel them manually, meanwhile we collect the correction information. Then we present novel method to incorporate the human correction information into deep learning model. Human know how to correct noisy data. So the correction information can be inject into deep learning model. We do the experiment on our own text classification dataset, which is manually labeled, because we relabel the noisy data in our dataset for our industry application. The experiment result shows that our method improve the classification accuracy from 91.7% to 92.5%. The 91.7% baseline is based on BERT training on the corrected dataset, which is hard to surpass.

Download Full-text

Learning From Human Correction For Data-Centric Deep Learning

10.36227/techrxiv.13647974.v6 ◽

2021 ◽

Author(s):

Tong Guo

Keyword(s):

Deep Learning ◽

Text Classification ◽

Classification Accuracy ◽

Noisy Data ◽

Learning Model ◽

Simple Method ◽

Know How ◽

Novel Method ◽

Deep Learning Model

In industry NLP application, our manually labeled data has a certain number of noisy data. We present a simple method to find the noisy data and relabel them manually, meanwhile we collect the correction information. Then we present novel method to incorporate the human correction information into deep learning model. Human know how to correct noisy data. So the correction information can be inject into deep learning model. We do the experiment on our own text classification dataset, which is manually labeled, because we relabel the noisy data in our dataset for our industry application. The experiment result shows that our method improve the classification accuracy from 91.7% to 92.5%. The 91.7% baseline is based on BERT training on the corrected dataset, which is hard to surpass.

Download Full-text

Learning From Human Correction For Data-Centric Deep Learning

10.36227/techrxiv.13647974.v5 ◽

2021 ◽

Author(s):

Tong Guo

Keyword(s):

Deep Learning ◽

Text Classification ◽

Classification Accuracy ◽

Noisy Data ◽

Learning Model ◽

Simple Method ◽

Know How ◽

Novel Method ◽

Deep Learning Model

In industry NLP application, our manually labeled data has a certain number of noisy data. We present a simple method to find the noisy data and relabel them manually, meanwhile we collect the correction information. Then we present novel method to incorporate the human correction information into deep learning model. Human know how to correct noisy data. So the correction information can be inject into deep learning model. We do the experiment on our own text classification dataset, which is manually labeled, because we relabel the noisy data in our dataset for our industry application. The experiment result shows that our method improve the classification accuracy from 91.7% to 92.5%. The 91.7% baseline is based on BERT training on the corrected dataset, which is hard to surpass.

Download Full-text

Learning From How Human Correct

10.36227/techrxiv.13647974.v3 ◽

2021 ◽

Author(s):

Tong Guo

Keyword(s):

Deep Learning ◽

Text Classification ◽

Classification Accuracy ◽

Noisy Data ◽

Learning Model ◽

Simple Method ◽

Know How ◽

Novel Method ◽

Deep Learning Model

In industry NLP application, our manually labeled data has a certain number of noisy data. We present a simple method to find the noisy data and relabel them manually, meanwhile we collect the correction information. Then we present novel method to incorporate the human correction information into deep learning model. Human know how to correct noisy data. So the correction information can be inject into deep learning model. We do the experiment on our own text classification dataset, which is manually labeled, because we relabel the noisy data in our dataset for our industry application. The experiment result shows that our method improve the classification accuracy from 91.7% to 92.5%. The 91.7% baseline is based on BERT training on the corrected dataset, which is hard to surpass.

Download Full-text

Learning From How Human Correct For Data-Centric Deep Learning

10.36227/techrxiv.13647974.v4 ◽

2021 ◽

Author(s):

Tong Guo

Keyword(s):

Deep Learning ◽

Text Classification ◽

Classification Accuracy ◽

Noisy Data ◽

Learning Model ◽

Simple Method ◽

Know How ◽

Novel Method ◽

Deep Learning Model

In industry NLP application, our manually labeled data has a certain number of noisy data. We present a simple method to find the noisy data and relabel them manually, meanwhile we collect the correction information. Then we present novel method to incorporate the human correction information into deep learning model. Human know how to correct noisy data. So the correction information can be inject into deep learning model. We do the experiment on our own text classification dataset, which is manually labeled, because we relabel the noisy data in our dataset for our industry application. The experiment result shows that our method improve the classification accuracy from 91.7% to 92.5%. The 91.7% baseline is based on BERT training on the corrected dataset, which is hard to surpass.

Download Full-text