The Classification of Short Scientific Texts Using Pretrained BERT Model
Keyword(s):
Automated text classification is a natural language processing (NLP) technology that could significantly facilitate scientific literature selection. A specific topical dataset of 630 article abstracts was obtained from the PubMed database. We proposed 27 parametrized options of PubMedBERT model and 4 ensemble models to solve a binary classification task on that dataset. Three hundred tests with resamples were performed in each classification approach. The best PubMedBERT model demonstrated F1-score = 0.857 while the best ensemble model reached F1-score = 0.853. We concluded that the short scientific texts classification quality might be improved using the latest state-of-art approaches.
2018 ◽
Vol 8
(6)
◽
pp. 4352
◽
2019 ◽
Vol 45
(1)
◽
pp. 11-14
2014 ◽
Vol 8
(3)
◽
pp. 227-235
◽
2020 ◽
Vol 5
(7)
◽
pp. 543-549
Keyword(s):
2021 ◽
Vol 14
(1)
◽