Selecting Informative Contexts Improves Language Model Fine-tuning

Understanding mathematical topics is important for both educators and students to capture latent concepts of questions, evaluate study performance, and recommend content in online learning systems. Compared to traditional text classification, mathematical topic classification has several main challenges: (1) the length of mathematical questions is relatively short; (2) there are various representations of the same mathematical concept(i.e., calculations and application); (3) the content of question is complex including algebra, geometry, and calculus. In order to overcome these problems, we propose a framework that combines content tokens and mathematical knowledge concepts in whole procedures. We embed entities from mathematics knowledge graphs, integrate entities into tokens in a masked language model, set up semantic similarity-based tasks for next-sentence prediction, and fuse knowledge vectors and token vectors during the fine-tuning procedure. We also build a Chinese mathematical topic prediction dataset consisting of more than 70,000 mathematical questions with topics. Our experiments using real data demonstrate that our knowledge graph-based mathematical topic prediction model outperforms other state-of-the-art methods.

Download Full-text

Analysis and Evaluation of Language Models for Word Sense Disambiguation

Computational Linguistics ◽

10.1162/coli_a_00405 ◽

2021 ◽

pp. 1-55

Author(s):

Daniel Loureiro ◽

Kiamehr Rezaee ◽

Mohammad Taher Pilehvar ◽

Jose Camacho-Collados

Keyword(s):

Feature Extraction ◽

Word Sense Disambiguation ◽

Language Model ◽

Training Data ◽

Fine Tuning ◽

Language Models ◽

Coarse Grained ◽

Word Sense ◽

Sense Disambiguation ◽

High Level

Abstract Transformer-based language models have taken many fields in NLP by storm. BERT and its derivatives dominate most of the existing evaluation benchmarks, including those for Word Sense Disambiguation (WSD), thanks to their ability in capturing context-sensitive semantic nuances. However, there is still little knowledge about their capabilities and potential limitations in encoding and recovering word senses. In this article, we provide an in-depth quantitative and qualitative analysis of the celebrated BERT model with respect to lexical ambiguity. One of the main conclusions of our analysis is that BERT can accurately capture high-level sense distinctions, even when a limited number of examples is available for each word sense. Our analysis also reveals that in some cases language models come close to solving coarse-grained noun disambiguation under ideal conditions in terms of availability of training data and computing resources. However, this scenario rarely occurs in real-world settings and, hence, many practical challenges remain even in the coarse-grained setting. We also perform an in-depth comparison of the two main language model based WSD strategies, i.e., fine-tuning and feature extraction, finding that the latter approach is more robust with respect to sense bias and it can better exploit limited available training data. In fact, the simple feature extraction strategy of averaging contextualized embeddings proves robust even using only three training sentences per word sense, with minimal improvements obtained by increasing the size of this training data.

Download Full-text

Assessment of Word-Level Neural Language Models for Sentence Completion

Applied Sciences ◽

10.3390/app10041340 ◽

2020 ◽

Vol 10 (4) ◽

pp. 1340

Author(s):

Heewoong Park ◽

Jonghun Park

Keyword(s):

Language Model ◽

Fine Tuning ◽

Language Models ◽

Sentence Completion ◽

Korean Language ◽

Learning Framework ◽

Scholastic Aptitude ◽

Word Level ◽

Network Language ◽

Comprehensive Study

The task of sentence completion, which aims to infer the missing text of a given sentence, was carried out to assess the reading comprehension level of machines as well as humans. In this work, we conducted a comprehensive study of various approaches for the sentence completion based on neural language models, which have been advanced in recent years. First, we revisited the recurrent neural network language model (RNN LM), achieving highly competitive results with an appropriate network structure and hyper-parameters. This paper presents a bidirectional version of RNN LM, which surpassed the previous best results on Microsoft Research (MSR) Sentence Completion Challenge and the Scholastic Aptitude Test (SAT) sentence completion questions. In parallel with directly applying RNN LM to sentence completion, we also employed a supervised learning framework that fine-tunes a large pre-trained transformer-based LM with a few sentence-completion examples. By fine-tuning a pre-trained BERT model, this work established state-of-the-art results on the MSR and SAT sets. Furthermore, we performed similar experimentation on newly collected cloze-style questions in the Korean language. The experimental results reveal that simply applying the multilingual BERT models for the Korean dataset was not satisfactory, which leaves room for further research.

Download Full-text

Semantics-Aware BERT for Language Understanding

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6510 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9628-9635

Author(s):

Zhuosheng Zhang ◽

Yuwei Wu ◽

Hai Zhao ◽

Zuchao Li ◽

Shuailiang Zhang ◽

...

Keyword(s):

Reading Comprehension ◽

Natural Language ◽

Language Model ◽

Fine Tuning ◽

Semantic Role Labeling ◽

Language Understanding ◽

Context Sensitive ◽

Language Representation ◽

Model Training ◽

Machine Reading

The latest work on language representations carefully integrates contextualized features into language model training, which enables a series of success especially in various machine reading comprehension and natural language inference tasks. However, the existing language representation models including ELMo, GPT and BERT only exploit plain context-sensitive features such as character or word embeddings. They rarely consider incorporating structured semantic information which can provide rich semantics for language representation. To promote natural language understanding, we propose to incorporate explicit contextual semantics from pre-trained semantic role labeling, and introduce an improved language representation model, Semantics-aware BERT (SemBERT), which is capable of explicitly absorbing contextual semantics over a BERT backbone. SemBERT keeps the convenient usability of its BERT precursor in a light fine-tuning way without substantial task-specific modifications. Compared with BERT, semantics-aware BERT is as simple in concept but more powerful. It obtains new state-of-the-art or substantially improves results on ten reading comprehension and language inference tasks.

Download Full-text

Transfer Learning and Textual Analysis of Accounting Disclosures: Applying Big Data Methods to Small(er) Data Sets

Accounting Horizons ◽

10.2308/horizons-19-161 ◽

2021 ◽

Author(s):

Federico Siano ◽

Peter Wysocki

Keyword(s):

Big Data ◽

Transfer Learning ◽

Textual Analysis ◽

Low Cost ◽

Language Model ◽

Fine Tuning ◽

Superior Performance ◽

Data Sets ◽

Quarterly Earnings ◽

Accounting Disclosures

We introduce and apply machine transfer learning methods to analyze accounting disclosures. We use the examples of the new BERT language model and sentiment analysis of quarterly earnings disclosures to demonstrate the key transfer learning concepts of: (i) pre-training on generic "Big Data", (ii) fine-tuning on small accounting datasets, and (iii) using a language model that captures context rather than stand-alone words. Overall, we show that this new approach is easy to implement, uses widely-available and low-cost computing resources, and has superior performance relative to existing textual analysis tools in accounting. We conclude with suggestions for opportunities to apply transfer learning to address important accounting research questions.

Download Full-text

Domain Specific Fine Tuning of Pre-trained Language Model in NLP

Lecture Notes in Networks and Systems - Proceedings of International Conference on Sustainable Expert Systems ◽

10.1007/978-981-33-4355-9_31 ◽

2021 ◽

pp. 403-417

Author(s):

Jony ◽

Jyoti Vashishtha ◽

Sunil Kumar

Keyword(s):

Language Model ◽

Fine Tuning ◽

Domain Specific

Download Full-text

Highly accurate classification of chest radiographic reports using a deep learning natural language model pre-trained on 3.8 million text reports

Bioinformatics ◽

10.1093/bioinformatics/btaa668 ◽

2020 ◽

Author(s):

Keno K Bressem ◽

Lisa C Adams ◽

Robert A Gaudin ◽

Daniel Tröltzsch ◽

Bernd Hamm ◽

...

Keyword(s):

Natural Language ◽

Language Processing ◽

Language Model ◽

Fine Tuning ◽

Supplementary Information ◽

Free Text ◽

Clinical Workflow ◽

Text Data ◽

Unlabelled Data ◽

Medical Reports

Abstract Motivation The development of deep, bidirectional transformers such as Bidirectional Encoder Representations from Transformers (BERT) led to an outperformance of several Natural Language Processing (NLP) benchmarks. Especially in radiology, large amounts of free-text data are generated in daily clinical workflow. These report texts could be of particular use for the generation of labels in machine learning, especially for image classification. However, as report texts are mostly unstructured, advanced NLP methods are needed to enable accurate text classification. While neural networks can be used for this purpose, they must first be trained on large amounts of manually labelled data to achieve good results. In contrast, BERT models can be pre-trained on unlabelled data and then only require fine tuning on a small amount of manually labelled data to achieve even better results. Results Using BERT to identify the most important findings in intensive care chest radiograph reports, we achieve areas under the receiver operation characteristics curve of 0.98 for congestion, 0.97 for effusion, 0.97 for consolidation and 0.99 for pneumothorax, surpassing the accuracy of previous approaches with comparatively little annotation effort. Our approach could therefore help to improve information extraction from free-text medical reports. Availability and implementation We make the source code for fine-tuning the BERT-models freely available at https://github.com/fast-raidiology/bert-for-radiology. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Deep Learning-Based Knowledge Graph Generation for COVID-19

Sustainability ◽

10.3390/su13042276 ◽

2021 ◽

Vol 13 (4) ◽

pp. 2276

Author(s):

Taejin Kim ◽

Yeoil Yun ◽

Namgyu Kim

Keyword(s):

Knowledge Base ◽

Language Model ◽

Fine Tuning ◽

Knowledge Graph ◽

Specific Information ◽

Specific Knowledge ◽

Text Documents ◽

Domain Specific ◽

Open Information Extraction ◽

Domain Specific Knowledge

Many attempts have been made to construct new domain-specific knowledge graphs using the existing knowledge base of various domains. However, traditional “dictionary-based” or “supervised” knowledge graph building methods rely on predefined human-annotated resources of entities and their relationships. The cost of creating human-annotated resources is high in terms of both time and effort. This means that relying on human-annotated resources will not allow rapid adaptability in describing new knowledge when domain-specific information is added or updated very frequently, such as with the recent coronavirus disease-19 (COVID-19) pandemic situation. Therefore, in this study, we propose an Open Information Extraction (OpenIE) system based on unsupervised learning without a pre-built dataset. The proposed method obtains knowledge from a vast amount of text documents about COVID-19 rather than a general knowledge base and add this to the existing knowledge graph. First, we constructed a COVID-19 entity dictionary, and then we scraped a large text dataset related to COVID-19. Next, we constructed a COVID-19 perspective language model by fine-tuning the bidirectional encoder representations from transformer (BERT) pre-trained language model. Finally, we defined a new COVID-19-specific knowledge base by extracting connecting words between COVID-19 entities using the BERT self-attention weight from COVID-19 sentences. Experimental results demonstrated that the proposed Co-BERT model outperforms the original BERT in terms of mask prediction accuracy and metric for evaluation of translation with explicit ordering (METEOR) score.

Download Full-text

MenuNER: Domain-Adapted BERT Based NER Approach for a Domain with Limited Dataset and Its Application to Food Menu Domain

Applied Sciences ◽

10.3390/app11136007 ◽

2021 ◽

Vol 11 (13) ◽

pp. 6007

Author(s):

Muzamil Hussain Syed ◽

Sun-Tae Chung

Keyword(s):

Domain Adaptation ◽

Language Model ◽

Named Entity Recognition ◽

Word Embedding ◽

Fine Tuning ◽

Entity Recognition ◽

Language Models ◽

Feature Vectors ◽

Named Entity ◽

Domain Specific

Entity-based information extraction is one of the main applications of Natural Language Processing (NLP). Recently, deep transfer-learning utilizing contextualized word embedding from pre-trained language models has shown remarkable results for many NLP tasks, including Named-entity recognition (NER). BERT (Bidirectional Encoder Representations from Transformers) is gaining prominent attention among various contextualized word embedding models as a state-of-the-art pre-trained language model. It is quite expensive to train a BERT model from scratch for a new application domain since it needs a huge dataset and enormous computing time. In this paper, we focus on menu entity extraction from online user reviews for the restaurant and propose a simple but effective approach for NER task on a new domain where a large dataset is rarely available or difficult to prepare, such as food menu domain, based on domain adaptation technique for word embedding and fine-tuning the popular NER task network model ‘Bi-LSTM+CRF’ with extended feature vectors. The proposed NER approach (named as ‘MenuNER’) consists of two step-processes: (1) Domain adaptation for target domain; further pre-training of the off-the-shelf BERT language model (BERT-base) in semi-supervised fashion on a domain-specific dataset, and (2) Supervised fine-tuning the popular Bi-LSTM+CRF network for downstream task with extended feature vectors obtained by concatenating word embedding from the domain-adapted pre-trained BERT model from the first step, character embedding and POS tag feature information. Experimental results on handcrafted food menu corpus from customers’ review dataset show that our proposed approach for domain-specific NER task, that is: food menu named-entity recognition, performs significantly better than the one based on the baseline off-the-shelf BERT-base model. The proposed approach achieves 92.5% F1 score on the YELP dataset for the MenuNER task.

Download Full-text