scholarly journals Semantics-Aware BERT for Language Understanding

2020 ◽  
Vol 34 (05) ◽  
pp. 9628-9635
Author(s):  
Zhuosheng Zhang ◽  
Yuwei Wu ◽  
Hai Zhao ◽  
Zuchao Li ◽  
Shuailiang Zhang ◽  
...  

The latest work on language representations carefully integrates contextualized features into language model training, which enables a series of success especially in various machine reading comprehension and natural language inference tasks. However, the existing language representation models including ELMo, GPT and BERT only exploit plain context-sensitive features such as character or word embeddings. They rarely consider incorporating structured semantic information which can provide rich semantics for language representation. To promote natural language understanding, we propose to incorporate explicit contextual semantics from pre-trained semantic role labeling, and introduce an improved language representation model, Semantics-aware BERT (SemBERT), which is capable of explicitly absorbing contextual semantics over a BERT backbone. SemBERT keeps the convenient usability of its BERT precursor in a light fine-tuning way without substantial task-specific modifications. Compared with BERT, semantics-aware BERT is as simple in concept but more powerful. It obtains new state-of-the-art or substantially improves results on ten reading comprehension and language inference tasks.

2020 ◽  
Vol 34 (10) ◽  
pp. 13901-13902
Author(s):  
Xingkai Ren ◽  
Ronghua Shi ◽  
Fangfang Li

Recently, unsupervised representation learning has been extremely successful in the field of natural language processing. More and more pre-trained language models are proposed and achieved the most advanced results especially in machine reading comprehension. However, these proposed pre-trained language models are huge with hundreds of millions of parameters that have to be trained. It is quite time consuming to use them in actual industry. Thus we propose a method that employ a distillation traditional reading comprehension model to simplify the pre-trained language model so that the distillation model has faster reasoning speed and higher inference accuracy in the field of machine reading comprehension. We evaluate our proposed method on the Chinese machine reading comprehension dataset CMRC2018 and greatly improve the accuracy of the original model. To the best of our knowledge, we are the first to propose a method that employ the distillation pre-trained language model in Chinese machine reading comprehension.


2021 ◽  
Vol 1955 (1) ◽  
pp. 012072
Author(s):  
Ruiheng Li ◽  
Xuan Zhang ◽  
Chengdong Li ◽  
Zhongju Zheng ◽  
Zihang Zhou ◽  
...  

2021 ◽  
Vol 11 (7) ◽  
pp. 3095
Author(s):  
Suhyune Son ◽  
Seonjeong Hwang ◽  
Sohyeun Bae ◽  
Soo Jun Park ◽  
Jang-Hwan Choi

Multi-task learning (MTL) approaches are actively used for various natural language processing (NLP) tasks. The Multi-Task Deep Neural Network (MT-DNN) has contributed significantly to improving the performance of natural language understanding (NLU) tasks. However, one drawback is that confusion about the language representation of various tasks arises during the training of the MT-DNN model. Inspired by the internal-transfer weighting of MTL in medical imaging, we introduce a Sequential and Intensive Weighted Language Modeling (SIWLM) scheme. The SIWLM consists of two stages: (1) Sequential weighted learning (SWL), which trains a model to learn entire tasks sequentially and concentrically, and (2) Intensive weighted learning (IWL), which enables the model to focus on the central task. We apply this scheme to the MT-DNN model and call this model the MTDNN-SIWLM. Our model achieves higher performance than the existing reference algorithms on six out of the eight GLUE benchmark tasks. Moreover, our model outperforms MT-DNN by 0.77 on average on the overall task. Finally, we conducted a thorough empirical investigation to determine the optimal weight for each GLUE task.


2020 ◽  
Vol 34 (05) ◽  
pp. 8918-8927
Author(s):  
Saku Sugawara ◽  
Pontus Stenetorp ◽  
Kentaro Inui ◽  
Akiko Aizawa

Existing analysis work in machine reading comprehension (MRC) is largely concerned with evaluating the capabilities of systems. However, the capabilities of datasets are not assessed for benchmarking language understanding precisely. We propose a semi-automated, ablation-based methodology for this challenge; By checking whether questions can be solved even after removing features associated with a skill requisite for language understanding, we evaluate to what degree the questions do not require the skill. Experiments on 10 datasets (e.g., CoQA, SQuAD v2.0, and RACE) with a strong baseline model show that, for example, the relative scores of the baseline model provided with content words only and with shuffled sentence words in the context are on average 89.2% and 78.5% of the original scores, respectively. These results suggest that most of the questions already answered correctly by the model do not necessarily require grammatical and complex reasoning. For precise benchmarking, MRC datasets will need to take extra care in their design to ensure that questions can correctly evaluate the intended skills.


2020 ◽  
Vol 34 (05) ◽  
pp. 8705-8712
Author(s):  
Qiyu Ren ◽  
Xiang Cheng ◽  
Sen Su

Multi-passage machine reading comprehension (MRC) aims to answer a question by multiple passages. Existing multi-passage MRC approaches have shown that employing passages with and without golden answers (i.e. labeled and unlabeled passages) for model training can improve prediction accuracy. In this paper, we present MG-MRC, a novel approach for multi-passage MRC via multi-task learning with generative adversarial training. MG-MRC adopts the extract-then-select framework, where an extractor is first used to predict answer candidates, then a selector is used to choose the final answer. In MG-MRC, we adopt multi-task learning to train the extractor by using both labeled and unlabeled passages. In particular, we use labeled passages to train the extractor by supervised learning, while using unlabeled passages to train the extractor by generative adversarial training, where the extractor is regarded as the generator and a discriminator is introduced to evaluate the generated answer candidates. Moreover, to train the extractor by backpropagation in the generative adversarial training process, we propose a hybrid method which combines boundary-based and content-based extracting methods to produce the answer candidate set and its representation. The experimental results on three open-domain QA datasets confirm the effectiveness of our approach.


2020 ◽  
Vol 34 (05) ◽  
pp. 9636-9643
Author(s):  
Zhuosheng Zhang ◽  
Yuwei Wu ◽  
Junru Zhou ◽  
Sufeng Duan ◽  
Hai Zhao ◽  
...  

For machine reading comprehension, the capacity of effectively modeling the linguistic knowledge from the detail-riddled and lengthy passages and getting ride of the noises is essential to improve its performance. Traditional attentive models attend to all words without explicit constraint, which results in inaccurate concentration on some dispensable words. In this work, we propose using syntax to guide the text modeling by incorporating explicit syntactic constraints into attention mechanism for better linguistically motivated word representations. In detail, for self-attention network (SAN) sponsored Transformer-based encoder, we introduce syntactic dependency of interest (SDOI) design into the SAN to form an SDOI-SAN with syntax-guided self-attention. Syntax-guided network (SG-Net) is then composed of this extra SDOI-SAN and the SAN from the original Transformer encoder through a dual contextual architecture for better linguistics inspired representation. To verify its effectiveness, the proposed SG-Net is applied to typical pre-trained language model BERT which is right based on a Transformer encoder. Extensive experiments on popular benchmarks including SQuAD 2.0 and RACE show that the proposed SG-Net design helps achieve substantial performance improvement over strong baselines.


2021 ◽  
Vol 1 (2) ◽  
pp. 18-22
Author(s):  
Strahil Sokolov ◽  
Stanislava Georgieva

This paper presents a new approach to processing and categorization of text from patient documents in Bulgarian language using Natural Language Processing and Edge AI. The proposed algorithm contains several phases - personal data anonymization, pre-processing and conversion of text to vectors, model training and recognition. The experimental results in terms of achieved accuracy are comparable with modern approaches.


Author(s):  
Ichiro Kobayashi ◽  

At the annual conference of the Japan Society for Artificial Intelligence (JSAI), a special survival session called "Challenge for Realizing Early Profits (CREP)" is organized to support and promote excellent ideas in new AI technologies expected to be realized and contributed to society within five years. Every year at the session, researchers propose their ideas and compete in being evaluated by conference participants. The Everyday Language Computing (ELC) project, started in 2000 at the Brain Science Institute, RIKEN, and ended in 2005, participated in the CREP program in 2001 to have their project evaluated by third parties and held an organized session every year in which those interested in language-based intelligence and personalization participate. They competed with other candidates, survived the session, and achieved the session's final goal to survive for five years. Papers in this special issue selected for presentation at the session include the following: The first article, "Everyday-Language Computing Project Overview," by Ichiro Kobayashi et al., gives an overview and the basic technologies of the ELC Project. The second to sixth papers are related to the ELC Project. The second article, "Computational Models of Language Within Context and Context-Sensitive Language Understanding," by Noriko Ito et al., proposes a new database, called the "semiotic base," that compiles linguistic resources with contextual information and an algorithm for achieving natural language understanding with the semiotic base. The third article, "Systemic-Functional Context-Sensitive Text Generation in the Framework of Everyday Language Computing," by Yusuke Takahashi et al., proposes an algorithm to generate texts with the semiotic base. The fourth article, "Natural Language-Mediated Software Agentification," by Michiaki Iwazume et al., proposes a method for agentifying and verbalizing existing software applications, together with a scheme for operating/running them. The fifth article, "Smart Help for Novice Users Based on Application Software Manuals," by Shino Iwashita et al., proposes a new framework for reusing electronic software manuals equipped with application software to provide tailor-made operation instructions to users. The sixth article, "Programming in Everyday Language: A Case for Email Management," by Toru Sugimoto et al., making a computer program written in natural language. Rhetorical structure analysis is used to translate the natural language command structure into the program structure. The seventh article, "Application of Paraphrasing to Programming with Linguistic Expressions," by Nozomu Kaneko et al., proposes a method for translating natural language commands into a computer program through a natural language paraphrasing mechanism. The eighth article, "A Human Interface Based on Linguistic Metaphor and Intention Reasoning," by Koichi Yamada et al., proposes a new human interface paradigm called Push Like Talking (PLT), which enables people to operate machines as they talk. The ninth article, "Automatic Metadata Annotation Based on User Preference Evaluation Patterns," by Mari Saito proposes effective automatic metadata annotation for content recommendations matched to user preference. The tenth article, "Dynamic Sense Representation Using Conceptual Fuzzy Sets," by Hiroshi Sekiya et al., proposes a method to represent word senses, which vary dynamically depending on context, using conceptual fuzzy sets. The eleventh article, "Common Sense from the Web? Naturalness of Everyday Knowledge Retrieved from WWW," by Rafal Rzepka et al., is a challenging work to acquire common-sense knowledge from information on the Web. The twelfth article, "Semantic Representation for Understanding Meaning Based on Correspondence Between Meanings," by Akira Takagi et al., proposes a new semantic representation to deal with Japanese language in natural language processing. I thank the reviewers and contributors for their time and effort in making this special issue possible, and I wish to thank the JACIII editorial board, especially Professors Kaoru Hirota and Toshio Fukuda, the Editors-in-Chief, for inviting me to serve as Guest Editor of this Journal. Thanks also go to Kazuki Ohmori and Kenta Uchino of Fuji Technology Press for their sincere support.


Author(s):  
Keno K Bressem ◽  
Lisa C Adams ◽  
Robert A Gaudin ◽  
Daniel Tröltzsch ◽  
Bernd Hamm ◽  
...  

Abstract Motivation The development of deep, bidirectional transformers such as Bidirectional Encoder Representations from Transformers (BERT) led to an outperformance of several Natural Language Processing (NLP) benchmarks. Especially in radiology, large amounts of free-text data are generated in daily clinical workflow. These report texts could be of particular use for the generation of labels in machine learning, especially for image classification. However, as report texts are mostly unstructured, advanced NLP methods are needed to enable accurate text classification. While neural networks can be used for this purpose, they must first be trained on large amounts of manually labelled data to achieve good results. In contrast, BERT models can be pre-trained on unlabelled data and then only require fine tuning on a small amount of manually labelled data to achieve even better results. Results Using BERT to identify the most important findings in intensive care chest radiograph reports, we achieve areas under the receiver operation characteristics curve of 0.98 for congestion, 0.97 for effusion, 0.97 for consolidation and 0.99 for pneumothorax, surpassing the accuracy of previous approaches with comparatively little annotation effort. Our approach could therefore help to improve information extraction from free-text medical reports. Availability and implementation We make the source code for fine-tuning the BERT-models freely available at https://github.com/fast-raidiology/bert-for-radiology. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document