Limits of Detecting Text Generated by Large-Scale Language Models

AbstractWith the rapid development of hospital informatization and Internet medical service in recent years, most hospitals have launched online hospital appointment registration systems to remove patient queues and improve the efficiency of medical services. However, most of the patients lack professional medical knowledge and have no idea of how to choose department when registering. To instruct the patients to seek medical care and register effectively, we proposed CIDRS, an intelligent self-diagnosis and department recommendation framework based on Chinese medical Bidirectional Encoder Representations from Transformers (BERT) in the cloud computing environment. We also established a Chinese BERT model (CHMBERT) trained on a large-scale Chinese medical text corpus. This model was used to optimize self-diagnosis and department recommendation tasks. To solve the limited computing power of terminals, we deployed the proposed framework in a cloud computing environment based on container and micro-service technologies. Real-world medical datasets from hospitals were used in the experiments, and results showed that the proposed model was superior to the traditional deep learning models and other pre-trained language models in terms of performance.

Download Full-text

Language Modeling for Morphologically Rich Languages: Character-Aware Modeling for Word-Level Prediction

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00032 ◽

2018 ◽

Vol 6 ◽

pp. 451-465 ◽

Cited By ~ 5

Author(s):

Daniela Gerz ◽

Ivan Vulić ◽

Edoardo Ponti ◽

Jason Naradowsky ◽

Roi Reichart ◽

...

Keyword(s):

Large Scale ◽

Language Modeling ◽

Language Models ◽

Data Sets ◽

High Type ◽

Word Level ◽

Level Information ◽

Character Sequences ◽

Novel Method ◽

Morphologically Rich Languages

Neural architectures are prominent in the construction of language models (LMs). However, word-level prediction is typically agnostic of subword-level information (characters and character sequences) and operates over a closed vocabulary, consisting of a limited word set. Indeed, while subword-aware models boost performance across a variety of NLP tasks, previous work did not evaluate the ability of these models to assist next-word prediction in language modeling tasks. Such subword-level informed models should be particularly effective for morphologically-rich languages (MRLs) that exhibit high type-to-token ratios. In this work, we present a large-scale LM study on 50 typologically diverse languages covering a wide variety of morphological systems, and offer new LM benchmarks to the community, while considering subword-level information. The main technical contribution of our work is a novel method for injecting subword-level information into semantic word vectors, integrated into the neural language modeling training, to facilitate word-level prediction. We conduct experiments in the LM setting where the number of infrequent words is large, and demonstrate strong perplexity gains across our 50 languages, especially for morphologically-rich languages. Our code and data sets are publicly available.

Download Full-text

Attending to Entities for Better Text Understanding

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6254 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7554-7561

Author(s):

Pengxiang Cheng ◽

Katrin Erk

Keyword(s):

Large Scale ◽

Human Performance ◽

State Of The Art ◽

Syntactic Structure ◽

Semantic Knowledge ◽

Training Data ◽

Language Models ◽

Long Distance ◽

Future Directions ◽

Text Understanding

Recent progress in NLP witnessed the development of large-scale pre-trained language models (GPT, BERT, XLNet, etc.) based on Transformer (Vaswani et al. 2017), and in a range of end tasks, such models have achieved state-of-the-art results, approaching human performance. This clearly demonstrates the power of the stacked self-attention architecture when paired with a sufficient number of layers and a large amount of pre-training data. However, on tasks that require complex and long-distance reasoning where surface-level cues are not enough, there is still a large gap between the pre-trained models and human performance. Strubell et al. (2018) recently showed that it is possible to inject knowledge of syntactic structure into a model through supervised self-attention. We conjecture that a similar injection of semantic knowledge, in particular, coreference information, into an existing model would improve performance on such complex problems. On the LAMBADA (Paperno et al. 2016) task, we show that a model trained from scratch with coreference as auxiliary supervision for self-attention outperforms the largest GPT-2 model, setting the new state-of-the-art, while only containing a tiny fraction of parameters compared to GPT-2. We also conduct a thorough analysis of different variants of model architectures and supervision configurations, suggesting future directions on applying similar techniques to other problems.

Download Full-text

Importance-Aware Learning for Neural Headline Editing

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6467 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9282-9289

Author(s):

Qingyang Wu ◽

Lei Li ◽

Hao Zhou ◽

Ying Zeng ◽

Zhou Yu

Keyword(s):

Social Media ◽

Large Scale ◽

Network Models ◽

Language Models ◽

Neural Network Models ◽

Generation Task ◽

Social Media Platforms ◽

Editing Process ◽

Different Levels

Many social media news writers are not professionally trained. Therefore, social media platforms have to hire professional editors to adjust amateur headlines to attract more readers. We propose to automate this headline editing process through neural network models to provide more immediate writing support for these social media news writers. To train such a neural headline editing model, we collected a dataset which contains articles with original headlines and professionally edited headlines. However, it is expensive to collect a large number of professionally edited headlines. To solve this low-resource problem, we design an encoder-decoder model which leverages large scale pre-trained language models. We further improve the pre-trained model's quality by introducing a headline generation task as an intermediate task before the headline editing task. Also, we propose Self Importance-Aware (SIA) loss to address the different levels of editing in the dataset by down-weighting the importance of easily classified tokens and sentences. With the help of Pre-training, Adaptation, and SIA, the model learns to generate headlines in the professional editor's style. Experimental results show that our method significantly improves the quality of headline editing comparing against previous methods.

Download Full-text

Syntax Role for Neural Semantic Role Labeling

Computational Linguistics ◽

10.1162/coli_a_00408 ◽

2021 ◽

pp. 1-48

Author(s):

Zuchao Li ◽

Hai Zhao ◽

Shexia He ◽

Jiaxun Cai

Keyword(s):

Argument Structure ◽

Large Scale ◽

Language Models ◽

Semantic Role ◽

Semantic Role Labeling ◽

Empirical Survey ◽

Learning Framework ◽

Syntactic Information ◽

Feature Based ◽

Predicate Argument Structure

Abstract Semantic role labeling (SRL) is dedicated to recognizing the semantic predicate-argument structure of a sentence. Previous studies in terms of traditional models have shown syntactic information can make remarkable contributions to SRL performance; however, the necessity of syntactic information was challenged by a few recent neural SRL studies that demonstrate impressive performance without syntactic backbones and suggest that syntax information becomes much less important for neural semantic role labeling, especially when paired with recent deep neural network and large-scale pre-trained language models. Despite this notion, the neural SRL field still lacks a systematic and full investigation on the relevance of syntactic information in SRL, for both dependency and both monolingual and multilingual settings. This paper intends to quantify the importance of syntactic information for neural SRL in the deep learning framework. We introduce three typical SRL frameworks (baselines), sequence-based, tree-based, and graph-based, which are accompanied by two categories of exploiting syntactic information: syntax pruningbased and syntax feature-based. Experiments are conducted on the CoNLL-2005, 2009, and 2012 benchmarks for all languages available, and results show that neural SRL models can still benefit from syntactic information under certain conditions. Furthermore, we show the quantitative significance of syntax to neural SRL models together with a thorough empirical survey using existing models.

Download Full-text

An Applicative Survey on Few-Shot Learning

Recent Patents on Engineering ◽

10.2174/1872212115666210715121344 ◽

2021 ◽

Vol 15 ◽

Author(s):

Jianwei Zhang ◽

Xubin Zhang ◽

Lei Lv ◽

Yining Di ◽

Wei Chen

Keyword(s):

Large Scale ◽

Representation Learning ◽

Language Models ◽

Data Sets ◽

Research Directions ◽

Large Scale Data ◽

Cross Domain ◽

Meta Learning ◽

Definition Of ◽

Future Work

Background: Learning discriminative representation from large-scale data sets has made a breakthrough in decades. However, it is still a thorny problem to generate representative embedding from limited examples, for example, a class containing only one image. Recently, deep learning-based Few-Shot Learning (FSL) has been proposed. It tackles this problem by leveraging prior knowledge in various ways. Objective: In this work, we review recent advances of FSL from the perspective of high-dimensional representation learning. The results of the analysis can provide insights and directions for future work. Methods: We first present the definition of general FSL. Then we propose a general framework for the FSL problem and give the taxonomy under the framework. We survey two FSL directions: learning policy and meta-learning. Results: We review the advanced applications of FSL, including image classification, object detection, image segmentation and other tasks etc., as well as the corresponding benchmarks to provide an overview of recent progress. Conclusion: FSL needs to be further studied in medical images, language models, and reinforcement learning in future work. In addition, cross-domain FSL, successive FSL, and associated FSL are more challenging and valuable research directions.

Download Full-text

Information Extraction from Electronic Medical Records Using Multitask Recurrent Neural Network with Contextual Word Embedding

Applied Sciences ◽

10.3390/app9183658 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3658 ◽

Cited By ~ 6

Author(s):

Jianliang Yang ◽

Yuenan Liu ◽

Minghui Qian ◽

Chenghua Guan ◽

Xiangfei Yuan

Keyword(s):

Electronic Medical Records ◽

Medical Records ◽

Large Scale ◽

Short Term Memory ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Recognition Task ◽

Entity Recognition ◽

Language Models ◽

Named Entity

Clinical named entity recognition is an essential task for humans to analyze large-scale electronic medical records efficiently. Traditional rule-based solutions need considerable human effort to build rules and dictionaries; machine learning-based solutions need laborious feature engineering. For the moment, deep learning solutions like Long Short-term Memory with Conditional Random Field (LSTM–CRF) achieved considerable performance in many datasets. In this paper, we developed a multitask attention-based bidirectional LSTM–CRF (Att-biLSTM–CRF) model with pretrained Embeddings from Language Models (ELMo) in order to achieve better performance. In the multitask system, an additional task named entity discovery was designed to enhance the model’s perception of unknown entities. Experiments were conducted on the 2010 Informatics for Integrating Biology & the Bedside/Veterans Affairs (I2B2/VA) dataset. Experimental results show that our model outperforms the state-of-the-art solution both on the single model and ensemble model. Our work proposes an approach to improve the recall in the clinical named entity recognition task based on the multitask mechanism.

Download Full-text

Pushdown Automata in Statistical Machine Translation

Computational Linguistics ◽

10.1162/coli_a_00197 ◽

2014 ◽

Vol 40 (3) ◽

pp. 687-723 ◽

Cited By ~ 3

Author(s):

Cyril Allauzen ◽

Bill Byrne ◽

Adrià de Gispert ◽

Gonzalo Iglesias ◽

Michael Riley

Keyword(s):

Machine Translation ◽

Large Scale ◽

Complexity Analysis ◽

Statistical Machine Translation ◽

Language Model ◽

General Purpose ◽

Language Models ◽

Experimental Conditions ◽

Context Free ◽

Pushdown Automata

This article describes the use of pushdown automata (PDA) in the context of statistical machine translation and alignment under a synchronous context-free grammar. We use PDAs to compactly represent the space of candidate translations generated by the grammar when applied to an input sentence. General-purpose PDA algorithms for replacement, composition, shortest path, and expansion are presented. We describe HiPDT, a hierarchical phrase-based decoder using the PDA representation and these algorithms. We contrast the complexity of this decoder with a decoder based on a finite state automata representation, showing that PDAs provide a more suitable framework to achieve exact decoding for larger synchronous context-free grammars and smaller language models. We assess this experimentally on a large-scale Chinese-to-English alignment and translation task. In translation, we propose a two-pass decoding strategy involving a weaker language model in the first-pass to address the results of PDA complexity analysis. We study in depth the experimental conditions and tradeoffs in which HiPDT can achieve state-of-the-art performance for large-scale SMT.

Download Full-text