On the Hierarchical Information in a Single Contextualised Word Representation (Student Abstract)

Dean L. Slack; Mariann Hardey; Noura Al Moubayed

doi:10.1609/aaai.v34i10.7231

On the Hierarchical Information in a Single Contextualised Word Representation (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7231 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13917-13918

Author(s):

Dean L. Slack ◽

Mariann Hardey ◽

Noura Al Moubayed

Keyword(s):

Language Processing ◽

Fine Tuning ◽

Language Models ◽

Linguistic Features ◽

Widespread Application ◽

Linear Classifiers ◽

Sentence Level ◽

Level Information ◽

Word Representation ◽

And Performance

Contextual word embeddings produced by neural language models, such as BERT or ELMo, have seen widespread application and performance gains across many Natural Language Processing tasks, suggesting rich linguistic features encoded in their representations. This work aims to investigate to what extent any linguistic hierarchical information is encoded into a single contextual embedding. Using labelled constituency trees, we train simple linear classifiers on top of single contextualised word representations for ancestor sentiment analysis tasks at multiple constituency levels of a sentence. To assess the presence of hierarchical information throughout the networks, the linear classifiers are trained using representations produced by each intermediate layer of BERT and ELMo variants. We show that with no fine-tuning, a single contextualised representation encodes enough syntactic and semantic sentence-level information to significantly outperform a non-contextual baseline for classifying 5-class sentiment of its ancestor constituents at multiple levels of the constituency tree. Additionally, we show that both LSTM and transformer architectures trained on similarly sized datasets achieve similar levels of performance on these tasks. Future work looks to expand the analysis to a wider range of NLP tasks and contextualisers.

Download Full-text

Better Word Representation Vectors Using Syllabic Alphabet: A Case Study of Swahili

Applied Sciences ◽

10.3390/app9183648 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3648

Author(s):

Casper S. Shikali ◽

Zhou Sijie ◽

Liu Qihe ◽

Refuoe Mokhosi

Keyword(s):

Language Processing ◽

Critical Role ◽

Language Model ◽

Central Africa ◽

Spoken Language ◽

Language Models ◽

Word Embeddings ◽

Word Representation

Deep learning has extensively been used in natural language processing with sub-word representation vectors playing a critical role. However, this cannot be said of Swahili, which is a low resource and widely spoken language in East and Central Africa. This study proposed novel word embeddings from syllable embeddings (WEFSE) for Swahili to address the concern of word representation for agglutinative and syllabic-based languages. Inspired by the learning methodology of Swahili in beginner classes, we encoded respective syllables instead of characters, character n-grams or morphemes of words and generated quality word embeddings using a convolutional neural network. The quality of WEFSE was demonstrated by the state-of-art results in the syllable-aware language model on both the small dataset (31.229 perplexity value) and the medium dataset (45.859 perplexity value), outperforming character-aware language models. We further evaluated the word embeddings using word analogy task. To the best of our knowledge, syllabic alphabets have not been used to compose the word representation vectors. Therefore, the main contributions of the study are a syllabic alphabet, WEFSE, a syllabic-aware language model and a word analogy dataset for Swahili.

Download Full-text

Analysis of OWA operators for automatic keyphrase extraction in a semantic context

Intelligent Data Analysis ◽

10.3233/ida-200008 ◽

2020 ◽

Vol 24 ◽

pp. 43-62

Author(s):

Yamel Pérez-Guadarramas ◽

Manuel Barreiro-Guerrero ◽

Alfredo Simón-Cuevas ◽

Francisco P. Romero ◽

José A. Olivas

Keyword(s):

Language Processing ◽

Topic Modeling ◽

Semantic Analysis ◽

Keyphrase Extraction ◽

Linguistic Features ◽

New Approach ◽

Owa Operators ◽

Modeling Process ◽

And Performance ◽

Computational Systems

Automatic keyphrase extraction from texts is useful for many computational systems in the fields of natural language processing and text mining. Although a number of solutions to this problem have been described, semantic analysis is one of the least exploited linguistic features in the most widely-known proposals, causing the results obtained to have low accuracy and performance rates. This paper presents an unsupervised method for keyphrase extraction, based on the use of lexico-syntactic patterns for extracting information from texts, and a fuzzy topic modeling. An OWA operator combining several semantic measures was applied to the topic modeling process. This new approach was evaluated with Inspec and 500N-KPCrowd datasets. Several approaches within our proposal were evaluated against each other. A statistical analysis was performed to substantiate the best approach of the proposal. This best approach was also compared with other reported systems, giving promising results.

Download Full-text

Active Learning for Effectively Fine-Tuning Transfer Learning to Downstream Task

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3446343 ◽

2021 ◽

Vol 12 (2) ◽

pp. 1-24

Author(s):

Md Abul Bashar ◽

Richi Nayak

Keyword(s):

Active Learning ◽

Transfer Learning ◽

Language Processing ◽

State Of The Art ◽

Language Model ◽

Ensemble Classifier ◽

Classification Performance ◽

Fine Tuning ◽

Linguistic Features ◽

Better Than

Language model (LM) has become a common method of transfer learning in Natural Language Processing (NLP) tasks when working with small labeled datasets. An LM is pretrained using an easily available large unlabelled text corpus and is fine-tuned with the labelled data to apply to the target (i.e., downstream) task. As an LM is designed to capture the linguistic aspects of semantics, it can be biased to linguistic features. We argue that exposing an LM model during fine-tuning to instances that capture diverse semantic aspects (e.g., topical, linguistic, semantic relations) present in the dataset will improve its performance on the underlying task. We propose a Mixed Aspect Sampling (MAS) framework to sample instances that capture different semantic aspects of the dataset and use the ensemble classifier to improve the classification performance. Experimental results show that MAS performs better than random sampling as well as the state-of-the-art active learning models to abuse detection tasks where it is hard to collect the labelled data for building an accurate classifier.

Download Full-text

PatentNet: multi-label classification of patent documents using deep learning based language understanding

Scientometrics ◽

10.1007/s11192-021-04179-4 ◽

2021 ◽

Author(s):

Arousha Haghighian Roudsari ◽

Jafar Afshar ◽

Wookey Lee ◽

Suan Lee

Keyword(s):

Deep Learning ◽

Language Processing ◽

State Of The Art ◽

Classification Performance ◽

Fine Tuning ◽

Language Models ◽

Classification Task ◽

Domain Experts ◽

Patent Classification ◽

Patent Documents

AbstractPatent classification is an expensive and time-consuming task that has conventionally been performed by domain experts. However, the increase in the number of filed patents and the complexity of the documents make the classification task challenging. The text used in patent documents is not always written in a way to efficiently convey knowledge. Moreover, patent classification is a multi-label classification task with a large number of labels, which makes the problem even more complicated. Hence, automating this expensive and laborious task is essential for assisting domain experts in managing patent documents, facilitating reliable search, retrieval, and further patent analysis tasks. Transfer learning and pre-trained language models have recently achieved state-of-the-art results in many Natural Language Processing tasks. In this work, we focus on investigating the effect of fine-tuning the pre-trained language models, namely, BERT, XLNet, RoBERTa, and ELECTRA, for the essential task of multi-label patent classification. We compare these models with the baseline deep-learning approaches used for patent classification. We use various word embeddings to enhance the performance of the baseline models. The publicly available USPTO-2M patent classification benchmark and M-patent datasets are used for conducting experiments. We conclude that fine-tuning the pre-trained language models on the patent text improves the multi-label patent classification performance. Our findings indicate that XLNet performs the best and achieves a new state-of-the-art classification performance with respect to precision, recall, F1 measure, as well as coverage error, and LRAP.

Download Full-text

Using Character-Level and Entity-Level Representations to Enhance Bidirectional Encoder Representation From Transformers-Based Clinical Semantic Textual Similarity Model: ClinicalSTS Modeling Study (Preprint)

10.2196/preprints.23357 ◽

2020 ◽

Author(s):

Ying Xiong ◽

Shuai Chen ◽

Qingcai Chen ◽

Jun Yan ◽

Buzhou Tang

Keyword(s):

Language Processing ◽

Pearson Correlation ◽

Language Model ◽

Model Performance ◽

Clinical Text ◽

Sentence Level ◽

Level Information ◽

Semantically Enhanced ◽

Copy And Paste ◽

Semantic Textual Similarity

BACKGROUND With the popularity of electronic health records (EHRs), the quality of health care has been improved. However, there are also some problems caused by EHRs, such as the growing use of copy-and-paste and templates, resulting in EHRs of low quality in content. In order to minimize data redundancy in different documents, Harvard Medical School and Mayo Clinic organized a national natural language processing (NLP) clinical challenge (n2c2) on clinical semantic textual similarity (ClinicalSTS) in 2019. The task of this challenge is to compute the semantic similarity among clinical text snippets. OBJECTIVE In this study, we aim to investigate novel methods to model ClinicalSTS and analyze the results. METHODS We propose a semantically enhanced text matching model for the 2019 n2c2/Open Health NLP (OHNLP) challenge on ClinicalSTS. The model includes 3 representation modules to encode clinical text snippet pairs at different levels: (1) character-level representation module based on convolutional neural network (CNN) to tackle the out-of-vocabulary problem in NLP; (2) sentence-level representation module that adopts a pretrained language model bidirectional encoder representation from transformers (BERT) to encode clinical text snippet pairs; and (3) entity-level representation module to model clinical entity information in clinical text snippets. In the case of entity-level representation, we compare 2 methods. One encodes entities by the entity-type label sequence corresponding to text snippet (called entity I), whereas the other encodes entities by their representation in MeSH, a knowledge graph in the medical domain (called entity II). RESULTS We conduct experiments on the ClinicalSTS corpus of the 2019 n2c2/OHNLP challenge for model performance evaluation. The model only using BERT for text snippet pair encoding achieved a Pearson correlation coefficient (PCC) of 0.848. When character-level representation and entity-level representation are individually added into our model, the PCC increased to 0.857 and 0.854 (entity I)/0.859 (entity II), respectively. When both character-level representation and entity-level representation are added into our model, the PCC further increased to 0.861 (entity I) and 0.868 (entity II). CONCLUSIONS Experimental results show that both character-level information and entity-level information can effectively enhance the BERT-based STS model.

Download Full-text

Deep learning for conflicting statements detection in text

10.7287/peerj.preprints.26589 ◽

2018 ◽

Author(s):

Vijay Lingam ◽

Simran Bhuria ◽

Mayukh Nair ◽

Divij Gurpreetsingh ◽

Anjali Goyal ◽

...

Keyword(s):

Deep Learning ◽

Language Processing ◽

Short Term Memory ◽

Classification Problem ◽

Linguistic Features ◽

Error Matrix ◽

Classification Framework ◽

Word Representation ◽

Series Of Experiments ◽

Long Short Term Memory

Background. Automatic contradiction detection or conflicting statements detection in text consists of identifying discrepancy, inconsistency and defiance in text and has several real world applications in questions and answering systems, multi-document summarization, dispute detection and finder in news, and detection of contradictions in opinions and sentiments on social media. Automatic contradiction detection is a technically challenging natural language processing problem. Contradiction detection between sources of text or two sentence pairs can be framed as a classification problem. Methods. We propose an approach for detecting three different types of contradiction: negation, antonyms and numeric mismatch. We derive several linguistic features from text and use it in a classification framework for detecting contradictions. The novelty of our approach in context to existing work is in the application of artificial neural networks and deep learning. Our approach uses techniques such as Long short-term memory (LSTM) and Global Vectors for Word Representation (GloVe). We conduct a series of experiments on three publicly available dataset on contradiction detection: Stanford dataset, SemEval dataset and PHEME dataset. In addition to existing dataset, we also create more dataset and make it publicly available. We measure the performance of our proposed approach using confusion and error matrix and accuracy. Results. There are three feature combinations on our dataset: manual features, LSTM based features and combination of manual and LSTM features. The accuracy of our classifier based on both LSTM and manual features for the SemEval dataset is 91.2%. The classifier was able to correctly classify 3204 out of 3513 instances. The accuracy of our classifier based on both LSTM and manual features for the Stanford dataset is 71.9%. The classifier was able to correctly classify 855 out of 1189 instances. The accuracy for the PHEME dataset is the highest across all datasets. The accuracy for the contradiction class is 96.85%. Discussion. Experimental analysis demonstrate encouraging results proving our hypothesis that deep learning along with LSTM based features can be used for identifying contradictions in text. Our results shows accuracy improvement over manual features after applying LSTM based features. The accuracy results varies across datasets and we observe different accuracy across multiple types of contradictions. Feature analysis shows that the discriminatory power of the five feature varies.

Download Full-text

Rare Words: A Major Problem for Contextualized Embeddings and How to Fix it by Attentive Mimicking

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6403 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8766-8774 ◽

Cited By ~ 1

Author(s):

Timo Schick ◽

Hinrich Schütze

Keyword(s):

Neural Network ◽

Natural Language Processing ◽

Language Processing ◽

Deep Neural Network ◽

Language Model ◽

Language Modeling ◽

Fine Tuning ◽

Language Models ◽

Network Architectures ◽

Semantic Properties

Pretraining deep neural network architectures with a language modeling objective has brought large improvements for many natural language processing tasks. Exemplified by BERT, a recently proposed such architecture, we demonstrate that despite being trained on huge amounts of data, deep language models still struggle to understand rare words. To fix this problem, we adapt Attentive Mimicking, a method that was designed to explicitly learn embeddings for rare words, to deep language models. In order to make this possible, we introduce one-token approximation, a procedure that enables us to use Attentive Mimicking even when the underlying language model uses subword-based tokenization, i.e., it does not assign embeddings to all words. To evaluate our method, we create a novel dataset that tests the ability of language models to capture semantic properties of words without any task-specific fine-tuning. Using this dataset, we show that adding our adapted version of Attentive Mimicking to BERT does substantially improve its understanding of rare words.

Download Full-text

Deep semi-supervised learning ensemble framework for classifying co-mentions of human proteins and phenotypes

BMC Bioinformatics ◽

10.1186/s12859-021-04421-z ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Morteza Pourreza Shahri ◽

Indika Kahanda

Keyword(s):

Ensemble Learning ◽

Language Processing ◽

Human Protein ◽

Language Models ◽

Convolutional Networks ◽

Relationship Extraction ◽

Novel Approach ◽

Sentence Level ◽

Human Proteins ◽

Automated Tools

Abstract Background Identifying human protein-phenotype relationships has attracted researchers in bioinformatics and biomedical natural language processing due to its importance in uncovering rare and complex diseases. Since experimental validation of protein-phenotype associations is prohibitive, automated tools capable of accurately extracting these associations from the biomedical text are in high demand. However, while the manual annotation of protein-phenotype co-mentions required for training such models is highly resource-consuming, extracting millions of unlabeled co-mentions is straightforward. Results In this study, we propose a novel deep semi-supervised ensemble framework that combines deep neural networks, semi-supervised, and ensemble learning for classifying human protein-phenotype co-mentions with the help of unlabeled data. This framework allows the ability to incorporate an extensive collection of unlabeled sentence-level co-mentions of human proteins and phenotypes with a small labeled dataset to enhance overall performance. We develop PPPredSS, a prototype of our proposed semi-supervised framework that combines sophisticated language models, convolutional networks, and recurrent networks. Our experimental results demonstrate that the proposed approach provides a new state-of-the-art performance in classifying human protein-phenotype co-mentions by outperforming other supervised and semi-supervised counterparts. Furthermore, we highlight the utility of PPPredSS in powering a curation assistant system through case studies involving a group of biologists. Conclusions This article presents a novel approach for human protein-phenotype co-mention classification based on deep, semi-supervised, and ensemble learning. The insights and findings from this work have implications for biomedical researchers, biocurators, and the text mining community working on biomedical relationship extraction.

Download Full-text

Pretrained Language Model for Text Generation: A Survey

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/612 ◽

2021 ◽

Author(s):

Junyi Li ◽

Tianyi Tang ◽

Wayne Xin Zhao ◽

Ji-Rong Wen

Keyword(s):

Language Processing ◽

Language Model ◽

Fine Tuning ◽

Language Models ◽

Text Generation ◽

Future Directions ◽

The Core ◽

Task Definition ◽

Core Content ◽

Challenging Tasks

Text generation has become one of the most important yet challenging tasks in natural language processing (NLP). The resurgence of deep learning has greatly advanced this field by neural generation models, especially the paradigm of pretrained language models (PLMs). In this paper, we present an overview of the major advances achieved in the topic of PLMs for text generation. As the preliminaries, we present the general task definition and briefly describe the mainstream architectures of PLMs for text generation. As the core content, we discuss how to adapt existing PLMs to model different input data and satisfy special properties in the generated text. We further summarize several important fine-tuning strategies for text generation. Finally, we present several future directions and conclude this paper. Our survey aims to provide text generation researchers a synthesis and pointer to related research.

Download Full-text

A study of deep learning methods for de-identification of clinical notes in cross-institute settings

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-019-0935-4 ◽

2019 ◽

Vol 19 (S5) ◽

Cited By ~ 5

Author(s):

Xi Yang ◽

Tianchen Lyu ◽

Qian Li ◽

Chih-Yin Lee ◽

Jiang Bian ◽

...

Keyword(s):

Deep Learning ◽

Language Processing ◽

Biomedical Literature ◽

Fine Tuning ◽

Word Embeddings ◽

Patient Privacy ◽

Learning Models ◽

Linguistic Features ◽

Clinical Text ◽

Clinical Notes

Abstract Background De-identification is a critical technology to facilitate the use of unstructured clinical text while protecting patient privacy and confidentiality. The clinical natural language processing (NLP) community has invested great efforts in developing methods and corpora for de-identification of clinical notes. These annotated corpora are valuable resources for developing automated systems to de-identify clinical text at local hospitals. However, existing studies often utilized training and test data collected from the same institution. There are few studies to explore automated de-identification under cross-institute settings. The goal of this study is to examine deep learning-based de-identification methods at a cross-institute setting, identify the bottlenecks, and provide potential solutions. Methods We created a de-identification corpus using a total 500 clinical notes from the University of Florida (UF) Health, developed deep learning-based de-identification models using 2014 i2b2/UTHealth corpus, and evaluated the performance using UF corpus. We compared five different word embeddings trained from the general English text, clinical text, and biomedical literature, explored lexical and linguistic features, and compared two strategies to customize the deep learning models using UF notes and resources. Results Pre-trained word embeddings using a general English corpus achieved better performance than embeddings from de-identified clinical text and biomedical literature. The performance of deep learning models trained using only i2b2 corpus significantly dropped (strict and relax F1 scores dropped from 0.9547 and 0.9646 to 0.8568 and 0.8958) when applied to another corpus annotated at UF Health. Linguistic features could further improve the performance of de-identification in cross-institute settings. After customizing the models using UF notes and resource, the best model achieved the strict and relaxed F1 scores of 0.9288 and 0.9584, respectively. Conclusions It is necessary to customize de-identification models using local clinical text and other resources when applied in cross-institute settings. Fine-tuning is a potential solution to re-use pre-trained parameters and reduce the training time to customize deep learning-based de-identification models trained using clinical corpus from a different institution.

Download Full-text