Cross Domain Aspect Extraction Using Various Embedding Techniques and Language Models

Background: Learning discriminative representation from large-scale data sets has made a breakthrough in decades. However, it is still a thorny problem to generate representative embedding from limited examples, for example, a class containing only one image. Recently, deep learning-based Few-Shot Learning (FSL) has been proposed. It tackles this problem by leveraging prior knowledge in various ways. Objective: In this work, we review recent advances of FSL from the perspective of high-dimensional representation learning. The results of the analysis can provide insights and directions for future work. Methods: We first present the definition of general FSL. Then we propose a general framework for the FSL problem and give the taxonomy under the framework. We survey two FSL directions: learning policy and meta-learning. Results: We review the advanced applications of FSL, including image classification, object detection, image segmentation and other tasks etc., as well as the corresponding benchmarks to provide an overview of recent progress. Conclusion: FSL needs to be further studied in medical images, language models, and reinforcement learning in future work. In addition, cross-domain FSL, successive FSL, and associated FSL are more challenging and valuable research directions.

Download Full-text

Neural Networks Incorporating Unlabeled and Partially-labeled Data for Cross-domain Chinese Word Segmentation

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/640 ◽

2018 ◽

Cited By ~ 1

Author(s):

Lujun Zhao ◽

Qi Zhang ◽

Peng Wang ◽

Xiaoyu Liu

Keyword(s):

Large Scale ◽

Word Segmentation ◽

Language Models ◽

Cross Entropy ◽

Chinese Word ◽

Chinese Word Segmentation ◽

Domain Specific ◽

Partially Labeled Data ◽

Cross Domain ◽

Resource Poor

Most existing Chinese word segmentation (CWS) methods are usually supervised. Hence, large-scale annotated domain-specific datasets are needed for training. In this paper, we seek to address the problem of CWS for the resource-poor domains that lack annotated data. A novel neural network model is proposed to incorporate unlabeled and partially-labeled data. To make use of unlabeled data, we combine a bidirectional LSTM segmentation model with two character-level language models using a gate mechanism. These language models can capture co-occurrence information. To make use of partially-labeled data, we modify the original cross entropy loss function of RNN. Experimental results demonstrate that the method performs well on CWS tasks in a series of domains.

Download Full-text

Cross-Domain Sentiment Classification With Bidirectional Contextualized Transformer Language Models

IEEE Access ◽

10.1109/access.2019.2952360 ◽

2019 ◽

Vol 7 ◽

pp. 163219-163230 ◽

Cited By ~ 5

Author(s):

Batsergelen Myagmar ◽

Jie Li ◽

Shigetomo Kimura

Keyword(s):

Sentiment Classification ◽

Language Models ◽

Cross Domain

Download Full-text

Cross-Domain Authorship Attribution Using Pre-trained Language Models

IFIP Advances in Information and Communication Technology - Artificial Intelligence Applications and Innovations ◽

10.1007/978-3-030-49161-1_22 ◽

2020 ◽

pp. 255-266

Author(s):

Georgios Barlas ◽

Efstathios Stamatatos

Keyword(s):

Language Models ◽

Authorship Attribution ◽

Cross Domain

Download Full-text

Transfer Language Space with Similar Domain Adaptation: A Case Study with Hepatocellular Carcinoma

10.1101/2020.08.26.20182659 ◽

2020 ◽

Author(s):

Patricia Balthazar ◽

Scott Jeffery Lee ◽

Daniel Rubin ◽

Terry Dessar ◽

Judy Gichoya ◽

...

Keyword(s):

Hepatocellular Carcinoma ◽

Domain Adaptation ◽

Language Model ◽

Complex Model ◽

Language Models ◽

Pixel Intensity ◽

Cross Domain ◽

Radiology Reports ◽

Generic Language ◽

Similar Domain

Transfer learning is a common practice in image classification with deep learning where the available data is often limited for training a complex model with millions of parameters. However, transferring language models requires special attention since cross-domain vocabularies (e.g. between news articles and radiology reports) do not always overlap as the pixel intensity range overlaps mostly for images. We present a concept of similar domain adaptation where we transfer an interinstitutional language model between two different modalities (ultrasound to MRI) to capture liver abnormalities. Our experiments show that such transfer is more effective for performing shared targeted task than generic language space transfer. We use MRI screening exam reports for hepatocellular carcinoma as the use-case and apply the transfer language space strategy to automatically label thousands of imaging exams.

Download Full-text

Evaluating Document Coherence Modeling

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00388 ◽

2021 ◽

Vol 9 ◽

pp. 621-640

Author(s):

Aili Shen ◽

Meladel Mistica ◽

Bahar Salehi ◽

Hang Li ◽

Timothy Baldwin ◽

...

Keyword(s):

Intrusion Detection ◽

Detection Task ◽

Language Models ◽

Cross Domain ◽

Discourse Modeling ◽

The Cross ◽

Substantial Drop

Abstract While pretrained language models (LMs) have driven impressive gains over morpho-syntactic and semantic tasks, their ability to model discourse and pragmatic phenomena is less clear. As a step towards a better understanding of their discourse modeling capabilities, we propose a sentence intrusion detection task. We examine the performance of a broad range of pretrained LMs on this detection task for English. Lacking a dataset for the task, we introduce INSteD, a novel intruder sentence detection dataset, containing 170,000+ documents constructed from English Wikipedia and CNN news articles. Our experiments show that pretrained LMs perform impressively in in-domain evaluation, but experience a substantial drop in the cross-domain setting, indicating limited generalization capacity. Further results over a novel linguistic probe dataset show that there is substantial room for improvement, especially in the cross- domain setting.

Download Full-text

A novel scheme of domain transfer in document-level cross-domain sentiment classification

Journal of Information Science ◽

10.1177/01655515211012329 ◽

2021 ◽

pp. 016555152110123

Author(s):

Yueting Lei ◽

Yanting Li

Keyword(s):

Language Model ◽

Sentiment Classification ◽

Language Models ◽

Product Reviews ◽

Data Set ◽

Emotional Words ◽

Cross Domain ◽

Text Ranking ◽

Domain Transfer

The sentiment classification aims to learn sentiment features from the annotated corpus and automatically predict the sentiment polarity of new sentiment text. However, people have different ways of expressing feelings in different domains. Thus, there are important differences in the characteristics of sentimental distribution across different domains. At the same time, in certain specific domains, due to the high cost of corpus collection, there is no annotated corpus available for the classification of sentiment. Therefore, it is necessary to leverage or reuse existing annotated corpus for training. In this article, we proposed a new algorithm for extracting central sentiment sentences in product reviews, and improved the pre-trained language model Bidirectional Encoder Representations from Transformers (BERT) to achieve the domain transfer for cross-domain sentiment classification. We used various pre-training language models to prove the effectiveness of the newly proposed joint algorithm for text-ranking and emotional words extraction, and utilised Amazon product reviews data set to demonstrate the effectiveness of our proposed domain-transfer framework. The experimental results of 12 different cross-domain pairs showed that the new cross-domain classification method was significantly better than several popular cross-domain sentiment classification methods.

Download Full-text