Cross Domain Aspect Extraction Using Various Embedding Techniques and Language Models

Author(s):  
Bonson Sebastian Mampilli ◽  
Deepa Anand
2021 ◽  
Vol 15 ◽  
Author(s):  
Jianwei Zhang ◽  
Xubin Zhang ◽  
Lei Lv ◽  
Yining Di ◽  
Wei Chen

Background: Learning discriminative representation from large-scale data sets has made a breakthrough in decades. However, it is still a thorny problem to generate representative embedding from limited examples, for example, a class containing only one image. Recently, deep learning-based Few-Shot Learning (FSL) has been proposed. It tackles this problem by leveraging prior knowledge in various ways. Objective: In this work, we review recent advances of FSL from the perspective of high-dimensional representation learning. The results of the analysis can provide insights and directions for future work. Methods: We first present the definition of general FSL. Then we propose a general framework for the FSL problem and give the taxonomy under the framework. We survey two FSL directions: learning policy and meta-learning. Results: We review the advanced applications of FSL, including image classification, object detection, image segmentation and other tasks etc., as well as the corresponding benchmarks to provide an overview of recent progress. Conclusion: FSL needs to be further studied in medical images, language models, and reinforcement learning in future work. In addition, cross-domain FSL, successive FSL, and associated FSL are more challenging and valuable research directions.


Author(s):  
Lujun Zhao ◽  
Qi Zhang ◽  
Peng Wang ◽  
Xiaoyu Liu

Most existing Chinese word segmentation (CWS) methods are usually supervised. Hence, large-scale annotated domain-specific datasets are needed for training. In this paper, we seek to address the problem of CWS for the resource-poor domains that lack annotated data. A novel neural network model is proposed to incorporate unlabeled and partially-labeled data. To make use of unlabeled data, we combine a bidirectional LSTM segmentation model with two character-level language models using a gate mechanism. These language models can capture co-occurrence information. To make use of partially-labeled data, we modify the original cross entropy loss function of RNN. Experimental results demonstrate that the method performs well on CWS tasks in a series of domains.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 163219-163230 ◽  
Author(s):  
Batsergelen Myagmar ◽  
Jie Li ◽  
Shigetomo Kimura

2020 ◽  
Author(s):  
Patricia Balthazar ◽  
Scott Jeffery Lee ◽  
Daniel Rubin ◽  
Terry Dessar ◽  
Judy Gichoya ◽  
...  

Transfer learning is a common practice in image classification with deep learning where the available data is often limited for training a complex model with millions of parameters. However, transferring language models requires special attention since cross-domain vocabularies (e.g. between news articles and radiology reports) do not always overlap as the pixel intensity range overlaps mostly for images. We present a concept of similar domain adaptation where we transfer an interinstitutional language model between two different modalities (ultrasound to MRI) to capture liver abnormalities. Our experiments show that such transfer is more effective for performing shared targeted task than generic language space transfer. We use MRI screening exam reports for hepatocellular carcinoma as the use-case and apply the transfer language space strategy to automatically label thousands of imaging exams.


2021 ◽  
Vol 9 ◽  
pp. 621-640
Author(s):  
Aili Shen ◽  
Meladel Mistica ◽  
Bahar Salehi ◽  
Hang Li ◽  
Timothy Baldwin ◽  
...  

Abstract While pretrained language models (LMs) have driven impressive gains over morpho-syntactic and semantic tasks, their ability to model discourse and pragmatic phenomena is less clear. As a step towards a better understanding of their discourse modeling capabilities, we propose a sentence intrusion detection task. We examine the performance of a broad range of pretrained LMs on this detection task for English. Lacking a dataset for the task, we introduce INSteD, a novel intruder sentence detection dataset, containing 170,000+ documents constructed from English Wikipedia and CNN news articles. Our experiments show that pretrained LMs perform impressively in in-domain evaluation, but experience a substantial drop in the cross-domain setting, indicating limited generalization capacity. Further results over a novel linguistic probe dataset show that there is substantial room for improvement, especially in the cross- domain setting.


2021 ◽  
pp. 016555152110123
Author(s):  
Yueting Lei ◽  
Yanting Li

The sentiment classification aims to learn sentiment features from the annotated corpus and automatically predict the sentiment polarity of new sentiment text. However, people have different ways of expressing feelings in different domains. Thus, there are important differences in the characteristics of sentimental distribution across different domains. At the same time, in certain specific domains, due to the high cost of corpus collection, there is no annotated corpus available for the classification of sentiment. Therefore, it is necessary to leverage or reuse existing annotated corpus for training. In this article, we proposed a new algorithm for extracting central sentiment sentences in product reviews, and improved the pre-trained language model Bidirectional Encoder Representations from Transformers (BERT) to achieve the domain transfer for cross-domain sentiment classification. We used various pre-training language models to prove the effectiveness of the newly proposed joint algorithm for text-ranking and emotional words extraction, and utilised Amazon product reviews data set to demonstrate the effectiveness of our proposed domain-transfer framework. The experimental results of 12 different cross-domain pairs showed that the new cross-domain classification method was significantly better than several popular cross-domain sentiment classification methods.


2018 ◽  
Vol 114 ◽  
pp. 70-80 ◽  
Author(s):  
Ricardo Marcondes Marcacini ◽  
Rafael Geraldeli Rossi ◽  
Ivone Penque Matsuno ◽  
Solange Oliveira Rezende

Sign in / Sign up

Export Citation Format

Share Document