IDENTIFYING DOMAIN-SPECIFIC SENSES AND ITS APPLICATION TO TEXT CLASSIFICATION

2020 ◽  
Author(s):  
Xin Wu ◽  
Yi Cai ◽  
Yang Kai ◽  
Tao Wang ◽  
Qing Li

Author(s):  
Pratiksha Bongale

Today’s world is mostly data-driven. To deal with the humongous amount of data, Machine Learning and Data Mining strategies are put into usage. Traditional ML approaches presume that the model is tested on a dataset extracted from the same domain from where the training data has been taken from. Nevertheless, some real-world situations require machines to provide good results with very little domain-specific training data. This creates room for the development of machines that are capable of predicting accurately by being trained on easily found data. Transfer Learning is the key to it. It is the scientific art of applying the knowledge gained while learning a task to another task that is similar to the previous one in some or another way. This article focuses on building a model that is capable of differentiating text data into binary classes; one roofing the text data that is spam and the other not containing spam using BERT’s pre-trained model (bert-base-uncased). This pre-trained model has been trained on Wikipedia and Book Corpus data and the goal of this paper is to highlight the pre-trained model’s capabilities to transfer the knowledge that it has learned from its training (Wiki and Book Corpus) to classifying spam texts from the rest.


2016 ◽  
Vol 78 ◽  
pp. 70-79 ◽  
Author(s):  
Sebastian Schmidt ◽  
Steffen Schnitzer ◽  
Christoph Rensing

Author(s):  
Aleksandra Edwards ◽  
Jose Camacho-Collados ◽  
Hélène De Ribaupierre ◽  
Alun Preece

2020 ◽  
Vol 10 (17) ◽  
pp. 6052
Author(s):  
Attaporn Wangpoonsarp ◽  
Kazuya Shimura ◽  
Fumiyo Fukumoto

This paper focuses on the domain-specific senses of words and proposes a method for detecting predominant sense depending on each domain. Our Domain-Specific Senses (DSS) model is an unsupervised manner and detects predominant senses in each domain. We apply a simple Markov Random Walk (MRW) model to ranking senses for each domain. It decides the importance of a sense within a graph by using the similarity of senses. The similarity of senses is obtained by using distributional representations of words from gloss texts in the thesaurus. It can capture large semantic context and thus does not require manual annotation of sense-tagged data. We used the Reuters corpus and the WordNet in the experiments. We applied the results of domain-specific senses to text classification and examined how DSS affects the overall performance of the text classification task. We compared our DSS model with one of the word sense disambiguation techniques (WSD), Context2vec, and the results demonstrate our domain-specific sense approach gains 0.053 F1 improvement on average over the WSD approach.


2021 ◽  
Author(s):  
Huihui Xu ◽  
Jaromir Savelka ◽  
Kevin D. Ashley

In this paper, we treat sentence annotation as a classification task. We employ sequence-to-sequence models to take sentence position information into account in identifying case law sentences as issues, conclusions, or reasons. We also compare the legal domain specific sentence embedding with other general purpose sentence embeddings to gauge the effect of legal domain knowledge, captured during pre-training, on text classification. We deployed the models on both summaries and full-text decisions. We found that the sentence position information is especially useful for full-text sentence classification. We also verified that legal domain specific sentence embeddings perform better, and that meta-sentence embedding can further enhance performance when sentence position information is included.


Sign in / Sign up

Export Citation Format

Share Document