Lifelong Learning of Topics and Domain-Specific Word Embeddings

2018 ◽

Cited By ~ 1

Author(s):

Hu Xu ◽

Bing Liu ◽

Lei Shu ◽

Philip S. Yu

Keyword(s):

Lifelong Learning ◽

Large Scale ◽

General Purpose ◽

Word Embedding ◽

Experimental Results ◽

Word Embeddings ◽

High Quality ◽

Domain Specific ◽

The Past ◽

Meta Learning

Learning high-quality domain word embeddings is important for achieving good performance in many NLP tasks. General-purpose embeddings trained on large-scale corpora are often sub-optimal for domain-specific applications. However, domain-specific tasks often do not have large in-domain corpora for training high-quality domain embeddings. In this paper, we propose a novel lifelong learning setting for domain embedding. That is, when performing the new domain embedding, the system has seen many past domains, and it tries to expand the new in-domain corpus by exploiting the corpora from the past domains via meta-learning. The proposed meta-learner characterizes the similarities of the contexts of the same word in many domain corpora, which helps retrieve relevant data from the past domains to expand the new domain corpus. Experimental results show that domain embeddings produced from such a process improve the performance of the downstream tasks.

Download Full-text

Development and Evaluation of Novel Ophthalmology Domain-Specific Neural Word Embeddings to Predict Visual Prognosis

International Journal of Medical Informatics ◽

10.1016/j.ijmedinf.2021.104464 ◽

2021 ◽

pp. 104464

Author(s):

Sophia Wang ◽

Benjamin Tseng ◽

Tina Hernandez-Boussard

Keyword(s):

Word Embeddings ◽

Visual Prognosis ◽

Domain Specific

Download Full-text

Learning Domain-Specific Word Embeddings from COVID-19 Tweets

10.1109/bigdata52589.2021.9671817 ◽

2021 ◽

Author(s):

Steve Aibuedefe Aigbe ◽

Christoph Eick

Keyword(s):

Word Embeddings ◽

Domain Specific

Download Full-text

Inferring Multilingual Domain-Specific Word Embeddings From Large Document Corpora

IEEE Access ◽

10.1109/access.2021.3118093 ◽

2021 ◽

Vol 9 ◽

pp. 137309-137321

Author(s):

Luca Cagliero ◽

Moreno La Quatra

Keyword(s):

Word Embeddings ◽

Domain Specific

Download Full-text

Word embeddings for application in geosciences: development, evaluation and examples of soil-related concepts

10.5194/soil-2018-44 ◽

2019 ◽

Author(s):

José Padarian ◽

Ignacio Fuentes

Keyword(s):

Language Processing ◽

Dimensional Space ◽

Language Model ◽

Test Suite ◽

Word Embeddings ◽

General Domain ◽

Domain Specific ◽

Descriptive Information ◽

Development Evaluation ◽

Numerical Representations

Abstract. A large amount of descriptive information is available in most disciplines of geosciences. This information is usually considered subjective and ill-favoured compared with its numerical counterpart. Considering the advances in natural language processing and machine learning, it is possible to utilise descriptive information and encode it as dense vectors. These word embeddings lay on a multi-dimensional space where angles and distances have a linguistic interpretation. We used 280 764 full-text scientific articles related to geosciences to train a domain-specific language model capable of generating such embeddings. To evaluate the quality of the numerical representations, we performed three intrinsic evaluations, namely: the capacity to generate analogies, term relatedness compared with the opinion of a human subject, and categorisation of different groups of words. Since this is the first attempt to evaluate word embedding for tasks in the geosciences domain, we created a test suite specific for geosciences. We compared our results with general domain embeddings commonly used in other disciplines. As expected, our domain-specific embeddings (GeoVec) outperformed general domain embeddings in all tasks, with an overall performance improvement of 107.9 %. The resulting embedding and test suite will be made available for other researchers to use an expand.

Download Full-text

The effect of word embeddings and domain specific long-range contextual information on a Recurrent Neural Network Language Model

2019 Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA) ◽

10.1109/robomech.2019.8704827 ◽

2019 ◽

Author(s):

Linda Khumalo ◽

Georg I. Schltinz ◽

Quentin Williams

Keyword(s):

Neural Network ◽

Long Range ◽

Recurrent Neural Network ◽

Contextual Information ◽

Language Model ◽

Word Embeddings ◽

Domain Specific ◽

Network Language

Download Full-text

Word Vector Models Approach to Text Regression of Financial Risk Prediction

Symmetry ◽

10.3390/sym12010089 ◽

2020 ◽

Vol 12 (1) ◽

pp. 89 ◽

Cited By ~ 1

Author(s):

Hsiang-Yuan Yeh ◽

Yu-Ching Yeh ◽

Da-Bai Shen

Keyword(s):

Financial Risk ◽

Information Economics ◽

Bag Of Words ◽

Word Embeddings ◽

Textual Information ◽

Financial Reports ◽

Domain Specific ◽

Regulatory Changes ◽

Wide Range ◽

Vector Representations

Linking textual information in finance reports to the stock return volatility provides a perspective on exploring useful insights for risk management. We introduce different kinds of word vector representations in the modeling of textual information: bag-of-words, pre-trained word embeddings, and domain-specific word embeddings. We apply linear and non-linear methods to establish a text regression model for volatility prediction. A large number of collected annually-published financial reports in the period from 1996 to 2013 is used in the experiments. We demonstrate that the domain-specific word vector learned from data not only captures lexical semantics, but also has better performance than the pre-trained word embeddings and traditional bag-of-words model. Our approach significantly outperforms with smaller prediction error in the regression task and obtains a 4%–10% improvement in the ranking task compared to state-of-the-art methods. These improvements suggest that the textual information may provide measurable effects on long-term volatility forecasting. In addition, we also find that the variations and regulatory changes in reports make older reports less relevant for volatility prediction. Our approach opens a new method of research into information economics and can be applied to a wide range of financial-related applications.

Download Full-text

Exploring Earth Science Applications using Word Embeddings

10.5194/egusphere-egu2020-9966 ◽

2020 ◽

Author(s):

Derek Koehl ◽

Carson Davis ◽

Rahul Ramachandran ◽

Udaysankar Nair ◽

Manil Maskey

Keyword(s):

Earth Science ◽

Controlled Vocabulary ◽

Word Embeddings ◽

Faceted Search ◽

Semantic Relationships ◽

Domain Specific ◽

The Earth ◽

Increase In Accuracy ◽

Fully Connected ◽

Analogy Prediction

Word embedding are numeric representations of text which capture meanings and semantic relationships in text. Embeddings can be constructed using different methods such as One Hot encoding, Frequency-based or Prediction-based approaches. Prediction-based approaches such as&#160; Word2Vec, can be used to generate word embeddings that can capture the underlying semantics and word relationships in a corpus. Word2Vec embeddings generated from domain specific corpus have been shown in studies to both predict relationships and augment word vectors to improve classifications. We describe results from two different experiments utilizing word embeddings for Earth science constructed from a corpus of over 20,000 journal papers using Word2Vec.&#160;The first experiment explores the analogy prediction performance of word embeddings built from the Earth science journal corpus and trained using domain-specific vocabulary. Our results demonstrate that the accuracy of domain-specific word embeddings in predicting Earth science analogy questions outperforms the ability of general corpus embedding to predict general analogy questions. While the results are as anticipated,&#160; the substantial increase in accuracy, particularly in the lexicographical domain was encouraging. The results point to the need for developing a comprehensive Earth science analogy test set that covers the full breadth of lexicographical and encyclopedic categories for validating word embeddings.The second experiment utilizes the word embeddings to augment metadata keyword classifications. Metadata describing NASA datasets have science keywords that are manually assigned which can lead to errors and inconsistencies. These science keywords are controlled vocabulary and are used to aid data discovery via faceted search and relevancy ranking. Given the small size of the number of metadata records with proper description and keywords, word embeddings were used for augmentation. A fully connected neural network was trained to suggest keywords given a description text. This approach provided the best accuracy at ~76% as compared to other methods tested.

Download Full-text

Detecting Domain-Specific Ambiguities: An NLP Approach Based on Wikipedia Crawling and Word Embeddings

2017 IEEE 25th International Requirements Engineering Conference Workshops (REW) ◽

10.1109/rew.2017.20 ◽

2017 ◽

Cited By ~ 7

Author(s):

Alessio Ferrari ◽

Beatrice Donati ◽

Stefania Gnesi

Keyword(s):

Word Embeddings ◽

Domain Specific

Download Full-text

Domain Heuristic Fusion of Multi-Word Embeddings for Nutrient Value Prediction

Mathematics ◽

10.3390/math9161941 ◽

2021 ◽

Vol 9 (16) ◽

pp. 1941

Author(s):

Gordana Ispirova ◽

Tome Eftimov ◽

Barbara Koroušić Seljak

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Nutrient Content ◽

Relevant Information ◽

Word Embeddings ◽

Short Text ◽

Domain Specific ◽

Nutrient Value ◽

Protein Prediction ◽

Vector Representations

Being both a poison and a cure for many lifestyle and non-communicable diseases, food is inscribing itself into the prime focus of precise medicine. The monitoring of few groups of nutrients is crucial for some patients, and methods for easing their calculations are emerging. Our proposed machine learning pipeline deals with nutrient prediction based on learned vector representations on short text–recipe names. In this study, we explored how the prediction results change when, instead of using the vector representations of the recipe description, we use the embeddings of the list of ingredients. The nutrient content of one food depends on its ingredients; therefore, the text of the ingredients contains more relevant information. We define a domain-specific heuristic for merging the embeddings of the ingredients, which combines the quantities of each ingredient in order to use them as features in machine learning models for nutrient prediction. The results from the experiments indicate that the prediction results improve when using the domain-specific heuristic. The prediction models for protein prediction were highly effective, with accuracies up to 97.98%. Implementing a domain-specific heuristic for combining multi-word embeddings yields better results than using conventional merging heuristics, with up to 60% more accuracy in some cases.

Download Full-text

Lifelong Learning of Topics and Domain-Specific Word Embeddings

Lifelong Domain Word Embedding via Meta-Learning

Development and Evaluation of Novel Ophthalmology Domain-Specific Neural Word Embeddings to Predict Visual Prognosis

Learning Domain-Specific Word Embeddings from COVID-19 Tweets

Inferring Multilingual Domain-Specific Word Embeddings From Large Document Corpora

Word embeddings for application in geosciences: development, evaluation and examples of soil-related concepts

The effect of word embeddings and domain specific long-range contextual information on a Recurrent Neural Network Language Model

Word Vector Models Approach to Text Regression of Financial Risk Prediction

Exploring Earth Science Applications using Word Embeddings

Detecting Domain-Specific Ambiguities: An NLP Approach Based on Wikipedia Crawling and Word Embeddings

Domain Heuristic Fusion of Multi-Word Embeddings for Nutrient Value Prediction

Export Citation Format