Tag recommendation model using feature learning via word embedding

Background Word embedding technologies, a set of language modeling and feature learning techniques in natural language processing (NLP), are now used in a wide range of applications. However, no formal evaluation and comparison have been made on the ability of each of the 3 current most famous unsupervised implementations (Word2Vec, GloVe, and FastText) to keep track of the semantic similarities existing between words, when trained on the same dataset. Objective The aim of this study was to compare embedding methods trained on a corpus of French health-related documents produced in a professional context. The best method will then help us develop a new semantic annotator. Methods Unsupervised embedding models have been trained on 641,279 documents originating from the Rouen University Hospital. These data are not structured and cover a wide range of documents produced in a clinical setting (discharge summary, procedure reports, and prescriptions). In total, 4 rated evaluation tasks were defined (cosine similarity, odd one, analogy-based operations, and human formal evaluation) and applied on each model, as well as embedding visualization. Results Word2Vec had the highest score on 3 out of 4 rated tasks (analogy-based operations, odd one similarity, and human validation), particularly regarding the skip-gram architecture. Conclusions Although this implementation had the best rate for semantic properties conservation, each model has its own qualities and defects, such as the training time, which is very short for GloVe, or morphological similarity conservation observed with FastText. Models and test sets produced by this study will be the first to be publicly available through a graphical interface to help advance the French biomedical research.

Download Full-text

Education: Television series will feature learning skills

PsycEXTRA Dataset ◽

10.1037/e381972004-039 ◽

1988 ◽

Author(s):

Keyword(s):

Feature Learning ◽

Television Series ◽

Learning Skills

Download Full-text

Context-Aware Local Binary Feature Learning An Approach For Face Recognition

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i6.462465 ◽

2019 ◽

Vol 7 (6) ◽

pp. 462-465

Author(s):

Sushmitamai K. Ahire ◽

Nilesh R. Wankhade

Keyword(s):

Face Recognition ◽

Feature Learning ◽

Context Aware ◽

Binary Feature

Download Full-text

Text Genre Detection Using Doc2Vec Word-embedding Language Model

Language and Information ◽

10.29403/li.23.2.2 ◽

2019 ◽

Vol 23 (2) ◽

pp. 23-43

Author(s):

Dongsung Kim

Keyword(s):

Language Model ◽

Word Embedding ◽

Text Genre

Download Full-text

A Simple Word Embedding Model for Lexical Substitution

10.3115/v1/w15-1501 ◽

2015 ◽

Cited By ~ 13

Author(s):

Oren Melamud ◽

Omer Levy ◽

Ido Dagan

Keyword(s):

Word Embedding ◽

Lexical Substitution

Download Full-text

Word Embedding Based Knowledge Representation with Extracting Relationship Between Scientific Terminologies

Intelligent Automation & Soft Computing ◽

10.31209/2019.100000135 ◽

2019 ◽

pp. -1--1

Author(s):

Mucheol Kim ◽

Junho Kim ◽

Mincheol Shin

Keyword(s):

Knowledge Representation ◽

Word Embedding

Download Full-text

Instruction Helps and Hinders Young Children’s Feature Learning

10.31234/osf.io/9fjtb ◽

2018 ◽

Author(s):

Charles Kalish ◽

Nigel Noll

Keyword(s):

Young Children ◽

Feature Learning ◽

Irrelevant Information ◽

Diagnostic Feature ◽

Older Children ◽

Prediction Task ◽

Feature Correlation ◽

The Future

Existing research suggests that adults and older children experience a tradeoff where instruction and feedback help them solve a problem efficiently, but lead them to ignore currently irrelevant information that might be useful in the future. It is unclear whether young children experience the same tradeoff. Eighty-seven children (ages five- to eight-years) and 42 adults participated in supervised feature prediction tasks either with or without an instructional hint. Follow-up tasks assessed learning of feature correlations and feature frequencies. Younger children tended to learn frequencies of both relevant and irrelevant features without instruction, but not the diagnostic feature correlation needed for the prediction task. With instruction, younger children did learn the diagnostic feature correlation, but then failed to learn the frequencies of irrelevant features. Instruction helped older children learn the correlation without limiting attention to frequencies. Adults learned the diagnostic correlation even without instruction, but with instruction no longer learned about irrelevant frequencies. These results indicate that young children do show some costs of learning with instruction characteristic of older children and adults. However, they also receive some of the benefits. The current study illustrates just what those tradeoffs might be, and how they might change over development.

Download Full-text

Key phrase Extraction by Improving TextRank with an Integration of Word Embedding and Syntactic Information

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200820155846 ◽

2020 ◽

Vol 13 ◽

Author(s):

Sheng Zhang ◽

Qi Luo ◽

Yukun Feng ◽

Ke Ding ◽

Daniela Gifu ◽

...

Keyword(s):

Semantic Information ◽

Performance Enhancement ◽

Word Embedding ◽

The Other ◽

Test Set ◽

Pagerank Algorithm ◽

Phrase Extraction ◽

Extraction Algorithm ◽

Syntactic Information ◽

Key Phrase Extraction

Background: As a known key phrase extraction algorithm, TextRank is an analogue of PageRank algorithm, which relied heavily on the statistics of term frequency in the manner of co-occurrence analysis. Objective: The frequency-based characteristic made it a neck-bottle for performance enhancement, and various improved TextRank algorithms were proposed in the recent years. Most of improvements incorporated semantic information into key phrase extraction algorithm and achieved improvement. Method: In this research, taking both syntactic and semantic information into consideration, we integrated syntactic tree algorithm and word embedding and put forward an algorithm of Word Embedding and Syntactic Information Algorithm (WESIA), which improved the accuracy of the TextRank algorithm. Results: By applying our method on a self-made test set and a public test set, the result implied that the proposed unsupervised key phrase extraction algorithm outperformed the other algorithms to some extent.

Download Full-text