Tag recommendation model using feature learning via word embedding

Author(s):  
Maryam Khanian Najafabadi ◽  
Madhavan Balan Nair ◽  
Azlinah Mohamed
10.2196/12310 ◽  
2019 ◽  
Vol 7 (3) ◽  
pp. e12310 ◽  
Author(s):  
Emeric Dynomant ◽  
Romain Lelong ◽  
Badisse Dahamna ◽  
Clément Massonnaud ◽  
Gaétan Kerdelhué ◽  
...  

Background Word embedding technologies, a set of language modeling and feature learning techniques in natural language processing (NLP), are now used in a wide range of applications. However, no formal evaluation and comparison have been made on the ability of each of the 3 current most famous unsupervised implementations (Word2Vec, GloVe, and FastText) to keep track of the semantic similarities existing between words, when trained on the same dataset. Objective The aim of this study was to compare embedding methods trained on a corpus of French health-related documents produced in a professional context. The best method will then help us develop a new semantic annotator. Methods Unsupervised embedding models have been trained on 641,279 documents originating from the Rouen University Hospital. These data are not structured and cover a wide range of documents produced in a clinical setting (discharge summary, procedure reports, and prescriptions). In total, 4 rated evaluation tasks were defined (cosine similarity, odd one, analogy-based operations, and human formal evaluation) and applied on each model, as well as embedding visualization. Results Word2Vec had the highest score on 3 out of 4 rated tasks (analogy-based operations, odd one similarity, and human validation), particularly regarding the skip-gram architecture. Conclusions Although this implementation had the best rate for semantic properties conservation, each model has its own qualities and defects, such as the training time, which is very short for GloVe, or morphological similarity conservation observed with FastText. Models and test sets produced by this study will be the first to be publicly available through a graphical interface to help advance the French biomedical research.


2015 ◽  
Author(s):  
Oren Melamud ◽  
Omer Levy ◽  
Ido Dagan

2018 ◽  
Author(s):  
Charles Kalish ◽  
Nigel Noll

Existing research suggests that adults and older children experience a tradeoff where instruction and feedback help them solve a problem efficiently, but lead them to ignore currently irrelevant information that might be useful in the future. It is unclear whether young children experience the same tradeoff. Eighty-seven children (ages five- to eight-years) and 42 adults participated in supervised feature prediction tasks either with or without an instructional hint. Follow-up tasks assessed learning of feature correlations and feature frequencies. Younger children tended to learn frequencies of both relevant and irrelevant features without instruction, but not the diagnostic feature correlation needed for the prediction task. With instruction, younger children did learn the diagnostic feature correlation, but then failed to learn the frequencies of irrelevant features. Instruction helped older children learn the correlation without limiting attention to frequencies. Adults learned the diagnostic correlation even without instruction, but with instruction no longer learned about irrelevant frequencies. These results indicate that young children do show some costs of learning with instruction characteristic of older children and adults. However, they also receive some of the benefits. The current study illustrates just what those tradeoffs might be, and how they might change over development.


Author(s):  
Sheng Zhang ◽  
Qi Luo ◽  
Yukun Feng ◽  
Ke Ding ◽  
Daniela Gifu ◽  
...  

Background: As a known key phrase extraction algorithm, TextRank is an analogue of PageRank algorithm, which relied heavily on the statistics of term frequency in the manner of co-occurrence analysis. Objective: The frequency-based characteristic made it a neck-bottle for performance enhancement, and various improved TextRank algorithms were proposed in the recent years. Most of improvements incorporated semantic information into key phrase extraction algorithm and achieved improvement. Method: In this research, taking both syntactic and semantic information into consideration, we integrated syntactic tree algorithm and word embedding and put forward an algorithm of Word Embedding and Syntactic Information Algorithm (WESIA), which improved the accuracy of the TextRank algorithm. Results: By applying our method on a self-made test set and a public test set, the result implied that the proposed unsupervised key phrase extraction algorithm outperformed the other algorithms to some extent.


Sign in / Sign up

Export Citation Format

Share Document