Word Embedding-Based Topic Similarity Measures

Background: As a known key phrase extraction algorithm, TextRank is an analogue of PageRank algorithm, which relied heavily on the statistics of term frequency in the manner of co-occurrence analysis. Objective: The frequency-based characteristic made it a neck-bottle for performance enhancement, and various improved TextRank algorithms were proposed in the recent years. Most of improvements incorporated semantic information into key phrase extraction algorithm and achieved improvement. Method: In this research, taking both syntactic and semantic information into consideration, we integrated syntactic tree algorithm and word embedding and put forward an algorithm of Word Embedding and Syntactic Information Algorithm (WESIA), which improved the accuracy of the TextRank algorithm. Results: By applying our method on a self-made test set and a public test set, the result implied that the proposed unsupervised key phrase extraction algorithm outperformed the other algorithms to some extent.

Download Full-text

MATHURA (MBI) - A NOVEL IMPUTATION MEASURE FOR IMPUTATION OF MISSING VALUES IN MEDICAL DATASETS

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666191216123352 ◽

2019 ◽

Vol 13 ◽

Author(s):

B. Mathura Bai ◽

N. Mangathayaru ◽

B. Padmaja Rani ◽

Shadi Aljawarneh

Keyword(s):

Similarity Measure ◽

Medical Records ◽

Missing Values ◽

Similarity Measures ◽

Common Problems ◽

Experiment Analysis

: Missing attribute values in medical datasets are one of the most common problems faced when mining medical datasets. Estimation of missing values is a major challenging task in pre-processing of datasets. Any wrong estimate of missing attribute values can lead to inefficient and improper classification thus resulting in lower classifier accuracies. Similarity measures play a key role during the imputation process. The use of an appropriate and better similarity measure can help to achieve better imputation and improved classification accuracies. This paper proposes a novel imputation measure for finding similarity between missing and non-missing instances in medical datasets. Experiments are carried by applying both the proposed imputation technique and popular benchmark existing imputation techniques. Classification is carried using KNN, J48, SMO and RBFN classifiers. Experiment analysis proved that after imputation of medical records using proposed imputation technique, the resulting classification accuracies reported by the classifiers KNN, J48 and SMO have improved when compared to other existing benchmark imputation techniques.

Download Full-text

Word Embedding-Based Topic Similarity Measures

Enhanced word embedding similarity measures using fuzzy rules for query expansion

Assumed similarity measures as predictors of team effectiveness in surveying. (Tech. Rep. No. 6.).

DISTANCE AND SIMILARITY MEASURES FOR INTUITIONISTIC MULTIPLICATIVE PREFERENCE RELATION AND ITS APPLICATIONS

Text Genre Detection Using Doc2Vec Word-embedding Language Model

A Simple Word Embedding Model for Lexical Substitution

Word Embedding Based Knowledge Representation with Extracting Relationship Between Scientific Terminologies

Faculty Opinions recommendation of Exploiting disjointness axioms to improve semantic similarity measures.

Non-Metric Similarity Measures

Key phrase Extraction by Improving TextRank with an Integration of Word Embedding and Syntactic Information

MATHURA (MBI) - A NOVEL IMPUTATION MEASURE FOR IMPUTATION OF MISSING VALUES IN MEDICAL DATASETS

Export Citation Format