From frequency counts to contextualized word embeddings

With the objective of providing easier access to pathology specimens, slides and kodachromes with linkage to x-ray and the remainder of the patient’s medical records, an automated natural language parsing routine, based on dictionary look-up, was written for Surgical Pathology document-pairs, each consisting of a Request for Examination (authored by clinicians) and its corresponding report (authored by pathologists). These documents were input to the system in free-text English without manual editing or coding.Two types of indices were prepared. The first was an »inverted« file, available for on-line retrieval, for display of the content of the document-pairs, frequency counts of cases or listing of cases in table format. Retrievable items are patient’s and specimen’s identification data, date of operation, name of clinician and pathologist, etc. The English content of the operative procedure, clinical findings and pathologic diagnoses can be retrieved through logical combination of key words. The second type of index was a catalog. Three catalog files — »operation«, »clinical«, and »pathology« — were prepared by alphabetization of lines formed by the rotation of phrases, headed by keywords. These keywords were automatically selected and standardized by the parsing routine and the phrases were extracted from each sentence of each input document. Over 2,500 document-pairs have been entered and are currently being utilized for purpose of medical education.

Download Full-text

Combining Word Embeddings and Feature Embeddings for Fine-grained Relation Extraction

10.3115/v1/n15-1155 ◽

2015 ◽

Cited By ~ 5

Author(s):

Mo Yu ◽

Matthew R. Gormley ◽

Mark Dredze

Keyword(s):

Relation Extraction ◽

Word Embeddings ◽

Fine Grained

Download Full-text

Inherently Interpretable Sparse Word Embeddings through Sparse Coding

10.36934/t2020-011 ◽

2020 ◽

Author(s):

Adly Templeton

Keyword(s):

Sparse Coding ◽

Word Embeddings

Download Full-text

Off-Topic Spoken Response Detection with Word Embeddings

10.21437/interspeech.2017-388 ◽

2017 ◽

Author(s):

Su-Youn Yoon ◽

Chong Min Lee ◽

Ikkyu Choi ◽

Xinhao Wang ◽

Matthew Mulholland ◽

...

Keyword(s):

Word Embeddings ◽

Response Detection

Download Full-text

Contextual Word Embeddings and Topic Modeling in Healthy Dieting and Obesity

Journal of Healthcare Informatics Research ◽

10.1007/s41666-019-00052-5 ◽

2019 ◽

Vol 3 (2) ◽

pp. 159-183 ◽

Cited By ~ 1

Author(s):

Vijaya Kumari Yeruva ◽

Sidrah Junaid ◽

Yugyung Lee

Keyword(s):

Topic Modeling ◽

Word Embeddings

Download Full-text

Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media

Journal Of Big Data ◽

10.1186/s40537-021-00488-w ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Yahya Albalawi ◽

Jim Buckley ◽

Nikola S. Nikolov

Keyword(s):

Social Media ◽

Deep Learning ◽

Comprehensive Evaluation ◽

Classification Problem ◽

Data Sets ◽

Word Embeddings ◽

Data Set ◽

Lower Accuracy ◽

Health Related ◽

The Impact

AbstractThis paper presents a comprehensive evaluation of data pre-processing and word embedding techniques in the context of Arabic document classification in the domain of health-related communication on social media. We evaluate 26 text pre-processings applied to Arabic tweets within the process of training a classifier to identify health-related tweets. For this task we use the (traditional) machine learning classifiers KNN, SVM, Multinomial NB and Logistic Regression. Furthermore, we report experimental results with the deep learning architectures BLSTM and CNN for the same text classification problem. Since word embeddings are more typically used as the input layer in deep networks, in the deep learning experiments we evaluate several state-of-the-art pre-trained word embeddings with the same text pre-processing applied. To achieve these goals, we use two data sets: one for both training and testing, and another for testing the generality of our models only. Our results point to the conclusion that only four out of the 26 pre-processings improve the classification accuracy significantly. For the first data set of Arabic tweets, we found that Mazajak CBOW pre-trained word embeddings as the input to a BLSTM deep network led to the most accurate classifier with F1 score of 89.7%. For the second data set, Mazajak Skip-Gram pre-trained word embeddings as the input to BLSTM led to the most accurate model with F1 score of 75.2% and accuracy of 90.7% compared to F1 score of 90.8% achieved by Mazajak CBOW for the same architecture but with lower accuracy of 70.89%. Our results also show that the performance of the best of the traditional classifier we trained is comparable to the deep learning methods on the first dataset, but significantly worse on the second dataset.

Download Full-text