Survey of Neural Text Representation Models

Karlo Babić; Sanda Martinčić-Ipšić; Ana Meštrović

doi:10.3390/info11110511

Survey of Neural Text Representation Models

Information ◽

10.3390/info11110511 ◽

2020 ◽

Vol 11 (11) ◽

pp. 511

Author(s):

Karlo Babić ◽

Sanda Martinčić-Ipšić ◽

Ana Meštrović

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Representation ◽

Neural Models ◽

Level Model ◽

Machine Readable ◽

Input Level ◽

Representation Level

In natural language processing, text needs to be transformed into a machine-readable representation before any processing. The quality of further natural language processing tasks greatly depends on the quality of those representations. In this survey, we systematize and analyze 50 neural models from the last decade. The models described are grouped by the architecture of neural networks as shallow, recurrent, recursive, convolutional, and attention models. Furthermore, we categorize these models by representation level, input level, model type, and model supervision. We focus on task-independent representation models, discuss their advantages and drawbacks, and subsequently identify the promising directions for future neural text representation models. We describe the evaluation datasets and tasks used in the papers that introduced the models and compare the models based on relevant evaluations. The quality of a representation model can be evaluated as its capability to generalize to multiple unrelated tasks. Benchmark standardization is visible amongst recent models and the number of different tasks models are evaluated on is increasing.

Sentiment of App with Word Vectors

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f1416.0986s319 ◽

2019 ◽

Vol 8 (6S3) ◽

pp. 2156-2159

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Text Data ◽

Vector Representations ◽

Text Sentiment Analysis

Vector representations for language have been shown to be useful in a number of Natural Language Processing tasks. In this paper, we aim to investigate the effectiveness of word vector representations for the problem of Sentiment Analysis. In particular, we target three sub-tasks namely sentiment words extraction, polarity of sentiment words detection, and text sentiment prediction. We investigate the effectiveness of vector representations over different text data and evaluate the quality of domain-dependent vectors. Vector representations has been used to compute various vector-based features and conduct systematically experiments to demonstrate their effectiveness. Using simple vector based features can achieve better results for text sentiment analysis of APP.

Natural Language Processing in Large-Scale Neural Models for Medical Screenings

Frontiers in Robotics and AI ◽

10.3389/frobt.2019.00062 ◽

2019 ◽

Vol 6 ◽

Cited By ~ 1

Author(s):

Catharina Marie Stille ◽

Trevor Bekolay ◽

Peter Blouw ◽

Bernd J. Kröger

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

Neural Models

Automatic Identification of Information Quality Metrics in Health News Stories

Frontiers in Public Health ◽

10.3389/fpubh.2020.515347 ◽

2020 ◽

Vol 8 ◽

Author(s):

Majed Al-Jefri ◽

Roger Evans ◽

Joon Lee ◽

Pietro Ghezzi

Keyword(s):

Machine Learning ◽

Health Care ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Information Quality ◽

Evaluation Process ◽

Health News ◽

News Stories

Objective: Many online and printed media publish health news of questionable trustworthiness and it may be difficult for laypersons to determine the information quality of such articles. The purpose of this work was to propose a methodology for the automatic assessment of the quality of health-related news stories using natural language processing and machine learning.Materials and Methods: We used a database from the website HealthNewsReview.org that aims to improve the public dialogue about health care. HealthNewsReview.org developed a set of criteria to critically analyze health care interventions' claims. In this work, we attempt to automate the evaluation process by identifying the indicators of those criteria using natural language processing-based machine learning on a corpus of more than 1,300 news stories. We explored features ranging from simple n-grams to more advanced linguistic features and optimized the feature selection for each task. Additionally, we experimented with the use of pre-trained natural language model BERT.Results: For some criteria, such as mention of costs, benefits, harms, and “disease-mongering,” the evaluation results were promising with an F1 measure reaching 81.94%, while for others the results were less satisfactory due to the dataset size, the need of external knowledge, or the subjectivity in the evaluation process.Conclusion: These used criteria are more challenging than those addressed by previous work, and our aim was to investigate how much more difficult the machine learning task was, and how and why it varied between criteria. For some criteria, the obtained results were promising; however, automated evaluation of the other criteria may not yet replace the manual evaluation process where human experts interpret text senses and make use of external knowledge in their assessment.

Identifying Heart Failure Symptoms and Poor Self-Management in Home Healthcare: A Natural Language Processing Study

10.3233/shti210653 ◽

2021 ◽

Author(s):

Sena Chae ◽

Jiyoun Song ◽

Marietta Ojo ◽

Maxim Topaz

Keyword(s):

Heart Failure ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Symptom Management ◽

Home Healthcare ◽

Self Management ◽

Clinical Notes ◽

Patients With Heart Failure

The goal of this natural language processing (NLP) study was to identify patients in home healthcare with heart failure symptoms and poor self-management (SM). The preliminary lists of symptoms and poor SM status were identified, NLP algorithms were used to refine the lists, and NLP performance was evaluated using 2.3 million home healthcare clinical notes. The overall precision to identify patients with heart failure symptoms and poor SM status was 0.86. The feasibility of methods was demonstrated to identify patients with heart failure symptoms and poor SM documented in home healthcare notes. This study facilitates utilizing key symptom information and patients’ SM status from unstructured data in electronic health records. The results of this study can be applied to better individualize symptom management to support heart failure patients’ quality-of-life.

Text Analysis of Assembly Work Instructions

Volume 1B: 35th Computers and Information in Engineering Conference ◽

10.1115/detc2015-47246 ◽

2015 ◽

Cited By ~ 1

Author(s):

Rahul Sharan Renu ◽

Gregory Mocko

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Lead Times ◽

Parts Of Speech ◽

Assembly Work ◽

And Performance ◽

Quality Of Products ◽

Speech Tagging

The objective of this research is to investigate the requirements and performance of parts-of-speech tagging of assembly work instructions. Natural Language Processing of assembly work instructions is required to perform data mining with the objective of knowledge reuse. Assembly work instructions are key process engineering elements that allow for predictable assembly quality of products and predictable assembly lead times. Authoring of assembly work instructions is a subjective process. It has been observed that most assembly work instructions are not grammatically complete sentences. It is hypothesized that this can lead to false parts-of-speech tagging (by Natural Language Processing tools). To test this hypothesis, two parts-of-speech taggers are used to tag 500 assembly work instructions (obtained from the automotive industry). The first parts-of-speech tagger is obtained from Natural Language Processing Toolkit (nltk.org) and the second parts-of-speech tagger is obtained from Stanford Natural Language Processing Group (nlp.stanford.edu). For each of these taggers, two experiments are conducted. In the first experiment, the assembly work instructions are input to the each tagger in raw form. In the second experiment, the assembly work instructions are preprocessed to make them grammatically complete, and then input to the tagger. It is found that the Stanford Natural Language Processing tagger with the preprocessed assembly work instructions produced the least number of false parts-of-speech tags.

Assessing Quality of Care (QC) Near End of Life (EOL): Use of Natural Language Processing (NLP) to Identify Brain Metastasis (BM) Cases for Evaluation of Radiation Therapy (RT) Receipt

International Journal of Radiation Oncology*Biology*Physics ◽

10.1016/j.ijrobp.2014.05.1740 ◽

2014 ◽

Vol 90 (1) ◽

pp. S577-S578

Author(s):

J.J. Ryoo ◽

C. Zheng ◽

M.K. Gould ◽

A.R. Kagan ◽

W.W. Lien

Keyword(s):

Radiation Therapy ◽

Quality Of Care ◽

Natural Language Processing ◽

Brain Metastasis ◽

Natural Language ◽

End Of Life ◽

Language Processing

Using Natural Language Processing to Classify Serious Illness Communication with Oncology Patients

10.1101/2021.08.20.21262082 ◽

2021 ◽

Author(s):

Anahita Davoudi ◽

Hegler Tissot ◽

Abigail Doucette ◽

Peter E Gabriel ◽

Ravi B. Parikh ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Institute Of Medicine ◽

Patient Centered ◽

Oncology Patients ◽

Serious Illness ◽

Patient Goals ◽

Core Measure

One core measure of healthcare quality set forth by the Institute of Medicine is whether care decisions match patient goals. High-quality "serious illness communication" about patient goals and prognosis is required to support patient-centered decision-making, however current methods are not sensitive enough to measure the quality of this communication or determine whether care delivered matches patient priorities. Natural language processing offers an efficient method for identification and evaluation of documented serious illness communication, which could serve as the basis for future quality metrics in oncology and other forms of serious illness. In this study, we trained NLP algorithms to identify and characterize serious illness communication with oncology patients.

ELEKTRONICZNY KORPUS TEKSTÓW POLSKICH Z XVII I XVIII W. – PROBLEMY TEORETYCZNE I WARSZTATOWE

Poradnik Językowy ◽

10.33896/porj.2020.8.3 ◽

2020 ◽

pp. 32-51

Author(s):

Włodzimierz Gruszczyński ◽

Dorota Adamiec ◽

Renata Bronikowska ◽

Aleksandra Wieczorek

Keyword(s):

Natural Language Processing ◽

Foreign Language ◽

Natural Language ◽

Language Processing ◽

18Th Century ◽

Text Representation ◽

Grammatical Structure ◽

Historical Material ◽

Polish Language ◽

National Corpus

Electronic Corpus of 17th- and 18th-century Polish Texts – theoretical and workshop problems Summary This paper presents the Electronic Corpus of 17th- and 18th-century Polish Texts (KorBa) – a large (13.5-million), annotated historical corpus available online. Its creation was modelled on the assumptions of the National Corpus of Polish (NKJP), yet the specifi c nature of the historical material enforced certain modifi cations of the solutions applied in NKJP, e.g. two forms of text representation (transliteration and transcription) were introduced, the principle of designating foreign-language fragments was adopted, and the tagset was adapted to the description of the grammatical structure of the Middle Polish language. The texts collected in KorBa are diversified in chronological, geographical, stylistic, and thematic terms although, due to e.g. limited access to the material, the postulate of representativeness and sustainability of the corpus was not fully implemented. The work on the corpus was to a large extent automated as a result of using natural language processing tools. Keywords: electronic text corpus – historical corpus – 17th-18th-century Polish – natural language processing

Concept of TF-IDF, Common Bag of Word and Word Embedding for Effective Sentiment Classification

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f4582.049620 ◽

2020 ◽

Vol 9 (4) ◽

pp. 2198-2201

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Sentiment Classification ◽

Word Embedding ◽

Text Representation ◽

Human Beings ◽

Text Data

Sentiment Classification is one of the well-known and most popular domain of machine learning and natural language processing. An algorithm is developed to understand the opinion of an entity similar to human beings. This research fining article presents a similar to the mention above. Concept of natural language processing is considered for text representation. Later novel word embedding model is proposed for effective classification of the data. Tf-IDF and Common BoW representation models were considered for representation of text data. Importance of these models are discussed in the respective sections. The proposed is testing using IMDB datasets. 50% training and 50% testing with three random shuffling of the datasets are used for evaluation of the model.

Business Sentiment Quotient Analysis using Natural Language Processing

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.d8721.049420 ◽

2020 ◽

Vol 9 (4) ◽

pp. 1350-1352

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Embedding Technique ◽

Computer Scientists ◽

Online Business ◽

New Research ◽

Python Programming ◽

Market Requirement

Online business has opened up several avenues for researchers and computer scientists to initiate new research models. The business activities that the customers accomplish certainly produce abundant information /data. Analysis of the data/information will obviously produce useful inferences and many declarations. These inferences may support the system in improving the quality of service, understand the current market requirement, Trend of the business, future need of the society and so on. In this connection the current paper is trying to propose a feature extraction technique named as Business Sentiment Quotient (BSQ). BSQ involves word2vec[1] word embedding technique from Natural Language Processing. Number of tweets related to business are accessed from twitter and processed to estimate BSQ using python programming language. BSQ may be utilized for further Machine Learning Activities.