Representation of Words in Natural Language Processing: A Survey

Natural Language ◽

Language Processing ◽

Word Embeddings ◽

Vector Representation ◽

Advantages And Disadvantages ◽

Word Representation ◽

The Relationship

The article is devoted to research to the state-of-art vector representation of words in natural language processing. Three main types of vector representation of a word are described, namely: static word embeddings, use of deep neural networks for word representation and dynamic) word embeddings based on the context of the text. This is a very actual and much-demanded area in natural language processing, computational linguistics and artificial intelligence at all. Proposed to consider several different models for vector representation of the word (or word embeddings), from the simplest (as a representation of text that describes the occurrence of words within a document or learning the relationship between a pair of words) to the multilayered neural networks and deep bidirectional transformers for language understanding, are described chronologically in relation to the appearance of models. Improvements regarding previous models are described, both the advantages and disadvantages of the presented models and in which cases or tasks it is better to use one or another model.

Advances in Computer and Electrical Engineering - Handbook of Research on Engineering Innovations and Technology Management in Organizations ◽

Advances in Computational Linguistics and Text Processing Frameworks

10.4018/978-1-7998-2772-6.ch012 ◽

2020 ◽

pp. 217-244

Author(s):

Ayush Srivastav ◽

Hera Khan ◽

Amit Kumar Mishra

Keyword(s):

Neural Networks ◽

Natural Language ◽

Language Processing ◽

Text Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Part Of Speech

The chapter provides an eloquent account of the major methodologies and advances in the field of Natural Language Processing. The most popular models that have been used over time for the task of Natural Language Processing have been discussed along with their applications in their specific tasks. The chapter begins with the fundamental concepts of regex and tokenization. It provides an insight to text preprocessing and its methodologies such as Stemming and Lemmatization, Stop Word Removal, followed by Part-of-Speech tagging and Named Entity Recognition. Further, this chapter elaborates the concept of Word Embedding, its various types, and some common frameworks such as word2vec, GloVe, and fastText. A brief description of classification algorithms used in Natural Language Processing is provided next, followed by Neural Networks and its advanced forms such as Recursive Neural Networks and Seq2seq models that are used in Computational Linguistics. A brief description of chatbots and Memory Networks concludes the chapter.

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

Getting in Shape: Word Embedding SubSpaces

10.24963/ijcai.2019/761 ◽

2019 ◽

Author(s):

Tianyuan Zhou ◽

João Sedoc ◽

Jordan Rodu

Keyword(s):

Natural Language ◽

Semantic Similarity ◽

Language Processing ◽

Theoretical Framework ◽

Word Embedding ◽

Word Embeddings ◽

Empirical Results ◽

Linear Alignment ◽

The Relationship

Many tasks in natural language processing require the alignment of word embeddings. Embedding alignment relies on the geometric properties of the manifold of word vectors. This paper focuses on supervised linear alignment and studies the relationship between the shape of the target embedding. We assess the performance of aligned word vectors on semantic similarity tasks and find that the isotropy of the target embedding is critical to the alignment. Furthermore, aligning with an isotropic noise can deliver satisfactory results. We provide a theoretical framework and guarantees which aid in the understanding of empirical results.

Deep learning in clinical natural language processing: a methodical review

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocz200 ◽

2019 ◽

Vol 27 (3) ◽

pp. 457-470 ◽

Cited By ~ 25

Author(s):

Stephen Wu ◽

Kirk Roberts ◽

Surabhi Datta ◽

Jingcheng Du ◽

Zongcheng Ji ◽

...

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Natural Language ◽

Language Processing ◽

Recurrent Neural Networks ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity

Abstract Objective This article methodically reviews the literature on deep learning (DL) for natural language processing (NLP) in the clinical domain, providing quantitative analysis to answer 3 research questions concerning methods, scope, and context of current research. Materials and Methods We searched MEDLINE, EMBASE, Scopus, the Association for Computing Machinery Digital Library, and the Association for Computational Linguistics Anthology for articles using DL-based approaches to NLP problems in electronic health records. After screening 1,737 articles, we collected data on 25 variables across 212 papers. Results DL in clinical NLP publications more than doubled each year, through 2018. Recurrent neural networks (60.8%) and word2vec embeddings (74.1%) were the most popular methods; the information extraction tasks of text classification, named entity recognition, and relation extraction were dominant (89.2%). However, there was a “long tail” of other methods and specific tasks. Most contributions were methodological variants or applications, but 20.8% were new methods of some kind. The earliest adopters were in the NLP community, but the medical informatics community was the most prolific. Discussion Our analysis shows growing acceptance of deep learning as a baseline for NLP research, and of DL-based NLP in the medical community. A number of common associations were substantiated (eg, the preference of recurrent neural networks for sequence-labeling named entity recognition), while others were surprisingly nuanced (eg, the scarcity of French language clinical NLP with deep learning). Conclusion Deep learning has not yet fully penetrated clinical NLP and is growing rapidly. This review highlighted both the popular and unique trends in this active field.

Natural language inference by deep learning method

MATEC Web of Conferences ◽

10.1051/matecconf/202235503028 ◽

2022 ◽

Vol 355 ◽

pp. 03028

Author(s):

Saihan Li ◽

Zhijie Hu ◽

Rong Cao

Keyword(s):

Deep Learning ◽

Natural Language ◽

Language Processing ◽

Learning Method ◽

Word Embeddings ◽

Learning Models ◽

The Relationship ◽

Inference Task

Natural Language inference refers to the problem of determining the relationships between a premise and a hypothesis, it is an emerging area of natural language processing. The paper uses deep learning methods to complete natural language inference task. The dataset includes 3GPP dataset and SNLI dataset. Gensim library is used to get the word embeddings, there are 2 methods which are word2vec and doc2vec to map the sentence to array. 2 deep learning models DNNClassifier and Attention are implemented separately to classify the relationship between the proposals from the telecommunication area dataset. The highest accuracy of the experiment is 88% and we found that the quality of the dataset decided the upper bound of the accuracy.

Proceedings of the Second ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics - TeachNLP '05

10.3115/1627291 ◽

2005 ◽

Keyword(s):

Natural Language ◽

Language Processing

Proceedings of the 2019 8th International Conference on Computing and Pattern Recognition ◽

Optimization of Recurrent Neural Networks on Natural Language Processing

10.1145/3373509.3373573 ◽

2019 ◽

Cited By ~ 2

Author(s):

Jingyu Huang ◽

Yunfei Feng

Keyword(s):

Neural Networks ◽

Natural Language ◽

Language Processing ◽

Recurrent Neural Networks

A Survey on Bias in Deep NLP

Applied Sciences ◽

10.3390/app11073184 ◽

2021 ◽

Vol 11 (7) ◽

pp. 3184

Author(s):

Ismael Garrido-Muñoz ◽

Arturo Montejo-Ráez ◽

Fernando Martínez-Santiago ◽

L. Alfonso Ureña-López

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Probability Distribution ◽

Natural Language ◽

Network Design ◽

Language Processing ◽

Deep Neural Networks ◽

Learning Processes ◽

Relevant Issue

Deep neural networks are hegemonic approaches to many machine learning areas, including natural language processing (NLP). Thanks to the availability of large corpora collections and the capability of deep architectures to shape internal language mechanisms in self-supervised learning processes (also known as “pre-training”), versatile and performing models are released continuously for every new network design. These networks, somehow, learn a probability distribution of words and relations across the training collection used, inheriting the potential flaws, inconsistencies and biases contained in such a collection. As pre-trained models have been found to be very useful approaches to transfer learning, dealing with bias has become a relevant issue in this new scenario. We introduce bias in a formal way and explore how it has been treated in several networks, in terms of detection and correction. In addition, available resources are identified and a strategy to deal with bias in deep NLP is proposed.

Natural Language Processing with Subsymbolic Neural Networks

Neural Network Perspectives on Cognition and Adaptive Robotics ◽

10.1201/9780367813239-8 ◽

2019 ◽

pp. 120-139

Author(s):

Risto Miikkulainen

Keyword(s):

Neural Networks ◽

Natural Language ◽

Language Processing

Prediction of Emergency Department Hospital Admission Based on Natural Language Processing and Neural Networks

Methods of Information in Medicine ◽

10.3414/me17-01-0024 ◽

2017 ◽

Vol 56 (05) ◽

pp. 377-389 ◽

Cited By ~ 21

Author(s):

Xingyu Zhang ◽

Joyce Kim ◽

Rachel E. Patzer ◽

Stephen R. Pitts ◽

Aaron Patzer ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Emergency Department ◽

Logistic Regression ◽

Natural Language ◽

Hospital Admission ◽

Language Processing ◽

Predictive Accuracy ◽

Free Text

SummaryObjective: To describe and compare logistic regression and neural network modeling strategies to predict hospital admission or transfer following initial presentation to Emergency Department (ED) triage with and without the addition of natural language processing elements.Methods: Using data from the National Hospital Ambulatory Medical Care Survey (NHAMCS), a cross-sectional probability sample of United States EDs from 2012 and 2013 survey years, we developed several predictive models with the outcome being admission to the hospital or transfer vs. discharge home. We included patient characteristics immediately available after the patient has presented to the ED and undergone a triage process. We used this information to construct logistic regression (LR) and multilayer neural network models (MLNN) which included natural language processing (NLP) and principal component analysis from the patient’s reason for visit. Ten-fold cross validation was used to test the predictive capacity of each model and receiver operating curves (AUC) were then calculated for each model.Results: Of the 47,200 ED visits from 642 hospitals, 6,335 (13.42%) resulted in hospital admission (or transfer). A total of 48 principal components were extracted by NLP from the reason for visit fields, which explained 75% of the overall variance for hospitalization. In the model including only structured variables, the AUC was 0.824 (95% CI 0.818-0.830) for logistic regression and 0.823 (95% CI 0.817-0.829) for MLNN. Models including only free-text information generated AUC of 0.742 (95% CI 0.7310.753) for logistic regression and 0.753 (95% CI 0.742-0.764) for MLNN. When both structured variables and free text variables were included, the AUC reached 0.846 (95% CI 0.839-0.853) for logistic regression and 0.844 (95% CI 0.836-0.852) for MLNN.Conclusions: The predictive accuracy of hospital admission or transfer for patients who presented to ED triage overall was good, and was improved with the inclusion of free text data from a patient’s reason for visit regardless of modeling approach. Natural language processing and neural networks that incorporate patient-reported outcome free text may increase predictive accuracy for hospital admission.

Finite-State Technology

The Oxford Handbook of Computational Linguistics 2nd edition ◽

10.1093/oxfordhb/9780199573691.013.39 ◽

2018 ◽

Author(s):

Mans Hulden

Keyword(s):

Natural Language ◽

Language Processing ◽

Finite State Machines ◽

Regular Languages ◽

Finite State Automata ◽

State Machines ◽

Computational Phonology ◽

Finite State

Finite-state machines—automata and transducers—are ubiquitous in natural-language processing and computational linguistics. This chapter introduces the fundamentals of finite-state automata and transducers, both probabilistic and non-probabilistic, illustrating the technology with example applications and common usage. It also covers the construction of transducers, which correspond to regular relations, and automata, which correspond to regular languages. The technologies introduced are widely employed in natural language processing, computational phonology and morphology in particular, and this is illustrated through common practical use cases.