Arabic Poem Generation Incorporating Deep Learning and Phonetic CNNsubword Embedding Models

International Journal of Robotic Computing ◽

10.35708/tai1868-126246 ◽

2019 ◽

pp. 64-91

Author(s):

Sameerah Talafha ◽

Banafsheh Rekabdar

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Model Performance ◽

Arabic Language ◽

Generation Model ◽

Two Stage ◽

Human Evaluation ◽

Effective Contribution

Arabic poetry generation is a very challenging task since the linguistic structure of the Arabic language is considered a severe challenge for many researchers and developers in the Natural Language Processing (NLP) field. In this paper, we propose a poetry generation model with extended phonetic and semantic embeddings (Phonetic CNNsubword embeddings). We show that Phonetic CNNsubword embeddings have an effective contribution to the overall model performance compared to FastTextsubword embeddings. Our poetry generation model consists of a two-stage approach: (1.) generating the first verse which explicitly incorporates the theme related phrase, (2.) other verses generation with the proposed Hierarchy-Attention Sequence-to-Sequence model (HAS2S), which adequately capture word, phrase, and verse information between contexts. A comprehensive human evaluation confirms that the poems generated by our model outperform the base models in criteria such as Meaning, Coherence, Fluency, and Poeticness. Extensive quantitative experiments using Bi-Lingual Evaluation Understudy (BLEU) scores also demonstrate significant improvements over strong baselines.

Download Full-text

Forty-two Million Ways to Describe Pain: Topic Modeling of 200,000 PubMed Pain-Related Abstracts Using Natural Language Processing and Deep Learning–Based Text Generation

Pain Medicine ◽

10.1093/pm/pnaa061 ◽

2020 ◽

Vol 21 (11) ◽

pp. 3133-3160

Author(s):

Patrick J Tighe ◽

Bharadwaj Sannapaneni ◽

Roger B Fillingim ◽

Charlie Doyle ◽

Michael Kent ◽

...

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Topic Modeling ◽

Taxonomic Structure ◽

Generation Model ◽

Pain Research ◽

Embedded Model ◽

Gated Recurrent Units

Abstract Objective Recent efforts to update the definitions and taxonomic structure of concepts related to pain have revealed opportunities to better quantify topics of existing pain research subject areas. Methods Here, we apply basic natural language processing (NLP) analyses on a corpus of >200,000 abstracts published on PubMed under the medical subject heading (MeSH) of “pain” to quantify the topics, content, and themes on pain-related research dating back to the 1940s. Results The most common stemmed terms included “pain” (601,122 occurrences), “patient” (508,064 occurrences), and “studi-” (208,839 occurrences). Contrarily, terms with the highest term frequency–inverse document frequency included “tmd” (6.21), “qol” (6.01), and “endometriosis” (5.94). Using the vector-embedded model of term definitions available via the “word2vec” technique, the most similar terms to “pain” included “discomfort,” “symptom,” and “pain-related.” For the term “acute,” the most similar terms in the word2vec vector space included “nonspecific,” “vaso-occlusive,” and “subacute”; for the term “chronic,” the most similar terms included “persistent,” “longstanding,” and “long-standing.” Topic modeling via Latent Dirichlet analysis identified peak coherence (0.49) at 40 topics. Network analysis of these topic models identified three topics that were outliers from the core cluster, two of which pertained to women’s health and obstetrics and were closely connected to one another, yet considered distant from the third outlier pertaining to age. A deep learning–based gated recurrent units abstract generation model successfully synthesized several unique abstracts with varying levels of believability, with special attention and some confusion at lower temperatures to the roles of placebo in randomized controlled trials. Conclusions Quantitative NLP models of published abstracts pertaining to pain may point to trends and gaps within pain research communities.

Download Full-text

Daily estimates of individual discharge likelihood with deep learning natural language processing in general medicine: a prospective and external validation study

Internal and Emergency Medicine ◽

10.1007/s11739-021-02816-7 ◽

2021 ◽

Author(s):

Stephen Bacchi ◽

Toby Gilbert ◽

Samuel Gluck ◽

Joy Cheng ◽

Yiran Tan ◽

...

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Validation Study ◽

External Validation ◽

General Medicine ◽

External Validation Study

Download Full-text

Deep Learning on Graphs for Natural Language Processing

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval ◽

10.1145/3404835.3462809 ◽

2021 ◽

Author(s):

Lingfei Wu ◽

Yu Chen ◽

Heng Ji ◽

Bang Liu

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing

Download Full-text

Deep Learning Techniques on Text Classification Using Natural Language Processing (NLP) In Social Healthcare Network: A Comprehensive Survey

2021 3rd International Conference on Signal Processing and Communication (ICPSC) ◽

10.1109/icspc51351.2021.9451752 ◽

2021 ◽

Author(s):

PM. Lavanya ◽

E. Sasikala

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Classification ◽

Healthcare Network ◽

Learning Techniques ◽

Comprehensive Survey

Download Full-text

A natural language processing approach based on embedding deep learning from heterogeneous compounds for quantitative structure–activity relationship modeling

Chemical Biology & Drug Design ◽

10.1111/cbdd.13742 ◽

2020 ◽

Vol 96 (3) ◽

pp. 961-972

Author(s):

Khalid Bouhedjar ◽

Abdelbasset Boukelia ◽

Abdelmalek Khorief Nacereddine ◽

Anouar Boucheham ◽

Amine Belaidi ◽

...

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Quantitative Structure Activity Relationship ◽

Structure Activity Relationship ◽

Activity Relationship ◽

Quantitative Structure ◽

Structure Activity ◽

Processing Approach

Download Full-text

Speech Master: Natural Language Processing and Deep Learning Approach for Automated Speech Evaluation

10.1109/iemcon53756.2021.9623163 ◽

2021 ◽

Author(s):

K.G.C.M Kooragama ◽

L.R.W.D. Jayashanka ◽

J.A. Munasinghe ◽

K.W. Jayawardana ◽

Muditha Tissera ◽

...

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Learning Approach ◽

Speech Evaluation

Download Full-text

Deep Learning Approaches for Spoken and Natural Language Processing

10.1007/978-3-030-79778-2 ◽

2021 ◽

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Learning Approaches

Download Full-text

Use of Natural Language Processing and Deep Learning towards Guiding Healthy Cholesterol Free Life

10.1109/icac54203.2021.9671230 ◽

2021 ◽

Author(s):

Dilith Sasanka ◽

H. K. N Malshani ◽

Uchitha I. Wickramaratne ◽

Yashmitha Kavindi ◽

Muditha Tissera ◽

...

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing

Download Full-text

Automatic ICD-10 Coding and Training System: Deep Neural Network Based on Supervised Learning

JMIR Medical Informatics ◽

10.2196/23230 ◽

2021 ◽

Vol 9 (8) ◽

pp. e23230

Author(s):

Pei-Fu Chen ◽

Ssu-Ming Wang ◽

Wei-Chih Liao ◽

Lu-Cheng Kuo ◽

Kuan-Chih Chen ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Deep Neural Network ◽

University Hospital ◽

Classification Model ◽

Icd 10 ◽

And Training

Background The International Classification of Diseases (ICD) code is widely used as the reference in medical system and billing purposes. However, classifying diseases into ICD codes still mainly relies on humans reading a large amount of written material as the basis for coding. Coding is both laborious and time-consuming. Since the conversion of ICD-9 to ICD-10, the coding task became much more complicated, and deep learning– and natural language processing–related approaches have been studied to assist disease coders. Objective This paper aims at constructing a deep learning model for ICD-10 coding, where the model is meant to automatically determine the corresponding diagnosis and procedure codes based solely on free-text medical notes to improve accuracy and reduce human effort. Methods We used diagnosis records of the National Taiwan University Hospital as resources and apply natural language processing techniques, including global vectors, word to vectors, embeddings from language models, bidirectional encoder representations from transformers, and single head attention recurrent neural network, on the deep neural network architecture to implement ICD-10 auto-coding. Besides, we introduced the attention mechanism into the classification model to extract the keywords from diagnoses and visualize the coding reference for training freshmen in ICD-10. Sixty discharge notes were randomly selected to examine the change in the F1-score and the coding time by coders before and after using our model. Results In experiments on the medical data set of National Taiwan University Hospital, our prediction results revealed F1-scores of 0.715 and 0.618 for the ICD-10 Clinical Modification code and Procedure Coding System code, respectively, with a bidirectional encoder representations from transformers embedding approach in the Gated Recurrent Unit classification model. The well-trained models were applied on the ICD-10 web service for coding and training to ICD-10 users. With this service, coders can code with the F1-score significantly increased from a median of 0.832 to 0.922 (P<.05), but not in a reduced interval. Conclusions The proposed model significantly improved the F1-score but did not decrease the time consumed in coding by disease coders.

Download Full-text

Towards a scientific workflow featuring Natural Language Processing for the digitisation of natural history collections

Research Ideas and Outcomes ◽

10.3897/rio.6.e55789 ◽

2020 ◽

Vol 6 ◽

Cited By ~ 3

Author(s):

David Owen ◽

Laurence Livermore ◽

Quentin Groom ◽

Alex Hardisty ◽

Thijs Leegwater ◽

...

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural History ◽

Natural Language ◽

Language Processing ◽

Scientific Workflow ◽

Entity Recognition ◽

Research Activities ◽

Handwritten Text ◽

Segmented Images

We describe an effective approach to automated text digitisation with respect to natural history specimen labels. These labels contain much useful data about the specimen including its collector, country of origin, and collection date. Our approach to automatically extracting these data takes the form of a pipeline. Recommendations are made for the pipeline's component parts based on some of the state-of-the-art technologies. Optical Character Recognition (OCR) can be used to digitise text on images of specimens. However, recognising text quickly and accurately from these images can be a challenge for OCR. We show that OCR performance can be improved by prior segmentation of specimen images into their component parts. This ensures that only text-bearing labels are submitted for OCR processing as opposed to whole specimen images, which inevitably contain non-textual information that may lead to false positive readings. In our testing Tesseract OCR version 4.0.0 offers promising text recognition accuracy with segmented images. Not all the text on specimen labels is printed. Handwritten text varies much more and does not conform to standard shapes and sizes of individual characters, which poses an additional challenge for OCR. Recently, deep learning has allowed for significant advances in this area. Google's Cloud Vision, which is based on deep learning, is trained on large-scale datasets, and is shown to be quite adept at this task. This may take us some way towards negating the need for humans to routinely transcribe handwritten text. Determining the countries and collectors of specimens has been the goal of previous automated text digitisation research activities. Our approach also focuses on these two pieces of information. An area of Natural Language Processing (NLP) known as Named Entity Recognition (NER) has matured enough to semi-automate this task. Our experiments demonstrated that existing approaches can accurately recognise location and person names within the text extracted from segmented images via Tesseract version 4.0.0. Potentially, NER could be used in conjunction with other online services, such as those of the Biodiversity Heritage Library to map the named entities to entities in the biodiversity literature (https://www.biodiversitylibrary.org/docs/api3.html). We have highlighted the main recommendations for potential pipeline components. The document also provides guidance on selecting appropriate software solutions. These include automatic language identification, terminology extraction, and integrating all pipeline components into a scientific workflow to automate the overall digitisation process.

Download Full-text