Mixed Script Identification Using Automated DNN Hyperparameter Optimization

Mixed script identification is a hindrance for automated natural language processing systems. Mixing cursive scripts of different languages is a challenge because NLP methods like POS tagging and word sense disambiguation suffer from noisy text. This study tackles the challenge of mixed script identification for mixed-code dataset consisting of Roman Urdu, Hindi, Saraiki, Bengali, and English. The language identification model is trained using word vectorization and RNN variants. Moreover, through experimental investigation, different architectures are optimized for the task associated with Long Short-Term Memory (LSTM), Bidirectional LSTM, Gated Recurrent Unit (GRU), and Bidirectional Gated Recurrent Unit (Bi-GRU). Experimentation achieved the highest accuracy of 90.17 for Bi-GRU, applying learned word class features along with embedding with GloVe. Moreover, this study addresses the issues related to multilingual environments, such as Roman words merged with English characters, generative spellings, and phonetic typing.

Download Full-text

deepBioWSD: effective deep neural word sense disambiguation of biomedical text data

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocy189 ◽

2019 ◽

Vol 26 (5) ◽

pp. 438-446 ◽

Cited By ~ 3

Author(s):

Ahmad Pesaranghader ◽

Stan Matwin ◽

Marina Sokolova ◽

Ali Pesaranghader

Keyword(s):

Language Processing ◽

Short Term Memory ◽

Word Sense Disambiguation ◽

Training Data ◽

Biomedical Text ◽

Word Sense ◽

Vocabulary Size ◽

Unified Medical Language System ◽

Knowledge Based ◽

Sense Disambiguation

Abstract Objective In biomedicine, there is a wealth of information hidden in unstructured narratives such as research articles and clinical reports. To exploit these data properly, a word sense disambiguation (WSD) algorithm prevents downstream difficulties in the natural language processing applications pipeline. Supervised WSD algorithms largely outperform un- or semisupervised and knowledge-based methods; however, they train 1 separate classifier for each ambiguous term, necessitating a large number of expert-labeled training data, an unattainable goal in medical informatics. To alleviate this need, a single model that shares statistical strength across all instances and scales well with the vocabulary size is desirable. Materials and Methods Built on recent advances in deep learning, our deepBioWSD model leverages 1 single bidirectional long short-term memory network that makes sense prediction for any ambiguous term. In the model, first, the Unified Medical Language System sense embeddings will be computed using their text definitions; and then, after initializing the network with these embeddings, it will be trained on all (available) training data collectively. This method also considers a novel technique for automatic collection of training data from PubMed to (pre)train the network in an unsupervised manner. Results We use the MSH WSD dataset to compare WSD algorithms, with macro and micro accuracies employed as evaluation metrics. deepBioWSD outperforms existing models in biomedical text WSD by achieving the state-of-the-art performance of 96.82% for macro accuracy. Conclusions Apart from the disambiguation improvement and unsupervised training, deepBioWSD depends on considerably less number of expert-labeled data as it learns the target and the context terms jointly. These merit deepBioWSD to be conveniently deployable in real-time biomedical applications.

Download Full-text

Investigating the Feasibility of Deep Learning Methods for Urdu Word Sense Disambiguation

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3477578 ◽

2022 ◽

Vol 21 (2) ◽

pp. 1-16

Author(s):

Ali Saeed ◽

Rao Muhammad Adeel Nawab ◽

Mark Stevenson

Keyword(s):

Deep Learning ◽

Language Processing ◽

Short Term Memory ◽

Word Sense Disambiguation ◽

Word Sense ◽

Short Term ◽

Learning Methods ◽

Term Memory ◽

Sense Disambiguation ◽

Long Short Term Memory

Word Sense Disambiguation (WSD), the process of automatically identifying the correct meaning of a word used in a given context, is a significant challenge in Natural Language Processing. A range of approaches to the problem has been explored by the research community. The majority of these efforts has focused on a relatively small set of languages, particularly English. Research on WSD for South Asian languages, particularly Urdu, is still in its infancy. In recent years, deep learning methods have proved to be extremely successful for a range of Natural Language Processing tasks. The main aim of this study is to apply, evaluate, and compare a range of deep learning methods approaches to Urdu WSD (both Lexical Sample and All-Words) including Simple Recurrent Neural Networks, Long-Short Term Memory, Gated Recurrent Units, Bidirectional Long-Short Term Memory, and Ensemble Learning. The evaluation was carried out on two benchmark corpora: (1) the ULS-WSD-18 corpus and (2) the UAW-WSD-18 corpus. Results (Accuracy = 63.25% and F1-Measure = 0.49) show that a deep learning approach outperforms previously reported results for the Urdu All-Words WSD task, whereas performance using deep learning approaches (Accuracy = 72.63% and F1-Measure = 0.60) are low in comparison to previously reported for the Urdu Lexical Sample task.

Download Full-text

A critical analysis and explication of word sense disambiguation as approached by natural language processing

Lingua ◽

10.1016/j.lingua.2020.102896 ◽

2020 ◽

Vol 243 ◽

pp. 102896

Author(s):

Julie Mennes ◽

Stephan van der Waart van Gulik

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Critical Analysis ◽

Word Sense Disambiguation ◽

Word Sense ◽

Sense Disambiguation

Download Full-text

Layered Multistep Bidirectional Long Short-Term Memory Networks for Biomedical Word Sense Disambiguation

2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm.2018.8621383 ◽

2018 ◽

Cited By ~ 1

Author(s):

Daniel Bis ◽

Canlin Zhang ◽

Xiuwen Liu ◽

Zhe He

Keyword(s):

Short Term Memory ◽

Word Sense Disambiguation ◽

Word Sense ◽

Short Term ◽

Term Memory ◽

Sense Disambiguation ◽

Long Short Term Memory

Download Full-text

Hidden Markov model for POS tagging in word sense disambiguation

2016 International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS) ◽

10.1109/csitss.2016.7779371 ◽

2016 ◽

Cited By ~ 4

Author(s):

Pooja Alva ◽

Vinay Hegde

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Word Sense Disambiguation ◽

Word Sense ◽

Pos Tagging ◽

Sense Disambiguation

Download Full-text

Ontology Matching using BabelNet Dictionary and Word Sense Disambiguation Algorithms

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v5.i1.pp196-205 ◽

2017 ◽

Vol 5 (1) ◽

pp. 196 ◽

Cited By ~ 5

Author(s):

Mohamed Biniz ◽

Rachid El Ayachi ◽

Mohamed Fakir

Keyword(s):

Natural Language Processing ◽

Language Processing ◽

Word Sense Disambiguation ◽

Similarity Measures ◽

Ontology Matching ◽

Word Sense ◽

Sense Disambiguation ◽

Lesk Algorithm ◽

Reference Ontology ◽

Selection Of

<p>Ontology matching is a discipline that means two things: first, the process of discovering correspondences between two different ontologies, and second is the result of this process, that is to say the expression of correspondences. This discipline is a crucial task to solve problems merging and evolving of heterogeneous ontologies in applications of the Semantic Web. This domain imposes several challenges, among them, the selection of appropriate similarity measures to discover the correspondences. In this article, we are interested to study algorithms that calculate the semantic similarity by using Adapted Lesk algorithm, Wu & Palmer Algorithm, Resnik Algorithm, Leacock and Chodorow Algorithm, and similarity flooding between two ontologies and BabelNet as reference ontology, we implement them, and compared experimentally. Overall, the most effective methods are Wu & Palmer and Adapted Lesk, which is widely used for Word Sense Disambiguation (WSD) in the field of Automatic Natural Language Processing (NLP).</p>

Download Full-text

A Novel Approach to Word Sense Disambiguation Based on Topical and Semantic Association

The Scientific World JOURNAL ◽

10.1155/2013/586327 ◽

2013 ◽

Vol 2013 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Xin Wang ◽

Wanli Zuo ◽

Ying Wang

Keyword(s):

Language Processing ◽

Fundamental Problem ◽

Word Sense Disambiguation ◽

Ambiguous Word ◽

Semantic Features ◽

Word Sense ◽

Semantic Association ◽

Data Set ◽

Novel Approach ◽

Sense Disambiguation

Word sense disambiguation (WSD) is a fundamental problem in nature language processing, the objective of which is to identify the most proper sense for an ambiguous word in a given context. Although WSD has been researched over the years, the performance of existing algorithms in terms of accuracy and recall is still unsatisfactory. In this paper, we propose a novel approach to word sense disambiguation based on topical and semantic association. For a given document, supposing that its topic category is accurately discriminated, the correct sense of the ambiguous term is identified through the corresponding topic and semantic contexts. We firstly extract topic discriminative terms from document and construct topical graph based on topic span intervals to implement topic identification. We then exploit syntactic features, topic span features, and semantic features to disambiguate nouns and verbs in the context of ambiguous word. Finally, we conduct experiments on the standard data set SemCor to evaluate the performance of the proposed method, and the results indicate that our approach achieves relatively better performance than existing approaches.

Download Full-text

Word embeddings and recurrent neural networks based on Long-Short Term Memory nodes in supervised biomedical word sense disambiguation

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2017.08.001 ◽

2017 ◽

Vol 73 ◽

pp. 137-147 ◽

Cited By ~ 19

Author(s):

Antonio Jimeno Yepes

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Short Term Memory ◽

Word Sense Disambiguation ◽

Word Sense ◽

Word Embeddings ◽

Short Term ◽

Term Memory ◽

Sense Disambiguation ◽

Long Short Term Memory

Download Full-text

Comparing supervised learning algorithms for Spatial Nominal Entity recognition

AGILE: GIScience Series ◽

10.5194/agile-giss-1-15-2020 ◽

2020 ◽

Vol 1 ◽

pp. 1-18

Author(s):

Amine Medad ◽

Mauro Gaio ◽

Ludovic Moncla ◽

Sébastien Mustière ◽

Yannick Le Nir

Keyword(s):

Natural Language ◽

Supervised Learning ◽

Language Processing ◽

Word Sense Disambiguation ◽

Learning Algorithms ◽

Named Entity Recognition ◽

Entity Recognition ◽

Word Sense ◽

Sense Disambiguation ◽

Supervised Learning Algorithms

Abstract. Discourse may contain both named and nominal entities. Most common nouns or nominal mentions in natural language do not have a single, simple meaning but rather a number of related meanings. This form of ambiguity led to the development of a task in natural language processing known as Word Sense Disambiguation. Recognition and categorisation of named and nominal entities is an essential step for Word Sense Disambiguation methods. Up to now, named entity recognition and categorisation systems mainly focused on the annotation, categorisation and identification of named entities. This paper focuses on the annotation and the identification of spatial nominal entities. We explore the combination of Transfer Learning principle and supervised learning algorithms, in order to build a system to detect spatial nominal entities. For this purpose, different supervised learning algorithms are evaluated with three different context sizes on two manually annotated datasets built from Wikipedia articles and hiking description texts. The studied algorithms have been selected for one or more of their specific properties potentially useful in solving our problem. The results of the first phase of experiments reveal that the selected algorithms have similar performances in terms of ability to detect spatial nominal entities. The study also confirms the importance of the size of the window to describe the context, when word-embedding principle is used to represent the semantics of each word.

Download Full-text

Knowledge-Based Method for Word Sense Disambiguation by Using Hindi WordNet

Engineering, Technology & Applied Science Research ◽

10.48084/etasr.2596 ◽

2019 ◽

Vol 9 (2) ◽

pp. 3985-3989 ◽

Cited By ~ 1

Author(s):

P. Sharma ◽

N. Joshi

Keyword(s):

Natural Language ◽

Language Processing ◽

Speech Processing ◽

Text Processing ◽

Word Sense Disambiguation ◽

Problem Area ◽

Word Sense ◽

Knowledge Resources ◽

Knowledge Based ◽

Sense Disambiguation

The purpose of word sense disambiguation (WSD) is to find the meaning of the word in any context with the help of a computer, to find the proper meaning of a lexeme in the available context in the problem area and the relationship between lexicons. This is done using natural language processing (NLP) techniques which involve queries from machine translation (MT), NLP specific documents or output text. MT automatically translates text from one natural language into another. Several application areas for WSD involve information retrieval (IR), lexicography, MT, text processing, speech processing etc. Using this knowledge-based technique, we are investigating Hindi WSD in this article. It involves incorporating word knowledge from external knowledge resources to remove the equivocalness of words. In this experiment, we tried to develop a WSD tool by considering a knowledge-based approach with WordNet of Hindi. The tool uses the knowledge-based LESK algorithm for WSD for Hindi. Our proposed system gives an accuracy of about 71.4%.

Download Full-text