Building Related Words in Indonesian and English Translation of Al-Qur’an Vocabulary Based on Distributional Similarity

The Qur'an is the Muslim holy book as the main source and guide, consisting of 114 surahs, 30 juz and has 6200 fewer verses in it. The search for relationships or arrangements of meaning between words in the Qur'an takes a long time to find and summarize. Obtained from the dictionary, encyclopedia, or thesaurus of the Al-Qur'an vocabulary, which contains each word entry has links with other words. This final project discusses the interrelations and semantic correspondence between words in the Qur'an, which supports to help find inter-related words in it, using linking with distributions that involve important parts in the word embedding. Measurement of the relevance of the word measurement with semantic similarity which is one of the lessons learned in Natural Language Processing (NLP). Extraordinary similarity measures the proximity of a word vector using cosine similarity. The process of converting words in the form of vectors using the fasttext which is the development of the Word2vec algorithm. The dataset is used for translations of the word Al-Qur'an in English and Indonesian. This entry becomes an input into the system then produces a score that represents the interrelationship between words. Evaluation of system output results is to perform performance calculations using Pearson correlation involving the gold standard.

Download Full-text

Conceptual Graphs Based Approach for Subjective Answers Evaluation

Natural Language Processing ◽

10.4018/978-1-7998-0951-7.ch037 ◽

2020 ◽

pp. 770-790

Author(s):

Goonjan Jain ◽

D.K. Lobiyal

Keyword(s):

Natural Language Processing ◽

Language Processing ◽

Evaluation System ◽

Pearson Correlation ◽

Similarity Measures ◽

Conceptual Graphs ◽

Pearson Correlation Coefficient ◽

Evaluation Systems ◽

Automated Evaluation ◽

Processing Techniques

Automated evaluation systems for objective type tests already exist. However, it is challenging to make an automated evaluation system for subjective type tests. Therefore, focus of this paper is on evaluation of simple text based subjective answers using Natural Language Processing techniques. A student's answer is evaluated by comparing it with a model answer of the question. Model answers cannot exactly match with the students' answers due to variability in writing. Therefore, researchers create conceptual graphs for both student as well as model answer and compute similarity between these graphs using techniques of graph similarity measures. Based on the similarity, marks are assigned to an answer. Lastly, in this manuscript authors compare the results obtained by human graders and the proposed system using Pearson correlation coefficient. Also, comparison has been drawn between the results of proposed system with other existing evaluation systems. The experimental evaluation of the proposed system shows promising results.

Download Full-text

A Call to Action on Artificial Intelligence and Social Work Education: Lessons Learned from A Simulation Project Using Natural Language Processing

Journal of Teaching in Social Work ◽

10.1080/08841233.2020.1813234 ◽

2020 ◽

Vol 40 (5) ◽

pp. 501-518

Author(s):

Kenta Asakura ◽

Katherine Occhiuto ◽

Sarah Todd ◽

Cedar Leithead ◽

Robert Clapperton

Keyword(s):

Artificial Intelligence ◽

Natural Language Processing ◽

Social Work ◽

Natural Language ◽

Language Processing ◽

Social Work Education ◽

Lessons Learned ◽

Call To Action ◽

Work Education

Download Full-text

Natural language processing for web browsing analytics: Challenges, lessons learned, and opportunities

Computer Networks ◽

10.1016/j.comnet.2021.108357 ◽

2021 ◽

pp. 108357

Author(s):

Daniel Perdices ◽

Javier Ramos ◽

José L. García-Dorado ◽

Iván González ◽

Jorge E. López de Vergara

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Lessons Learned ◽

Web Browsing

Download Full-text

Natural Language Processing: Security- and Defense-Related Lessons Learned

10.7249/pe-a926-1 ◽

2021 ◽

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Lessons Learned

Download Full-text

Benchmarking Effectiveness and Efficiency of Deep Learning Models for Semantic Textual Similarity in the Clinical Domain: Validation Study

JMIR Medical Informatics ◽

10.2196/27386 ◽

2021 ◽

Vol 9 (12) ◽

pp. e27386

Author(s):

Qingyu Chen ◽

Alex Rankine ◽

Yifan Peng ◽

Elaheh Aghaarabi ◽

Zhiyong Lu

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Mean Squared Error ◽

Pearson Correlation ◽

Data Set ◽

Squared Error ◽

Real Time Applications ◽

Effectiveness And Efficiency ◽

Pearson Correlations

Background Semantic textual similarity (STS) measures the degree of relatedness between sentence pairs. The Open Health Natural Language Processing (OHNLP) Consortium released an expertly annotated STS data set and called for the National Natural Language Processing Clinical Challenges. This work describes our entry, an ensemble model that leverages a range of deep learning (DL) models. Our team from the National Library of Medicine obtained a Pearson correlation of 0.8967 in an official test set during 2019 National Natural Language Processing Clinical Challenges/Open Health Natural Language Processing shared task and achieved a second rank. Objective Although our models strongly correlate with manual annotations, annotator-level correlation was only moderate (weighted Cohen κ=0.60). We are cautious of the potential use of DL models in production systems and argue that it is more critical to evaluate the models in-depth, especially those with extremely high correlations. In this study, we benchmark the effectiveness and efficiency of top-ranked DL models. We quantify their robustness and inference times to validate their usefulness in real-time applications. Methods We benchmarked five DL models, which are the top-ranked systems for STS tasks: Convolutional Neural Network, BioSentVec, BioBERT, BlueBERT, and ClinicalBERT. We evaluated a random forest model as an additional baseline. For each model, we repeated the experiment 10 times, using the official training and testing sets. We reported 95% CI of the Wilcoxon rank-sum test on the average Pearson correlation (official evaluation metric) and running time. We further evaluated Spearman correlation, R², and mean squared error as additional measures. Results Using only the official training set, all models obtained highly effective results. BioSentVec and BioBERT achieved the highest average Pearson correlations (0.8497 and 0.8481, respectively). BioSentVec also had the highest results in 3 of 4 effectiveness measures, followed by BioBERT. However, their robustness to sentence pairs of different similarity levels varies significantly. A particular observation is that BERT models made the most errors (a mean squared error of over 2.5) on highly similar sentence pairs. They cannot capture highly similar sentence pairs effectively when they have different negation terms or word orders. In addition, time efficiency is dramatically different from the effectiveness results. On average, the BERT models were approximately 20 times and 50 times slower than the Convolutional Neural Network and BioSentVec models, respectively. This results in challenges for real-time applications. Conclusions Despite the excitement of further improving Pearson correlations in this data set, our results highlight that evaluations of the effectiveness and efficiency of STS models are critical. In future, we suggest more evaluations on the generalization capability and user-level testing of the models. We call for community efforts to create more biomedical and clinical STS data sets from different perspectives to reflect the multifaceted notion of sentence-relatedness.

Download Full-text

LIS4: Lesk Inspired Sense Specific Semantic Similarity using WordNet

Journal of Information & Knowledge Management ◽

10.1142/s0219649221500064 ◽

2021 ◽

pp. 2150006

Author(s):

Saravanakumar Kandasamy ◽

Aswani Kumar Cherukuri

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Semantic Similarity ◽

Language Processing ◽

Gold Standard ◽

Question Answering ◽

Knowledge Based ◽

Benchmark Datasets ◽

Processing Information

Semantic similarity quantification between concepts is one of the inevitable parts in domains like Natural Language Processing, Information Retrieval, Question Answering, etc. to understand the text and their relationships better. Last few decades, many measures have been proposed by incorporating various corpus-based and knowledge-based resources. WordNet and Wikipedia are two of the Knowledge-based resources. The contribution of WordNet in the above said domain is enormous due to its richness in defining a word and all of its relationship with others. In this paper, we proposed an approach to quantify the similarity between concepts that exploits the synsets and the gloss definitions of different concepts using WordNet. Our method considers the gloss definitions, contextual words that are helping in defining a word, synsets of contextual word and the confidence of occurrence of a word in other word’s definition for calculating the similarity. The evaluation based on different gold standard benchmark datasets shows the efficiency of our system in comparison with other existing taxonomical and definitional measures.

Download Full-text

Annotation of phenotypes using ontologies: a gold standard for the training and evaluation of natural language processing systems

Database ◽

10.1093/database/bay110 ◽

2018 ◽

Vol 2018 ◽

Cited By ~ 8

Author(s):

Wasila Dahdul ◽

Prashanti Manda ◽

Hong Cui ◽

James P Balhoff ◽

T Alexander Dececchi ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Gold Standard

Download Full-text

Textual entailment graphs

Natural Language Engineering ◽

10.1017/s1351324915000108 ◽

2015 ◽

Vol 21 (5) ◽

pp. 699-724 ◽

Cited By ~ 6

Author(s):

LILI KOTLERMAN ◽

IDO DAGAN ◽

BERNARDO MAGNINI ◽

LUISA BENTIVOGLI

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Gold Standard ◽

State Of The Art ◽

Text Analytics ◽

Joint Work ◽

Gold Standard Dataset ◽

Textual Entailment ◽

Interesting Task

AbstractIn this work, we present a novel type of graphs for natural language processing (NLP), namely textual entailment graphs (TEGs). We describe the complete methodology we developed for the construction of such graphs and provide some baselines for this task by evaluating relevant state-of-the-art technology. We situate our research in the context of text exploration, since it was motivated by joint work with industrial partners in the text analytics area. Accordingly, we present our motivating scenario and the first gold-standard dataset of TEGs. However, while our own motivation and the dataset focus on the text exploration setting, we suggest that TEGs can have different usages and suggest that automatic creation of such graphs is an interesting task for the community.

Download Full-text

Ontology Matching using BabelNet Dictionary and Word Sense Disambiguation Algorithms

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v5.i1.pp196-205 ◽

2017 ◽

Vol 5 (1) ◽

pp. 196 ◽

Cited By ~ 5

Author(s):

Mohamed Biniz ◽

Rachid El Ayachi ◽

Mohamed Fakir

Keyword(s):

Natural Language Processing ◽

Language Processing ◽

Word Sense Disambiguation ◽

Similarity Measures ◽

Ontology Matching ◽

Word Sense ◽

Sense Disambiguation ◽

Lesk Algorithm ◽

Reference Ontology ◽

Selection Of

<p>Ontology matching is a discipline that means two things: first, the process of discovering correspondences between two different ontologies, and second is the result of this process, that is to say the expression of correspondences. This discipline is a crucial task to solve problems merging and evolving of heterogeneous ontologies in applications of the Semantic Web. This domain imposes several challenges, among them, the selection of appropriate similarity measures to discover the correspondences. In this article, we are interested to study algorithms that calculate the semantic similarity by using Adapted Lesk algorithm, Wu & Palmer Algorithm, Resnik Algorithm, Leacock and Chodorow Algorithm, and similarity flooding between two ontologies and BabelNet as reference ontology, we implement them, and compared experimentally. Overall, the most effective methods are Wu & Palmer and Adapted Lesk, which is widely used for Word Sense Disambiguation (WSD) in the field of Automatic Natural Language Processing (NLP).</p>

Download Full-text

Research on English Translation of Computer-aided Classics Based on Natural Language Processing

Journal of Physics Conference Series ◽

10.1088/1742-6596/1550/3/032036 ◽

2020 ◽

Vol 1550 ◽

pp. 032036

Author(s):

Shengqin Bi

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

English Translation ◽

Computer Aided

Download Full-text