word similarity
Recently Published Documents


TOTAL DOCUMENTS

276
(FIVE YEARS 86)

H-INDEX

18
(FIVE YEARS 2)

Author(s):  
Ghazeefa Fatima ◽  
Rao Muhammad Adeel Nawab ◽  
Muhammad Salman Khan ◽  
Ali Saeed

Semantic word similarity is a quantitative measure of how much two words are contextually similar. Evaluation of semantic word similarity models requires a benchmark corpus. However, despite the millions of speakers and the large digital text of the Urdu language on the Internet, there is a lack of benchmark corpus for the Cross-lingual Semantic Word Similarity task for the Urdu language. This article reports our efforts in developing such a corpus. The newly developed corpus is based on the SemEval-2017 task 2 English dataset, and it contains 1,945 cross-lingual English–Urdu word pairs. For each of these pairs of words, semantic similarity scores were assigned by 11 native Urdu speakers. In addition to corpus generation, this article also reports the evaluation results of a baseline approach, namely “Translation Plus Monolingual Analysis” for automated identification of semantic similarity between English–Urdu word pairs. The results showed that the path length similarity measure performs better for the Google and Bing translated words. The newly created corpus and evaluation results are freely available online for further research and development.


2022 ◽  
Vol 16 (1) ◽  
pp. 0-0

One of the most critical activities of revealing terrorism-related information is classifying online documents.The internet provides consumers with a variety of useful knowledge, and the volume of web material is increasingly growing. This makes finding potentially hazardous records incredibly difficult. To define the contents, merely extracting keywords from records is inadequate. Many methods have been studied so far to develop automatic document classification systems, they are mainly computational and knowledge-based approaches. due to the complexities of natural languages, these approaches do not provide sufficient results. To fix this shortcoming, we given approach of structure dependent on the WordNet hierarchy and the frequency of n-gram data that employs word similarity. Using four different queries terms from four different regions, this approach was checked for the NY Times articles that were sampled. Our suggested approach successfully removes background words and phrases from the document recognizes connected to terrorism texts, according to experimental findings.


2021 ◽  
Vol 6 (3) ◽  
pp. 380-398
Author(s):  
Arief Dwi Saputra ◽  
Alfina Rahmatia ◽  
Sri Handari Wahyuningsih ◽  
Andi Azhar

The COVID-19 pandemic brought about employees to be less enthusiastic due to the declining competitiveness and switching systems from offline to online. This study closely scrutinized how gamification strategies assume a part in entrepreneurial behavior on attitudes, subjective norms and behavioral control, and entrepreneurial education through self-efficacy, experience, and program involvement. Purposive sampling was utilized to choose a sample of 442 informants for this qualitative study. The review was carried out through a literature study and reinforced by in-depth interviews. The data was coded using the Nvivo 12 application with word similarity analysis at a maximum percentage of 100%. Based on the results of word similarity, there was a similarity in the relationship related to cluster analysis which classified the mutually supportive roles among variables as a business strategy during the pandemic. Overall, the application of gamification displays an impact on motivation, behavior change, and psychological effects on entrepreneurial behavior and education. The research contribution is utilized to address issues in the role of the organization as a solution to the relationship between gamification strategies and employee performance. The application of gamification strategies plays a role in opening fascinating exploration in the future. Further studies are expected to discuss pertaining business strategies in dealing with unexpected moments such as the COVID-19 pandemic.


2021 ◽  
Vol 72 ◽  
pp. 1281-1305
Author(s):  
Atefe Pakzad ◽  
Morteza Analoui

Distributional semantic models represent the meaning of words as vectors. We introduce a selection method to learn a vector space that each of its dimensions is a natural word. The selection method starts from the most frequent words and selects a subset, which has the best performance. The method produces a vector space that each of its dimensions is a word. This is the main advantage of the method compared to fusion methods such as NMF, and neural embedding models. We apply the method to the ukWaC corpus and train a vector space of N=1500 basis words. We report tests results on word similarity tasks for MEN, RG-65, SimLex-999, and WordSim353 gold datasets. Also, results show that reducing the number of basis vectors from 5000 to 1500 reduces accuracy by about 1.5-2%. So, we achieve good interpretability without a large penalty. Interpretability evaluation results indicate that the word vectors obtained by the proposed method using N=1500 are more interpretable than word embedding models, and the baseline method. We report the top 15 words of 1500 selected basis words in this paper.


2021 ◽  
pp. 1-23
Author(s):  
Yerai Doval ◽  
Jose Camacho-Collados ◽  
Luis Espinosa-Anke ◽  
Steven Schockaert

Abstract Word embeddings have become a standard resource in the toolset of any Natural Language Processing practitioner. While monolingual word embeddings encode information about words in the context of a particular language, cross-lingual embeddings define a multilingual space where word embeddings from two or more languages are integrated together. Current state-of-the-art approaches learn these embeddings by aligning two disjoint monolingual vector spaces through an orthogonal transformation which preserves the structure of the monolingual counterparts. In this work, we propose to apply an additional transformation after this initial alignment step, which aims to bring the vector representations of a given word and its translations closer to their average. Since this additional transformation is non-orthogonal, it also affects the structure of the monolingual spaces. We show that our approach both improves the integration of the monolingual spaces and the quality of the monolingual spaces themselves. Furthermore, because our transformation can be applied to an arbitrary number of languages, we are able to effectively obtain a truly multilingual space. The resulting (monolingual and multilingual) spaces show consistent gains over the current state-of-the-art in standard intrinsic tasks, namely dictionary induction and word similarity, as well as in extrinsic tasks such as cross-lingual hypernym discovery and cross-lingual natural language inference.


2021 ◽  
pp. 1-29
Author(s):  
Dongqiang Yang ◽  
Yanqin Yin

Abstract Modelling semantic similarity plays a fundamental role in lexical semantic applications. A natural way of calculating semantic similarity is to access handcrafted semantic networks, but similarity prediction can also be anticipated in a distributional vector space. Similarity calculation continues to be a challenging task, even with the latest breakthroughs in deep neural language models. We first examined popular methodologies in measuring taxonomic similarity, including edge-counting that solely employs semantic relations in a taxonomy, as well as the complex methods that estimate concept specificity. We further extrapolated three weighting factors in modelling taxonomic similarity. To study the distinct mechanisms between taxonomic and distributional similarity measures, we ran head-to-head comparisons of each measure with human similarity judgements from the perspectives of word frequency, polysemy degree and similarity intensity. Our findings suggest that without fine-tuning the uniform distance, taxonomic similarity measures can depend on the shortest path length as a prime factor to predict semantic similarity; in contrast to distributional semantics, edge-counting is free from sense distribution bias in use and can measure word similarity both literally and metaphorically; the synergy of retrofitting neural embeddings with concept relations in similarity prediction may indicate a new trend to leverage knowledge bases on transfer learning. It appears that a large gap still exists on computing semantic similarity among different ranges of word frequency, polysemous degree and similarity intensity.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Qifeng Gong

The application of artificial intelligence in the field of English needs to process a large amount of English text data, but the deviation of English word similarity reduces its overall English translation accuracy and data processing efficiency. Therefore, this paper proposes an accurate estimation of English word similarity based on semantic network, which combines a variety of computing methods to form a compound computing structure based on semantic network. The experimental results show that the error between the Semantic Web-based English word similarity calculation method and manual evaluation is small, and the accuracy of English word similarity calculation is improved to a certain extent. In addition, compared with other English word similarity calculation methods, the English word similarity calculation method based on semantic network is more in line with people’s cognition and understanding of knowledge, has higher reliability, and has certain practical value in the field of English.


2021 ◽  
Vol 1 (1) ◽  
pp. 11-24
Author(s):  
Arief Dwi Saputra ◽  
Alfina Rahmatia ◽  
Muslimah Muslimah

Islamic philanthropy and social entrepreneurship have created solutions in addressing the problems that occur for maximizing economic, social, and religious activity. In this study review, Islamic philanthropy links the elements of zakat, infaq, sadaqah, and waqf in terms of social entrepreneurship with elements of social value, civil society, innovation, and economic activity. The data was obtained using literature studies and interviews on Lazismu Bengkulu as an Islamic philanthropic movement and CV. Presidium on the social entrepreneurship movement. Then, data were processed using Nvivo and drawn conclusions through word similarity analysis.  Findings. The synergy between employers and society plays a role in addressing problems against poverty alleviation, wealth equality, community welfare, creating social benefits, optimizing social capital, innovation in problem-solving efforts, building a balance between social activities and business activities. Integration of these two movements explains the dominant increase compared to the decline by presenting an impact on production, consumption, investment, economic growth, and economic stability. In the analysis of word similarity, efforts of synergy and integration concluded that both movements could be implemented in practice because they support each other and have close links to achieve goals and increase the dominant impact of social, economic, and religious activities.


Author(s):  
Fulian Yin ◽  
Yanyan Wang ◽  
Jianbo Liu ◽  
Marco Tosato

AbstractThe word similarity task is used to calculate the similarity of any pair of words, and is a basic technology of natural language processing (NLP). The existing method is based on word embedding, which fails to capture polysemy and is greatly influenced by the quality of the corpus. In this paper, we propose a multi-prototype Chinese word representation model (MP-CWR) for word similarity based on synonym knowledge base, including knowledge representation module and word similarity module. For the first module, we propose a dual attention to combine semantic information for jointly learning word knowledge representation. The MP-CWR model utilizes the synonyms as prior knowledge to supplement the relationship between words, which is helpful to solve the challenge of semantic expression due to insufficient data. As for the word similarity module, we propose a multi-prototype representation for each word. Then we calculate and fuse the conceptual similarity of two words to obtain the final result. Finally, we verify the effectiveness of our model on three public data sets with other baseline models. In addition, the experiments also prove the stability and scalability of our MP-CWR model under different corpora.


Sign in / Sign up

Export Citation Format

Share Document