Word similarity metrics and multilateral comparison

This is a preliminary study in which we investigate the acquisition of English as second language (L2[1]) word stress by native speakers of Brazilian Portuguese (BP, L1[2]). In this paper, we show results of a multiple choice forced choice perception test in which native speakers of American English and native speakers of Dutch judged the production of English words bearing pre-final stress that were both cognates and non-cognates with BP words. The tokens were produced by native speakers of American English and by Brazilians that speak English as a second language. The results have shown that American and Dutch listeners were consistent in their judgments on native and non-native stress productions and both speakers' groups produced variation in stress in relation to the canonical pattern. However, the variability found in American English points to the prosodic patterns of English and the variability found in Brazilian English points to the stress patterns of Portuguese. It occurs especially in words whose forms activate neighboring similar words in the L1. Transfer from the L1 appears both at segmental and prosodic levels in BP English. [1] L2 stands for second language, foreign language, target language. [2] L1 stands for first language, mother tongue, source language.

Download Full-text

Word Similarity Calculation by Using the Edit Distance Metrics with Consonant Normalization

Journal of Information Processing Systems ◽

10.3745/jips.04.0018 ◽

2015 ◽

Keyword(s):

Edit Distance ◽

Distance Metrics ◽

Word Similarity ◽

Similarity Calculation

Download Full-text

Analysis Accuracy of Similar Word Based Clustering (EWSB) Algorithm on Machine Translator Bahasa Indonesia-Minang

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v3i3.241 ◽

2018 ◽

Vol 3 (3) ◽

Author(s):

Herry Sujaini

Keyword(s):

Machine Translation ◽

Clustering Algorithm ◽

Statistical Machine Translation ◽

Target Language ◽

Word Similarity ◽

Similar Word ◽

Word Clustering ◽

Translation Accuracy ◽

Bahasa Indonesia

Extended Word Similarity Based (EWSB) Clustering is a word clustering algorithm based on the value of words similarity obtained from the computation of a corpus. One of the benefits of clustering with this algorithm is to improve the translation of a statistical machine translation. Previous research proved that EWSB algorithm could improve the Indonesian-English translator, where the algorithm was applied to Indonesian language as target language.This paper discusses the results of a research using EWSB algorithm on a Indonesian to Minang statistical machine translator, where the algorithm is applied to Minang language as the target language. The research obtained resulted that the EWSB algorithm is quite effective when used in Minang language as the target language. The results of this study indicate that EWSB algorithm can improve the translation accuracy by 6.36%.

Download Full-text

Phylogenetic relations and mitogenome‐wide similarity metrics reveal monophyly of Penaeus sensu lato

Ecology and Evolution ◽

10.1002/ece3.7148 ◽

2021 ◽

Vol 11 (5) ◽

pp. 2040-2049

Author(s):

Vinaya Kumar Katneni ◽

Mudagandur S. Shekhar ◽

Ashok Kumar Jangam ◽

Balasubramanian C. Paran ◽

Ashok Selvaraj ◽

...

Keyword(s):

Similarity Metrics ◽

Phylogenetic Relations

Download Full-text

Cyberbullying Detection, Based on the FastText and Word Similarity Schemes

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3398191 ◽

2020 ◽

Vol 20 (1) ◽

pp. 1-15

Author(s):

Kun Wang ◽

Yanpeng Cui ◽

Jianwei Hu ◽

Yu Zhang ◽

Wei Zhao ◽

...

Keyword(s):

Word Similarity ◽

Cyberbullying Detection

Download Full-text

Intelligent recognition of semantic relationships based on antonymy

Multiagent and Grid Systems ◽

10.3233/mgs-200332 ◽

2020 ◽

Vol 16 (3) ◽

pp. 263-290

Author(s):

Hui Guan ◽

Chengzhen Jia ◽

Hongji Yang

Keyword(s):

Semantic Similarity ◽

New Approach ◽

Word Similarity ◽

Semantic Relationships ◽

Proposed Model ◽

Path Distance ◽

The Hierarchical Structure ◽

Thinking Process ◽

Similarity Measuring ◽

Intelligent Recognition

Since computing semantic similarity tends to simulate the thinking process of humans, semantic dissimilarity must play a part in this process. In this paper, we present a new approach for semantic similarity measuring by taking consideration of dissimilarity into the process of computation. Specifically, the proposed measures explore the potential antonymy in the hierarchical structure of WordNet to represent the dissimilarity between concepts and then combine the dissimilarity with the results of existing methods to achieve semantic similarity results. The relation between parameters and the correlation value is discussed in detail. The proposed model is then applied to different text granularity levels to validate the correctness on similarity measurement. Experimental results show that the proposed approach not only achieves high correlation value against human ratings but also has effective improvement to existing path-distance based methods on the word similarity level, in the meanwhile effectively correct existing sentence similarity method in some cases in Microsoft Research Paraphrase Corpus and SemEval-2014 date set.

Download Full-text

Methods for detecting and correcting contextual data quality problems

Intelligent Data Analysis ◽

10.3233/ida-205282 ◽

2021 ◽

Vol 25 (4) ◽

pp. 763-787

Author(s):

Alladoumbaye Ngueilbaye ◽

Hongzhi Wang ◽

Daouda Ahmat Mahamat ◽

Ibrahim A. Elgendy ◽

Sahalu B. Junaidu

Keyword(s):

Data Quality ◽

Quality Evaluation ◽

Web Applications ◽

Support Vector ◽

Similarity Metrics ◽

Distributed Data ◽

Contextual Data ◽

Formal Taxonomy ◽

E Learning ◽

High Scalability

Knowledge extraction, data mining, e-learning or web applications platforms use heterogeneous and distributed data. The proliferation of these multifaceted platforms faces many challenges such as high scalability, the coexistence of complex similarity metrics, and the requirement of data quality evaluation. In this study, an extended complete formal taxonomy and some algorithms that utilize in achieving the detection and correction of contextual data quality anomalies were developed and implemented on structured data. Our methods were effective in detecting and correcting more data anomalies than existing taxonomy techniques, and also highlighted the demerit of Support Vector Machine (SVM). These proposed techniques, therefore, will be of relevance in detection and correction of errors in large contextual data (Big data).

Download Full-text

Word similarity metrics and multilateral comparison

An Optimal Assessment of Natural Language Student Input Using Word-to-Word Similarity Metrics

A comparison of perceptual word similarity metrics

USE OF SIMILARITY METRICS IN TEMPLATE-BASED DETECTION OF OBJECTS IN IMAGES

The role of cross-linguistic stress pattern frequency and word similarity on the acquisition of English stress pattern by native speakers of Brazilian Portuguese

Word Similarity Calculation by Using the Edit Distance Metrics with Consonant Normalization

Analysis Accuracy of Similar Word Based Clustering (EWSB) Algorithm on Machine Translator Bahasa Indonesia-Minang

Phylogenetic relations and mitogenome‐wide similarity metrics reveal monophyly of Penaeus sensu lato

Cyberbullying Detection, Based on the FastText and Word Similarity Schemes

Intelligent recognition of semantic relationships based on antonymy

Methods for detecting and correcting contextual data quality problems

Export Citation Format