ERP correlates of orthographic similarity using the levenshtein distance measure

2009 ◽  
Author(s):  
Marta Vergara-Martinez ◽  
Tamara Y. Swaab
Author(s):  
Haydee Carrasco-Ortiz ◽  
Mark Amengual ◽  
Stefan Th. Gries

This study investigated the extent to which phonological and orthographic overlap between the two languages of bilinguals predicts word processing abilities in their dominant and non-dominant languages. Forty-four English-dominant L1 English-L2 Spanish speakers and Spanish-dominant Spanish heritage speakers performed a lexical decision task while reading words in English and Spanish. We calculated orthographic and phonological similarity of cognate and noncognate words using the Levenshtein distance measure. Results showed that both bilingual groups benefited from orthographic similarity when reading Spanish and English words, whereas a facilitative effect was restricted to Spanish words that shared phonology across languages. These findings suggest a different contribution of phonological and orthographic similarity in bilingual word recognition, independently of language dominance.


2021 ◽  
Vol 26 (2) ◽  
pp. 155
Author(s):  
Maylton Silva Fernandes ◽  
Gustavo Lopez Estivalet ◽  
Márcio Martins Leitão

Resumo: Palavras cognatas são conhecidas por dividirem semelhanças formais e semânticas entre duas ou mais línguas, possivelmente dividindo representações no léxico mental. Nesse sentido, as palavras cognatas possuem diferentes graus de semelhança, como por exemplo pares do português-inglês: cognatos perfeitos “banana”, cognatos de alto grau “momento-moment” e cognatas de baixo grau “noite-night”. Focalizando a relação formal e independentemente do conhecimento bilíngue, como as palavras cognatas do português-inglês são reconhecidas por monolíngues? O presente artigo tem o objetivo de investigar o reconhecimento de palavras cognatas do português-inglês por monolíngues através do grau de semelhança ortográfica. Para tanto, aplicamos um experimento de julgamento de aceitabilidade entre pares de palavras cognatas. Com o objetivo de se pesquisar o grau de similaridade, utilizou-se a Distância de Levenshtein Normalizada entre as palavras cognatas. Os resultados apontaram uma correlação significativa entre o julgamento de aceitabilidade e este coeficiente. Portanto, os resultados indicaram que mesmo participantes não-bilíngues são capazes de reconhecer a granularidade da semelhança ortográfica. Ainda, de forma exploratória, foi possível determinar o coeficiente a partir do qual as palavras podem ser consideradas pares cognatos. Enfim, espera-se que o presente estudo permita uma melhor compreensão das palavras cognatas assim como provoque uma reflexão do monolinguismo. Palavras-chave: Cognatas; distância de Levenshtein; julgamento de aceitabilidade; bilinguismo.Abstract: Cognate words are known to share formal and semantic similarities between two or more languages, possibly dividing representations in the mental lexicon. In this sense, cognate words have different degrees of similarity, as for example PortugueseEnglish pairs: perfect cognates “banana”, high degree cognates “momento-moment” and low degree cognates “noite-night”. Focusing on the formal relationship and regardless of bilingual knowledge, how are cognate words in Portuguese-English recognized by monolinguals? This article aims to investigate the recognition of cognate words in Portuguese-English by monolinguals through the degree of orthographic similarity. For that, we applied an acceptability judgment experiment between cognate word pairs. In order to investigate the degree of similarity, the Normalized Levenshtein Distance was used between cognate words. The results showed a significant correlation between the acceptability judgment and this coefficient. Therefore, the results indicated that even non-bilingual participants are able to recognize the granularity of orthographic similarity. Still, in an exploratory way, it was possible to determine the coefficient from which words can be considered cognate pairs. Therefore, it is hoped that the present study allows a better understanding of cognate words as well as provoking a reflection of monolinguals.Keywords: cognate; Levenshtein distance; acceptability judgement task; bilingualism.


Literator ◽  
2008 ◽  
Vol 29 (1) ◽  
pp. 185-204 ◽  
Author(s):  
P.N. Zulu ◽  
G. Botha ◽  
E. Barnard

Two methods for objectively measuring similarities and dissimilarities between the eleven official languages of South Africa are described. The first concerns the use of n-grams. The confusions between different languages in a text-based language identification system can be used to derive information on the relationships between the languages. Our classifier calculates n-gram statistics from text documents and then uses these statistics as features in classification. We show that the classification results of a validation test can be used as a similarity measure of the relationship between languages. Using the similarity measures, we were able to represent the relationships graphically. We also apply the Levenshtein distance measure to the orthographic word transcriptions from the eleven South African languages under investigation. Hierarchical clustering of the distances between the different languages shows the relationships between the languages in terms of regional groupings and closeness. Both multidimensional scaling and dendrogram analysis reveal results similar to well-known language groupings, and also suggest a finer level of detail on these relationships.


2011 ◽  
Vol 15 (1) ◽  
pp. 157-166 ◽  
Author(s):  
JOB SCHEPENS ◽  
TON DIJKSTRA ◽  
FRANC GROOTJEN

Researchers on bilingual processing can benefit from computational tools developed in artificial intelligence. We show that a normalized Levenshtein distance function can efficiently and reliably simulate bilingual orthographic similarity ratings. Orthographic similarity distributions of cognates and non-cognates were identified across pairs of six European languages: English, German, French, Spanish, Italian, and Dutch. Semantic equivalence was determined using the conceptual structure of a translation database. By using a similarity threshold, large numbers of cognates could be selected that nearly completely included the stimulus materials of experimental studies. The identified numbers of form-similar and identical cognates correlated highly with branch lengths of phylogenetic language family trees, supporting the usefulness of the new measure for cross-language comparison. The normalized Levenshtein distance function can be considered as a new formal model of cross-language orthographic similarity.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Putta Hemalatha ◽  
Geetha Mary Amalanathan

PurposeAdequate resources for learning and training the data are an important constraint to develop an efficient classifier with outstanding performance. The data usually follows a biased distribution of classes that reflects an unequal distribution of classes within a dataset. This issue is known as the imbalance problem, which is one of the most common issues occurring in real-time applications. Learning of imbalanced datasets is a ubiquitous challenge in the field of data mining. Imbalanced data degrades the performance of the classifier by producing inaccurate results.Design/methodology/approachIn the proposed work, a novel fuzzy-based Gaussian synthetic minority oversampling (FG-SMOTE) algorithm is proposed to process the imbalanced data. The mechanism of the Gaussian SMOTE technique is based on finding the nearest neighbour concept to balance the ratio between minority and majority class datasets. The ratio of the datasets belonging to the minority and majority class is balanced using a fuzzy-based Levenshtein distance measure technique.FindingsThe performance and the accuracy of the proposed algorithm is evaluated using the deep belief networks classifier and the results showed the efficiency of the fuzzy-based Gaussian SMOTE technique achieved an AUC: 93.7%. F1 Score Prediction: 94.2%, Geometric Mean Score: 93.6% predicted from confusion matrix.Research limitations/implicationsThe proposed research still retains some of the challenges that need to be focused such as application FG-SMOTE to multiclass imbalanced dataset and to evaluate dataset imbalance problem in a distributed environment.Originality/valueThe proposed algorithm fundamentally solves the data imbalance issues and challenges involved in handling the imbalanced data. FG-SMOTE has aided in balancing minority and majority class datasets.


2012 ◽  
Vol 57 (3) ◽  
pp. 829-835 ◽  
Author(s):  
Z. Głowacz ◽  
J. Kozik

The paper describes a procedure for automatic selection of symptoms accompanying the break in the synchronous motor armature winding coils. This procedure, called the feature selection, leads to choosing from a full set of features describing the problem, such a subset that would allow the best distinguishing between healthy and damaged states. As the features the spectra components amplitudes of the motor current signals were used. The full spectra of current signals are considered as the multidimensional feature spaces and their subspaces are tested. Particular subspaces are chosen with the aid of genetic algorithm and their goodness is tested using Mahalanobis distance measure. The algorithm searches for such a subspaces for which this distance is the greatest. The algorithm is very efficient and, as it was confirmed by research, leads to good results. The proposed technique is successfully applied in many other fields of science and technology, including medical diagnostics.


2017 ◽  
Vol 3 (2) ◽  
pp. 1-6
Author(s):  
Ferly Gunawan ◽  
M. Ali Fauzi ◽  
Putra Pandu Adikara

Perkembangan aplikasi mobile yang pesat membuat banyak aplikasi diciptakan dengan berbagai kegunaan untuk memenuhi kebutuhan pengguna. Setiap aplikasi memungkinkan pengguna untuk memberi ulasan tentang aplikasi tersebut. Tujuan dari ulasan adalah untuk mengevaluasi dan meningkatkan kualitas produk ke depannya. Untuk mengetahui hal tersebut, analisis sentimen dapat digunakan untuk mengklasifikasikan ulasan ke dalam sentimen positif atau negatif. Pada ulasan aplikasi biasanya terdapat salah eja sehingga sulit dipahami. Kata yang mengalami salah eja perlu dilakukan normalisasi kata untuk diubah menjadi kata standar. Karena itu, normalisasi kata dibutuhkan untuk menyelesaikan masalah salah eja. Penelitian ini menggunakan normalisasi kata berbasis Levenshtein distance. Berdasarkan pengujian, nilai akurasi tertinggi terdapat pada perbandingan data latih 70% dan data uji 30%. Hasil akurasi tertinggi dari pengujian menggunakan nilai edit <=2 adalah 100%, nilai edit tertinggi kedua didapat pada nilai edit <=1 dengan akurasi 96,4%, sedangkan nilai edit dengan akurasi terendah diperoleh pada nilai edit <=4 dan <=5 dengan akurasi 66,6%. Hasil dari pengujian Naive Bayes-Levenshtein Distance memiliki nilai akurasi tertinggi yaitu 96,9% dibandingkan dengan pengujian Naive Bayes tanpa Levenshtein Distance dengan nilai akurasi 94,4%.  


Sign in / Sign up

Export Citation Format

Share Document