phonetic similarity
Recently Published Documents

Covert audio recordings feature in the criminal justice system in a variety of guises, either on their own or accompanied by video. If legally obtained, such recordings can provide important forensic evidence. However, the quality of these potentially valuable evidential recordings is often very poor and their content indistinct, to the extent that a jury requires an accompanying transcript. At present, in many international jurisdictions, these transcriptions are produced by investigating police officers involved in the case, but transcription is a highly complex, meticulous and onerous task, and police officers are untrained and have a vested interest in the influence of the transcript on a case, which gives rise to potential inaccuracy. This paper reports the design and results of a controlled transcription experiment in which eight linguistically trained professional transcribers produced transcripts for an audio recording of a conversation between five adults in a busy restaurant. In the context of covert recordings, this recording shares many of the typical features of covert forensic recordings, including the presence of multiple speakers, background noise and use of non-specialist recording equipment. We present a detailed qualitative and quantitative comparison of the transcripts, identifying areas of agreement and disagreement in (a) speaker attribution and (b) the representation of the linguistic content. We find that disagreement between the transcriptions is frequent and various in nature; the most common causes are identified as (i) omission of speech that is included in other transcripts, (ii) variation in the representation of turns, (iii) orthographic variation seemingly motivated by phonetic similarity, and (iv) orthographic variation seemingly not motivated by phonetic similarity. We argue that the variable nature of the transcription of “challenging” audio recordings must be considered in forensic contexts and make recommendations for improving practice in the production of forensic transcriptions.

Download Full-text

Say my name! An empirical study on the pronounceability of identifier names

10.5753/vem.2021.17218 ◽

2021 ◽

Author(s):

Remo Gresta ◽

Elder Cirilo

Keyword(s):

Empirical Study ◽

Open Source ◽

Source Code ◽

Complexity Measure ◽

Code Review ◽

Phonetic Similarity ◽

Complexity Score ◽

Review Sessions ◽

Naming Practices

Identifiers represent approximately 2/3 of the elements in source code, and their names directly impact code comprehension. Indeed, intention-revealing names make code easier to understand, especially in code review sessions, where developers examine each other's code for mistakes. However, we argue that names should be understandable and pronounceable to enable developers to review and discuss code effectively. Therefore, we carried out an empirical study based on 40 open-source projects to explore the naming practices of developers concerning word complexity and pronounceability. We applied the Word Complexity Measure (WCM) to discover complex names; and analyzed the phonetic similarity among names and hard-to-pronounce English words. As a result, we observed that most of the analyzed names are somewhat composed of hard-to-pronounce words. The overall word complexity score of the projects also tends to be significant. Finally, the results show that the code location impacts the word complexity: names in small scopes tend to be simpler than names declared in large scopes.

Download Full-text

Automated Analysis of Digitized Letter Fluency Data

Frontiers in Psychology ◽

10.3389/fpsyg.2021.654214 ◽

2021 ◽

Vol 12 ◽

Author(s):

Sunghye Cho ◽

Naomi Nevler ◽

Natalia Parjane ◽

Christopher Cieri ◽

Mark Liberman ◽

...

Keyword(s):

Language Processing ◽

Test Performance ◽

Successful Implementation ◽

Articulation Rate ◽

Start Time ◽

Phonetic Similarity ◽

Word Duration ◽

Language Characteristics ◽

Semantic Distances ◽

Fluency Task

The letter-guided naming fluency task is a measure of an individual’s executive function and working memory. This study employed a novel, automated, quantifiable, and reproducible method to investigate how language characteristics of words produced during a fluency task are related to fluency performance, inter-word response time (RT), and over task duration using digitized F-letter-guided fluency recordings produced by 76 young healthy participants. Our automated algorithm counted the number of correct responses from the transcripts of the F-letter fluency data, and individual words were rated for concreteness, ambiguity, frequency, familiarity, and age of acquisition (AoA). Using a forced aligner, the transcripts were automatically aligned with the corresponding audio recordings. We measured inter-word RT, word duration, and word start time from the forced alignments. Articulation rate was also computed. Phonetic and semantic distances between two consecutive F-letter words were measured. We found that total F-letter score was significantly correlated with the mean values of word frequency, familiarity, AoA, word duration, phonetic similarity, and articulation rate; total score was also correlated with an individual’s standard deviation of AoA, familiarity, and phonetic similarity. RT was negatively correlated with frequency and ambiguity of F-letter words and was positively correlated with AoA, number of phonemes, and phonetic and semantic distances. Lastly, the frequency, ambiguity, AoA, number of phonemes, and semantic distance of words produced significantly changed over time during the task. The method employed in this paper demonstrates the successful implementation of our automated language processing pipelines in a standardized neuropsychological task. This novel approach captures subtle and rich language characteristics during test performance that enhance informativeness and cannot be extracted manually without massive effort. This work will serve as the reference for letter-guided category fluency production similarly acquired in neurodegenerative patients.

Download Full-text

Acoustic analysis of vowel formant frequencies in genetically-related and non-genetically related speakers with implications for forensic speaker comparison

PLoS ONE ◽

10.1371/journal.pone.0246645 ◽

2021 ◽

Vol 16 (2) ◽

pp. e0246645

Author(s):

Julio Cesar Cavalcanti ◽

Anders Eriksson ◽

Plinio A. Barbosa

Keyword(s):

High Frequency ◽

Acoustic Analysis ◽

Low Frequency ◽

Identical Twin ◽

Vowel Quality ◽

Male Adult ◽

Formant Frequencies ◽

Phonetic Similarity ◽

Vowel Formant ◽

Speaker Discrimination

The purpose of this study was to explore the speaker-discriminatory potential of vowel formant mean frequencies in comparisons of identical twin pairs and non-genetically related speakers. The influences of lexical stress and the vowels’ acoustic distances on the discriminatory patterns of formant frequencies were also assessed. Acoustic extraction and analysis of the first four speech formants F1-F4 were carried out using spontaneous speech materials. The recordings comprise telephone conversations between identical twin pairs while being directly recorded through high-quality microphones. The subjects were 20 male adult speakers of Brazilian Portuguese (BP), aged between 19 and 35. As for comparisons, stressed and unstressed oral vowels of BP were segmented and transcribed manually in the Praat software. F1-F4 formant estimates were automatically extracted from the middle points of each labeled vowel. Formant values were represented in both Hertz and Bark. Comparisons within identical twin pairs using the Bark scale were performed to verify whether the measured differences would be potentially significant when following a psychoacoustic criterion. The results revealed consistent patterns regarding the comparison of low-frequency and high-frequency formants in twin pairs and non-genetically related speakers, with high-frequency formants displaying a greater speaker-discriminatory power compared to low-frequency formants. Among all formants, F4 seemed to display the highest discriminatory potential within identical twin pairs, followed by F3. As for non-genetically related speakers, both F3 and F4 displayed a similar high discriminatory potential. Regarding vowel quality, the central vowel /a/ was found to be the most speaker-discriminatory segment, followed by front vowels. Moreover, stressed vowels displayed a higher inter-speaker discrimination than unstressed vowels in both groups; however, the combination of stressed and unstressed vowels was found even more explanatory in terms of the observed differences. Although identical twins displayed a higher phonetic similarity, they were not found phonetically identical.

Download Full-text

Qur’an Search System for Handling Cross Verse Based on Phonetic Similarity

Jurnal Sisfokom (Sistem Informasi dan Komputer) ◽

10.32736/sisfokom.v10i1.986 ◽

2021 ◽

Vol 10 (1) ◽

pp. 46-51

Author(s):

Intan Khairunnisa Fitriani ◽

Moch Arif Bijaksana ◽

Kemas Muslim Lhaksmana

Keyword(s):

Muslim Community ◽

Search System ◽

Phonetic Similarity ◽

Winkler Method

The number of verses in the Qur'an that is not small will be difficult and time consuming if done manually. Building a search system in the Qur'anic verse using the Indonesian Arabic-Latin equivalent will be very helpful for the Muslim community in Indonesia, especially for those who are not familiar with Arabic writing. In this study, a verse search system will be built on the Al-Qur'an based on phonetic similarity, more details about the handling of the verses in the Al-Qur'an. The system was built using the Jaro-Winkler algorithm to calculate the value of similarity and using the N-Grams algorithm for ranking documents. The same study has been done before with the name Lafzi +, with MAP 90% and 93% recall. In previous studies cases such as nun wiqoyah at the end of the verse could not be handled, so the system could not handle the search for the entire Qur'an. In addition, in the previous system the application of the Jaro-Winkler method to calculate the value of similarity has also not been fully implemented. So to complete the previous research, in this study added rules other than pre-existing rules so that they can handle nun wiqoyah at the end of the verse. By applying the Jaro-Winkler method to calculate the value of similarity and N-Grams for ranking documents and adding nun wiqoyah rules, this system generates 94% MAP and 92% recall. The results of this study indicate an increase in MAP, this shows that this system can improve the accuracy of systems that have been built before.

Download Full-text

Does the color of your letters depend on your language? The influence of regulatory factors in grapheme-color synesthesia across seven languages.

10.31234/osf.io/y8zuh ◽

2021 ◽

Author(s):

Nicholas Root ◽

Michiko Asano ◽

Helena Melero ◽

Chai-Youn Kim ◽

Anton V. Sidoroff-Dorso ◽

...

Keyword(s):

Effect Size ◽

Regulatory Factors ◽

Phonetic Similarity ◽

Letter Frequency

Grapheme-color synesthetes experience graphemes as having a consistent color (e.g., “N is turquoise”). Synesthetes’ specific associations (which letter is which color) are influenced by linguistic properties (letter frequency, phonetic similarity, etc.), and plausibly reveal the characteristics of underlying letter representations. Despite their clear linguistic origin, these influences (termed “Regulatory Factors”, RFs) are almost always studied in a single language, typically English. Here, we measure the influence of three RFs (previously examined only in English) in Dutch, English, Greek, Japanese, Korean, Russian, and Spanish synesthetes. For two RFs, effect size significantly differed between languages. In contrast, an RF which we predicted to be universal (because it is shaped by prelinguistic shape-color associations) did not differ between languages. Our results suggest that monolingual synesthesia studies should be interpreted with caution. Furthermore, they show how synesthetic associations offer an exceptional opportunity to study linguistic and prelinguistic influences on letter representations in different languages.

Download Full-text

A Cross-Linguistic Study of L3 Phonological Acquisition of Stop Contrasts

SAGE Open ◽

10.1177/2158244020985510 ◽

2021 ◽

Vol 11 (1) ◽

pp. 215824402098551

Author(s):

Jiaqi Liu ◽

Jiayan Lin

Keyword(s):

First Language ◽

Target Language ◽

Phonological Acquisition ◽

Acoustic Feature ◽

L2 Reading ◽

Reading Task ◽

Third Language ◽

Phonetic Similarity ◽

L3 Acquisition ◽

Initial Stage

The research reported in this article investigated how students learning Japanese or Russian as a third language (L3) perceived and produced word-initial stops in their respective target language and the link between perception and production. The participants in the study were 39 Chinese university students who spoke Mandarin Chinese as their first language (L1), English as their second language (L2), and Japanese or Russian as their L3. An L3 identification task, an L3 reading task, and an L2 reading task were used to investigate the learners’ perception and production of word-initial stops. The results demonstrated that the phonetic similarity in different stop categories between L1, L2, and L3 contributed to learners’ confusion in perception. On the contrary, L3 learners could perceive the new acoustic feature voicing lead, but found it difficult to produce L3 voiced stops. In addition, the study found a positive relationship between the perception and production of voiceless stops in the initial stage of L3 acquisition, but there was no correlation between the perception and production of voiced stops. Pedagogical implications for L3 speech learning are discussed on the basis of the results.

Download Full-text

Linguistic Harmony as a Means of Symbolization in Folklore and Poetic Texts

Slovene ◽

10.31168/2305-6754.2021.10.1.14 ◽

2021 ◽

Vol 10 (1) ◽

pp. 322-346

Author(s):

Alexander V. Gura

Keyword(s):

Cultural Context ◽

Traditional Culture ◽

Symbolic Language ◽

Complex Sound ◽

Cultural Contexts ◽

Phonetic Similarity ◽

The World ◽

The Aesthetic ◽

Poetic Texts ◽

Over Time

The article discusses the use of linguistic harmony in traditional culture as a means of symbolisation. In folklore texts, the phonetic similarity of the words heightens the semantic connections between them. This happens when homonyms, paronyms, and other similar-sounding words in the text along with anagrammatic coding of the meaning of the text (for example, riddles) merge, combining two words in one hybrid word paronymous with both of them; by means of phonetic strengthening, complex sound compilation of the text as a whole, as frequently seen in poetry, etc. Symbolic correlations based on verbal consonances usually occur in spells, conjurations, dream interpretations, superstitions, and rituals that have a magical function (prognostic, healing, etc.). In archaic elements of the poetry, the harmony of the words combines an aesthetic function with a magical one (merging more and more with the aesthetic one over time), which allows us to talk about their true syncretism and the magical origins of poetry. Sound and logical-conceptual methods of symbolisation often interact with each other. The symbolism generated by the harmony of words fits into a wide cultural context, revealing deep-semantic cultural parallels from different eras and communities, since the supraindividual memory operating in culture is able to store and bring to life the accumulated connotations. Symbolism arising from the consonances of words has the property of reviving the etymological memory of a word in cultural contexts. In some archaic Slavic zones, symbolism, based on consonant words, still retains its productivity. In the symbolic language of the culture, it also performs a structuring function, takes part in the formation of connections and relationships between the single elements of the traditional picture of the world setting up certain parameters for it, for example, it forms parallels in folk zoology in animal and bird codes, isolating single groups of characters.

Download Full-text

Grammaticalization of Progressive Aspect in a Slavic Dialect in Albania

Journal of Language Contact ◽

10.1163/19552629-bja10012 ◽

2020 ◽

Vol 13 (2) ◽

pp. 428-458

Author(s):

Maxim Makartsev

Keyword(s):

The Other ◽

Language Attrition ◽

Language Community ◽

Progressive Aspect ◽

Phonetic Similarity

Abstract The article focuses on two markers of progressive aspect that are emerging in a Balkan Slavic dialect in Albania, presumably under Albanian influence. One of them dates back to locative (ǵe ‘where’). Two processes intertwine on the grammaticalisation path of the other (toko): originally an adversative conjunction (‘but’), it was structurally mapped to its polysemic (adversative, but also affirmative, progressive, conditional) Albanian counterpart po. At the same time, its choice to mark progressive was additionally motivated by the phonetic similarity with another Albanian progressive marker duke. In the first third of the 20th century both markers were used as synonyms. However, during the subsequent process of language attrition the language community in question split into three groups regarding the use of the markers: of the last six remaining speakers one speaker used only ǵe as an optional marker; one speaker used toko as an optional marker; four other speakers used toko as a regular progressive marker.

Download Full-text

DONBETTYR:THE ANCIENT NAME OF THE ANCIENT DEITY

Известия СОИГСИ ◽

10.46698/e1905-4145-3142-v ◽

2020 ◽

Author(s):

Э.Т. Гутиева

Keyword(s):

High Frequency ◽

Phonetic Similarity ◽

Frequency Character

Теоним Донбеттыр, имя водного божества осетинского нартовского эпоса, традиционно считается примером ономастической христианизации, присвоением языческому богу имени христианского святого. В работе рассматривается гипотеза В.И. Абаева о переименовании древнего божества осетинского нартовского пантеона в честь Апостола Петра. Предлагаемые в работе альтернативные этимологии свидетельствуют не только об архаичности самого образа божества, но и позволяют полагать архаичность его имени. Постпозитивный элемент теонима *Беттыр интерпретируется через общеиндоевропейский корень со значением «отец». Главными аргументами в поддержку такого подхода можно считать высокопрецедентный характер элемента со значением «отец, родитель» среди именований божественных сущностей и фонетическое сходство между элементами. Благодаря маркированности элементом *pəter- многие верховные боги индоевропейцев оказываются связанными в одно этимологическое гнездо. Такой подход позволяет допускать ономастические связи Донбеттыра с олимпийскими богами, с верховным римским богом. Дальнейшего исследования требует презумпция о возможной фоссилизации в имени представителя нартовского пантеона протоиндоевропейской формулы, обозначавшей верховного бога и сохраненной в других древнейших эпических системах индоевропейских народов. Важным представляется рассмотрение теонимов других эпических систем, агиографической литературы, особенно в части эпитетов святых, анализ ряда этнонимов. Theonym Donbettyr, the name of the water deity of the Ossetian Nart epic, is traditionally considered an example of onomastic Christianization, assigning the name of a Christian Saint to a pagan God. The paper considers the hypothesis of V. I. Abaev about renaming the ancient God in honor of the Apostle Peter. The alternative etymologies proposed in this work indicate not only the archaic nature of the image of the deity, but also allow us to assume the archaic nature of his name also. The postpositive element of the theonym *Bettyr is interpreted through a common Indo-European root with the meaning "father". The main arguments in support of this approach can be considered the high-frequency character of the element with the meaning "father, parent" among the names of divine entities and the phonetic similarity between the elements. Marked with the element * pəter-many of the Supreme gods of the Indo-Europeans are connected in one etymological nest. This approach allows to suggest the onomastic connection of Donbettyr with the Olympic gods, with the Supreme Roman God. The presumption about the possible fossilization in the name of the representative of the Nart Pantheon of the proto-Indo-European formula requires Further research. The formula denoted the Supreme God and was preserved in other ancient epic systems of the Indo-European peoples. It is important to review the theonyms of different epic systems, hagiographic literature, especially in terms of epithets of saints, to research related ethnonyms.

Download Full-text

phonetic similarityRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Specifying Challenges in Transcribing Covert Recordings: Implications for Forensic Transcription

Say my name! An empirical study on the pronounceability of identifier names

Automated Analysis of Digitized Letter Fluency Data

Acoustic analysis of vowel formant frequencies in genetically-related and non-genetically related speakers with implications for forensic speaker comparison

Qur’an Search System for Handling Cross Verse Based on Phonetic Similarity

Does the color of your letters depend on your language? The influence of regulatory factors in grapheme-color synesthesia across seven languages.

A Cross-Linguistic Study of L3 Phonological Acquisition of Stop Contrasts

Linguistic Harmony as a Means of Symbolization in Folklore and Poetic Texts

Grammaticalization of Progressive Aspect in a Slavic Dialect in Albania

DONBETTYR:THE ANCIENT NAME OF THE ANCIENT DEITY

phonetic similarity
Recently Published Documents