actual word
Recently Published Documents


TOTAL DOCUMENTS

20
(FIVE YEARS 10)

H-INDEX

2
(FIVE YEARS 1)

2022 ◽  
Vol 23 (1) ◽  
pp. 82-94
Author(s):  
Febiarty Wulan Suci ◽  
Nur Hayatin ◽  
Yuda Munarko

Stemming has an important role in text processing. Stemming of each language is different and strongly affected by the type of text language. Besides that, each language has different rules in the use of words with an affix. A large number of the words used in the Indonesian language are formed by combining root words with affixes and other combining forms. One of the problems in Indonesian stemming is having different types of affixes, and also having some prefixes that changes according to the first letters of the root words. Implementing Idris stemmer for Indonesian text is of interest because Indonesia and Malaysia have the same language root. However, the results do not always produce the actual word, because the Idris algorithm first removes the prefix according to Rule 2. This elimination directly affected the Idris stemmer result when implemented to Indonesian text. In this study, we focus on a modified Idris stemmer (from Malay) to IN-Indris with Indonesia context. In order to test the proposed modification to the original algorithm, Indonesian online novels excerpts are used to measure the performance of IN-Idris.test was conducted to compare the proposed algorithm with other stemmers. From the experiment result, IN-Idris had an accuracy of approximately 82.81%. There was an increased accuracy up to 5.25% when compared to Idris accuracy. Moreover, the proposed stemmer is also running faster than Idris with a gap of speed of around 0.25 seconds. ABSTRAK: Stemming mempunyai peranan penting dalam pemprosesan teks. Stem setiap bahasa adalah berbeza dan sangat dipengaruhi oleh jenis bahasa teks. Selain itu, setiap bahasa mempunyai peraturan yang berbeza dalam penggunaan kata dengan awalan. Sebilangan besar kata-kata yang digunakan dalam bahasa Indonesia dibentuk dengan menggabungkan kata akar dengan afiks dan bentuk gabungan lain. Salah satu masalah dalam bahasa Indonesia adalah mempunyai pelbagai jenis awalan, dan juga mempunyai beberapa awalan yang berubah sesuai dengan huruf pertama kata dasar. Menerapkan stemder Idris untuk teks Indonesia adalah minat kerana Indonesia dan Malaysia mempunyai akar bahasa yang sama. Namun, hasilnya tidak selalu menghasilkan kata yang sebenarnya, kerana algoritma Idris pertama kali menghapus awalan menurut Peraturan 2. Penghapusan ini secara langsung mempengaruhi hasil batang Idris ketika diterapkan ke teks Indonesia. Dalam kajian ini, kami memfokuskan pada stemmer Idris yang diubahsuai (dari bahasa Melayu) ke IN-Indris dengan konteks Indonesia. Untuk menguji cadangan pengubahsuaian pada algoritma asli, petikan novel dalam talian Indonesia digunakan untuk mengukur prestasi IN-Idris. Ujian dilakukan untuk membandingkan algoritma yang dicadangkan dengan stemmer lain. Dari hasil eksperimen, IN-Idris mempunyai ketepatan sekitar 82,81%, ada peningkatan ketepatan hingga 5,25% dibandingkan dengan ketepatan Idris. Selain itu, stemmer yang dicadangkan juga berjalan lebih cepat daripada Idris dengan jurang kelajuan sekitar 0.25 saat.


Diachronica ◽  
2021 ◽  
Author(s):  
Timotheus A. Bodt¹ ◽  
Johann-Mattis List²

Abstract While analysing lexical data of Western Kho-Bwa languages of the Sino-Tibetan or Trans-Himalayan family with the help of a computer-assisted approach for historical language comparison, we observed gaps in the data where one or more varieties lacked forms for certain concepts. We employed a new workflow, combining manual and automated steps, to predict the most likely phonetic realisations of the missing forms in our data, by making systematic use of the information on sound correspondences in words that were potentially cognate with the missing forms. This procedure yielded a list of hypothetical reflexes of previously identified cognate sets, which we first preregistered as an experiment on the prediction of unattested word forms and then compared with actual word forms elicited during secondary fieldwork. In this study we first describe the workflow which we used to predict hypothetical reflexes and the process of elicitation of actual word forms during fieldwork. We then present the results of our reflex prediction experiment. Based on this experiment, we identify four general benefits of reflex prediction in historical language comparison. These comprise (1) an increased transparency of linguistic research, (2) an increased efficiency of field and source work, (3) an educational aspect which offers teachers and learners a wide plethora of linguistic phenomena, including the regularity of sound change, and (4) the possibility of kindling speakers’ interest in their own linguistic heritage.


2021 ◽  
pp. 14-21
Author(s):  
С.А. Шаров

В статье рассматриваются проблемы создания частотных словарей для преподавания языка с учетом таких параметров, как источники корпусов, собственно частотность слова, зависимость от длины документов, тематическое и жанровое разнообразие корпусов. Приводятся примеры проблем с частотными списками и даются рекомендации для практического применения частотных словарей. Отмечается, что помимо размера корпуса на содержание частотных словарей влияют слова, популярные внутри длинных документов, поскольку они приводят к выбросам частот, а также соответствие тем и жанров, представленных в корпусе, целям обучения, так как корпуса из разных предметных областей и жанров могут радикально отличаться друг от друга. The paper discusses the issues in creating frequency dictionaries aimed at language teaching, while taking into account such parameters as sources of corpora, actual word frequencies, document length consideration, as well as variation in topics and genres. It provides examples of problems with frequency lists and gives recommendations for practical use of frequency dictionaries. In addition to the size of the corpus, the frequency dictionaries are influenced by words that are frequent within long documents, since they lead to frequency bursts, as well as by the link between the topics and genres in a corpus to the learning objectives, since corpora from different subject areas and genres can produce radically different frequency profiles.


2020 ◽  
Vol 228 (4) ◽  
pp. 254-263 ◽  
Author(s):  
Pedro S. Mendes ◽  
Karlos Luna ◽  
Pedro B. Albuquerque

Abstract. The present study tested if word frequency effects on judgments of learning (JOLs) are exclusively due to beliefs or if the direct experience with the items also plays a role. Across four experiments, participants read prompts about the frequency of the words (high/low), which could be congruent/incongruent with the words’ actual frequency. They made pre-study JOLs (except Experiment 1b), immediate JOLs, and completed a recall test. If experience drives the effect, JOLs should be based on actual word frequency rather than the prompts. Results showed higher pre-study JOLs for prompts of high frequency, but higher immediate JOLs for high-frequency words regardless of the prompt, suggesting an effect of direct experience with the words. In Experiments 2 and 3, we manipulated participants’ beliefs, finding a small effect of beliefs on JOLs. We conclude that, regarding word frequency, direct experience with the items seems more relevant than beliefs when making immediate JOLs.


2020 ◽  
Vol 29 (1) ◽  
pp. 27-50
Author(s):  
DAVID MAW

ABSTRACTWord setting in Machaut's refrain songs poses a problem, for whilst it is clearly indicated in the manuscripts, it often does not comply with recognised principles or values. To understand the situation, a dualistic relationship of words and music is proposed. It is founded in the coordinated but independent operation of principles of musical mimesis and musico-poetic dislocation. The music is constructed at a primary level as an imitation of the poetic form; but it is fundamentally independent of this model and may thus be detached from it and displaced against it. Devices such as ‘cross-cadencing’, ‘quasi-declamation’, ‘complementary-cadence inversion’ and ‘dissonance’ between implied and actual word setting are manifestations of this technique. The proposal accounts on the same basis for both the close relationship of words and music observable in the virelais and for the more abstract connection apparent in the rondeaux. There is a technical unity at work across the genres in Machaut's song composition.


Author(s):  
Claudio Iacobini

The term parasynthesis is mainly used in modern theoretical linguistics in the meaning introduced by Arsène Darmesteter (1874) to refer to denominal or deadjectival prefixed verbs of the Romance languages (Fr. embarquer ‘to load, to board’) in which the non-prefixed verb (barquer) is not an actual word, and the co-radical nominal form (embarqu-) is not well formed. The Romance parasynthetic verb is characterized with reference to its nominal or adjectival base as the result of the co-occurrence of both a prefix and a suffix (typically of a conversion process, i.e., non-overt derivational marking). The co-occurrence or simultaneity of the two processes has been seen by some scholars as a circumfixation phenomenon, whereby two elements act in combination. The peculiar relationship existing between base and parasynthetic verb is particularly problematic for an Item and Process theoretical perspective since this approach entails the application of one process at a time. Conversely, a Word and Paradigm framework deals more easily with parasynthetic patterns, as parasynthetic verbs are put in relation with prefixed verbs and verbs formed by conversion, without being undermined neither by gaps in derivational patterns nor by the possible concomitant addition of prefixes and suffixes. Due to their peculiar structure, parasynthetic verbs have been matter of investigation even for non-specialists of Romance languages, especially from synchronic (or, better said, achronic) point of view. Attention has been also placed on their diachronic development in that, despite being characteristic of the Romance languages, parasynthetic verbs were already present, although to a lesser extent, in Latin. The diachronic development of parasynthetic verbs is strictly connected with that of spatial verb prefixes from Latin to the Romance languages, with particular reference to their loss of productivity in the encoding of spatial meanings and their grammaticalization into actionality markers. Parasynthetic verbs have been in the Romance languages since their earliest stages and have shown constant productivity and diffusion in all the Romance varieties, thus differing from spatial prefixes, which underwent a strong reduction in productivity in combination with verbs. The term parasynthetic is sometimes also used to refer to nouns and adjectives derived from compounds or in which both a prefix and a suffix are attached to a lexical base. In the case of nominal and adjectival formation, there is much less consensus among scholars on the need to use this term, as well as on which processes should fall under this label. The common denominator of such cases consists either in the non-attestation of presumed intermediate stages (Sp. corchotaponero ‘relative to the industry of cork plugs’) or in the non-correspondence between sense and structure of the morphologically complex word (Fr. surnaturel ‘supernatural’).


The presence of Malaysian millennials on social media platforms is increasingly gaining attention particularly on Twitter. Language wise, many of them are predominantly using English and Malay in their tweets but with a touch of their own “styles” in various morphological aspects. This trend eventually leads to a rampant use of distorted vocabulary, churning out many non-standard words. This study aims to address the need in classifying the types of morphological distortions of words that are widely used among the Malaysian millennials and identify the reasons behind such trend. A total of 50 active Twitter users from Malaysia aged 18 to 30 years old were randomly chosen for this study. From each user, 20 tweets of longer than 5 words were selected for lexical analysis, giving a sum of 1000 tweets (8443 words in total). Then, interviews were conducted on 30 participants to gauge the factors of using those non-standard words. The findings revealed that the words were largely distorted in terms of its inflections so as to fit some sounds. Also, most distorted words were deliberately coined so that the millennials would appear trendy, while some were merely following the usage without knowing the actual word. This study has shown that the use of distorted words among Malaysian Twitter users did not hinder effective communication.


2019 ◽  
Author(s):  
Нигина Башировна Тухтасинова

В статье анализируются формальные и семантические особенности французской туристической терминологии. Рассматриваются продуктивные модели ее образования, а также роль, которую играют в ней исконные и заимствованные элементы. Делается вывод о преимущественно аутентичном характере лексики, принадлежащей к данной терминосистеме и отражающей актуальные словообразовательные процессы, происходящие в системе современного французского языка.The article analyzes the formal and semantic features of French tourist terminology. The productive models of its formation are examined, as well as the role that the original and borrowed elements play in it. The conclusion is drawn about the predominantly authentic character of the vocabulary belonging to this term system and reflecting the actual word-formation processes taking place in the system of the modern French language.


2019 ◽  
Author(s):  
Sarah Trapp ◽  
Padraig O'Seaghdha

Preparation of unattached phonological fragments such as word-initial consonants has been conceived of as partial production and thus as revealing fundamental processes of phonological encoding in language production. However, recent evidence of flexible preparation (O’Séaghdha & Frazer, 2014) challenges the partial production view. Instead, preparation may be mediated by high-level attention to meta-linguistically accessible elements, such as starting points for phonological encoding. If so, preparation may occur even if it is often inapplicable. In a word naming experiment, preparation was equally present for sets containing three consistent items whether they were embedded with single or with multiple inconsistent exceptions. Unlike in previous work, preparation was not evident immediately after single inconsistent items. These results lead us to further specify the theory of attentional form preparation as follows: Preparation of starting points in word production not only involves sustained attention to an abstract symbolic construal of the current task situation (e.g., /b/s are relevant), but also opportunistic transient deployment of that construal in conjunction with target word selection. This proposal underscores the need for coordination among distinct processes of anticipatory preparation and actual word production. More broadly, phonological encoding provides a rich but underexplored domain for the study of complex attention.


2019 ◽  
Vol 10 ◽  
pp. 175-202
Author(s):  
Till Vogt ◽  

In the case of Breton, lots of attempts were made to determine its historically grown word order. Proposals in this regard range from VSO (Timm) over V2 (Schafer) to SVO (Varin). This paper shows that traditional Breton has a preference for V2 positioning within a VSO-type framework. Lower Sorbian is a language with a rich morphology and consequently shows a relatively flexible word order. However, in unmarked declarative sentences it is normally the subject which occurs in sentence-initial position whereas the verb does not seem to prefer any specific position. Having determined the word order in the traditional varieties of Breton and Lower Sorbian, an outlook will be given on potential changes of their actual word order under language contact.


Sign in / Sign up

Export Citation Format

Share Document