parts of speech
Recently Published Documents


TOTAL DOCUMENTS

1041
(FIVE YEARS 459)

H-INDEX

18
(FIVE YEARS 3)

Author(s):  
Pragya Katyayan ◽  
Nisheeth Joshi

Hindi is the third most-spoken language in the world (615 million speakers) and has the fourth highest native speakers (341 million). It is an inflectionally rich and relatively free word-order language with an immense vocabulary set. Despite being such a celebrated language across the globe, very few Natural Language Processing (NLP) applications and tools have been developed to support it computationally. Moreover, most of the existing ones are not efficient enough due to the lack of semantic information (or contextual knowledge). Hindi grammar is based on Paninian grammar and derives most of its rules from it. Paninian grammar very aggressively highlights the role of karaka theory in free-word order languages. In this article, we present an application that extracts all possible karakas from simple Hindi sentences with an accuracy of 84.2% and an F1 score of 88.5%. We consider features such as Parts of Speech tags, post-position markers (vibhaktis), semantic tags for nouns and syntactic structure to grab the context in different-sized word windows within a sentence. With the help of these features, we built a rule-based inference engine to extract karakas from a sentence. The application takes in a text file with clean (without punctuation) simple Hindi sentences and gives back karaka tagged sentences in a separate text file as output.


2022 ◽  
Vol 13 (1) ◽  
pp. 166-179
Author(s):  
Dac Phat Dinh

Exploring the shift in meanings of translating prepositions from English to Vietnamese, the study, besides analyzing the cases of the changes in meanings of prepositions, aims to discuss a general variety of meanings of English prepositions. The methods of analysis and synthesis of theories from the available data on prepositions as well as the methods of classifying and systematizing prepositions were applied to English-Vietnamese translation. From the collected data, this study has revealed 6 cases of the shift in meanings of prepositions and the characteristics of multiple meanings of prepositions. In the course of translation, contextual meanings are used in order to convey the meanings appropriately in the Vietnamese style. The research paper can make some contribution to the teaching of translation and make it a reference material for English learners.


2022 ◽  
Vol 14 (1) ◽  
pp. 0-0

POS (Parts of Speech) tagging, a vital step in diverse Natural Language Processing (NLP) tasks has not drawn much attention in case of Odia a computationally under-developed language. The proposed hybrid method suggests a robust POS tagger for Odia. Observing the rich morphology of the language and unavailability of sufficient annotated text corpus a combination of machine learning and linguistic rules is adopted in the building of the tagger. The tagger is trained on tagged text corpus from the domain of tourism and is capable of obtaining a perceptible improvement in the result. Also an appreciable performance is observed for news articles texts of varied domains. The performance of proposed algorithm experimenting on Odia language shows its manifestation in dominating over existing methods like rule based, hidden Markov model (HMM), maximum entropy (ME) and conditional random field (CRF).


2021 ◽  
pp. 587-595
Author(s):  
Alebachew Chiche ◽  
Hiwot Kadi ◽  
Tibebu Bekele

Natural language processing plays a great role in providing an interface for human-computer communication. It enables people to talk with the computer in their formal language rather than machine language. This study aims at presenting a Part of speech tagger that can assign word class to words in a given paragraph sentence. Some of the researchers developed parts of speech taggers for different languages such as English Amharic, Afan Oromo, Tigrigna, etc. On the other hand, many other languages do not have POS taggers like Shekki’noono language.  POS tagger is incorporated in most natural language processing tools like machine translation, information extraction as a basic component. So, it is compulsory to develop a part of speech tagger for languages then it is possible to work with an advanced natural language application. Because those applications enhance machine to machine, machine to human, and human to human communications. Although, one language POS tagger cannot be directly applied for other languages POS tagger. With the purpose for developing the Shekki’noono POS tagger, we have used the stochastic Hidden Markov Model. For the study, we have used 1500 sentences collected from different sources such as newspapers (which includes social, economic, and political aspects), modules, textbooks, Radio Programs, and bulletins.  The collected sentences are labeled by language experts with their appropriate parts of speech for each word.  With the experiments carried out, the part of speech tagger is trained on the training sets using Hidden Markov model. As experiments showed, HMM based POS tagging has achieved 92.77 % accuracy for Shekki’noono. And the POS tagger model is compared with the previous experiments in related works using HMM. As a future work, the proposed approaches can be utilized to perform an evaluation on a larger corpus.


2021 ◽  
pp. 35-45
Author(s):  
L. SHYTYK ◽  
D. KULINICH

The article is devoted to the study of lexical and grammatical features of epistolary addresses (on the material of “Letters to Oles Honchar” compiled by M. Stepanenko). The address is interpreted as one of the manifestations of human communication needs which serves to establish and maintain speech contact, as well as to express the emotional and evaluative characteristics of the interlocutor. An epistolary address is a word or phrase by which the author of a letter nominates his addressee in the text of a written message to establish contact with him. We processed 895 letters to Oles Honchar, in which 1185 addresses in Ukrainian and about 200 units in other languages had been recorded. Lexical features of addresses represent their belonging to the following semantic groups: addresses-anthroponyms (name, patronymic and surname); traditional etiquette forms (пан, товариш); general addresses (names of persons by generic or gender feature; names of persons by kinship in the indirect sense; names of persons by friendly relations); special addresses (names by profession, type of activity, position, academic titles); occasional addresses. Most often, senders address Oles Honchar by patronymic or by name, using it in full or in short form, and sometimes by surname. The lexical and semantic content of addresses depends on the intention of the speaker, his politeness, knowledge of language etiquette and the peculiarities of the relationship with the writer. In order to strengthen the address, attributive distributors expressed by honorific and emotional-evaluative adjectives аre used. Honorific adjectives (шановний, високошановний, найшанованіший, глибокошановний, вельмишановний, високоповажний, etc.) convey a polite attitude and perform etiquette function. Emotional-evaluative adjectives (дорогий, славний, щирий, незабутній, рідний, любий, коханий, etc.) denote sincerity, friendliness, friendly affection and perform an evaluative function. We reveal a significant proportion of constructions in which adjectives of both groups are used. This causes a change in the tonality of the communicative situation and reduces interpersonal distance. Possessive pronouns мій, наш, which have partially lost the meaning of possessiveness, strengthen the intimacy, cordiality and sincerity of the relationship. Addresses in Russian, Belarusian, Polish and English are described. It is found that the grammatical differentiation of addresses directly depends on lexical and grammatical features (proper or common names and substantivized parts of speech) and morphological means of their expression. It is confirmed that the typical morphological form of addresses is the vocative case of the noun, as well as the homonymous nominative case in letters written during the Soviet period. Violations of morphological norms (different case forms of lexical phrase components, a non-normative form Олесе) and orthographic mistakes in spelling of the writer’s patronymic are revealed. The non-normative form of the nominative case as a means of expressing the address in letters dated 1990–1995 is substantiated. The results of the research show that the most frequent lexeme is Олесю Терентійовичу. Forms Олесь Терентійович and Олесю are less used. Quantitative indicators of addressing forms are summarized in the table. We see the prospect of further scientific research in deepening other vectors of analysis of addresses, in particular in the study of their functional and stylistic potential.


Author(s):  
E.N. Popova

The relevance of this work is related to the poorly studied conjunctions in the dialects of the Komi language. The study of conjunctions is relevant not only in connection with the solution of the problem of the formation of a conjunction as a part of speech, but also with the solution of problems related to the complication of a sentence, and at the same time, the syntactic structure of the language. Due to the lack of descriptions of conjunctions in the dialects of the Komi language, we continue to conduct research on the problem of "Conjunctions in the Komi-Zyryan dialects" in order to create generalizing works based on such descriptions in the future. The object of the study is the conjunctions of temporary meaning that function in the dialects of the Komi language. The scientific novelty of the study is that, for the first time, the conjunctions of temporary meaning that were not previously described in the dialects of the Komi language, are considered. Their composition in dialects is revealed; the origin, methods and ways of their formation, genetic connection with other parts of speech are established; structural features, peculiarities of their use in a sentence are determined. In the course of the study, descriptive-analytical, comparative, etymological methods, the method of lexicographic search were used.


2021 ◽  
Author(s):  
Zhiming Bao ◽  
Luwen Cao ◽  
Kunmei Han ◽  
Lin Li ◽  
Jia Wen Hing ◽  
...  

Abstract It is well-documented that patients with semantic dementia and Alzheimer’s disease present with difficulty in lexical retrieval and reversal of the concreteness effect in nouns and verbs. Little is known about the lexical phenomena before the onset of symptoms. We anticipate that there are linguistic signs in the speech of people who suffer from mild cognitive impairment (MCI), the prodromal stage of dementia. Here, we report the results of a novel corpus-linguistic approach to the early detection of cognitive impairment. We recorded 40 hours of natural, unconstrained speech of 188 English-speaking Singaporeans; 90 are diagnosed with MCI (51 amnestic, 39 nonamnestic), and 98 are cognitively healthy. The recordings yield 327,470 words, which are tagged for parts of speech. We calculate the per-minute speech rates and concreteness scores of nouns and verbs, and of all tagged words, in our dataset. Our analysis shows that the two measures of nouns and verbs identify different subtypes of MCI. Compared with healthy controls, subjects with amnestic MCI produce fewer but more abstract nouns, whereas subjects with nonamnestic MCI produce fewer but more concrete verbs. Cognitive impairment is manifested in ordinary language before the presentation of clinical symptoms, and can be detected through non-invasive corpus-based analysis of natural speech.


2021 ◽  
pp. 235-244
Author(s):  
Joanna Pieczonka
Keyword(s):  

The article concerns the Latin Grammar by Emilia Kubicka, published in 2019. The book presents the rules of the Latin pronunciation, conjugation, declension of the nouns, adjectives, numerals, pronouns, and the indeclinable parts of speech such as adverbs and prepositions. However, the grammar does not present all the principles that govern the structure of Latin sentences, and the book has numerous errors.


Author(s):  
Kateryna Sheremeta

The article, based on a systematic approach, highlights the author's thematic groups of English-language terminological units of the specialized  language of higher education in the United States. An attempt has been made to comprehend on the scholarly basis the correlation between the concepts of “thematic group” and “lexical-semantic group”. It is noted that the thematic classification of lexical units, which is the most common way of combining words, is bgrounded on the internal connections of objects and phenomena of reality, and is determined by the subject-logical features and common functional purpose of these units. Thematic groups of one or another branch terminological system can contain several nuclear lexical-semantic groups, and their units are characterized by a clear differentiation of features. It is emphasized that in interpreting the concept of thematic group, modern linguistics aims to determine the ways and features of semantic development (extralinguistic aspects) not of individual words, but of groups of lexems that have one semantic orientation. A thematic group is a group of words that includes words selected and combined on the basis of common subject-logical connections, and these words are the same parts of speech; or words from other parts of speech, needed to reveal a common theme. In the process of systematic study of English terminology of the U. S. higher education at the conceptual level, the terms are distributed by the author in a certain order – built subject-matter classification, which results in combining terms into thematic groups. Thematic classification involves a clear, logically sound organization of terminological vocabulary. The classification of the terminology of American higher education is based on determining its content, establishing the scope of semantics of each term, its concept that is combined with other terminological units in a single terminology system of the U. S. higher education.


Author(s):  
Valentyna Nagnybida ◽  
Olga Vashchylo

Abstract. The article is devoted to the analysis of syntactic compression means and methods of their translation; study of frequency of syntactic compression means use and frequency of their translation methods application on the material of mechanical engineering texts. The research material is syntactic units of language selected from the mechanical engineering texts, demonstrating the use of syntactic compression, and their translation into Ukrainian. Examples were selected from the manuals on mechanical engineering, operating instructions for various devices, bilingual sites of international engineering companies such as SKF and DENSO. The results of our study showed that the most productive means of syntactic compression in the mechanical engineering texts are the infinitive (27%), gerund (18%), and adjective (16%). Less productive means of syntactic compression are ellipsis (11%), replacement (10%), syntactic reduction (9%) and nominalization (9%). A study of ways to translate English compressed structures in the mechanical engineering texts, depending on the syntactic compression means used, showed that most often the infinitive is translated by a complex sentence of the goal (65.4%), and least often – a complex sentence of cause and effect); the most frequent way of translating an adjective is a complex sentence with attribute (72.8%), the least frequent – unfolding (1.2%); gerund in these texts is most often translated by replacing parts of speech (78.7%), least often – a complex sentence of mode of action (1.1%); the most typical way of translating an ellipse is unfolding (80%), the least frequent ways of an ellipse translating are complex sentences of purpose (1.7%), time (1.7%) and admission (1.7%); nominalization is most often translated by a complex sentence of the goal (40.7%), least often – a complex sentence with attribute (1.9%); the most frequent way of translating syntactic reduction in the texts of mechanical engineering is unfolding (89.4%), the least frequent – explanatory translation (10.6%); replacement is most often translated by unfolding (75.5%), and the least often – explanatory translation (24.5%).   Keywords: speech compression; syntactic compression; syntactic compression means; compressed structures; translation methods; mechanical engineering texts.


Sign in / Sign up

Export Citation Format

Share Document