scholarly journals Linguistic approach to the classification problem based on the multiset theory

2021 ◽  
Vol 1047 (1) ◽  
pp. 012083
Author(s):  
L A Demidova ◽  
Ju S Sokolova
2022 ◽  
pp. 171-195
Author(s):  
Jale Bektaş

Conducting NLP for Turkish is a lot harder than other Latin-based languages such as English. In this study, by using text mining techniques, a pre-processing frame is conducted in which TF-IDF values are calculated in accordance with a linguistic approach on 7,731 tweets shared by 13 famous economists in Turkey, retrieved from Twitter. Then, the classification results are compared with four common machine learning methods (SVM, Naive Bayes, LR, and integration LR with SVM). The features represented by the TF-IDF are experimented in different N-grams. The findings show the success of a text classification problem is relative with the feature representation methods, and the performance superiority of SVM is better compared to other ML methods with unigram feature representation. The best results are obtained via the integration method of SVM with LR with the Acc of 82.9%. These results show that these methodologies are satisfying for the Turkish language.


2013 ◽  
Vol 6 (2) ◽  
pp. 176-191
Author(s):  
Ester Vidović

The article explores how two cultural models which were dominant in Great Britain during the Victorian era – the model based on the philosophy of ‘technologically useful bodies’ and the Christian model of empathy – were connected with the understanding of disability. Both cultural models are metaphorically constituted and based on the ‘container’ and ‘up and down’ image schemas respectively. 1 The intersubjective character of cultural models is foregrounded, in particular, in the context of conceiving of abstract concepts such as emotions and attitudes. The issue of disability is addressed from a cognitive linguistic approach to literary analysis while studying the reflections of the two cultural models on the portrayal of the main characters of Charles Dickens's A Christmas Carol. The studied cultural models appeared to be relatively stable, while their evaluative aspects proved to be subject to historical change. The article provides incentives for further study which could include research on the connectedness between, on one hand, empathy with fictional characters roused by reading Dickens's works and influenced by cultural models dominant during the Victorian period in Britain and, on the other hand, the contemporaries’ actual actions taken to ameliorate the social position of the disabled in Victorian Britain.


2015 ◽  
Vol 11 (1) ◽  
pp. 41-54 ◽  
Author(s):  
Zsófia Demjén

This paper demonstrates how a range of linguistic methods can be harnessed in pursuit of a deeper understanding of the ‘lived experience’ of psychological disorders. It argues that such methods should be applied more in medical contexts, especially in medical humanities. Key extracts from The Unabridged Journals of Sylvia Plath are examined, as a case study of the experience of depression. Combinations of qualitative and quantitative linguistic methods, and inter- and intra-textual comparisons are used to consider distinctive patterns in the use of metaphor, personal pronouns and (the semantics of) verbs, as well as other relevant aspects of language. Qualitative techniques provide in-depth insights, while quantitative corpus methods make the analyses more robust and ensure the breadth necessary to gain insights into the individual experience. Depression emerges as a highly complex and sometimes potentially contradictory experience for Plath, involving both a sense of apathy and inner turmoil. It involves a sense of a split self, trapped in a state that one cannot overcome, and intense self-focus, a turning in on oneself and a view of the world that is both more negative and more polarized than the norm. It is argued that a linguistic approach is useful beyond this specific case.


Author(s):  
Sunitha .T ◽  
Shyamala .J ◽  
Annie Jesus Suganthi Rani.A

Data mining suggest an innovative way of prognostication stereotype of Patients health risks. Large amount of Electronic Health Records (EHRs) collected over the years have provided a rich base for risk analysis and prediction. An EHR contains digitally stored healthcare information about an individual, such as observations, laboratory tests, diagnostic reports, medications, procedures, patient identifying information and allergies. A special type of EHR is the Health Examination Records (HER) from annual general health check-ups. Identifying participants at risk based on their current and past HERs is important for early warning and preventive intervention. By “risk”, we mean unwanted outcomes such as mortality and morbidity. This approach is limited due to the classification problem and consequently it is not informative about the specific disease area in which a personal is at risk. Limited amount of data extracted from the health record is not feasible for providing the accurate risk prediction. The main motive of this project is for risk prediction to classify progressively developing situation with the majority of the data unlabeled.


Vestnik MEI ◽  
2020 ◽  
Vol 5 (5) ◽  
pp. 132-139
Author(s):  
Ivan E. Kurilenko ◽  
◽  
Igor E. Nikonov ◽  

A method for solving the problem of classifying short-text messages in the form of sentences of customers uttered in talking via the telephone line of organizations is considered. To solve this problem, a classifier was developed, which is based on using a combination of two methods: a description of the subject area in the form of a hierarchy of entities and plausible reasoning based on the case-based reasoning approach, which is actively used in artificial intelligence systems. In solving various problems of artificial intelligence-based analysis of data, these methods have shown a high degree of efficiency, scalability, and independence from data structure. As part of using the case-based reasoning approach in the classifier, it is proposed to modify the TF-IDF (Term Frequency - Inverse Document Frequency) measure of assessing the text content taking into account known information about the distribution of documents by topics. The proposed modification makes it possible to improve the classification quality in comparison with classical measures, since it takes into account the information about the distribution of words not only in a separate document or topic, but in the entire database of cases. Experimental results are presented that confirm the effectiveness of the proposed metric and the developed classifier as applied to classification of customer sentences and providing them with the necessary information depending on the classification result. The developed text classification service prototype is used as part of the voice interaction module with the user in the objective of robotizing the telephone call routing system and making a shift from interaction between the user and system by means of buttons to their interaction through voice.


IdeBahasa ◽  
2020 ◽  
Vol 2 (2) ◽  
pp. 121-132
Author(s):  
Shifa Nur Zakiyah ◽  
Susi Machdalena ◽  
Tb. Ace Fachrullah

This article discussed the phonemic correspondence in Sundanese and Javanese using a historical comparative linguistic approach. The problem to be examined in this study is the form of phonemic correspondence in Sundanese and Javanese. The purpose of this study was to determine the phonemic correspondence sets in the comparison between Sundanese and Javanese. The method used in this research to analyze the data is the phonemic correspondence method. The correspondence method is used to find the relationship between languages ​​in the field of language sounds (phonology). Phonemic correspondence is used to determine regular phonemic changes in the languages ​​being compared. Data collection used interview techniques, note techniques and recording techniques. After the data is collected, then the data is classified according to the problem being studied and grouped into more specifics. After that, conclusions will be made based on the results of the data analysis. The data source obtained comes from 200 swadesh vocabularies in Sundanese and Javanese. From 200 swadesh vocabulary data found 49 data included in phonemic correspondence which is divided into 12 correspondence sets. The results of this study include the formation of correspondences in Sundanese and Javanese, namely, (ɛ ~ i) and (i ~ ɛ), (a ~ ɔ) and (ɔ ~ a), (d ~ D), (t ~ T) , (ɤ ~ ə), (b ~ w), (ɔ ~ u) and (ɔ ~ U), (i ~ I), (ø ~ h) and (h ~ ø), (ø ~ m), and (a ~ ə).


Sign in / Sign up

Export Citation Format

Share Document