scholarly journals SUPPORT OF INFORMAL CARERS FOR PEOPLE AFTER A STROKE WITH CROWDSOURCING AND NATURAL LANGUAGE PROCESSING

2021 ◽  
Vol 21 (3) ◽  
pp. 3-10
Author(s):  
Petr ŠALOUN ◽  
◽  
Barbora CIGÁNKOVÁ ◽  
David ANDREŠIČ ◽  
Lenka KRHUTOVÁ ◽  
...  

For a long time, both professionals and the lay public showed little interest in informal carers. Yet these people deals with multiple and common issues in their everyday lives. As the population is aging we can observe a change of this attitude. And thanks to the advances in computer science, we can offer them some effective assistance and support by providing necessary information and connecting them with both professional and lay public community. In this work we describe a project called “Research and development of support networks and information systems for informal carers for persons after stroke” producing an information system visible to public as a web portal. It does not provide just simple a set of information but using means of artificial intelligence, text document classification and crowdsourcing further improving its accuracy, it also provides means of effective visualization and navigation over the content made by most by the community itself and personalized on a level of informal carer’s phase of the care-taking timeline. In can be beneficial for informal carers as it allows to find a content specific to their current situation. This work describes our approach to classification of text documents and its improvement through crowdsourcing. Its goal is to test text documents classifier based on documents similarity measured by N-grams method and to design evaluation and crowdsourcing-based classification improvement mechanism. Interface for crowdsourcing was created using CMS WordPress. In addition to data collection, the purpose of interface is to evaluate classification accuracy, which leads to extension of classifier test data set, thus the classification is more successful.

2020 ◽  
Vol 38 (02) ◽  
Author(s):  
TẠ DUY CÔNG CHIẾN

Ontologies apply to many applications in recent years, such as information retrieval, information extraction, and text document classification. The purpose of domain-specific ontology is to enrich the identification of concept and the interrelationships. In our research, we use ontology to specify a set of generic subjects (concept) that characterizes the domain as well as their definitions and interrelationships. This paper introduces a system for labeling subjects of a text documents based on the differential layers of domain specific ontology, which contains the information and the vocabularies related to the computer domain. A document can contain several subjects such as data science, database, and machine learning. The subjects in text document classification are determined based on the differential layers of the domain specific ontology. We combine the methodologies of Natural Language Processing with domain ontology to determine the subjects in text document. In order to increase performance, we use graph database to store and access ontology. Besides, the paper focuses on evaluating our proposed algorithm with some other methods. Experimental results show that our proposed algorithm yields performance significantly


2021 ◽  
Vol 297 ◽  
pp. 01071
Author(s):  
Sifi Fatima-Zahrae ◽  
Sabbar Wafae ◽  
El Mzabi Amal

Sentiment classification is one of the hottest research areas among the Natural Language Processing (NLP) topics. While it aims to detect sentiment polarity and classification of the given opinion, requires a large number of aspect extractions. However, extracting aspect takes human effort and long time. To reduce this, Latent Dirichlet Allocation (LDA) method have come out recently to deal with this issue.In this paper, an efficient preprocessing method for sentiment classification is presented and will be used for analyzing user’s comments on Twitter social network. For this purpose, different text preprocessing techniques have been used on the dataset to achieve an acceptable standard text. Latent Dirichlet Allocation has been applied on the obtained data after this fast and accurate preprocessing phase. The implementation of different sentiment analysis methods and the results of these implementations have been compared and evaluated. The experimental results show that the combined uses of the preprocessing method of this paper and Latent Dirichlet Allocation have an acceptable results compared to other basic methods.


Author(s):  
Irfan Ali Kandhro ◽  
Sahar Zafar Jumani ◽  
Kamlash Kumar ◽  
Abdul Hafeez ◽  
Fayyaz Ali

This paper presents the automated tool for the classification of text with respect to predefined categories. It has always been considered as a vital method to manage and process a huge number of documents in digital forms which are widespread and continuously increasing. Most of the research work in text classification has been done in Urdu, English and other languages. But limited research work has been carried out on roman data. Technically, the process of the text classification follows two steps: the first step consists of choosing the main features from all the available features of the text documents with the usage of feature extraction techniques. The second step applies classification algorithms on those chosen features. The data set is collected through scraping tools from the most popular news websites Awaji Awaze and Daily Jhoongar. Furthermore, the data set splits in training and testing 70%, 30%, respectively. In this paper, the deep learning models, such as RNN, LSTM, and CNN, are used for classification of roman Urdu headline news. The testing accuracy of RNN (81%), LSTM (82%), and CNN (79%), and the experimental results demonstrate that the performance of the LSTM method is state-of-art method compared to CNN and RNN.


Author(s):  
Adam Piotr Idczak

It is estimated that approximately 80% of all data gathered by companies are text documents. This article is devoted to one of the most common problems in text mining, i. e. text classification in sentiment analysis, which focuses on determining document’s sentiment. Lack of defined structure of the text makes this problem more challenging. This has led to development of various techniques used in determining document’s sentiment. In this paper the comparative analysis of two methods in sentiment classification: naive Bayes classifier and logistic regression was conducted. Analysed texts are written in Polish language and come from banks. Classification was conducted by means of bag-of-n-grams approach where text document is presented as set of terms and each term consists of n words. The results show that logistic regression performed better.


2020 ◽  
Vol 10 (11) ◽  
pp. 4009
Author(s):  
Asmaa M. Aubaid ◽  
Alok Mishra

With the growth of online information and sudden expansion in the number of electronic documents provided on websites and in electronic libraries, there is difficulty in categorizing text documents. Therefore, a rule-based approach is a solution to this problem; the purpose of this study is to classify documents by using a rule-based. This paper deals with the rule-based approach with the embedding technique for a document to vector (doc2vec) files. An experiment was performed on two data sets Reuters-21578 and the 20 Newsgroups to classify the top ten categories of these data sets by using a document to vector rule-based (D2vecRule). Finally, this method provided us a good classification result according to the F-measures and implementation time metrics. In conclusion, it was observed that our algorithm document to vector rule-based (D2vecRule) was good when compared with other algorithms such as JRip, One R, and ZeroR applied to the same Reuters-21578 dataset.


Author(s):  
Fika Hastarita Rachman ◽  
Riyanarto Sarno ◽  
Chastine Fatichah

Music has lyrics and audio. That’s components can be a feature for music emotion classification. Lyric features were extracted from text data and audio features were extracted from audio signal data.In the classification of emotions, emotion corpus is required for lyrical feature extraction. Corpus Based Emotion (CBE) succeed to increase the value of F-Measure for emotion classification on text documents. The music document has an unstructured format compared with the article text document. So it requires good preprocessing and conversion process before classification process. We used MIREX Dataset for this research. Psycholinguistic and stylistic features were used as lyrics features. Psycholinguistic feature was a feature that related to the category of emotion. In this research, CBE used to support the extraction process of psycholinguistic feature. Stylistic features related with usage of unique words in the lyrics, e.g. ‘ooh’, ‘ah’, ‘yeah’, etc. Energy, temporal and spectrum features were extracted for audio features.The best test result for music emotion classification was the application of Random Forest methods for lyrics and audio features. The value of F-measure was 56.8%.


2018 ◽  
Vol 6 (1) ◽  
pp. 1-10 ◽  
Author(s):  
Mohamed K. Elhadad ◽  
Khaled M. Badran ◽  
Gouda I. Salama

The task of extracting the used feature vector in mining tasks (classification, clustering …etc.) is considered the most important task for enhancing the text processing capabilities. This paper proposes a novel approach to be used in building the feature vector used in web text document classification process; adding semantics in the generated feature vector. This approach is based on utilizing the benefit of the hierarchal structure of the WordNet ontology, to eliminate meaningless words from the generated feature vector that has no semantic relation with any of WordNet lexical categories; this leads to the reduction of the feature vector size without losing information on the text, also enriching the feature vector by concatenating each word with its corresponding WordNet lexical category. For mining tasks, the Vector Space Model (VSM) is used to represent text documents and the Term Frequency Inverse Document Frequency (TFIDF) is used as a term weighting technique. The proposed ontology based approach was evaluated against the Principal component analysis (PCA) approach, and against an ontology based reduction technique without the process of adding semantics to the generated feature vector using several experiments with five different classifiers (SVM, JRIP, J48, Naive-Bayes, and kNN). The experimental results reveal the effectiveness of the authors' proposed approach against other traditional approaches to achieve a better classification accuracy F-measure, precision, and recall.


2017 ◽  
Vol 5 (4) ◽  
pp. 44-58 ◽  
Author(s):  
Mohamed K. Elhadad ◽  
Khaled M. Badran ◽  
Gouda I. Salama

Dimensionality reduction of feature vector size plays a vital role in enhancing the text processing capabilities; it aims in reducing the size of the feature vector used in the mining tasks (classification, clustering, etc.). This paper proposes an efficient approach to be used in reducing the size of the feature vector for web text document classification process. This approach is based on using WordNet ontology, utilizing the benefit of its hierarchal structure, to eliminate words from the generated feature vector that has no relation with any of WordNet lexical categories; this leads to the reduction of the feature vector size without losing information on the text. For mining tasks, the Vector Space Model (VSM) is used to represent text documents and the Term Frequency Inverse Document Frequency (TFIDF) is used as a term weighting method. The proposed ontology based approach was evaluated against the Principal component analysis (PCA) approach using several experiments. The experimental results reveal the effectiveness of the authors' proposed approach against other traditional approaches to achieve a better classification accuracy F-measure, precision, and recall.


2020 ◽  
Vol 39 (2) ◽  
pp. 528-535
Author(s):  
I.I. Ayogu

The recent increase in the emergence of Nigerian language text online motivates this paper in which the problem of classifying text documents written in Yorùbá language into one of a few pre-designated classes is considered. Text document classification/categorization research is well established for English language and many other languages; this is not so for Nigerian languages. This paper evaluated the performance of a multinomial Naive Bayes model learned on a research dataset consisting of 100 samples of text each from business, sporting, entertainment, technology and political domains, separately on unigram, bigram and trigram features obtained using the bag of words representation approach. Results show that the performance of the model over unigram and bigram features is comparable but significantly better than a model learned on trigram features. The results generally indicate a possibility for the practical application of NB algorithm to the classification of text documents written in Yorùbá language. Keywords: Supervised learning, text classification, Yorùbá language, text mining, BoW Representation


Sign in / Sign up

Export Citation Format

Share Document