Ukrainian Text Preprocessing in GRAC

Author(s):  
Vasyl Starko ◽  
Andriy Rysin ◽  
Maria Shvedova
2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Mehdi Srifi ◽  
Ahmed Oussous ◽  
Ayoub Ait Lahcen ◽  
Salma Mouline

AbstractVarious recommender systems (RSs) have been developed over recent years, and many of them have concentrated on English content. Thus, the majority of RSs from the literature were compared on English content. However, the research investigations about RSs when using contents in other languages such as Arabic are minimal. The researchers still neglect the field of Arabic RSs. Therefore, we aim through this study to fill this research gap by leveraging the benefit of recent advances in the English RSs field. Our main goal is to investigate recent RSs in an Arabic context. For that, we firstly selected five state-of-the-art RSs devoted originally to English content, and then we empirically evaluated their performance on Arabic content. As a result of this work, we first build four publicly available large-scale Arabic datasets for recommendation purposes. Second, various text preprocessing techniques have been provided for preparing the constructed datasets. Third, our investigation derived well-argued conclusions about the usage of modern RSs in the Arabic context. The experimental results proved that these systems ensure high performance when applied to Arabic content.


2021 ◽  
Author(s):  
Zoriana Haladzhun ◽  
Nataliia Kunanets ◽  
Paraskoviya Dvorianyn ◽  
Olena Makarchuk ◽  
Nataliia Veretennikova

Author(s):  
Murugan Anandarajan ◽  
Chelsey Hill ◽  
Thomas Nolan
Keyword(s):  

Author(s):  
Victoria Vysotska ◽  
Vitor Basto Fernandes ◽  
Vasyl Lytvyn ◽  
Michael Emmerich ◽  
Mariya Hrendus
Keyword(s):  

Author(s):  
Iryna Bundza

This article discusses the peculiarities of the category of number of Polish and Ukrainian nouns. To indicate the problem areas related to the teaching of the category of number to Ukrainian-speaking persons, the author analysed Polish and Ukrainian lexemes in terms of their fulfilments of the grammatical category of number. The article presents the contexts which may trigger errors, which in turn may cause a comical effect or distort communication. The data were collected from Polish and Ukrainian dictionaries, as well as the National Corpus of Polish and the Ukrainian Text Corpus.


2020 ◽  
Vol 4 (4) ◽  
pp. 33
Author(s):  
Toni Pano ◽  
Rasha Kashef

During the COVID-19 pandemic, many research studies have been conducted to examine the impact of the outbreak on the financial sector, especially on cryptocurrencies. Social media, such as Twitter, plays a significant role as a meaningful indicator in forecasting the Bitcoin (BTC) prices. However, there is a research gap in determining the optimal preprocessing strategy in BTC tweets to develop an accurate machine learning prediction model for bitcoin prices. This paper develops different text preprocessing strategies for correlating the sentiment scores of Twitter text with Bitcoin prices during the COVID-19 pandemic. We explore the effect of different preprocessing functions, features, and time lengths of data on the correlation results. Out of 13 strategies, we discover that splitting sentences, removing Twitter-specific tags, or their combination generally improve the correlation of sentiment scores and volume polarity scores with Bitcoin prices. The prices only correlate well with sentiment scores over shorter timespans. Selecting the optimum preprocessing strategy would prompt machine learning prediction models to achieve better accuracy as compared to the actual prices.


Sign in / Sign up

Export Citation Format

Share Document