Assessing the Impact of Text Preprocessing in Sentiment Analysis of Short Social Network Messages in the Russian Language

Author(s):  
Egor Araslanov ◽  
Evgeniy Komotskiy ◽  
Ebenezer Agbozo
Author(s):  
Nadezhda G. Yarushkina ◽  
◽  
Vadim S. Moshkin ◽  
Andrei A. Konstantinov ◽  
◽  
...  

The paper proposes an original algorithm for the formation of a training sample for a neural network that provides a sentiment analysis of text posts in social networks. A feature of the algorithm is the use of the extended Russian-language semantic thesaurus WordNetAffect and the expert dictionary of author’s symbols for expressing emotions. In addition, the paper describes the application of a neural network based on the LSTM architecture to determine the emotional coloring of text messages on a social network using two text vectorization algorithms “word2vec” and “BERT”. As a result of the experiments, an indicator of the accuracy of determining the emotional coloring of messages of 87% was achieved using lemmatization as a text preprocessing algorithm and the BERT algorithm when converting it into a vector.


The vocabulary of a language is a variable quantity, it is constantly changing, responding to the needs of life and reflecting its new realities. The events taking place in the South-East of Ukraine since March 2014 have significantly changed the usual picture of the world of the parties involved in this conflict, led to a new interpretation of reality, the emergence of new mental constructs, objectified in the language using a number of lexical innovations, most of which fall under the definition of „hate speech”. The purpose of this article is to try to examine the impact of the armed conflict in the South-East of Ukraine on the emergence of lexical innovations in the Russian language, to identify ways of forming new units and their main thematic clusters. The material for the work was neoplasms recorded in electronic Russian and Russian-speaking Ukrainian mass media, as well as selected from social networks and videos. The analysis showed that in the context of the armed conflict in the South-East of Ukraine, the characteristic manifestations of „hate speech” are mainly numerous new categories-labels with a pronounced conflict potential. The priority in this regard is offensive and derogatory nominations of representatives of the opposite camp, taking into account their worldview / ideological, national / ethnic, territorial / regional characteristics. The military jargon has also undergone a significant update, incorporating not only the reactualized slangisms of the era of the Afghan campaign of 1979-89, but also lexical innovations caused by the military and political realities of the current armed conflict in the Donbas. Neologisms are formed in accordance with the existing methods in the Russian language (word formation, semantic derivation, borrowing). At the same time, non-standard word-forming techniques are also used (language play, homophony, etc.).


Author(s):  
Natalia V. Kozlovskaya ◽  
◽  
Sz. Janurik ◽  

The article analyzes the contents and reflects the growth dynamics of a representative group which comprises compound neologisms with the first component stem II (ИИ) (a Russian abbreviation for “artificial intelligence”). It is the process of language integration that plays a significant role in the formation of compound nouns with the first component stem II: the currently widespread functioning of the above-mentioned pattern as well as of similar patterns results from the impact the analytism makes upon the vocabulary and grammar of the Russian language. The research based on the analysis of the data contained in the Russian National Corpus and the “Integrum” mass media database has proved that the component stem II belongs to the most productive formants in the Russian language of the 2010s. The article displays the main tendencies in the formation of lexical paradigms of the “II-compounds” in the modern Russian language. Of special significance in a quantitative sense is the hypernym-hyponym composition of nouns containing a seme “the ability to perform the functions which have traditionally been considered a human’s prerogative”: II-advokat (artificial intelligence (AI) barrister), II-dermatolog (artificial intelligence dermatologist), II-sekretar’ (artificial intelligence secretary), II-yurist (artificial intelligence lawyer). The article also mentions the process of discourse transition of scientific terms with the first component stem II into the modern newspaper and magazine publicism. On the basis of the expert sampling analysis a conclusion is drawn in the article about the heterogeneity of the formant II and the principles of its lexicographic description are outlined which are going to be represented in the publication of annual neological dictionaries “Lexical innovations in the Russian language” recommenced at the department of Modern Russian Lexicography at the Institute for Linguistic Studies of the RAS.


2020 ◽  
Vol 4 (4) ◽  
pp. 33
Author(s):  
Toni Pano ◽  
Rasha Kashef

During the COVID-19 pandemic, many research studies have been conducted to examine the impact of the outbreak on the financial sector, especially on cryptocurrencies. Social media, such as Twitter, plays a significant role as a meaningful indicator in forecasting the Bitcoin (BTC) prices. However, there is a research gap in determining the optimal preprocessing strategy in BTC tweets to develop an accurate machine learning prediction model for bitcoin prices. This paper develops different text preprocessing strategies for correlating the sentiment scores of Twitter text with Bitcoin prices during the COVID-19 pandemic. We explore the effect of different preprocessing functions, features, and time lengths of data on the correlation results. Out of 13 strategies, we discover that splitting sentences, removing Twitter-specific tags, or their combination generally improve the correlation of sentiment scores and volume polarity scores with Bitcoin prices. The prices only correlate well with sentiment scores over shorter timespans. Selecting the optimum preprocessing strategy would prompt machine learning prediction models to achieve better accuracy as compared to the actual prices.


Author(s):  
N. Basko

The article discusses the changes in communication that have occurred in the Russian speech etiquette, on the example of etiquette forms of greeting. Speech etiquette is the most important element of a communicative act. Compliance with the rules of speech etiquette largely ensures success in solving communicative problems. Based on the analysis of lexicographical sources and materials of modern Russian mass media, a shift in the use of greeting forms is noted. It is expressed in the transfer of old forms of greeting to a passive stock and the emergence and active use of new forms of greeting The author concludes that the dynamics of changes in speech forms of greeting reflects the general trends in the development of the Russian language at the present stage, such as a) the active neologization; b) the influence of the English language; c) the impact of computer technology on the language.


Rusin ◽  
2020 ◽  
pp. 144-158
Author(s):  
Z.I. Rezanova ◽  
◽  
A.V. Dybo ◽  

The problem of the speech norm and deviations from it in various forms of communication will never lose its relevance. Firstly, linguistic rationing is included in the sphere of state regulation aimed at stabilizing it, and, secondly, variation in the norm is an inevitable consequence of the impact of a dynamically developing speech pattern. The article discusses the problem of the multiple of factors that cause deviations from the speech norm (standard) and their mutual influences. The attention primarily focuses on the interaction of interlingual interference and internal trends in the development of language in its various forms. The problem is considered on the example of deviations from the speech standard in case forms in the regional version of the Russian language – in the spoken language of Shor-Russian bilinguals. The variability of the case forms is caused by the wide spread of bilingualism in the region and the substrate influence of the mother languages on the speech practices of bilinguals speaking Russian. The analysis model presented in the article can be applied to other linguistic contacts and intra-lingual interactions. The source of the material is the Shor subcorpus of the Corpus of Russian oral speech of Turkic-Russian bilinguals (RuTuBic). The corpus includes morphological annotation, common for linguistically marked corpuses, as well as marking of deviations from the speech standard. The model was tested on the material of interview records of 11 respondents (15 hours). The main sources of deviations from the speech standard are the interference of the Shor language, the dialectal influence of Siberian Russian dialects, structures of colloquial speech, active tendencies of Russian grammar, and discursive and genre features of speech.


2021 ◽  
Vol 98 ◽  
pp. 01016
Author(s):  
Yuliia Vyatleva ◽  
Natalia Grigorenko ◽  
Yuliia Pokrovskaya ◽  
Natalia Bal

The Russian language, as the state language, is given an honorable and important role, uniting all nationalities together. For its mission to serve the unity, solidarity, and mutual understanding of all the peoples of Russia, a mandatory national educational program has been approved, operating throughout the territory of the Russian Federation. The present article is dealing with the problem of mastering the Russian language. The goal of the research is to study the etiology and specifics of writing disorders in primary school children studying at general education school and to develop differentiated strategies for teaching pupils with various manifestations of dysgraphia. Methods used during the preparation of the article included theoretical research, such as learning, generalization, analysis, synthesis, axiomatics, as well as empirical techniques, namely, observation and comparison. Results and novelty of the research consist of clarifying information about the state of the problem of writing disorders in contemporary schoolchildren, updating scientific ideas about the contingent of primary school children in need of correctional assistance from specialists; applying an interdisciplinary approach to the study of the etiology, mechanisms, causes, and specifics of various manifestations of dysgraphia in primary general school children; supplementing the scientific data on the impact of didactogenia on the quality of learning writing and the formation of dysgraphia in pupils with difficulties in the assimilation of the program learning material on the Russian language; as well developing high-performance speech technologies for the early detection and elimination of violations of written language and difficulties in learning academic courses of the Russian language.


Literary Fact ◽  
2020 ◽  
pp. 322-336
Author(s):  
Elena Takho-Godi

The article compares for the first time the philosophical and aesthetic views on Russian literature and language of two prominent representatives of Russian abroad — the critic Yu.I. Aykhenvald (1872–1928) and the medievalist, interpreter of Russian classics P.M. Bitsilli (1879–1953). A full overview of factual materials identified to date is given, confirming the mutual interest of Yu.I. Aykhenvald and P.M. Bitsilli: documents from P.M. Bitsilli collection at the Institute of Russian literature (Pushkin House) of the RAS, Yu.I.Aykhenvald’s review from the Berlin newspaper “Rul'”on P.M. Bitsilli’s “Studies on Russian Poetry”, obituary of Yu.I. Aykhenvald, which was published by P.M. Bitsilli in the Sofia newspaper “Golos”. Among the issues raised are the impact which Aykhenvald’s immanent method of analyzing a literary text had on P.M. Bitsilli’s aesthetically individualizing method, the approaches of both authors to solving the morphology of Russian culture and philosophy of the nation, the connection of the Pushkin theme with thoughts about the fate of post-revolutionary Russia and Russian language in their works, which they articulated during the discussion of S. and A. Volkonsky’s book “In defense of the Russian language”.


2019 ◽  
Vol 17 (4) ◽  
pp. 475-486
Author(s):  
Tomas Hmira

Teaching Russian at the universities in Slovakia has its own history. For decades, the Russian language has occupied a worthy place in the country’s universities, including Ruzomberok. A new educational reform (since September, 2019) implies freedom of choice when learning foreign languages. Consequently, interest in studying the Russian language in Slovakia is forecasted, which indicates the relevance of our chosen topic. The purpose of the article is to analyze the system of teaching the Russian language at Catholic University in Slovakia from the perspective of a young teacher with little pedagogical experience. The theoretical significance of the article lies in the pointing out the problem of forming and developing interest to the Russian language in Ruzomberok through the culturological approach outlined by the famous Slovak specialist in Russian philology Eva Kollarova. The practical significance of the work consists in using educational disciplines being worked out, their analysis and further adjustment aimed at developing the students’ personality, their individuality through the Russian language. As a result of the study, it was proved that the curriculum based on the culturological approach to teaching a foreign language helps to build trust between the teacher and students. Also, university students are motivated, and in the university environment, they do not only receive facts through language, but are also brought up. We see the prospects of the study in a more detailed study of the impact of teaching the Russian language according to the principle of “language through culture, culture through language.” Due to the growing interest to the Russian language in Slovakia and a free choice in learning a foreign language, the willingness of teachers and students for the dialogue of cultures is becoming very important.


Sign in / Sign up

Export Citation Format

Share Document