scholarly journals Towards a Stylometric Authorship Recognition Model for the Social Media Texts in Arabic

2021 ◽  
Author(s):  
Haroon Nasser Alsager

Numerous studies have been concerned with developing new authorship recognition systems to address the increasing rates of cybercrimes associated with the anonymous nature of social media platforms, which still offer the opportunity for the users not to reveal their true identities. Nevertheless, it is still challenging to identify the real authors of social media’s offensive and inappropriate content. These contents are usually very short; therefore, it is challenging for stylometric authorship systems to assign controversial texts to their real authors based on the salient and distinctive linguistic features and patterns within these contents. This research introduces a new stylometric authorship system that considers both the shortness of data and the peculiar linguistic properties of Arabic. A corpus of 20, 357 tweets from 134 Twitter users. A document clustering based on Document Index Graph (DIG) model was used to classify input patterns in the tweets that shared common linguistic features. A comparative analysis using Vector Space Clustering (VSC) model based on the Bag of Words (BOW) model, conventionally used in authorship recognition applications, was used. Results indicate that the proposed system is more accurate than other standard authorship systems mainly based on vector space clustering methods. It was also clear that the model had the advantage of providing complete information about the documents and the degree of overlap between every pair of documents, which was useful in determining the similarity between documents.

2020 ◽  
Vol 11 (4) ◽  
pp. 490-507
Author(s):  
Haroon Nasser Alsager

Numerous studies have been concerned with developing new authorship recognition systems to address the increasing rates of cybercrimes associated with the anonymous nature of social media platforms, which still offer the opportunity for the users not to reveal their true identities. Nevertheless, it is still challenging to identify the real authors of social media’s offensive and inappropriate content. These contents are usually very short; therefore, it is challenging for stylometric authorship systems to assign controversial texts to their real authors based on the salient and distinctive linguistic features and patterns within these contents. This research introduces a new stylometric authorship system that considers both the shortness of data and the peculiar linguistic properties of Arabic. A corpus of 20, 357 tweets from 134 Twitter users. A document clustering based on Document Index Graph (DIG) model was used to classify input patterns in the tweets that shared common linguistic features. A comparative analysis using Vector Space Clustering (VSC) model based on the Bag of Words (BOW) model, conventionally used in authorship recognition applications, was used. Results indicate that the proposed system is more accurate than other standard authorship systems mainly based on vector space clustering methods. It was also clear that the model had the advantage of providing complete information about the documents and the degree of overlap between every pair of documents, which was useful in determining the similarity between documents.


2019 ◽  
Vol 2 (2) ◽  
pp. e20-e29 ◽  
Author(s):  
Kalyan Gudaru ◽  
Leonardo Tortolero Blanco ◽  
Daniele Castellani ◽  
Hegel Trujillo Santamaria ◽  
Marcela Pelayo-Nieto ◽  
...  

Background and Objectives There is an increasing use of social media amongst the urological community. However, it is difficult to identify urological data on various social media platforms in an efficient manner. We proposed a hashtag, #UroSoMe, to be used when posting urology-related content in the social media platforms. The objectives of this article are to describe how #UroSoMe was developed, and to report the data of the first month of #UroSoMe.   Material and Methods The hashtag, #UroSoMe, was introduced to the urological community. The #UroSoMe working group was formed, and the members actively invited and encouraged people to use the hashtag #UroSoMe when posting urology-related contents. After the #UroSoMe (@so_uro) platform on twitter had grown to more than 300 users, the first live event of online case discussion, i.e. #LiveCaseDiscussions, was conducted. A prospective observational study of the hashtag #UroSoMe Twitter activity during the first month of its usage from 14 December 2018 to 13 January 2019 was evaluated. Outcome measures included number of users, number of tweets, user location, top tweeters, top hashtags used and interactions. Analysis was performed using NodeXL (Social Media Research Foundation; California, USA; https://www.smrfoundation.org/nodexl/), Symplur (https:// www.symplur.com) and Twitonomy (https://www.twitonomy.com).   Results The first month of #UroSoMe activity documented 1373 tweets/retweets by 1008 tweeters with 17698 mentions and 1003 replies. The #LiveCaseDiscussions was able to achieve a potential reach of 2,033,352 Twitter users. The top tweets mainly included cases presented by #UroSoMe working group members during #LiveCaseDiscussions. The twitonomy map showed participation from 214 geographical locations. The major groups of participants using the hashtag #UroSoMe were ‘Researcher/Academic’ and ‘Doctor’. The twitter account of #UroSoMe (@so_uro) has now grown to more than 1000 followers.   Conclusions Social media is an excellent platform for interaction amongst the urological community. The results demonstrated that #UroSoMe was able to achieve wide spread engagement from all over the world.


Author(s):  
Hyejin Park ◽  
J. Patrick Biddix ◽  
Han Woo Park

Social media platforms provide valuable insights into public conversations. They likewise aid in understanding current issues and events. Twitter has become an important virtual venue where global users hold conversations, share information, and exchange news and research. This study investigates social network structures among Twitter users with regard to the Covid-19 outbreak at its onset and its spread. The data were derived from two Twitter datasets by using a search query, “coronavirus,” on February 28th, 2020, when the coronavirus outbreak was at a relatively early stage. The first dataset is a collection of tweets used in investigating social network structures and for visualization. The second dataset comprises tweets that have citations of scientific research publications regarding coronavirus. The collected data were analyzed to examine numerical indicators of the social network structures, subgroups, influencers, and features regarding research citations. This was also essential to measure the statistical relationships among social elements and research citations. The findings revealed that individuals tend to have conversations with specific people in clusters regarding daily issues on coronavirus without prominent or central voice tweeters. Tweets related to coronavirus were often associated with entertainment, politics, North Korea, and business. During their conversations, the users also responded to and mentioned the U.S. president, the World Health Organization (WHO), celebrities, and news channels. Meanwhile, people shared research articles about the outbreak, including its spread, symptoms related to the disease, and prevention strategies. These findings provide insight into the information sharing behaviors at the onset of the outbreak.


2019 ◽  
Author(s):  
Angela Leis ◽  
Francesco Ronzano ◽  
Miguel A. Mayer ◽  
Laura I. Furlong ◽  
Ferran Sanz

BACKGROUND Mental disorders have become a major concern in public health and are one of the main causes of the overall disease burden worldwide. Social media platforms allow us to observe the activities, thoughts and feelings of people’s daily lives, including those of patients suffering from mental disorders. There are studies that have analyzed the influence of mental disorders, including depression, in the behavior of social media users, but they have been usually focused on messages written in English. OBJECTIVE The aim of this study is to identify the linguistic features of tweets in Spanish and the behavioral patterns of Twitter users that generate them, which could suggest signs of depression. METHODS This study was developed in two steps. In the first step, the selection of users and the compilation of tweets were performed. Three datasets of tweets were created, a depressive users dataset (made up of the timeline of 90 users who explicitly mention that they suffer from depression), a depressive tweets dataset (a manually curated selection of tweets from the previous users that include expressions indicative of depression) and a control dataset (made up of the timeline of 450 randomly selected users). In the second step, the comparison and analysis of the three datasets of tweets were carried out. RESULTS In comparison to the control dataset, the depressive users are less active in posting tweets, doing it more frequently between 23:00 and 6:00 (P<.001). The percentage of nouns used by the control dataset almost doubles that of the depressive users (P<.001). By contrast, the use of verbs is more common in the depressive users dataset (P<.001). The first-person singular pronoun was by far the most used in the depressive users dataset (80%) and the first and the second person plural were the less frequent (0.4% in both cases), being this distribution different to that of the control dataset (P<.001). Sadness and anger emotions were the most common in the depressive users and depressive tweets datasets with significant differences when comparing these datasets with the control one (P<.001). As for negation words, they were detected in the 34% and 46% of the tweets in the depressive users and depressive tweets respectively, which are significantly different to the control dataset (P<.001). Negative polarity was more frequent in the depressive users (54%) and depressive tweets (65%) datasets than in the control one (43.5%) (P<.001). CONCLUSIONS Twitter users who are potentially suffering from depression modify the general characteristics of their language and the way they interact on social media. Based on these changes these users can be monitored and supported, thus introducing new opportunities for the study of depression and for providing additional healthcare services to people with this disorder.


2021 ◽  
Vol 11 (12) ◽  
pp. 5489
Author(s):  
Ibrahim Riza Hallac ◽  
Betul Ay ◽  
Galip Aydin

Gathering useful insights from social media data has gained great interest over the recent years. User representation can be a key task in mining publicly available user-generated rich content offered by the social media platforms. The way to automatically create meaningful observations about users of a social network is to obtain real-valued vectors for the users with user embedding representation learning models. In this study, we presented one of the most comprehensive studies in the literature in terms of learning high-quality social media user representations by leveraging state-of-the-art text representation approaches. We proposed a novel doc2vec-based representation method, which can encode both textual and non-textual information of a social media user into a low dimensional vector. In addition, various experiments were performed for investigating the performance of text representation techniques and concepts including word2vec, doc2vec, Glove, NumberBatch, FastText, BERT, ELMO, and TF-IDF. We also shared a new social media dataset comprising data from 500 manually selected Twitter users of five predefined groups. The dataset contains different activity data such as comment, retweet, like, location, as well as the actual tweets composed by the users.


Koneksi ◽  
2020 ◽  
Vol 4 (2) ◽  
pp. 338
Author(s):  
Faiz Zulia Maharany ◽  
Ahmad Junaidi

'Nightmare' is the title of a video clip belonging to a singer and singer called Halsey, in which the video clip is explained about the figure of women who struggle against patriarchal culture which has been a barrier wall for women to get their rights, welfare and the equality needed they get. This research uses descriptive qualitative research methods. Data collection techniques are done through documentation, observation and study of literature. Then, analyzed using Charles Sanders Peirce's semiotics technique. The results of this study show the fact that signs, symbols or messages representing feminism in the video, 'Nightmare' clips are presented through scenes that present women's actions in opposing domination over men and sarcastic sentences contained in the lyrics of the song to discuss with patriarchy. Youtube as one of the social media platforms where the 'Nightmare' video clip is uploaded is very effective for mass communication and for conveying the message contained in the video clip to the viewing public.‘Nightmare’ adalah judul video klip milik musisi sekaligus penyanyi yang bernama Halsey, dimana pada Video klipnya tersebut menceritakan tentang figur perempuan-perempuan yang berusaha melawan budaya patriarki yang selama ini telah menjadi dinding penghalang bagi perempuan untuk mendapatkan hak-haknya, keadilan dan kesetaraan yang seharusnya mereka dapatkan. Penelitian ini menggunakan metode penelitian kualitatif deskriptif. Teknik pengumpulan data dilakukan melalui dokumentasi, observasi dan studi kepustakaan. Kemudian, dianalisis menggunakan teknik semiotika milik Charles Sanders Peirce. Hasil penelitian ini menunjukan bahwa terdapat tanda-tanda, simbol atau pesan yang merepresentasikan feminisme di dalam video klip ‘Nightmare’ yang dihadirkan melalui adegan-adegan yang menyajikan aksi perempuan dalam menolak dominasi atas laki-laki dan kalimat-kalimat sarkas yang terkandung dalam lirik lagunya untuk ditujukan kepada patriarki. Youtube sebagai salah satu platform media sosial dimana video klip ‘Nightmare’ diunggah sangat efektif untuk melakukan komunikasi massa dan untuk menyampaikan pesan yang terkandung di dalam video klip tersebut kepada masyarakat yang menonton.


2020 ◽  
Author(s):  
Ethan Kaji ◽  
Maggie Bushman

BACKGROUND Adolescents with depression often turn to social media to express their feelings, for support, and for educational purposes. Little is known about how Reddit, a forum-based platform, compares to Twitter, a newsfeed platform, when it comes to content surrounding depression. OBJECTIVE The purpose of this study is to identify differences between Reddit and Twitter concerning how depression is discussed and represented online. METHODS A content analysis of Reddit posts and Twitter posts, using r/depression and #depression, identified signs of depression using the DSM-IV criteria. Other youth-related topics, including School, Family, and Social Activity, and the presence of medical or promotional content were also coded for. Relative frequency of each code was then compared between platforms as well as the average DSM-IV score for each platform. RESULTS A total of 102 posts were included in this study, with 53 Reddit posts and 49 Twitter posts. Findings suggest that Reddit has more content with signs of depression with 92% than Twitter with 24%. 28.3% of Reddit posts included medical content compared to Twitter with 18.4%. 53.1% of Twitter posts had promotional content while Reddit posts didn’t contain promotional content. CONCLUSIONS Users with depression seem more willing to discuss their mental health on the subreddit r/depression than on Twitter. Twitter users also use #depression with a wider variety of topics, not all of which actually involve a case of depression.


2021 ◽  
pp. 016344372110158
Author(s):  
Opeyemi Akanbi

Moving beyond the current focus on the individual as the unit of analysis in the privacy paradox, this article examines the misalignment between privacy attitudes and online behaviors at the level of society as a collective. I draw on Facebook’s market performance to show how despite concerns about privacy, market structures drive user, advertiser and investor behaviors to continue to reward corporate owners of social media platforms. In this market-oriented analysis, I introduce the metaphor of elasticity to capture the responsiveness of demand for social media to the data (price) charged by social media companies. Overall, this article positions social media as inelastic, relative to privacy costs; highlights the role of the social collective in the privacy crises; and ultimately underscores the need for structural interventions in addressing privacy risks.


Author(s):  
Giandomenico Di Domenico ◽  
Annamaria Tuan ◽  
Marco Visentin

AbstractIn the wake of the COVID-19 pandemic, unprecedent amounts of fake news and hoax spread on social media. In particular, conspiracy theories argued on the effect of specific new technologies like 5G and misinformation tarnished the reputation of brands like Huawei. Language plays a crucial role in understanding the motivational determinants of social media users in sharing misinformation, as people extract meaning from information based on their discursive resources and their skillset. In this paper, we analyze textual and non-textual cues from a panel of 4923 tweets containing the hashtags #5G and #Huawei during the first week of May 2020, when several countries were still adopting lockdown measures, to determine whether or not a tweet is retweeted and, if so, how much it is retweeted. Overall, through traditional logistic regression and machine learning, we found different effects of the textual and non-textual cues on the retweeting of a tweet and on its ability to accumulate retweets. In particular, the presence of misinformation plays an interesting role in spreading the tweet on the network. More importantly, the relative influence of the cues suggests that Twitter users actually read a tweet but not necessarily they understand or critically evaluate it before deciding to share it on the social media platform.


2021 ◽  
pp. 146144482110594
Author(s):  
Yiyi Yin ◽  
Zhuoxiao Xie

This study discusses the shifting dynamics of fan participatory cultures on social media platforms by introducing the concept of “platformized language games.” We conceive of a fan community as a “speech community” and propose that the language and discourses of fan participatory cultures are technological practices that only make sense in use and interactions as “games” on social media platform. Based on an ethnography of communication on fan communities on Weibo, we analyze the technological-communicative acts of fan speech communities, including the platformized setting, participants, topics, norms, and key purposes. We argue that the social media logic (programmability, connectivity, popularity, and datafication) articulates with fans’ language games, thus shifting the “form of life” of celebrity fans on social media. Empirically, fan participatory cultures continue to mutate in China, as fan communities create idiosyncratic platformized language games based on the selective appropriation of the social media logics of connectivity and data-driven metrics.


Sign in / Sign up

Export Citation Format

Share Document