Metrics for Personal Profiles of Social Network Users

Criminals use online social networks for various activities by including communication, planning, and execution of criminal acts. They often employ ciphered posts using slang expressions, which are restricted to specific groups. Although literature shows advances in analysis of posts in natural language messages, such as hate discourses, threats, and more notably in the sentiment analysis; research enabling intention analysis of posts using slang expressions is still underexplored. We propose a framework and construct software prototypes for the selection of social network posts with criminal slang expressions and automatic classification of these posts according to illocutionary classes. The developed framework explores computational ontologies and machine learning (ML) techniques. Our defined Ontology of Criminal Expressions represents crime concepts in a formal and flexible model, and associates them with criminal slang expressions. This ontology is used for selecting suspicious posts and decipher them. In our solution, the criminal intention in written posts is automatically classified relying on learned models from existing posts. This work carries out a case study to evaluate the framework with 8,835,290 tweets. The obtained results show its viability by demonstrating the benefits in deciphering posts and the effectiveness of detecting user’s intention in written criminal posts based on ML.

Download Full-text

I Know Where You Are Coming From: On the Impact of Social Media Sources on AI Model Performance (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7258 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13971-13972

Author(s):

Yang Qi ◽

Farseev Aleksandr ◽

Filchenkov Andrey

Keyword(s):

Machine Learning ◽

Social Networks ◽

Social Media ◽

Social Network ◽

Model Performance ◽

User Profiling ◽

Personalized Recommendation ◽

Modal Data ◽

Social Media Networks ◽

The Impact

Nowadays, social networks play a crucial role in human everyday life and no longer purely associated with spare time spending. In fact, instant communication with friends and colleagues has become an essential component of our daily interaction giving a raise of multiple new social network types emergence. By participating in such networks, individuals generate a multitude of data points that describe their activities from different perspectives and, for example, can be further used for applications such as personalized recommendation or user profiling. However, the impact of the different social media networks on machine learning model performance has not been studied comprehensively yet. Particularly, the literature on modeling multi-modal data from multiple social networks is relatively sparse, which had inspired us to take a deeper dive into the topic in this preliminary study. Specifically, in this work, we will study the performance of different machine learning models when being learned on multi-modal data from different social networks. Our initial experimental results reveal that social network choice impacts the performance and the proper selection of data source is crucial.

Download Full-text

Algorithm for detecting attacks on Web applications based on machine learning methods and attributes queries

Journal of Science and Technology on Information security ◽

10.54654/isj.v2i14.118 ◽

2022 ◽

Vol 2 (14) ◽

pp. 26-34

Author(s):

Nguyen Manh Thang ◽

Tran Thi Luong

Keyword(s):

Machine Learning ◽

Web Application ◽

Cloud Storage ◽

Web Applications ◽

Detection Algorithm ◽

Learning Methods ◽

Google Docs ◽

Effective Work ◽

Web Resources ◽

Machine Learning Methods

Abstract—Almost developed applications tend to become as accessible as possible to the user on the Internet. Different applications often store their data in cyberspace for more effective work and entertainment, such as Google Docs, emails, cloud storage, maps, weather, news,... Attacks on Web resources most often occur at the application level, in the form of HTTP/HTTPS-requests to the site, where traditional firewalls have limited capabilities for analysis and detection attacks. To protect Web resources from attacks at the application level, there are special tools - Web Application Firewall (WAF). This article presents an anomaly detection algorithm, and how it works in the open-source web application firewall ModSecurity, which uses machine learning methods with 8 suggested features to detect attacks on web applications. Tóm tắt—Hầu hết các ứng dụng được phát triển có xu hướng trở nên dễ tiếp cận nhất có thể đối với người dùng qua Internet. Các ứng dụng khác nhau thường lưu trữ dữ liệu trên không gian mạng để làm việc và giải trí hiệu quả hơn, chẳng hạn như Google Docs, email, lưu trữ đám mây, bản đồ, thời tiết, tin tức,... Các cuộc tấn công vào tài nguyên Web thường xảy ra nhất ở tầng ứng dụng, dưới dạng các yêu cầu HTTP/HTTPS đến trang web, nơi tường lửa truyền thống có khả năng hạn chế trong việc phân tích và phát hiện các cuộc tấn công. Để bảo vệ tài nguyên Web khỏi các cuộc tấn công ở tầng ứng dụng, xuất hiện các công cụ đặc biệt - Tường lửa Ứng dụng Web (WAF). Bài viết này trình bày thuật toán phát hiện bất thường và cách thức hoạt động của tường lửa ứng dụng web mã nguồn mở ModSecurity khi sử dụng phương pháp học máy với 8 đặc trưng được đề xuất để phát hiện các cuộc tấn công vào các ứng dụng web.

Download Full-text

Retweet Prediction Based on User Behavior

10.32920/ryerson.14657001 ◽

2021 ◽

Author(s):

Syeda Nadia Firdaus

Keyword(s):

Machine Learning ◽

Social Networks ◽

Social Network ◽

Prediction Model ◽

Matrix Factorization ◽

Information Diffusion ◽

User Behavior ◽

Past Research ◽

The Difference ◽

The Impact

Social network is a hot topic of interest for researchers in the field of computer science in recent years. These social networks such as Facebook, Twitter, Instagram play an important role in information diffusion. Social network data are created by its users. Users’ online activities and behavior have been studied in various past research efforts in order to get a better understanding on how information is diffused on social networks. In this study, we focus on Twitter and we explore the impact of user behavior on their retweet activity. To represent a user’s behavior for predicting their retweet decision, we introduce 10-dimentional emotion and 35-dimensional personality related features. We consider the difference of a user being an author and a retweeter in terms of their behaviors, and propose a machine learning based retweet prediction model considering this difference. We also propose two approaches for matrix factorization retweet prediction model which learns the latent relation between users and tweets to predict the user’s retweet decision. In the experiment, we have tested our proposed models. We find that models based on user behavior related features provide good improvement (3% - 6% in terms of F1- score) over baseline models. By only considering user’s behavior as a retweeter, the data processing time is reduced while the prediction accuracy is comparable to the case when both retweeting and posting behaviors are considered. In the proposed matrix factorization models, we include tweet features into the basic factorization model through newly defined regularization terms and improve the performance by 3% - 4% in terms of F1-score. Finally, we compare the performance of machine learning and matrix factorization models for retweet prediction and find that none of the models is superior to the other in all occasions. Therefore, different models should be used depending on how prediction results will be used. Machine learning model is preferable when a model’s performance quality is important such as for tweet re-ranking and tweet recommendation. Matrix factorization is a preferred option when model’s positive retweet prediction capability is more important such as for marketing campaign and finding potential retweeters.

Download Full-text

Retweet Prediction Based on User Behavior

10.32920/ryerson.14657001.v1 ◽

2021 ◽

Author(s):

Syeda Nadia Firdaus

Keyword(s):

Machine Learning ◽

Social Networks ◽

Social Network ◽

Prediction Model ◽

Matrix Factorization ◽

Information Diffusion ◽

User Behavior ◽

Past Research ◽

The Difference ◽

The Impact

Social network is a hot topic of interest for researchers in the field of computer science in recent years. These social networks such as Facebook, Twitter, Instagram play an important role in information diffusion. Social network data are created by its users. Users’ online activities and behavior have been studied in various past research efforts in order to get a better understanding on how information is diffused on social networks. In this study, we focus on Twitter and we explore the impact of user behavior on their retweet activity. To represent a user’s behavior for predicting their retweet decision, we introduce 10-dimentional emotion and 35-dimensional personality related features. We consider the difference of a user being an author and a retweeter in terms of their behaviors, and propose a machine learning based retweet prediction model considering this difference. We also propose two approaches for matrix factorization retweet prediction model which learns the latent relation between users and tweets to predict the user’s retweet decision. In the experiment, we have tested our proposed models. We find that models based on user behavior related features provide good improvement (3% - 6% in terms of F1- score) over baseline models. By only considering user’s behavior as a retweeter, the data processing time is reduced while the prediction accuracy is comparable to the case when both retweeting and posting behaviors are considered. In the proposed matrix factorization models, we include tweet features into the basic factorization model through newly defined regularization terms and improve the performance by 3% - 4% in terms of F1-score. Finally, we compare the performance of machine learning and matrix factorization models for retweet prediction and find that none of the models is superior to the other in all occasions. Therefore, different models should be used depending on how prediction results will be used. Machine learning model is preferable when a model’s performance quality is important such as for tweet re-ranking and tweet recommendation. Matrix factorization is a preferred option when model’s positive retweet prediction capability is more important such as for marketing campaign and finding potential retweeters.

Download Full-text

Application of Machine Learning Methods in the Task of Identifying User Accounts in Two Social Networks

Computer Tools in Education ◽

10.32603/2071-2340-2019-3-29-43 ◽

2019 ◽

pp. 29-43

Author(s):

Anastasiya A. Korepanova ◽

◽

Valerii D. Oliseenko ◽

Maxim V. Abramov ◽

Alexander L. Tulupyev ◽

...

Keyword(s):

Machine Learning ◽

Social Networks ◽

Information System ◽

New Combination ◽

Practical Significance ◽

User Profiles ◽

Learning Models ◽

Machine Learning Methods ◽

The Social ◽

Machine Learning Models

The article describes the approach to solving the problem of comparing user profiles of different social networks and identifying those that belong to one person. An appropriate method is proposed based on a comparison of the social environment and the values of account profile attributes in two different social networks. The results of applying various machine learning models to solving this problem are compared. The novelty of the approach lies in the proposed new combination of various methods and application to new social networks. The practical significance of the study is to automate the process of determining the ownership of profiles in various social networks to one user. These results can be applied in the task of constructing a meta-profile of a user of an information system for the subsequent construction of a profile of his vulnerabilities, as well as in other studies devoted to social networks.

Download Full-text

Automatic Misinformation Detection About COVID-19 in Brazilian Portuguese WhatsApp Messages

10.5753/sbbd_estendido.2021.18173 ◽

2021 ◽

Author(s):

Antônio Diogo Forte Martins ◽

José Maria Monteiro ◽

Javam Machado

Keyword(s):

Machine Learning ◽

Social Networks ◽

Brazilian Portuguese ◽

Primary Sources ◽

Learning Methods ◽

Data Set ◽

Machine Learning Methods

During the coronavirus pandemic, the problem of misinformation arose once again, quite intensely, through social networks. In Brazil, one of the primary sources of misinformation is the messaging application WhatsApp. However, due to WhatsApp's private messaging nature, there still few methods of misinformation detection developed specifically for this platform. In this context, the automatic misinformation detection (MID) about COVID-19 in Brazilian Portuguese WhatsApp messages becomes a crucial challenge. In this work, we present the COVID-19.BR, a data set of WhatsApp messages about coronavirus in Brazilian Portuguese, collected from Brazilian public groups and manually labeled. Then, we are investigating different machine learning methods in order to build an efficient MID for WhatsApp messages. So far, our best result achieved an F1 score of 0.774 due to the predominance of short texts. However, when texts with less than 50 words are filtered, the F1 score rises to 0.85.

Download Full-text

Using supervised machine learning methods to improve the selection of analogue sites for studying habitability of the sub-surface ocean of Europa

10.5194/epsc2020-474 ◽

2020 ◽

Author(s):

Alvaro del Moral ◽

Victoria Pearson ◽

Mark Fox-Powell ◽

Karen Olsson-Francis

Keyword(s):

Machine Learning ◽

Supervised Machine Learning ◽

Learning Methods ◽

Surface Ocean ◽

Machine Learning Methods ◽

Selection Of

Download Full-text

Examination of ‘Interests’ and ‘Activities’ of Social Network users

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b7448.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 2688-2693

Keyword(s):

Machine Learning ◽

Social Media ◽

Social Network ◽

General Population ◽

Social Media Marketing ◽

Family Status ◽

Learning Methods ◽

Attribute Data ◽

Machine Learning Methods ◽

The Social

The present study relates to the analysis of attribute data related to users of the social network VK. The general population N = 52,614 users is the intersection of audiences from two communities for social media marketing. Based on the collected statistics on the “interests” attribute, one can compile a generalized portrait of an IT specialist and online marketer: this is a man aged about 30 years old, not married, or who defines his family status as “everything is complicated”. He speaks an average of two languages, works for an organization, or studies at a university. He has about 370 followers on VK. The result based on the data from the field 'activities' is very close to the data from the field 'interests', and gives a similar picture of the generalized portrait of a specialist. As part of the study, the authors have learned how to segment users into the users that identify themselves as „IT specialists or online marketers‟, and „other‟ users, using machine learning methods

Download Full-text

Learning Data Correction for Myoelectric Hand Based on “Survival of the Fittest”

Cyborg and Bionic Systems ◽

10.34133/2021/9875814 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Yusuke Yamanoi ◽

Shunta Togo ◽

Yinlai Jiang ◽

Hiroshi Yokoi

Keyword(s):

Machine Learning ◽

Data Storage ◽

Learning Performance ◽

Huge Amount ◽

Learning Methods ◽

Machine Learning Methods ◽

Wearable Robots ◽

Processing Data ◽

Survival Of The Fittest ◽

Learning Data

In recent years, myoelectric hands have become multi-degree-of-freedom (DOF) devices, which are controlled via machine learning methods. However, currently, learning data for myoelectric hands are gathered manually and thus tend to be of low quality. Moreover, in the case of infants, gathering accurate learning data is nearly impossible because of the difficulty of communicating with them. Therefore, a method that automatically corrects errors in the learning data is necessary. Myoelectric hands are wearable robots and thus have volumetric and weight constraints that make it infeasible to store large amounts of data or apply complex processing methods. Compared with general machine learning methods such as image processing, those for myoelectric hands have limitations on the data storage, although the amount of data to be processed is quite large. If we can use this huge amount of processing data to correct the learning data without storing the processing data, the machine learning performance is expected to improve. We then propose a method for correcting the learning data through utilisation of the signals acquired during the use of the myoelectric hand. The proposed method is inspired by “survival of the fittest.” The effectiveness of the method was verified through offline analysis. The method reduced the amount of learning data and learning time by approximately a factor of 10 while maintaining classification rates. The classification rates improved for one participant but slightly deteriorated on average among all participants. To solve this problem, verifying the method via interactive learning will be necessary in the future.

Download Full-text