Utilizing Blockchain Technology in Social Media Bot Identification

As bots become more prevalent and smarter in the modern age of the internet, it becomes ever more important that they be identified and removed. Recent research has dictated that machine learning methods are accurate and the gold standard of bot identification on social media. Unfortunately, machine learning models do not come without their negative aspects such as lengthy training times, difficult feature selection, and overwhelming pre-processing tasks. To overcome these difficulties, we are proposing a blockchain framework for bot identification. At the current time, it is unknown how this method will perform, but it serves to prove the existence of an overwhelming gap of research under this area.

Download Full-text

A Web Interface for Analyzing Hate Speech

Future Internet ◽

10.3390/fi13030080 ◽

2021 ◽

Vol 13 (3) ◽

pp. 80

Author(s):

Lazaros Vrysis ◽

Nikolaos Vryzas ◽

Rigas Kotsakis ◽

Theodora Saridou ◽

Maria Matsiola ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Graphical User Interface ◽

Hate Speech ◽

Web Interface ◽

Learning Models ◽

Speech Detection ◽

Media Services ◽

The Web ◽

Machine Learning Models

Social media services make it possible for an increasing number of people to express their opinion publicly. In this context, large amounts of hateful comments are published daily. The PHARM project aims at monitoring and modeling hate speech against refugees and migrants in Greece, Italy, and Spain. In this direction, a web interface for the creation and the query of a multi-source database containing hate speech-related content is implemented and evaluated. The selected sources include Twitter, YouTube, and Facebook comments and posts, as well as comments and articles from a selected list of websites. The interface allows users to search in the existing database, scrape social media using keywords, annotate records through a dedicated platform and contribute new content to the database. Furthermore, the functionality for hate speech detection and sentiment analysis of texts is provided, making use of novel methods and machine learning models. The interface can be accessed online with a graphical user interface compatible with modern internet browsers. For the evaluation of the interface, a multifactor questionnaire was formulated, targeting to record the users’ opinions about the web interface and the corresponding functionality.

Download Full-text

Triage and diagnosis of COVID-19 from medical social media (Preprint)

10.2196/preprints.30397 ◽

2021 ◽

Author(s):

Abul Hasan ◽

Mark Levene ◽

David Weston ◽

Renate Fromson ◽

Nicolas Koslover ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Learning Models ◽

Rule Based ◽

Additional Information ◽

Processing Pipeline ◽

Machine Learning Models

BACKGROUND The COVID-19 pandemic has created a pressing need for integrating information from disparate sources, in order to assist decision makers. Social media is important in this respect, however, to make sense of the textual information it provides and be able to automate the processing of large amounts of data, natural language processing methods are needed. Social media posts are often noisy, yet they may provide valuable insights regarding the severity and prevalence of the disease in the population. In particular, machine learning techniques for triage and diagnosis could allow for a better understanding of what social media may offer in this respect. OBJECTIVE This study aims to develop an end-to-end natural language processing pipeline for triage and diagnosis of COVID-19 from patient-authored social media posts, in order to provide researchers and other interested parties with additional information on the symptoms, severity and prevalence of the disease. METHODS The text processing pipeline first extracts COVID-19 symptoms and related concepts such as severity, duration, negations, and body parts from patients’ posts using conditional random fields. An unsupervised rule-based algorithm is then applied to establish relations between concepts in the next step of the pipeline. The extracted concepts and relations are subsequently used to construct two different vector representations of each post. These vectors are applied separately to build support vector machine learning models to triage patients into three categories and diagnose them for COVID-19. RESULTS We report that Macro- and Micro-averaged F_{1\ }scores in the range of 71-96% and 61-87%, respectively, for the triage and diagnosis of COVID-19, when the models are trained on human labelled data. Our experimental results indicate that similar performance can be achieved when the models are trained using predicted labels from concept extraction and rule-based classifiers, thus yielding end-to-end machine learning. Also, we highlight important features uncovered by our diagnostic machine learning models and compare them with the most frequent symptoms revealed in another COVID-19 dataset. In particular, we found that the most important features are not always the most frequent ones. CONCLUSIONS Our preliminary results show that it is possible to automatically triage and diagnose patients for COVID-19 from natural language narratives using a machine learning pipeline, in order to provide additional information on the severity and prevalence of the disease through the eyes of social media.

Download Full-text

Application of Machine Learning Methods in the Task of Identifying User Accounts in Two Social Networks

Computer Tools in Education ◽

10.32603/2071-2340-2019-3-29-43 ◽

2019 ◽

pp. 29-43

Author(s):

Anastasiya A. Korepanova ◽

◽

Valerii D. Oliseenko ◽

Maxim V. Abramov ◽

Alexander L. Tulupyev ◽

...

Keyword(s):

Machine Learning ◽

Social Networks ◽

Information System ◽

New Combination ◽

Practical Significance ◽

User Profiles ◽

Learning Models ◽

Machine Learning Methods ◽

The Social ◽

Machine Learning Models

The article describes the approach to solving the problem of comparing user profiles of different social networks and identifying those that belong to one person. An appropriate method is proposed based on a comparison of the social environment and the values of account profile attributes in two different social networks. The results of applying various machine learning models to solving this problem are compared. The novelty of the approach lies in the proposed new combination of various methods and application to new social networks. The practical significance of the study is to automate the process of determining the ownership of profiles in various social networks to one user. These results can be applied in the task of constructing a meta-profile of a user of an information system for the subsequent construction of a profile of his vulnerabilities, as well as in other studies devoted to social networks.

Download Full-text

Automated Retraining of Machine Learning Models

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l3322.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 445-452

Keyword(s):

Machine Learning ◽

Input Data ◽

Research Work ◽

Learning Models ◽

Machine Learning Methods ◽

Machine Learning Model ◽

Crucial Component ◽

Conventional Machine ◽

Over Time ◽

Machine Learning Models

Data is the most crucial component of a successful ML system. Once a machine learning model is developed, it gets obsolete over time due to presence of new input data being generated every second. In order to keep our predictions accurate we need to find a way to keep our models up to date. Our research work involves finding a mechanism which can retrain the model with new data automatically. This research also involves exploring the possibilities of automating machine learning processes. We started this project by training and testing our model using conventional machine learning methods. The outcome was then compared with the outcome of those experiments conducted using the AutoML methods like TPOT. This helped us in finding an efficient technique to retrain our models. These techniques can be used in areas where people do not deal with the actual working of a ML model but only require the outputs of ML processes

Download Full-text

TOPICAL ISSUES OF APPLICATION OF MACHINE LEARNING METHODS IN ECONOMY

Инновационные аспекты развития науки и техники. Сборник статей VIII Международной научно-практической конференции: сборник статей, [электронное издание сетевого распространения] / Под ред. Н.В. Емельянова. – М.: “КДУ”, “Добросвет”, 2021. – 149 с. ◽

10.31453/kdu.ru.978-5-7913-1176-4-2021-28-33 ◽

2021 ◽

Author(s):

Natalia Pavlovna Persteneva ◽

◽

Darya Dmitrievn Skryleva ◽

Keyword(s):

Machine Learning ◽

Unsupervised Learning ◽

Supervised Learning ◽

Learning Model ◽

Learning Models ◽

Learning Methods ◽

Machine Learning Methods ◽

Machine Learning Model ◽

Popular Classes ◽

Machine Learning Models

The article discusses machine learning methods. Using the example of two popular classes: supervised learning and unsupervised learning. Variants of the main types of machine learning models for each method are presented. A generalized algorithm for building any machine learning model is formed.

Download Full-text

Telugu News Data Classification Using Machine Learning Approach

10.4018/978-1-7998-7685-4.ch014 ◽

2022 ◽

pp. 181-194

Author(s):

Bala Krishna Priya G. ◽

Jabeen Sultana ◽

Usha Rani M.

Keyword(s):

Machine Learning ◽

Social Media ◽

Research Work ◽

Learning Approach ◽

Fake News ◽

Learning Models ◽

Machine Learning Classifiers ◽

Proposed Model ◽

Machine Learning Approach ◽

Machine Learning Models

Mining Telugu news data and categorizing based on public sentiments is quite important since a lot of fake news emerged with rise of social media. Identifying whether news text is positive, negative, or neutral and later classifying the data in which areas they fall like business, editorial, entertainment, nation, and sports is included throughout this research work. This research work proposes an efficient model by adopting machine learning classifiers to perform classification on Telugu news data. The results obtained by various machine-learning models are compared, and an efficient model is found, and it is observed that the proposed model outperformed with reference to accuracy, precision, recall, and F1-score.

Download Full-text

Application of Traditional Machine Learning Models to Detect Abnormal Traffic in the Internet of Things Networks

10.1007/978-3-030-88081-1_55 ◽

2021 ◽

pp. 735-744

Author(s):

Evgeniya Istratova ◽

Mikhail Grif ◽

Dmitry Dostovalov

Keyword(s):

Machine Learning ◽

Internet Of Things ◽

The Internet ◽

Learning Models ◽

The Internet Of Things ◽

Machine Learning Models

Download Full-text

Chained Anomaly Detection Models for Federated Learning: An Intrusion Detection Case Study

Applied Sciences ◽

10.3390/app8122663 ◽

2018 ◽

Vol 8 (12) ◽

pp. 2663 ◽

Cited By ~ 11

Author(s):

Davy Preuveneers ◽

Vera Rimmer ◽

Ilias Tsingenopoulos ◽

Jan Spooren ◽

Wouter Joosen ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Intrusion Detection ◽

Anomaly Detection ◽

Training Data ◽

Learning Models ◽

Traditional System ◽

Blockchain Technology ◽

Malicious Behavior ◽

Machine Learning Models

The adoption of machine learning and deep learning is on the rise in the cybersecurity domain where these AI methods help strengthen traditional system monitoring and threat detection solutions. However, adversaries too are becoming more effective in concealing malicious behavior amongst large amounts of benign behavior data. To address the increasing time-to-detection of these stealthy attacks, interconnected and federated learning systems can improve the detection of malicious behavior by joining forces and pooling together monitoring data. The major challenge that we address in this work is that in a federated learning setup, an adversary has many more opportunities to poison one of the local machine learning models with malicious training samples, thereby influencing the outcome of the federated learning and evading detection. We present a solution where contributing parties in federated learning can be held accountable and have their model updates audited. We describe a permissioned blockchain-based federated learning method where incremental updates to an anomaly detection machine learning model are chained together on the distributed ledger. By integrating federated learning with blockchain technology, our solution supports the auditing of machine learning models without the necessity to centralize the training data. Experiments with a realistic intrusion detection use case and an autoencoder for anomaly detection illustrate that the increased complexity caused by blockchain technology has a limited performance impact on the federated learning, varying between 5 and 15%, while providing full transparency over the distributed training process of the neural network. Furthermore, our blockchain-based federated learning solution can be generalized and applied to more sophisticated neural network architectures and other use cases.

Download Full-text

Three machine learning models for the 2019 Solubility Challenge

ADMET & DMPK ◽

10.5599/admet.835 ◽

2020 ◽

Cited By ~ 1

Author(s):

John Mitchell

Keyword(s):

Machine Learning ◽

Random Forest ◽

Gold Standard ◽

Challenge Test ◽

Wisdom Of Crowds ◽

Learning Models ◽

The Third ◽

Aqueous Solubilities ◽

Machine Learning Models ◽

Better Than

We describe three machine learning models submitted to the 2019 Solubility Challenge. All are founded on tree-like classifiers, with one model being based on Random Forest and another on the related Extra Trees algorithm. The third model is a consensus predictor combining the former two with a Bagging classifier. We call this consensus classifier Vox Machinarum, and here discuss how it benefits from the Wisdom of Crowds. On the first 2019 Solubility Challenge test set of 100 low-variance intrinsic aqueous solubilities, Extra Trees is our best classifier. One the other, a high-variance set of 32 molecules, we find that Vox Machinarum and Random Forest both perform a little better than Extra Trees, and almost equally to one another. We also compare the gold standard solubilities from the 2019 Solubility Challenge with a set of literature-based solubilities for most of the same compounds.

Download Full-text