Fake Comments Detection with Sentiment Anatomy using Iterative Sequential Minimal Optimization Algorithm

It is significant to create electronicon stream markets,on stream communication networks, peer-to-peer functions, social media providerson stream and convenience customers. In reality, web based amenities are specially designed to overcome the risk of uncertainties & distrust inherent in the main concern of ecommerce applications & to increase the robustness of the system& resistance against fake clients & unbelievers. The aim of the Ecommerce platform is, moreover, to embrace one of the most efficient methods for understanding and evaluating user attempts to expose fraudsters. Or else, the fundamental objective of ecommerce amenities to exploit the profit & purchase rate, will be endangered & deteriorated through fake and ill-intentioned users. Individuals and organizations need to detect fake Comments. With disappointing and hidden features, it is difficult to identify counterfeit Comments simply by looking at a single Comments text. It is also why it is a difficult task to identify falsified Comments.This paper uses the sentiment anatomy (SA) tool for the identification of fake Comments to analyzeon stream film Comments. The texts and the SA system are used for a specific dataset of film Comments. We particularly compared the supervised SVM & SMO machine-learning process with the feeling classification methods of the analyzes in two different cases, without stopping phrases. Measured outcomes display that SMO process compared to the SVM process for both methodes, &it arrives at the maximum precision not only in the classification of text but also for finding duplicate analyses.

Download Full-text

Machine Learning Methods in E-mail Spam Classification

Studia Informatica ◽

10.34739/si.2019.23.04 ◽

2020 ◽

pp. 57-76

Author(s):

Piotr Świtalski ◽

Mateusz Kopówka

Keyword(s):

Machine Learning ◽

Social Media ◽

Classification Problem ◽

The Internet ◽

Classification Methods ◽

Spam Filtering ◽

Machine Learning Methods ◽

New Methods ◽

E Mail

Increasing number of unwanted e-mails has influence on users’ security in the Internet. Today spam e-mails can store potential malicious messages which e.g. can redirect user to fake sites. These messages recently appeared in social media. Filtering of this content is important due to minimize financial and branding costs. Traditional methods of spam filtering cannot be sufficient for present threats. We required new methods for constructing more dependable and robust antispam filters. Machine learning recently becomes very popular technique in classification methods. It has been successfully used in spam classification. In this paper we present some methods of machine learning for spam detecting. We would also like to introduce ways to solve the spam classification problem. We show that these methods can be useful in classification of malicious messages. We also compared developed methods and presented results in the experimental section.

Download Full-text

News Classification Using Machine Learning

International Journal on Recent and Innovation Trends in Computing and Communication ◽

10.17762/ijritcc.v9i5.5464 ◽

2021 ◽

Vol 9 (5) ◽

pp. 23-27

Author(s):

SHWETA MAHAJAN

Keyword(s):

Machine Learning ◽

Social Media ◽

Performance Improvement ◽

Vital Role ◽

Learning Approach ◽

Entertainment Education ◽

Meaningful Information ◽

Textual Data ◽

Machine Learning Approach

There are plenty of social media webpages and platforms producing the textual data. These different kind of a data needs to be analysed and processed to extract meaningful information from raw data. Classification of text plays a vital role in extraction of useful information along with summarization, text retrieval. In our work we have considered the problem of news classification using machine learning approach. Currently we have a news related dataset which having various types of data like entertainment, education, sports, politics, etc. On this data we have applying classification algorithm with some word vectorizing techniques in order to get best result. The results which we got that have been compared on different parameters like Precision, Recall, F1 Score, accuracy for performance improvement.

Download Full-text

A WEB-BASED FAST AND RELIABLE TEXT CLASSIFICATION TOOL

SOCIETY. TECHNOLOGY. SOLUTIONS. Proceedings of the International Scientific Conference ◽

10.35363/via.sts.2019.21 ◽

2019 ◽

Vol 1 ◽

pp. 24

Author(s):

Jānis Kapenieks

Keyword(s):

Machine Learning ◽

Social Media ◽

Data Storage ◽

Text Classification ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Web Based ◽

Opinion Analysis ◽

Classification Tool ◽

User Friendly

INTRODUCTION Opinion analysis in the big data analysis context has been a hot topic in science and the business world recently. Social media has become a key data source for opinions generating a large amount of data every day providing content for further analysis. In the Big data age, unstructured data classification is one of the key tools for fast and reliable content analysis. I expect significant growth in the demand for content classification services in the nearest future. There are many online text classification tools available providing limited functionality -such as automated text classification in predefined categories and sentiment analysis based on a pre-trained machine learning algorithm. The limited functionality does not provide tools such as data mining support and/or a machine learning algorithm training interface. There are a limited number of tools available providing the whole sets of tools required for text classification, i.e. this includes all the steps starting from data mining till building a machine learning algorithm and applying it to a data stream from a social network source. My goal is to create a tool able to generate a classified text stream directly from social media with a user friendly set-up interface. METHODS AND MATERIALS The text classification tool will have a core based modular structure (each module providing certain functionality) so the system can be scaled in terms of technology and functionality. The tool will be built on open source libraries and programming languages running on a Linux OS based server. The tool will be based on three key components: frontend, backend and data storage as described below: backend: Python and Nodejs programming language with machine learning and text filtering libraries: TensorFlow, and Keras, for data storage Mysql 5.7/8 will be used, frontend will be based on web technologies built using PHP and Javascript. EXPECTED RESULTS The expected result of my work is a web-based text classification tool for opinion analysis using data streams from social media. The tool will provide a user friendly interface for data collection, algorithm selection, machine learning algorithm setup and training. Multiple text classification algorithms will be available as listed below: Linear SVM Random Forest Multinomial Naive Bayes Bernoulli Naive Bayes Ridge Regressio Perceptron Passive Aggressive Classifier Deep machine learning algorithm. System users will be able to identify the most effective algorithm for their text classification task and compare them based on their accuracy. The architecture of the text classification tool will be based on a frontend interface and backend services. The frontend interface will provide all the tools the system user will be interacting with the system. This includes setting up data collection streams from multiple social networks and allocating them to pre-specified channels based on keywords. Data from each channel can be classified and assigned to a pre-defined cluster. The tool will provide a training interface for machine learning algorithms. This text classification tool is currently in active development for a client with planned testing and implementation in April 2019.

Download Full-text

Exploring the Transition to Fatherhood: Feasibility Study Using Social Media and Machine Learning (Preprint)

10.2196/preprints.12371 ◽

2018 ◽

Author(s):

Samantha J Teague ◽

Adrian BR Shatte

Keyword(s):

Machine Learning ◽

Social Media ◽

Transition To Parenthood ◽

Learning Algorithm ◽

Clinical Care ◽

Group Discussion ◽

Web Based ◽

Additional Tool ◽

Discussion Threads ◽

Rich Data

BACKGROUND Fathers’ experiences across the transition to parenthood are underreported in the literature. Social media offers the potential to capture fathers’ experiences in real time and at scale while also removing the barriers that fathers typically face in participating in research and clinical care. OBJECTIVE This study aimed to assess the feasibility of using social media data to map the discussion topics of fathers across the fatherhood transition. METHODS Discussion threads from two Web-based parenting communities, r/Daddit and r/PreDaddit from the social media platform Reddit, were collected over a 2-week period, resulting in 1980 discussion threads contributed to by 5853 unique users. An unsupervised machine learning algorithm was then implemented to group discussion threads into topics within each community and across a combined collection of all discussion threads. RESULTS Results demonstrated that men use Web-based communities to share the joys and challenges of the fatherhood experience. Minimal overlap in discussions was found between the 2 communities, indicating that distinct conversations are held on each forum. A range of social support techniques was demonstrated, with conversations characterized by encouragement, humor, and experience-based advice. CONCLUSIONS This study demonstrates that rich data on fathers’ experiences can be sourced from social media and analyzed rapidly using automated techniques, providing an additional tool for researchers exploring fatherhood.

Download Full-text

Suicide Prediction on Social Media by Implementing Sentiment Analysis along with Machine Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b3424.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 4833-4837

Keyword(s):

Machine Learning ◽

Social Media ◽

Semantic Analysis ◽

Machine Learning Techniques ◽

Suicide Prediction ◽

The People ◽

Learning Techniques ◽

Use Of Social Media ◽

Day By Day

Technology is growing day by day and the influence of them on our day-to-day life is reaching new heights in the digitized world. Most of the people are prone to the use of social media and even minute details are getting posted every second. Some even go to the extent of posting even suicide related issues. This paper addresses the issue of suicide and is predicting the suicide issues on social media and their semantic analysis. With the help of Machine Learning techniques and semantic analysis of sentiments the prediction and classification of suicide is done. The model of approach is a four-tier approach, which is very beneficial as it uses the twitter4J data by using weka tool and implementing it on WordNet. The precision and accuracy aspects are verified as the parameters for the performance efficiency of the procedure. We also give a solution for the lack of resources regarding the terminological resources by providing a phase for the generation of records of vocabulary also.

Download Full-text

On the fly classification of traffic in Anonymous Communication Networks using a Machine Learning approach

2020 IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS) ◽

10.1109/ants50601.2020.9342804 ◽

2020 ◽

Author(s):

Lalitha Chinmayee ◽

MaheshKumar Hurali ◽

Annapurna P Patil

Keyword(s):

Machine Learning ◽

Communication Networks ◽

Anonymous Communication ◽

Learning Approach ◽

Machine Learning Approach

Download Full-text

FRI0417 IDENTIFICATION OF THE MOST IMPORTANT FEATURES OF KNEE OSTEOARTHRITIS PROGRESSORS USING MACHINE LEARNING METHODS

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2020-eular.1033 ◽

2020 ◽

Vol 79 (Suppl 1) ◽

pp. 807.1-807

Author(s):

A. Jamshidi ◽

M. Leclercq ◽

A. Labbe ◽

J. P. Pelletier ◽

F. Abram ◽

...

Keyword(s):

Machine Learning ◽

Knee Osteoarthritis ◽

Prediction Models ◽

Joint Space ◽

Classification Methods ◽

Research Support ◽

Common Features ◽

Knee Oa ◽

Hospital Research

Background:Knee osteoarthritis (OA), a leading cause of disability worldwide, can be difficult to define as its development is often insidious and involves different subgroups. We still lack robust prediction models that are able to guide clinical decisions and stratify OA patients according to risk of disease progression.Objectives:This study aimed at identifying the most important features of knee OA progressors. To this end, we used machine learning (ML) algorithms on a large set of subjects and features to develop advanced prediction models that provide high classification and prediction performance.Methods:Participants, features and outcomes were from the Osteoarthritis Initiative. Features were from baseline (1107), including articular knee tissues (135) assessed by quantitative MRI. OA progressors were ascertained by four outcomes: cartilage volume loss in medial plateau at 48 and 96 months (Prop_CV_48M, 96M); Kellgren-Lawrence (KL) grade ≥2; and medial joint space narrowing (JSN) ≥1 at 48 months. Subjects’ numbers were as follows: 1598 for the outcome Prop_CV_96M, 1044 for the Prop_CV_48M, and 1468 for each KL grade ≥2 at 48 months and JSN ≥1 at 48 months. Six feature selection models were used to identify the common features in each outcome. Six classification methods were applied to measure the accuracy of the selected features in classifying the subjects into progressors and non-progressors. Classification of the best features was done using auto-ML interface and the area under the curve (AUC). To prioritize the top features, Sparse Partial Least Square (sPLS) method was used.Results:For the classification of the best common features in each outcome, Multi-Layer Perceptron (MLP) achieved the highest AUC in Prop_CV_96M, KL, and JSN (0.80, 0.88, 0.95), and Gradient Boosting Machine (GBM) for Prop_CV_48M (0.70). sPLS revealed that the baseline top five features to predict knee OA progressors are the joint space width (JSW), mean cartilage thickness of peripheral, medial, and central tibial plateau, and JSN.Conclusion:This is the first time that such a comprehensive study was performed for identifying the best features and classification methods for knee OA progressors. Data revealed that early prediction of knee OA progression can be done with high accuracy and based on only a few features. This study identifies the baseline X-ray and MRI-based features as the most important for predicting knee OA progressors. These results could be used for the development of a tool enabling prediction of knee OA progressors.Acknowledgments:This work was supported in part by the Osteoarthritis Research Unit of the University of Montreal Hospital Research Centre; the Chair in Osteoarthritis, University of Montreal, (both from Montreal, Quebec, Canada); and the Computational Biology Laboratory, Laval University Hospital Research Center, (Québec, Quebec, Canada). A Jamshidi received a bursary from the Canada First Research Excellence Fund through TransMedTech Institute, (Montreal, Quebec, Canada).Disclosure of Interests:Afshin Jamshidi: None declared, Mickaël Leclercq: None declared, Aurelie Labbe: None declared, Jean-Pierre Pelletier Shareholder of: ArthroLab Inc., Grant/research support from: TRB Chemedica, Speakers bureau: TRB Chemedica and Mylan, François Abram Employee of: ArthroLab Inc., Arnaud Droit: None declared, Johanne Martel-Pelletier Shareholder of: ArthroLab Inc., Grant/research support from: TRB Chemedica

Download Full-text

Characterizing and Identifying the Prevalence of Web-Based Misinformation Relating to Medication for Opioid Use Disorder: Machine Learning Approach

Journal of Medical Internet Research ◽

10.2196/30753 ◽

2021 ◽

Vol 23 (12) ◽

pp. e30753

Author(s):

Mai ElSherief ◽

Steven A Sumner ◽

Christopher M Jones ◽

Royal K Law ◽

Akadia Kacha-Ochana ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Large Scale ◽

Addiction Treatment ◽

Opioid Use Disorder ◽

Supervised Machine Learning ◽

Computational Techniques ◽

Opioid Use ◽

Web Based ◽

Health Communities

Background Expanding access to and use of medication for opioid use disorder (MOUD) is a key component of overdose prevention. An important barrier to the uptake of MOUD is exposure to inaccurate and potentially harmful health misinformation on social media or web-based forums where individuals commonly seek information. There is a significant need to devise computational techniques to describe the prevalence of web-based health misinformation related to MOUD to facilitate mitigation efforts. Objective By adopting a multidisciplinary, mixed methods strategy, this paper aims to present machine learning and natural language analysis approaches to identify the characteristics and prevalence of web-based misinformation related to MOUD to inform future prevention, treatment, and response efforts. Methods The team harnessed public social media posts and comments in the English language from Twitter (6,365,245 posts), YouTube (99,386 posts), Reddit (13,483,419 posts), and Drugs-Forum (5549 posts). Leveraging public health expert annotations on a sample of 2400 of these social media posts that were found to be semantically most similar to a variety of prevailing opioid use disorder–related myths based on representational learning, the team developed a supervised machine learning classifier. This classifier identified whether a post’s language promoted one of the leading myths challenging addiction treatment: that the use of agonist therapy for MOUD is simply replacing one drug with another. Platform-level prevalence was calculated thereafter by machine labeling all unannotated posts with the classifier and noting the proportion of myth-indicative posts over all posts. Results Our results demonstrate promise in identifying social media postings that center on treatment myths about opioid use disorder with an accuracy of 91% and an area under the curve of 0.9, including how these discussions vary across platforms in terms of prevalence and linguistic characteristics, with the lowest prevalence on web-based health communities such as Reddit and Drugs-Forum and the highest on Twitter. Specifically, the prevalence of the stated MOUD myth ranged from 0.4% on web-based health communities to 0.9% on Twitter. Conclusions This work provides one of the first large-scale assessments of a key MOUD-related myth across multiple social media platforms and highlights the feasibility and importance of ongoing assessment of health misinformation related to addiction treatment.

Download Full-text

Classification of Traffic Accident Information Using Machine Learning from Social Media

International Journal of Emerging Trends in Engineering Research ◽

10.30534/ijeter/2020/04832020 ◽

2020 ◽

Vol 8 (3) ◽

pp. 630-637

Author(s):

Dody Agung Saputro

Keyword(s):

Machine Learning ◽

Social Media ◽

Traffic Accident

Download Full-text

Utilizing machine learning-based approaches for the detection and classification of human papillomavirus (HPV) vaccine misinformation: Infodemiology Study of Reddit Discussions (Preprint)

10.2196/preprints.26478 ◽

2020 ◽

Author(s):

Jingcheng Du ◽

Sharice Preston ◽

Hanxiao Sun ◽

Ross Shegog ◽

Rachel Cunningham ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Social Media ◽

Human Papillomavirus ◽

Convolutional Neural Network ◽

Hpv Vaccine ◽

Support Vector ◽

Safety Issues ◽

Vaccine Promotion

BACKGROUND The rapid growth of social media as an information channel has made it possible to quickly spread inaccurate or false vaccine information and thus create obstacles for vaccine promotion. OBJECTIVE To develop and evaluate an intelligent automated protocol to identify and classify HPV vaccine misinformation on social media, using machine learning (ML)-based methods. METHODS Reddit posts (2007-2017, n=28,121) were compiled that contained human papillomavirus (HPV) vaccine related keywords. A random subset (n=2200) was manually labeled for misinformation, serving as a gold standard corpus for evaluation. Five ML-based algorithms, including support vector machines (SVM), logistics regression (LR), extremely randomized trees (ET), convolutional neural network (CNN) and recurrent neural network (RNN), designed to identify vaccine misinformation, were evaluated for identification performance. Topic modeling was applied to identify the major categories associated with HPV vaccine misinformation. RESULTS A convolutional neural network model achieved the highest AUC at 0.7943. Of 28,121 Reddit posts, 7,207 (25.63%) were classified as vaccine misinformation with discussions about general safety issues identified as the leading type misinformed posts (37%). CONCLUSIONS ML-based approaches are effective in the identification and classification of HPV vaccine misinformation from Reddit and may be generalizable to other social media platforms. ML -based methods may provide the capacity and utility to meet the challenge for intelligent automated monitoring and classification of public health misinformation in social media networks. The timely identification of vaccine misinformation online is a first step for misinformation correction and vaccine promotion. CLINICALTRIAL

Download Full-text