Using Machine Learning to Detect Events on the Basis of Bengali and Banglish Facebook Posts

In modern times, ensuring social security has become the prime concern for security administrators. The widespread and recurrent use of social media sites is creating a huge risk for the lives of the general people, as these sites are frequently becoming potential sources of the organization of various types of immoral events. For protecting society from these dangers, a prior detection system which can effectively detect events by analyzing these social media data is essential. However, automating the process of event detection has been difficult, as existing processes must account for diverse writing styles, languages, dialects, post lengths, and et cetera. To overcome these difficulties, we developed an effective model for detecting events, which, for our purposes, were classified as either protesting, celebrating, religious, or neutral, using Bengali and Banglish Facebook posts. At first, the collected posts’ text were processed for language detection, and then, detected posts were pre-processed using stopwords removal and tokenization. Features were then extracted from these pre-processed texts using three sub-processes: filtering, phrase matching of specific events, and sentiment analysis. The collected features were ultimately used to train our Bernoulli Naive Bayes classification model, which was capable of detecting events with 90.41% accuracy (for Bengali-language posts) and 70% (for the Banglish-form posts). For evaluating the effectiveness of our proposed model more precisely, we compared it with two other classifiers: Support Vector Machine and Decision Tree.

Download Full-text

Heterogeneous Ensemble Structure based Universal Spam Profile Detection System for Social Media Networks

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.a2179.059120 ◽

2020 ◽

Vol 9 (1) ◽

pp. 1028-1039

Keyword(s):

Social Media ◽

Detection System ◽

User Profile ◽

Classification Model ◽

Support Vector ◽

Social Media Networks ◽

Online Social Media ◽

Linear Polynomial ◽

Proposed Model ◽

Social Media Network

The exponential rise in internet technology and online social media networks have revitalized human-being to connect and socialize globally irrespective of geographical and any demographic boundaries. Additionally, it has revitalized business communities to reach target audiences through social media networks. However, as parallel adverse up-surge the everincreasing presence of malicious users or spam has altered predominant intend of such social media network by propagating biased contents, malicious contents and fraud acts. Avoiding and neutralizing such malefic users on social media network has remained a critical challenge due to gigantically large size and user’s diversity such as Facebook, Twitter, and LinkedIn etc. Though exploiting certain user’s behavior and content types can help identifying malicious users, majority of the existing methods are limited due to confined parametric assessment, and inferior classification approaches. With intend to provide spam profile detection system in this paper a novel heterogeneous ensemblebased method is developed. The proposed model exploits user profile features, user’s activity features, location features and content features to perform spam user profile detection. To ensure optimality of computational significances, we applied multi-phased feature selection method employing Wilcoxon Rank Sum test, Significant Predictor test, and Pearson Correlation test, which assured retaining optimal feature sets for further classification. Subsequently, applying an array of machine learning methods, including Logistic regression, decision tree, Support Vector Machine variants with Linear, Polynomial and RBF kernels, Least Square SVM with linear, polynomial and RBF kernels, ANN with different kernels, etc we constituted a robust ensemble model for spam user profile classification. Simulations revealed that the proposed ensemble classification model achieves accuracy and F-score higher than 98%, which is the highest amongst major works done so far. It affirms suitability and robustness of the proposed model for real time spam profile detection and classification on social media platforms

Download Full-text

Sentiment classification of social media reviews using an ensemble classifier

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v16.i1.pp355-363 ◽

2019 ◽

Vol 16 (1) ◽

pp. 355 ◽

Cited By ~ 1

Author(s):

Savita Sangam ◽

Subhash Shinde

Keyword(s):

Social Media ◽

Opinion Mining ◽

Ensemble Classifier ◽

Sentiment Classification ◽

Support Vector ◽

Business Organizations ◽

Text Data ◽

Proposed Model ◽

Show Business ◽

Use Of Social Media

<p>These days it has become a common practice for business organizations and individuals to make use of social media for sharing the opinions about the products or the services. Consumers are also ready to share their views on certain products or commodities. Thus huge amount of unstructured social media data gets generated day by day. Gradually heap of text data will be formed in many areas like automated business, education, health care, and show business and so on. Opinion mining also referred as sentiment analysis or sentiment classification, deals with mining of the review text and classifying the opinions or the sentiments of that text as positive or negative. In this paper we propose an ensemble classifier model consisting of Support Vector Machine and Artificial Neural Network. It combines the knowledge from two feature sets for sentiment classification. The proposed model shows the acceptable performance in terms of accuracy when compared with the baseline model.</p>

Download Full-text

Comparison of SVM, RF and SGD Methods for Determination of Programmer's Performance Classification Model in Social Media Activities

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i2.1770 ◽

2020 ◽

Vol 4 (2) ◽

pp. 329-335

Author(s):

Rusydi Umar ◽

Imam Riadi ◽

Purwono

Keyword(s):

Social Media ◽

Gradient Descent ◽

Classification Model ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Svm Algorithm ◽

Vector Machines ◽

Performance Patterns ◽

A Company

The failure of most startups in Indonesia is caused by team performance that is not solid and competent. Programmers are an integral profession in a startup team. The development of social media can be used as a strategic tool for recruiting the best programmer candidates in a company. This strategic tool is in the form of an automatic classification system of social media posting from prospective programmers. The classification results are expected to be able to predict the performance patterns of each candidate with a predicate of good or bad performance. The classification method with the best accuracy needs to be chosen in order to get an effective strategic tool so that a comparison of several methods is needed. This study compares classification methods including the Support Vector Machines (SVM) algorithm, Random Forest (RF) and Stochastic Gradient Descent (SGD). The classification results show the percentage of accuracy with k = 10 cross validation for the SVM algorithm reaches 81.3%, RF at 74.4%, and SGD at 80.1% so that the SVM method is chosen as a model of programmer performance classification on social media activities.

Download Full-text

Drug Target Group Prediction with Multiple Drug Networks

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666190702103927 ◽

2020 ◽

Vol 23 (4) ◽

pp. 274-284 ◽

Cited By ~ 12

Author(s):

Jingang Che ◽

Lei Chen ◽

Zi-Han Guo ◽

Shuaiqun Wang ◽

Aorigele

Keyword(s):

Drug Target ◽

Low Cost ◽

Machine Learning Algorithms ◽

Classification Model ◽

Support Vector ◽

Multiple Drug ◽

Property A ◽

Multiple Networks ◽

Proposed Model ◽

The One

Background: Identification of drug-target interaction is essential in drug discovery. It is beneficial to predict unexpected therapeutic or adverse side effects of drugs. To date, several computational methods have been proposed to predict drug-target interactions because they are prompt and low-cost compared with traditional wet experiments. Methods: In this study, we investigated this problem in a different way. According to KEGG, drugs were classified into several groups based on their target proteins. A multi-label classification model was presented to assign drugs into correct target groups. To make full use of the known drug properties, five networks were constructed, each of which represented drug associations in one property. A powerful network embedding method, Mashup, was adopted to extract drug features from above-mentioned networks, based on which several machine learning algorithms, including RAndom k-labELsets (RAKEL) algorithm, Label Powerset (LP) algorithm and Support Vector Machine (SVM), were used to build the classification model. Results and Conclusion: Tenfold cross-validation yielded the accuracy of 0.839, exact match of 0.816 and hamming loss of 0.037, indicating good performance of the model. The contribution of each network was also analyzed. Furthermore, the network model with multiple networks was found to be superior to the one with a single network and classic model, indicating the superiority of the proposed model.

Download Full-text

Real-time spatio-temporal event detection on geotagged social media

Journal Of Big Data ◽

10.1186/s40537-021-00482-2 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Yasmeen George ◽

Shanika Karunasekera ◽

Aaron Harwood ◽

Kwan Hui Lim

Keyword(s):

New York ◽

Social Media ◽

Event Detection ◽

Detection System ◽

Time And Space ◽

Social Media Data ◽

Event Time ◽

Spatio Temporal ◽

Geographical Space ◽

Media Data

AbstractA key challenge in mining social media data streams is to identify events which are actively discussed by a group of people in a specific local or global area. Such events are useful for early warning for accident, protest, election or breaking news. However, neither the list of events nor the resolution of both event time and space is fixed or known beforehand. In this work, we propose an online spatio-temporal event detection system using social media that is able to detect events at different time and space resolutions. First, to address the challenge related to the unknown spatial resolution of events, a quad-tree method is exploited in order to split the geographical space into multiscale regions based on the density of social media data. Then, a statistical unsupervised approach is performed that involves Poisson distribution and a smoothing method for highlighting regions with unexpected density of social posts. Further, event duration is precisely estimated by merging events happening in the same region at consecutive time intervals. A post processing stage is introduced to filter out events that are spam, fake or wrong. Finally, we incorporate simple semantics by using social media entities to assess the integrity, and accuracy of detected events. The proposed method is evaluated using different social media datasets: Twitter and Flickr for different cities: Melbourne, London, Paris and New York. To verify the effectiveness of the proposed method, we compare our results with two baseline algorithms based on fixed split of geographical space and clustering method. For performance evaluation, we manually compute recall and precision. We also propose a new quality measure named strength index, which automatically measures how accurate the reported event is.

Download Full-text

Perceiving Residents’ Festival Activities Based on Social Media Data: A Case Study in Beijing, China

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10070474 ◽

2021 ◽

Vol 10 (7) ◽

pp. 474

Author(s):

Bingqing Wang ◽

Bin Meng ◽

Juan Wang ◽

Siyu Chen ◽

Jian Liu

Keyword(s):

Social Media ◽

Language Processing ◽

Topic Model ◽

Central Area ◽

Classification Model ◽

Social Media Data ◽

Ring Road ◽

Different Types ◽

Spatial Differences ◽

Media Data

Social media data contains real-time expressed information, including text and geographical location. As a new data source for crowd behavior research in the era of big data, it can reflect some aspects of the behavior of residents. In this study, a text classification model based on the BERT and Transformers framework was constructed, which was used to classify and extract more than 210,000 residents’ festival activities based on the 1.13 million Sina Weibo (Chinese “Twitter”) data collected from Beijing in 2019 data. On this basis, word frequency statistics, part-of-speech analysis, topic model, sentiment analysis and other methods were used to perceive different types of festival activities and quantitatively analyze the spatial differences of different types of festivals. The results show that traditional culture significantly influences residents’ festivals, reflecting residents’ motivation to participate in festivals and how residents participate in festivals and express their emotions. There are apparent spatial differences among residents in participating in festival activities. The main festival activities are distributed in the central area within the Fifth Ring Road in Beijing. In contrast, expressing feelings during the festival is mainly distributed outside the Fifth Ring Road in Beijing. The research integrates natural language processing technology, topic model analysis, spatial statistical analysis, and other technologies. It can also broaden the application field of social media data, especially text data, which provides a new research paradigm for studying residents’ festival activities and adds residents’ perception of the festival. The research results provide a basis for the design and management of the Chinese festival system.

Download Full-text

A support vector machine with the tabu search algorithm for freeway incident detection

International Journal of Applied Mathematics and Computer Science ◽

10.2478/amcs-2014-0030 ◽

2014 ◽

Vol 24 (2) ◽

pp. 397-404 ◽

Cited By ~ 37

Author(s):

Baozhen Yao ◽

Ping Hu ◽

Mingheng Zhang ◽

Maoqing Jin

Keyword(s):

Support Vector Machine ◽

Tabu Search ◽

Traffic Management ◽

Search Algorithm ◽

Detection System ◽

Support Vector ◽

Incident Detection ◽

Tabu Search Algorithm ◽

Proposed Model ◽

Parameter Values

Abstract Automated Incident Detection (AID) is an important part of Advanced Trafﬁc Management and Information Systems (ATMISs). An automated incident detection system can effectively provide information on an incident, which can help initiate the required measure to reduce the inﬂuence of the incident. To accurately detect incidents in expressways, a Support Vector Machine (SVM) is used in this paper. Since the selection of optimal parameters for the SVM can improve prediction accuracy, the tabu search algorithm is employed to optimize the SVM parameters. The proposed model is evaluated with data for two freeways in China. The results show that the tabu search algorithm can effectively provide better parameter values for the SVM, and SVM models outperform Artiﬁcial Neural Networks (ANNs) in freeway incident detection.

Download Full-text

Sales Growth Rate Forecasting Using Improved PSO and SVM

Mathematical Problems in Engineering ◽

10.1155/2014/437898 ◽

2014 ◽

Vol 2014 ◽

pp. 1-13 ◽

Cited By ~ 1

Author(s):

Xibin Wang ◽

Junhao Wen ◽

Shafiq Alam ◽

Xiang Gao ◽

Zhuo Jiang ◽

...

Keyword(s):

Growth Rate ◽

Decisive Role ◽

Convergence Speed ◽

Classification Model ◽

Support Vector ◽

Local Optimum ◽

Sales Growth ◽

Local Optima ◽

Proposed Model ◽

Forecasting Performance

Accurate forecast of the sales growth rate plays a decisive role in determining the amount of advertising investment. In this study, we present a preclassification and later regression based method optimized by improved particle swarm optimization (IPSO) for sales growth rate forecasting. We use support vector machine (SVM) as a classification model. The nonlinear relationship in sales growth rate forecasting is efficiently represented by SVM, while IPSO is optimizing the training parameters of SVM. IPSO addresses issues of traditional PSO, such as relapsing into local optimum, slow convergence speed, and low convergence precision in the later evolution. We performed two experiments; firstly, three classic benchmark functions are used to verify the validity of the IPSO algorithm against PSO. Having shown IPSO outperform PSO in convergence speed, precision, and escaping local optima, in our second experiment, we apply IPSO to the proposed model. The sales growth rate forecasting cases are used to testify the forecasting performance of proposed model. According to the requirements and industry knowledge, the sample data was first classified to obtain types of the test samples. Next, the values of the test samples were forecast using the SVM regression algorithm. The experimental results demonstrate that the proposed model has good forecasting performance.

Download Full-text

KLASIFIKASI TEKS SOSIAL MEDIA TWITTER MENGGUNAKAN SUPPORT VECTOR MACHINE (Studi Kasus Penusukan Wiranto)

Jurnal Informatika dan Rekayasa Elektronik ◽

10.36595/jire.v2i2.117 ◽

2019 ◽

Vol 2 (2) ◽

pp. 43

Author(s):

Lalu Mutawalli ◽

Mohammad Taufan Asri Zaen ◽

Wire Bagye

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Big Data ◽

Mass Communication ◽

Confusion Matrix ◽

Classification Model ◽

Support Vector ◽

Large Set ◽

Appearance Time ◽

Data Production

In the era of technological disruption of mass communication, social media became a reference in absorbing public opinion. The digitalization of data is very rapidly produced by social media users because it is an attempt to represent the feelings of the audience. Data production in question is the user posts the status and comments on social media. Data production by the public in social media raises a very large set of data or can be referred to as big data. Big data is a collection of data sets in very large numbers, complex, has a relatively fast appearance time, so that makes it difficult to handle. Analysis of big data with data mining methods to get knowledge patterns in it. This study analyzes the sentiments of netizens on Twitter social media on Mr. Wiranto stabbing case. The results of the sentiment analysis showed 41% gave positive comments, 29% commented neutrally, and 29% commented negatively on events. Besides, modeling of the data is carried out using a support vector machine algorithm to create a system capable of classifying positive, neutral, and negative connotations. The classification model that has been made is then tested using the confusion matrix technique with each result is a precision value of 83%, a recall value of 80%, and finally, as much as 80% obtained in testing the accuracy.

Download Full-text

Case Study 2: The Trade-Off between Reproducibility and Privacy in the Use of Social Media Data to Study Political Behavior

The Practice of Reproducible Research ◽

10.1525/9780520967779-011 ◽

2019 ◽

pp. 103-108

Keyword(s):

Social Media ◽

Political Behavior ◽

Social Media Data ◽

Trade Off ◽

Use Of Social Media ◽

Media Data

Download Full-text