An ensemble classification model for fake feedback detection using proposed labeled CloudArmor dataset

AbstractSocial networking services (SNSs) can significantly impact public life during important political events. Thus, it comes as no surprise that different political actors try to exploit these online platforms for their benefit. Bots constitute a popular tool on SNSs that appears to be able to shape public opinion and disrupt political processes. However, the role of bots during political events in a non-Western context remains largely under-studied. This article addresses the question of the involvement of Twitter bots during electoral campaigns in Japan. In our study, we collected Twitter data over a fourteen-day period in October 2017 using a set of hashtags related to the 2017 Japanese general election. Our dataset includes 905,215 tweets, 665,400 of which were unique tweets. Using a supervised machine learning approach, we first built a custom ensemble classification model for bot detection based on user profile features, with an area under curve (AUC) for the test set of 0.998. Second, in applying our model, we estimate that the impact of Twitter bots in Japan was minor overall. In comparison with similar studies conducted during elections in the US and the UK, the deployment of Twitter bots involved in the 2017 Japanese general election seems to be significantly lower. Finally, given our results on the level of bots on Twitter during the 2017 Japanese general election, we provide various possible explanations for their underuse within a broader socio-political context.

Download Full-text

Concept Attribute Labeling and Context-Aware Named Entity Recognition in Electronic Health Records

International Journal of Reliable and Quality E-Healthcare ◽

10.4018/ijrqeh.2018010101 ◽

2018 ◽

Vol 7 (1) ◽

pp. 1-15 ◽

Cited By ~ 1

Author(s):

Alexandra Pomares-Quimbaya ◽

Rafael A. Gonzalez ◽

Oscar Mauricio Muñoz Velandia ◽

Angel Alberto Garcia Peña ◽

Julián Camilo Daza Rodríguez ◽

...

Keyword(s):

Electronic Health Records ◽

Ad Hoc ◽

Named Entity Recognition ◽

Ensemble Classification ◽

Entity Recognition ◽

Classification Model ◽

Health Records ◽

Named Entity ◽

Electronic Health ◽

Concept Attribute

Extracting valuable knowledge from Electronic Health Records (EHR) represents a challenging task due to the presence of both structured and unstructured data, including codified fields, images and test results. Narrative text in particular contains a variety of notes which are diverse in language and detail, as well as being full of ad hoc terminology, including acronyms and jargon, which is especially challenging in non-English EHR, where there is a dearth of annotated corpora or trained case sets. This paper proposes an approach for NER and concept attribute labeling for EHR that takes into consideration the contextual words around the entity of interest to determine its sense. The approach proposes a composition method of three different NER methods, together with the analysis of the context (neighboring words) using an ensemble classification model. This contributes to disambiguate NER, as well as labeling the concept as confirmed, negated, speculative, pending or antecedent. Results show an improvement of the recall and a limited impact on precision for the NER process.

Download Full-text

Study of Financial Warning Ensemble Model for Listed Companies Based on Unbalanced Classification Perspective

International Journal of Intelligent Information Technologies ◽

10.4018/ijiit.2020010103 ◽

2020 ◽

Vol 16 (1) ◽

pp. 32-48

Author(s):

Wei Cong

Keyword(s):

Decision Tree ◽

Listed Companies ◽

Hellinger Distance ◽

Financial Data ◽

Ensemble Classification ◽

Classification Model ◽

Base Classifier ◽

C4.5 Decision Tree ◽

Unbalanced Classification ◽

Warning Model

Using the ensemble learning method to mine valuable information from a sea of financial data accumulated on the market of financial securities is very important for studying data processing. On the basis of financial data from A-share companies listed on Shanghai Stock Market, this article takes the perspective of unbalanced classification of ST stocks to carry out a study of the construction of a financial warning model for the listed companies. In our experiment, HDRF (HDRandom Forest, Hellinger Distance based Random Forest), ensemble classification models of Bagging, AdaBoost, and Rotation Forest, which take Hellinger distance decision tree (HDDT) as the base classifier, and the ensemble classification model which takes the C4.5 decision tree as the base classifier, are compared in respect of both the area under the ROC curve and the F-measure. As shown in the experimental results, the HDRF and the HDDT based classifier, as an ensemble method, are effective for financial data of listed companies.

Download Full-text

Ensemble Classification Model for Diabetes Prediction in Data Mining

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i5.16431647 ◽

2019 ◽

Vol 7 (5) ◽

pp. 1643-1647

Author(s):

Munendra Kumar ◽

Anuj Kumar

Keyword(s):

Data Mining ◽

Ensemble Classification ◽

Classification Model ◽

Diabetes Prediction

Download Full-text

A stacking ensemble classification model for detection and classification of power quality disturbances in PV integrated power network

Measurement ◽

10.1016/j.measurement.2021.109025 ◽

2021 ◽

Vol 175 ◽

pp. 109025

Author(s):

Padmavathi Radhakrishnan ◽

Kalaivani Ramaiyan ◽

Arangarajan Vinayagam ◽

Veerapandiyan Veerasamy

Keyword(s):

Power Quality ◽

Ensemble Classification ◽

Classification Model ◽

Power Network ◽

Power Quality Disturbances

Download Full-text

Handling imbalanced data with concept drift by applying dynamic sampling and ensemble classification model

Computer Communications ◽

10.1016/j.comcom.2020.01.061 ◽

2020 ◽

Vol 153 ◽

pp. 553-560

Author(s):

S. Ancy ◽

D. Paulraj

Keyword(s):

Concept Drift ◽

Imbalanced Data ◽

Ensemble Classification ◽

Classification Model ◽

Dynamic Sampling

Download Full-text

Least Square Privacy Preserving Technique for Intrusion Detection System

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b7447.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 2312-2319

Keyword(s):

Intrusion Detection ◽

Network Architecture ◽

Privacy Preservation ◽

Detection System ◽

Least Square Method ◽

Least Square ◽

Ensemble Classification ◽

Classification Model ◽

User Privacy ◽

Network Intrusion

Network intrusion is a foremost growing concern threat in the cyberspace, which can be damage the network architecture in a multiple ways by modifying the system configuration/parameters. Hackers/Intruders are familiar with signature based intrusion detection models and they are making successful attempts to crash the networks. Hence, it is necessary to preserve user privacy on intrusion data. PPDM techniques forms a necessary but existing techniques such as Encryption, Perturbation, Data Transformation, Normalization, L-Diversity, K-Anonymity methods forms excessive generalization and suppression problems. In this paper, LSPPM distortion technique using Least Square Method with ensemble classification model have been proposed for providing efficient privacy preservation on intrusion data. The proposed methodology is validated on benchmark NSL_KDD intrusion dataset. A comparative analysis of NSL_KDD class attributes is performed for better classification in terms of accuracy, FAR, F-Score and time taken to build LSPPM-NIDS. The experimental results of state-of-art PPDM methods are also analyzed before and after distortion, and privacy measures to ascertain the degree of privacy offered.

Download Full-text