An ensemble classification model for fake feedback detection using proposed labeled CloudArmor dataset

2021 ◽  
Vol 93 ◽  
pp. 107217
Author(s):  
Harsh Taneja ◽  
Supreet Kaur
2019 ◽  
Vol 15 (3) ◽  
pp. 497-514
Author(s):  
Jozef Michal Mintal ◽  
Róbert Vancel

AbstractSocial networking services (SNSs) can significantly impact public life during important political events. Thus, it comes as no surprise that different political actors try to exploit these online platforms for their benefit. Bots constitute a popular tool on SNSs that appears to be able to shape public opinion and disrupt political processes. However, the role of bots during political events in a non-Western context remains largely under-studied. This article addresses the question of the involvement of Twitter bots during electoral campaigns in Japan. In our study, we collected Twitter data over a fourteen-day period in October 2017 using a set of hashtags related to the 2017 Japanese general election. Our dataset includes 905,215 tweets, 665,400 of which were unique tweets. Using a supervised machine learning approach, we first built a custom ensemble classification model for bot detection based on user profile features, with an area under curve (AUC) for the test set of 0.998. Second, in applying our model, we estimate that the impact of Twitter bots in Japan was minor overall. In comparison with similar studies conducted during elections in the US and the UK, the deployment of Twitter bots involved in the 2017 Japanese general election seems to be significantly lower. Finally, given our results on the level of bots on Twitter during the 2017 Japanese general election, we provide various possible explanations for their underuse within a broader socio-political context.


Author(s):  
Alexandra Pomares-Quimbaya ◽  
Rafael A. Gonzalez ◽  
Oscar Mauricio Muñoz Velandia ◽  
Angel Alberto Garcia Peña ◽  
Julián Camilo Daza Rodríguez ◽  
...  

Extracting valuable knowledge from Electronic Health Records (EHR) represents a challenging task due to the presence of both structured and unstructured data, including codified fields, images and test results. Narrative text in particular contains a variety of notes which are diverse in language and detail, as well as being full of ad hoc terminology, including acronyms and jargon, which is especially challenging in non-English EHR, where there is a dearth of annotated corpora or trained case sets. This paper proposes an approach for NER and concept attribute labeling for EHR that takes into consideration the contextual words around the entity of interest to determine its sense. The approach proposes a composition method of three different NER methods, together with the analysis of the context (neighboring words) using an ensemble classification model. This contributes to disambiguate NER, as well as labeling the concept as confirmed, negated, speculative, pending or antecedent. Results show an improvement of the recall and a limited impact on precision for the NER process.


2020 ◽  
Vol 16 (1) ◽  
pp. 32-48
Author(s):  
Wei Cong

Using the ensemble learning method to mine valuable information from a sea of financial data accumulated on the market of financial securities is very important for studying data processing. On the basis of financial data from A-share companies listed on Shanghai Stock Market, this article takes the perspective of unbalanced classification of ST stocks to carry out a study of the construction of a financial warning model for the listed companies. In our experiment, HDRF (HDRandom Forest, Hellinger Distance based Random Forest), ensemble classification models of Bagging, AdaBoost, and Rotation Forest, which take Hellinger distance decision tree (HDDT) as the base classifier, and the ensemble classification model which takes the C4.5 decision tree as the base classifier, are compared in respect of both the area under the ROC curve and the F-measure. As shown in the experimental results, the HDRF and the HDDT based classifier, as an ensemble method, are effective for financial data of listed companies.


Measurement ◽  
2021 ◽  
Vol 175 ◽  
pp. 109025
Author(s):  
Padmavathi Radhakrishnan ◽  
Kalaivani Ramaiyan ◽  
Arangarajan Vinayagam ◽  
Veerapandiyan Veerasamy

Network intrusion is a foremost growing concern threat in the cyberspace, which can be damage the network architecture in a multiple ways by modifying the system configuration/parameters. Hackers/Intruders are familiar with signature based intrusion detection models and they are making successful attempts to crash the networks. Hence, it is necessary to preserve user privacy on intrusion data. PPDM techniques forms a necessary but existing techniques such as Encryption, Perturbation, Data Transformation, Normalization, L-Diversity, K-Anonymity methods forms excessive generalization and suppression problems. In this paper, LSPPM distortion technique using Least Square Method with ensemble classification model have been proposed for providing efficient privacy preservation on intrusion data. The proposed methodology is validated on benchmark NSL_KDD intrusion dataset. A comparative analysis of NSL_KDD class attributes is performed for better classification in terms of accuracy, FAR, F-Score and time taken to build LSPPM-NIDS. The experimental results of state-of-art PPDM methods are also analyzed before and after distortion, and privacy measures to ascertain the degree of privacy offered.


Sign in / Sign up

Export Citation Format

Share Document