Adaptive spam filterings system using complement naive bayes model

2020 ◽  
Vol 26 (1) ◽  
Author(s):  
M.A. Adegoke ◽  
O. Abass

Naïve bayes filter is a simple probabilistic filtering method based on Bayes theorem. A crucial problem with the conventional naïve bayes filter is the assumption of uniform priors in the computation of the posterior distribution. For online data such as email environment where the training data are constantly updated so as to outsmart the tricks of spammers, the prior knowledge cannot be uniform. Skewedness in the prior knowledge caused by the updated information has been reported to affect the accuracy and then the effectiveness of the traditional naïve bayes filter. In this study, the skewedness is addressed using complement naïve bayes model. The complement naïve bayes model was implemented and tested on benchmarked data and the result compared with the results obtained with the results obtained from the conventional naïve bayes filter on the same dataset. The complement naïve bayes based filter outperforms the conventional naïve bayes filter by 5.39%.Keywords: Spam, Spam filtering, complement naïve bayes, adaptive filtering, prior, bias, accuracy, filter, adaptive, skewednessVol. 26, No 1, June, 2019

2021 ◽  
Author(s):  
Graeme Hart ◽  
Michael Woodburn ◽  
Nada Marhoon ◽  
Alan Pritchard ◽  
Jeff Feldman ◽  
...  

BACKGROUND Background: Quality Assurance activities are frequently dependent on manual assessment of text-based records. Increasingly, these records have digital structures that may be amenable to computer analysis. We used the Australian Commission for Safety and Quality in Healthcare (ACSQHC) National Clinical Care Colonoscopy standard reporting requirement as a proof of concept for an analytics process to streamline and reduce manual reporting overheads. The endoscopy unit performs approximately 4,500 colonoscopies (mainly outpatient) per year. Quarterly reporting of colonoscopy outcomes requires approximately 30 hours of manual data abstraction, collation and combination from a variety of electronic databases. The most time consuming is manual retrieval and abstraction of histopathology records from the EMR. OBJECTIVE 1. To reduce the manual overheads of quarterly National Standards KPI reporting for colonoscopy compliance using an automated data pipeline and Artificial Intelligence tools. 2. The service also wished to minimise the risk of failure to follow up in new cancer diagnoses for outpatient colonoscopies. 3. To develop a data and analytic pipeline that would be easily re-purposed for additional standards, audit and research projects. METHODS A data pipeline and analysis environment were established in the hospitals’ secure Microsoft Azure databricks resource. A Training data set of 1000 colonoscopies was extracted using from the procedural Provation database using the the ProvationMD ® reporting tool and linked to relevant histopathology reports provided from the Clinical Research Data Warehouse (CRDW). The Machine Learning (ML) training data set was created when histopathological reports were manually coded by Gastroenterology Registrars & nurses into the following categories: Adenoma Clinically Significant Sessile Serrated Adenoma Cancer Adequate Bowel Preparation Complete examination A variety of Natural Language Processing (NLP) & ML models were assessed and refined to minimize error rate. Sensitivity was prioritised for the diagnosis of Cancer to minimize missed cases. Reporting to clinicians and quality co-ordinators was established using Microsoft Power BI. RESULTS The Naïve Bayes model for multinomial data resulted in high accuracy, but impacted recall. Sensitivity improved using a virtual ensemble approach, layering models within the processing pipeline and maximised using Microsoft’s ® Text Analytics – Healthcare NLP model with our custom Naïve Bayes model. F1 scores between 0.89 and 0.93 were achieved. The algorithm checks daily for new data and performs the analysis. Quarterly analysis and reporting time decreased from 30 hours to less than 5 minutes and reports can now be continuously updated in the Microsoft Power BI reporting portal. CONCLUSIONS Advanced analytic techniques can be deployed for mandatory quality reporting in a secure, cloud based, hospital data domain. The cost was far less than the manual processes it replaces. Reporting is more timely as it is automated. The potential for training such algorithms for other QA reporting is high. Text based research and audit within the free text domain of the EMR clinical documentation also becomes possible. CLINICALTRIAL Not applicable


2012 ◽  
Vol 6-7 ◽  
pp. 576-582
Author(s):  
Ping Li ◽  
Ming Liang Cui ◽  
Zhen Shan Hou ◽  
Liu Liu Wei ◽  
Wen Hao Ying ◽  
...  

Session segmentation can not only contribute a lot to the further and deeper analysis of user’s search behavior but also act as the foundation of other retrieval process researches based on users’ complicated search behaviors. This paper proposes a session boundary discrimination model utilizing time interval and query likelihood on the basis of Naive Bayes Model. Compared with previous study, the model proposed in this paper shows a prominent improvement through experiment in three aspects, which is: recall ratio, precision ratio and value F. Owing to its advantage in session boundary discrimination, the application of the model can serve as a tool in fields like personalized information retrieval, query suggestion, search activity analysis and other fields which is related to search results improvement.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 217917-217927
Author(s):  
Dashe Li ◽  
Jiajun Sun ◽  
Huanhai Yang ◽  
Xueying Wang

2020 ◽  
Vol 541 ◽  
pp. 316-331
Author(s):  
Si-Yuan Liu ◽  
Jing Xiao ◽  
Xiao-Ke Xu

2015 ◽  
Vol 2015 ◽  
pp. 1-11 ◽  
Author(s):  
Mengmeng Wang ◽  
Wanli Zuo ◽  
Ying Wang

Today microblogging has increasingly become a means of information diffusion via user’s retweeting behavior. Since retweeting content, as context information of microblogging, is an understanding of microblogging, hence, user’s retweeting sentiment tendency analysis has gradually become a hot research topic. Targeted at online microblogging, a dynamic social network, we investigate how to exploit dynamic retweeting sentiment features in retweeting sentiment tendency analysis. On the basis of time series of user’s network structure information and published text information, we first model dynamic retweeting sentiment features. Then we build Naïve Bayes models from profile-, relationship-, and emotion-based dimensions, respectively. Finally, we build a multilayer Naïve Bayes model based on multidimensional Naïve Bayes models to analyze user’s retweeting sentiment tendency towards a microblog. Experiments on real-world dataset demonstrate the effectiveness of the proposed framework. Further experiments are conducted to understand the importance of dynamic retweeting sentiment features and temporal information in retweeting sentiment tendency analysis. What is more, we provide a new train of thought for retweeting sentiment tendency analysis in dynamic social networks.


2012 ◽  
Vol 19B (3) ◽  
pp. 195-200
Author(s):  
Jae-Hoon Kim ◽  
Kil-Ho Jeon

IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 57868-57880 ◽  
Author(s):  
Longjie Li ◽  
Shijin Xu ◽  
Mingwei Leng ◽  
Shiyu Fang ◽  
Xiaoyun Chen

Sign in / Sign up

Export Citation Format

Share Document