Machine Learning increases efficiency and timeliness of National Procedural Clinical Standards KPI reporting (Preprint)

BACKGROUND Background: Quality Assurance activities are frequently dependent on manual assessment of text-based records. Increasingly, these records have digital structures that may be amenable to computer analysis. We used the Australian Commission for Safety and Quality in Healthcare (ACSQHC) National Clinical Care Colonoscopy standard reporting requirement as a proof of concept for an analytics process to streamline and reduce manual reporting overheads. The endoscopy unit performs approximately 4,500 colonoscopies (mainly outpatient) per year. Quarterly reporting of colonoscopy outcomes requires approximately 30 hours of manual data abstraction, collation and combination from a variety of electronic databases. The most time consuming is manual retrieval and abstraction of histopathology records from the EMR. OBJECTIVE 1. To reduce the manual overheads of quarterly National Standards KPI reporting for colonoscopy compliance using an automated data pipeline and Artificial Intelligence tools. 2. The service also wished to minimise the risk of failure to follow up in new cancer diagnoses for outpatient colonoscopies. 3. To develop a data and analytic pipeline that would be easily re-purposed for additional standards, audit and research projects. METHODS A data pipeline and analysis environment were established in the hospitals’ secure Microsoft Azure databricks resource. A Training data set of 1000 colonoscopies was extracted using from the procedural Provation database using the the ProvationMD ® reporting tool and linked to relevant histopathology reports provided from the Clinical Research Data Warehouse (CRDW). The Machine Learning (ML) training data set was created when histopathological reports were manually coded by Gastroenterology Registrars & nurses into the following categories: Adenoma Clinically Significant Sessile Serrated Adenoma Cancer Adequate Bowel Preparation Complete examination A variety of Natural Language Processing (NLP) & ML models were assessed and refined to minimize error rate. Sensitivity was prioritised for the diagnosis of Cancer to minimize missed cases. Reporting to clinicians and quality co-ordinators was established using Microsoft Power BI. RESULTS The Naïve Bayes model for multinomial data resulted in high accuracy, but impacted recall. Sensitivity improved using a virtual ensemble approach, layering models within the processing pipeline and maximised using Microsoft’s ® Text Analytics – Healthcare NLP model with our custom Naïve Bayes model. F1 scores between 0.89 and 0.93 were achieved. The algorithm checks daily for new data and performs the analysis. Quarterly analysis and reporting time decreased from 30 hours to less than 5 minutes and reports can now be continuously updated in the Microsoft Power BI reporting portal. CONCLUSIONS Advanced analytic techniques can be deployed for mandatory quality reporting in a secure, cloud based, hospital data domain. The cost was far less than the manual processes it replaces. Reporting is more timely as it is automated. The potential for training such algorithms for other QA reporting is high. Text based research and audit within the free text domain of the EMR clinical documentation also becomes possible. CLINICALTRIAL Not applicable

Download Full-text

Prediction of Lung Cancer Risk using Random Forest Algorithm Based on Kaggle Data Set

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f7879.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 1623-1630

Keyword(s):

Machine Learning ◽

Lung Cancer ◽

Random Forest ◽

Naive Bayes ◽

Early Stage ◽

Naïve Bayes ◽

Training Data ◽

Random Forest Algorithm ◽

Data Set ◽

Wide Range

As huge amount of data accumulating currently, Challenges to draw out the required amount of data from available information is needed. Machine learning contributes to various fields. The fast-growing population caused the evolution of a wide range of diseases. This intern resulted in the need for the machine learning model that uses the patient's datasets. From different sources of datasets analysis, cancer is the most hazardous disease, it may cause the death of the forbearer. The outcome of the conducted surveys states cancer can be nearly cured in the initial stages and it may also cause the death of an affected person in later stages. One of the major types of cancer is lung cancer. It highly depends on the past data which requires detection in early stages. The recommended work is based on the machine learning algorithm for grouping the individual details into categories to predict whether they are going to expose to cancer in the early stage itself. Random forest algorithm is implemented, it results in more efficiency of 97% compare to KNN and Naive Bayes. Further, the KNN algorithm doesn't learn anything from training data but uses it for classification. Naive Bayes results in the inaccuracy of prediction. The proposed system is for predicting the chances of lung cancer by displaying three levels namely low, medium, and high. Thus, mortality rates can be reduced significantly.

Download Full-text

Adaptive spam filterings system using complement naive bayes model

Journal of Computer Science and Its Application ◽

10.4314/jcsia.v26i1.12 ◽

2020 ◽

Vol 26 (1) ◽

Author(s):

M.A. Adegoke ◽

O. Abass

Keyword(s):

Prior Knowledge ◽

Adaptive Filtering ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Spam Filtering ◽

Online Data ◽

Bayes Model ◽

Crucial Problem ◽

Naïve Bayes Model

Naïve bayes filter is a simple probabilistic filtering method based on Bayes theorem. A crucial problem with the conventional naïve bayes filter is the assumption of uniform priors in the computation of the posterior distribution. For online data such as email environment where the training data are constantly updated so as to outsmart the tricks of spammers, the prior knowledge cannot be uniform. Skewedness in the prior knowledge caused by the updated information has been reported to affect the accuracy and then the effectiveness of the traditional naïve bayes filter. In this study, the skewedness is addressed using complement naïve bayes model. The complement naïve bayes model was implemented and tested on benchmarked data and the result compared with the results obtained with the results obtained from the conventional naïve bayes filter on the same dataset. The complement naïve bayes based filter outperforms the conventional naïve bayes filter by 5.39%.Keywords: Spam, Spam filtering, complement naïve bayes, adaptive filtering, prior, bias, accuracy, filter, adaptive, skewednessVol. 26, No 1, June, 2019

Download Full-text

Session Segmentation Method Based on Naïve Bayes Model

Advanced Engineering Forum ◽

10.4028/www.scientific.net/aef.6-7.576 ◽

2012 ◽

Vol 6-7 ◽

pp. 576-582

Author(s):

Ping Li ◽

Ming Liang Cui ◽

Zhen Shan Hou ◽

Liu Liu Wei ◽

Wen Hao Ying ◽

...

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

Time Interval ◽

Segmentation Method ◽

Retrieval Process ◽

Search Activity ◽

Query Suggestion ◽

Bayes Model ◽

Discrimination Model ◽

Naïve Bayes Model

Session segmentation can not only contribute a lot to the further and deeper analysis of user’s search behavior but also act as the foundation of other retrieval process researches based on users’ complicated search behaviors. This paper proposes a session boundary discrimination model utilizing time interval and query likelihood on the basis of Naive Bayes Model. Compared with previous study, the model proposed in this paper shows a prominent improvement through experiment in three aspects, which is: recall ratio, precision ratio and value F. Owing to its advantage in session boundary discrimination, the application of the model can serve as a tool in fields like personalized information retrieval, query suggestion, search activity analysis and other fields which is related to search results improvement.

Download Full-text

Evaluation of Prognosis in Nasopharyngeal Cancer Using Machine Learning

Technology in Cancer Research & Treatment ◽

10.1177/1533033820909829 ◽

2020 ◽

Vol 19 ◽

pp. 153303382090982

Author(s):

Melek Akcay ◽

Durmus Etiz ◽

Ozer Celik ◽

Alaattin Ozen

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Naive Bayes ◽

Nasopharyngeal Cancer ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Support Vector ◽

Tumor Diameter ◽

Survival Prognosis ◽

Data Set

Background and Aim: Although the prognosis of nasopharyngeal cancer largely depends on a classification based on the tumor-lymph node metastasis staging system, patients at the same stage may have different clinical outcomes. This study aimed to evaluate the survival prognosis of nasopharyngeal cancer using machine learning. Settings and Design: Original, retrospective. Materials and Methods: A total of 72 patients with a diagnosis of nasopharyngeal cancer who received radiotherapy ± chemotherapy were included in the study. The contribution of patient, tumor, and treatment characteristics to the survival prognosis was evaluated by machine learning using the following techniques: logistic regression, artificial neural network, XGBoost, support-vector clustering, random forest, and Gaussian Naive Bayes. Results: In the analysis of the data set, correlation analysis, and binary logistic regression analyses were applied. Of the 18 independent variables, 10 were found to be effective in predicting nasopharyngeal cancer-related mortality: age, weight loss, initial neutrophil/lymphocyte ratio, initial lactate dehydrogenase, initial hemoglobin, radiotherapy duration, tumor diameter, number of concurrent chemotherapy cycles, and T and N stages. Gaussian Naive Bayes was determined as the best algorithm to evaluate the prognosis of machine learning techniques (accuracy rate: 88%, area under the curve score: 0.91, confidence interval: 0.68-1, sensitivity: 75%, specificity: 100%). Conclusion: Many factors affect prognosis in cancer, and machine learning algorithms can be used to determine which factors have a greater effect on survival prognosis, which then allows further research into these factors. In the current study, Gaussian Naive Bayes was identified as the best algorithm for the evaluation of prognosis of nasopharyngeal cancer.

Download Full-text

An Enhanced Naive Bayes Model for Dissolved Oxygen Forecasting in Shellfish Aquaculture

IEEE Access ◽

10.1109/access.2020.3042180 ◽

2020 ◽

Vol 8 ◽

pp. 217917-217927

Author(s):

Dashe Li ◽

Jiajun Sun ◽

Huanhai Yang ◽

Xueying Wang

Keyword(s):

Dissolved Oxygen ◽

Naive Bayes ◽

Naïve Bayes ◽

Shellfish Aquaculture ◽

Bayes Model ◽

Naïve Bayes Model

Download Full-text

Sign prediction by motif naive Bayes model in social networks

Information Sciences ◽

10.1016/j.ins.2020.05.128 ◽

2020 ◽

Vol 541 ◽

pp. 316-331

Author(s):

Si-Yuan Liu ◽

Jing Xiao ◽

Xiao-Ke Xu

Keyword(s):

Social Networks ◽

Naive Bayes ◽

Naïve Bayes ◽

Bayes Model ◽

Naïve Bayes Model

Download Full-text

A Multilayer Naïve Bayes Model for Analyzing User’s Retweeting Sentiment Tendency

Computational Intelligence and Neuroscience ◽

10.1155/2015/510281 ◽

2015 ◽

Vol 2015 ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

Mengmeng Wang ◽

Wanli Zuo ◽

Ying Wang

Keyword(s):

Information Diffusion ◽

Naive Bayes ◽

Naïve Bayes ◽

Structure Information ◽

Bayes Model ◽

Dynamic Social Network ◽

Dynamic Social Networks ◽

Text Information ◽

Naïve Bayes Model ◽

Tendency Analysis

Today microblogging has increasingly become a means of information diffusion via user’s retweeting behavior. Since retweeting content, as context information of microblogging, is an understanding of microblogging, hence, user’s retweeting sentiment tendency analysis has gradually become a hot research topic. Targeted at online microblogging, a dynamic social network, we investigate how to exploit dynamic retweeting sentiment features in retweeting sentiment tendency analysis. On the basis of time series of user’s network structure information and published text information, we first model dynamic retweeting sentiment features. Then we build Naïve Bayes models from profile-, relationship-, and emotion-based dimensions, respectively. Finally, we build a multilayer Naïve Bayes model based on multidimensional Naïve Bayes models to analyze user’s retweeting sentiment tendency towards a microblog. Experiments on real-world dataset demonstrate the effectiveness of the proposed framework. Further experiments are conducted to understand the importance of dynamic retweeting sentiment features and temporal information in retweeting sentiment tendency analysis. What is more, we provide a new train of thought for retweeting sentiment tendency analysis in dynamic social networks.

Download Full-text

The application of naive Bayes model averaging to predict Alzheimer's disease from genome-wide data

Journal of the American Medical Informatics Association ◽

10.1136/amiajnl-2011-000101 ◽

2011 ◽

Vol 18 (4) ◽

pp. 370-375 ◽

Cited By ~ 42

Author(s):

Wei Wei ◽

Shyam Visweswaran ◽

Gregory F Cooper

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Naive Bayes ◽

Model Averaging ◽

Naïve Bayes ◽

Bayes Model ◽

Genome Wide ◽

Genome Wide Data ◽

Naïve Bayes Model

Download Full-text

Future Prediction of Diabetics using XG Booster Classifiers

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c5144.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2128-2132

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

The Body ◽

Machine Learning Algorithms ◽

Support Vector ◽

Common Disease ◽

Data Set ◽

Glucose Content

Diabetes is a most common disease that occurs to most of the humans now a day. The predictions for this disease are proposed through machine learning techniques. Through this method the risk factors of this disease are identified and can be prevented from increasing. Early prediction in such disease can be controlled and save human’s life. For the early predictions of this disease we collect data set having 8 attributes diabetic of 200 patients. The patients’ sugar level in the body is tested by the features of patient’s glucose content in the body and according to the age. The main Machine learning algorithms are Support vector machine (SVM), naive bayes (NB), K nearest neighbor (KNN) and Decision Tree (DT). In the exiting the Naive Bayes the accuracy levels are 66% but in the Decision tree the accuracy levels are 70 to 71%. The accuracy levels of the patients are not proper in range. But in XG boost classifiers even after the Naïve Bayes 74 Percentage and in Decision tree the accuracy levels are 89 to 90%. In the proposed system the accuracy ranges are shown properly and this is only used mostly. A dataset of 729 patients can be stored in Mongo DB and in that 129 patients repots are taken for the prediction purpose and the remaining are used for training. The training datasets are used for the prediction purposes.

Download Full-text

Prediction of Heart Disease using Machine Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1081.0982s1019 ◽

2019 ◽

Vol 8 (2S10) ◽

pp. 474-477

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Support Vector Machines ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Data Set ◽

Vector Machines ◽

Naive Bayes Classification ◽

Naïve Bayes Classification

Machine learning is one of the fast growing aspect in current world. Machine learning (ML) and Artificial Neural Network (ANN) are helpful in detection and diagnosis of various heart diseases. Naïve Bayes Classification is a vital approach of classification in machine learning. The heart disease consists of set of range disorders affecting the heart. It includes blood vessel problems such as irregular heart beat issues, weak heart muscles, congenital heart defects, cardio vascular disease and coronary artery disease. Coronary heart disorder is a familiar type of heart disease. It reduces the blood flow to the heart leading to a heart attack. In this paper the UCI machine learning repository data set consisting of patients suffering from heart disease is analyzed using Naïve Bayes classification and support vector machines. The classification accuracy of the patients suffering from heart disease is predicted using Naïve Bayes classification and support vector machines. Implementation is done using R language.

Download Full-text