scholarly journals Mining the crime data using naïve Bayes model

Author(s):  
Lourdes M. Padirayon ◽  
Melvin S. Atayan ◽  
Jose Sherief Panelo ◽  
Carlito R. Fagela, Jr

<p>A massive number of documents on crime has been handled by police departments worldwide and today's criminals are becoming technologically elegant. One obstacle faced by law enforcement is the complexity of processing voluminous crime data. Approximately 439 crimes have been registered in sanchez mira municipality in the past seven years. Police officers have no clear view as to the pattern crimes in the municipality, peak hours, months of the commission and the location where the crimes are concentrated. The naïve Bayes modelis a classification algorithm using the Rapid miner auto model which is used and analyze the crime data set. This approach helps to recognize crime trends and of which, most of the crimes committed were a violation of special penal laws. The month of May has the highest for index and non-index crimes and Tuesday as for the day of crimes. Hotspots were barangay centro 1 for non-index crimes and barangay centro 2 for index crimes. Most non-index crimes committed were violations of special law and for index crime rape recorded the highest crime and usually occurs at 2 o’clock in the afternoon. The crime outcome takes various decisions to maximize the efficacy of crime solutions.</p>

2021 ◽  
Author(s):  
Graeme Hart ◽  
Michael Woodburn ◽  
Nada Marhoon ◽  
Alan Pritchard ◽  
Jeff Feldman ◽  
...  

BACKGROUND Background: Quality Assurance activities are frequently dependent on manual assessment of text-based records. Increasingly, these records have digital structures that may be amenable to computer analysis. We used the Australian Commission for Safety and Quality in Healthcare (ACSQHC) National Clinical Care Colonoscopy standard reporting requirement as a proof of concept for an analytics process to streamline and reduce manual reporting overheads. The endoscopy unit performs approximately 4,500 colonoscopies (mainly outpatient) per year. Quarterly reporting of colonoscopy outcomes requires approximately 30 hours of manual data abstraction, collation and combination from a variety of electronic databases. The most time consuming is manual retrieval and abstraction of histopathology records from the EMR. OBJECTIVE 1. To reduce the manual overheads of quarterly National Standards KPI reporting for colonoscopy compliance using an automated data pipeline and Artificial Intelligence tools. 2. The service also wished to minimise the risk of failure to follow up in new cancer diagnoses for outpatient colonoscopies. 3. To develop a data and analytic pipeline that would be easily re-purposed for additional standards, audit and research projects. METHODS A data pipeline and analysis environment were established in the hospitals’ secure Microsoft Azure databricks resource. A Training data set of 1000 colonoscopies was extracted using from the procedural Provation database using the the ProvationMD ® reporting tool and linked to relevant histopathology reports provided from the Clinical Research Data Warehouse (CRDW). The Machine Learning (ML) training data set was created when histopathological reports were manually coded by Gastroenterology Registrars & nurses into the following categories: Adenoma Clinically Significant Sessile Serrated Adenoma Cancer Adequate Bowel Preparation Complete examination A variety of Natural Language Processing (NLP) & ML models were assessed and refined to minimize error rate. Sensitivity was prioritised for the diagnosis of Cancer to minimize missed cases. Reporting to clinicians and quality co-ordinators was established using Microsoft Power BI. RESULTS The Naïve Bayes model for multinomial data resulted in high accuracy, but impacted recall. Sensitivity improved using a virtual ensemble approach, layering models within the processing pipeline and maximised using Microsoft’s ® Text Analytics – Healthcare NLP model with our custom Naïve Bayes model. F1 scores between 0.89 and 0.93 were achieved. The algorithm checks daily for new data and performs the analysis. Quarterly analysis and reporting time decreased from 30 hours to less than 5 minutes and reports can now be continuously updated in the Microsoft Power BI reporting portal. CONCLUSIONS Advanced analytic techniques can be deployed for mandatory quality reporting in a secure, cloud based, hospital data domain. The cost was far less than the manual processes it replaces. Reporting is more timely as it is automated. The potential for training such algorithms for other QA reporting is high. Text based research and audit within the free text domain of the EMR clinical documentation also becomes possible. CLINICALTRIAL Not applicable


2013 ◽  
Vol 22 (04) ◽  
pp. 1350019 ◽  
Author(s):  
DIEGO VIDAURRE ◽  
CONCHA BIELZA ◽  
PEDRO LARRAÑAGA

The naïve Bayes model is a simple but often satisfactory supervised classification method. The original naïve Bayes scheme, does, however, have a serious weakness, namely, the harmful effect of redundant predictors. In this paper, we study how to apply a regularization technique to learn a computationally efficient classifier that is inspired by naïve Bayes. The proposed formulation, combined with an L1-penalty, is capable of discarding harmful, redundant predictors. A modification of the LARS algorithm is devised to solve this problem. We tackle both real-valued and discrete predictors, assuring that our method is applicable to a wide range of data. In the experimental section, we empirically study the effect of redundant and irrelevant predictors. We also test the method on a high dimensional data set from the neuroscience field, where there are many more predictors than data cases. Finally, we run the method on a real data set than combines categorical with numeric predictors. Our approach is compared with several naïve Bayes variants and other classification algorithms (SVM and kNN), and is shown to be competitive.


Information ◽  
2021 ◽  
Vol 12 (5) ◽  
pp. 204
Author(s):  
Charlyn Villavicencio ◽  
Julio Jerison Macrohon ◽  
X. Alphonse Inbaraj ◽  
Jyh-Horng Jeng ◽  
Jer-Guang Hsieh

A year into the COVID-19 pandemic and one of the longest recorded lockdowns in the world, the Philippines received its first delivery of COVID-19 vaccines on 1 March 2021 through WHO’s COVAX initiative. A month into inoculation of all frontline health professionals and other priority groups, the authors of this study gathered data on the sentiment of Filipinos regarding the Philippine government’s efforts using the social networking site Twitter. Natural language processing techniques were applied to understand the general sentiment, which can help the government in analyzing their response. The sentiments were annotated and trained using the Naïve Bayes model to classify English and Filipino language tweets into positive, neutral, and negative polarities through the RapidMiner data science software. The results yielded an 81.77% accuracy, which outweighs the accuracy of recent sentiment analysis studies using Twitter data from the Philippines.


2012 ◽  
Vol 6-7 ◽  
pp. 576-582
Author(s):  
Ping Li ◽  
Ming Liang Cui ◽  
Zhen Shan Hou ◽  
Liu Liu Wei ◽  
Wen Hao Ying ◽  
...  

Session segmentation can not only contribute a lot to the further and deeper analysis of user’s search behavior but also act as the foundation of other retrieval process researches based on users’ complicated search behaviors. This paper proposes a session boundary discrimination model utilizing time interval and query likelihood on the basis of Naive Bayes Model. Compared with previous study, the model proposed in this paper shows a prominent improvement through experiment in three aspects, which is: recall ratio, precision ratio and value F. Owing to its advantage in session boundary discrimination, the application of the model can serve as a tool in fields like personalized information retrieval, query suggestion, search activity analysis and other fields which is related to search results improvement.


2020 ◽  
pp. 107780122093082
Author(s):  
Laura Johnson ◽  
Elisheva Davidoff ◽  
Abigail R. DeSilva

In New Jersey, collaboration between police departments and advocates from domestic violence organizations is mandated by state policy, which requires law enforcement agencies to participate in domestic violence response teams (DVRTs). The purpose of this study is to examine factors that motivate police officers to implement DVRT. Twenty-four semi-structured interviews were conducted with DVRT coordinators and domestic violence liaison police officers. Findings suggest that police motivation for implementing the intervention is often influenced by perceived benefits to police response and investigation, perceived benefits to victims, the need to comply with mandates, and recognition of domestic violence as a serious crime.


2018 ◽  
Vol 246 ◽  
pp. 03027
Author(s):  
Manfu Ma ◽  
Wei Deng ◽  
Hongtong Liu ◽  
Xinmiao Yun

Due to using the single classification algorithm can not meet the performance requirements of intrusion detection, combined with the numerical value of KNN and the advantage of naive Bayes in the structure of data, an intrusion detection model KNN-NB based on KNN and Naive Bayes hybrid classification algorithm is proposed. The model first preprocesses the NSL-KDD intrusion detection data set. And then by exploiting the advantages of KNN algorithm in data values, the model calculates the distance between the samples according to the feature items and selects the K sample data with the smallest distance. Finally, by naive Bayes to get the final result. The experimental results on the NSL-KDD dataset show that the KNN-NB algorithm can meet the requirement of balanced performance than the traditional KNN and Naive Bayes algorithm in term of accuracy, sensitivity, false detection rate, specificity, and missed detection rate.


2020 ◽  
Vol 19 ◽  
pp. 153303382090982
Author(s):  
Melek Akcay ◽  
Durmus Etiz ◽  
Ozer Celik ◽  
Alaattin Ozen

Background and Aim: Although the prognosis of nasopharyngeal cancer largely depends on a classification based on the tumor-lymph node metastasis staging system, patients at the same stage may have different clinical outcomes. This study aimed to evaluate the survival prognosis of nasopharyngeal cancer using machine learning. Settings and Design: Original, retrospective. Materials and Methods: A total of 72 patients with a diagnosis of nasopharyngeal cancer who received radiotherapy ± chemotherapy were included in the study. The contribution of patient, tumor, and treatment characteristics to the survival prognosis was evaluated by machine learning using the following techniques: logistic regression, artificial neural network, XGBoost, support-vector clustering, random forest, and Gaussian Naive Bayes. Results: In the analysis of the data set, correlation analysis, and binary logistic regression analyses were applied. Of the 18 independent variables, 10 were found to be effective in predicting nasopharyngeal cancer-related mortality: age, weight loss, initial neutrophil/lymphocyte ratio, initial lactate dehydrogenase, initial hemoglobin, radiotherapy duration, tumor diameter, number of concurrent chemotherapy cycles, and T and N stages. Gaussian Naive Bayes was determined as the best algorithm to evaluate the prognosis of machine learning techniques (accuracy rate: 88%, area under the curve score: 0.91, confidence interval: 0.68-1, sensitivity: 75%, specificity: 100%). Conclusion: Many factors affect prognosis in cancer, and machine learning algorithms can be used to determine which factors have a greater effect on survival prognosis, which then allows further research into these factors. In the current study, Gaussian Naive Bayes was identified as the best algorithm for the evaluation of prognosis of nasopharyngeal cancer.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 217917-217927
Author(s):  
Dashe Li ◽  
Jiajun Sun ◽  
Huanhai Yang ◽  
Xueying Wang

2020 ◽  
Vol 541 ◽  
pp. 316-331
Author(s):  
Si-Yuan Liu ◽  
Jing Xiao ◽  
Xiao-Ke Xu

Sign in / Sign up

Export Citation Format

Share Document