Supervised and Unsupervised Machine Learning Methodologies for Crime Pattern Analysis

Crime is a grave problem that affects all countries in the world. The level of crime in a country has a big impact on its economic growth and quality of life of citizens. In this paper, we provide a survey of trends of supervised and unsupervised machine learning methods used for crime pattern analysis. We use a spatiotemporal dataset of crimes in San Francisco, CA to demonstrate some of these strategies for crime analysis. We use classification models, namely, Logistic Regression, Random Forest, Gradient Boosting and Naive Bayes to predict crime types such as Larceny, Theft, etc. and propose model optimization strategies. Further, we use a graph based unsupervised machine learning technique called core periphery structures to analyze how crime behavior evolves over time. These methods can be generalized to use for different counties and can be greatly helpful in planning police task forces for law enforcement and crime prevention.

Download Full-text

Using Machine Learning to Provide Reliable Differentiated Services for IoT in SDN-Like Publish/Subscribe Middleware

Sensors ◽

10.3390/s19061449 ◽

2019 ◽

Vol 19 (6) ◽

pp. 1449 ◽

Cited By ~ 2

Author(s):

Yulong Shi ◽

Yang Zhang ◽

Hans-Arno Jacobsen ◽

Lulu Tang ◽

Geoffrey Elliott ◽

...

Keyword(s):

Machine Learning ◽

Loss Rate ◽

Differentiated Services ◽

Gradient Boosting ◽

Queuing Delay ◽

End To End Delay ◽

Extreme Gradient Boosting ◽

Real Value ◽

Difficult Issue

At present, most publish/subscribe middlewares suppose that there are equal Quality of Service (QoS) requirements for all users. However, in many real-world Internet of Things (IoT) service scenarios, different users may have different delay requirements. How to provide reliable differentiated services has become an urgent problem. The rise of Software-Defined Networking (SDN) provides endless possibilities to improve the QoS of publish/subscribe middlewares due to its greater programmability. We can encode event topics and priorities into flow entries of SDN switches directly to meet customized requirements. In this paper, we first propose an SDN-like publish/subscribe middleware architecture and describe how to use this architecture and priority queues supported by OpenFlow switches to realize differentiated services. Then we present a machine learning method using the eXtreme Gradient Boosting (XGBoost) model to solve the difficult issue of getting the queuing delay of switches accurately. Finally, we propose a reliable differentiated services guarantee mechanism according to the queuing delay and the programmability of SDN to improve QoS, namely, a two-layer queue management mechanism. Experimental evaluations show that the delay predicted by the XGBoost method is closer to the real value; our mechanism can save end-to-end delay, reduce packet loss rate, and allocate bandwidth more reasonably.

Download Full-text

Ammonoid Taxonomy with Supervised and Unsupervised Machine Learning Algorithms

10.31233/osf.io/ewkx9 ◽

2021 ◽

Author(s):

Floe Foxon

Keyword(s):

Machine Learning ◽

Naive Bayes ◽

Learning Algorithms ◽

Clustering Algorithms ◽

Measurement Data ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Unsupervised Machine Learning

Ammonoid identification is crucial to biostratigraphy, systematic palaeontology, and evolutionary biology, but may prove difficult when shell features and sutures are poorly preserved. This necessitates novel approaches to ammonoid taxonomy. This study aimed to taxonomize ammonoids by their conch geometry using supervised and unsupervised machine learning algorithms. Ammonoid measurement data (conch diameter, whorl height, whorl width, and umbilical width) were taken from the Paleobiology Database (PBDB). 11 species with ≥50 specimens each were identified providing N=781 total unique specimens. Naive Bayes, Decision Tree, Random Forest, Gradient Boosting, K-Nearest Neighbours, and Support Vector Machine classifiers were applied to the PBDB data with a 5x5 nested cross-validation approach to obtain unbiased generalization performance estimates across a grid search of algorithm parameters. All supervised classifiers achieved ≥70% accuracy in identifying ammonoid species, with Naive Bayes demonstrating the least over-fitting. The unsupervised clustering algorithms K-Means, DBSCAN, OPTICS, Mean Shift, and Affinity Propagation achieved Normalized Mutual Information scores of ≥0.6, with the centroid-based methods having most success. This presents a reasonably-accurate proof-of-concept approach to ammonoid classification which may assist identification in cases where more traditional methods are not feasible.

Download Full-text

Research on dairy products detection based on machine learning algorithm

MATEC Web of Conferences ◽

10.1051/matecconf/202235503008 ◽

2022 ◽

Vol 355 ◽

pp. 03008

Author(s):

Yang Zhang ◽

Lei Zhang ◽

Yabin Ma ◽

Jinsen Guan ◽

Zhaoxia Liu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Electronic Nose ◽

Milk Fat ◽

Dairy Products ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Extreme Gradient Boosting

In this study, an electronic nose model composed of seven kinds of metal oxide semiconductor sensors was developed to distinguish the milk source (the dairy farm to which milk belongs), estimate the content of milk fat and protein in milk, to identify the authenticity and evaluate the quality of milk. The developed electronic nose is a low-cost and non-destructive testing equipment. (1) For the identification of milk sources, this paper uses the method of combining the electronic nose odor characteristics of milk and the component characteristics to distinguish different milk sources, and uses Principal Component Analysis (PCA) and Linear Discriminant Analysis , LDA) for dimensionality reduction analysis, and finally use three machine learning algorithms such as Logistic Regression (LR), Support Vector Machine (SVM) and Random Forest (RF) to build a milk source (cow farm) Identify the model and evaluate and compare the classification effects. The experimental results prove that the classification effect of the SVM-LDA model based on the electronic nose odor characteristics is better than other single feature models, and the accuracy of the test set reaches 91.5%. The RF-LDA and SVM-LDA models based on the fusion feature of the two have the best effect Set accuracy rate is as high as 96%. (2) The three algorithms, Gradient Boosting Decision Tree (GBDT), Extreme Gradient Boosting (XGBoost) and Random Forest (RF), are used to construct the electronic nose odor data for milk fat rate and protein rate. The method of estimating the model, the results show that the RF model has the best estimation performance( R2 =0.9399 for milk fat; R2=0.9301for milk protein). And it prove that the method proposed in this study can improve the estimation accuracy of milk fat and protein, which provides a technical basis for predicting the quality of dairy products.

Download Full-text

A Contributor-Focused Intrinsic Quality Assessment of OpenStreetMap in Mozambique Using Unsupervised Machine Learning

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10030156 ◽

2021 ◽

Vol 10 (3) ◽

pp. 156

Author(s):

Aphiwe Madubedube ◽

Serena Coetzee ◽

Victoria Rautenbach

Keyword(s):

Machine Learning ◽

Valuable Insight ◽

Unsupervised Machine Learning ◽

Disaster Relief Operations ◽

Intrinsic Quality ◽

Relief Operations ◽

Ground Truthing ◽

Definition Of ◽

Level Of Experience

Anyone can contribute geographic information to OpenStreetMap (OSM), regardless of their level of experience or skills, which has raised concerns about quality. When reference data is not available to assess the quality of OSM data, intrinsic methods that assess the data and its metadata can be used. In this study, we applied unsupervised machine learning for analysing OSM history data to get a better understanding of who contributed when and how in Mozambique. Even though no absolute statements can be made about the quality of the data, the results provide valuable insight into the quality. Most of the data in Mozambique (93%) was contributed by a small group of active contributors (25%). However, these were less active than the OSM Foundation’s definition of active contributorship and the Humanitarian OpenStreetMap Team (HOT) definition for intermediate mappers. Compared to other contributor classifications, our results revealed a new class: contributors who were new in the area and most likely attracted by HOT mapping events during disaster relief operations in Mozambique in 2019. More studies in different parts of the world would establish whether the patterns observed here are typical for developing countries. Intrinsic methods cannot replace ground truthing or extrinsic methods, but provide alternative ways for gaining insight about quality, and they can also be used to inform efforts to further improve the quality. We provide suggestions for how contributor-focused intrinsic quality assessments could be further refined.

Download Full-text

Design of 1-year mortality forecast at hospital admission: A machine learning approach

Health Informatics Journal ◽

10.1177/1460458220987580 ◽

2021 ◽

Vol 27 (1) ◽

pp. 146045822098758

Author(s):

Vicent Blanes-Selva ◽

Vicente Ruiz-García ◽

Salvador Tortajada ◽

José-Miguel Benedí ◽

Bernardo Valdivieso ◽

...

Keyword(s):

Machine Learning ◽

Palliative Care ◽

Hospital Admission ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Standard Procedures ◽

Learning Techniques ◽

Machine Learning Approach ◽

Using Data

Palliative care is referred to a set of programs for patients that suffer life-limiting illnesses. These programs aim to maximize the quality of life (QoL) for the last stage of life. They are currently based on clinical evaluation of the risk of 1-year mortality. The main aim of this work is to develop and validate machine-learning-based models to predict the exitus of a patient within the next year using data gathered at hospital admission. Five machine-learning techniques were applied using a retrospective dataset. The evaluation was performed with five metrics computed by a resampling strategy: Accuracy, the area under the ROC curve, Specificity, Sensitivity, and the Balanced Error Rate. All models reported an AUC ROC from 0.857 to 0.91. Specifically, Gradient Boosting Classifier was the best model, producing an AUC ROC of 0.91, a sensitivity of 0.858, a specificity of 0.808, and a BER of 0.1687. Information from standard procedures at hospital admission combined with machine learning techniques produced models with competitive discriminative power. Our models reach the best results reported in the state of the art. These results demonstrate that they can be used as an accurate data-driven palliative care criteria inclusion.

Download Full-text

Sign language dactyl recognition based on machine learning algorithms

Eastern-European Journal of Enterprise Technologies ◽

10.15587/1729-4061.2021.239253 ◽

2021 ◽

Vol 4 (2(112)) ◽

pp. 58-72

Author(s):

Chingiz Kenshimov ◽

Zholdas Buribayev ◽

Yedilkhan Amirgaliyev ◽

Aisulyu Ataniyazova ◽

Askhat Aitimov

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Sign Language ◽

Gesture Recognition ◽

Research Work ◽

Gradient Boosting ◽

Support Vector ◽

Extreme Gradient Boosting

In the course of our research work, the American, Russian and Turkish sign languages were analyzed. The program of recognition of the Kazakh dactylic sign language with the use of machine learning methods is implemented. A dataset of 5000 images was formed for each gesture, gesture recognition algorithms were applied, such as Random Forest, Support Vector Machine, Extreme Gradient Boosting, while two data types were combined into one database, which caused a change in the architecture of the system as a whole. The quality of the algorithms was also evaluated. The research work was carried out due to the fact that scientific work in the field of developing a system for recognizing the Kazakh language of sign dactyls is currently insufficient for a complete representation of the language. There are specific letters in the Kazakh language, because of the peculiarities of the spelling of the language, problems arise when developing recognition systems for the Kazakh sign language. The results of the work showed that the Support Vector Machine and Extreme Gradient Boosting algorithms are superior in real-time performance, but the Random Forest algorithm has high recognition accuracy. As a result, the accuracy of the classification algorithms was 98.86 % for Random Forest, 98.68 % for Support Vector Machine and 98.54 % for Extreme Gradient Boosting. Also, the evaluation of the quality of the work of classical algorithms has high indicators. The practical significance of this work lies in the fact that scientific research in the field of gesture recognition with the updated alphabet of the Kazakh language has not yet been conducted and the results of this work can be used by other researchers to conduct further research related to the recognition of the Kazakh dactyl sign language, as well as by researchers, engaged in the development of the international sign language

Download Full-text

An unsupervised machine learning method for assessing quality of tandem mass spectra

Proteome Science ◽

10.1186/1477-5956-10-s1-s12 ◽

2012 ◽

Vol 10 (Suppl 1) ◽

pp. S12 ◽

Cited By ~ 3

Author(s):

Wenjun Lin ◽

Jianxin Wang ◽

Wen-Jun Zhang ◽

Fang-Xiang Wu

Keyword(s):

Machine Learning ◽

Mass Spectra ◽

Tandem Mass ◽

Machine Learning Method ◽

Learning Method ◽

Unsupervised Machine Learning ◽

Tandem Mass Spectra

Download Full-text

Applying Unsupervised Machine Learning in Continuous Integration, Security and Deployment Pipeline Automation for Application Software System

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d7387.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 1426-1430

Keyword(s):

Machine Learning ◽

Web Application ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Application Software ◽

Unsupervised Machine Learning ◽

Continuous Integration ◽

And Performance ◽

Agile Software

Continuous integration and Continuous Deployment (CICD) is a trending practice in agile software development. Using Continuous Integration helps the developers to find the bugs before it goes to the production by running unit tests, smoke tests etc. Deploying the components of the application in Production using Continuous Deployment, using this way, the new release of the application reaches the client faster. Continuous Security makes sure that the application is less prone to vulnerabilities by doing static scans on code and dynamic scans on the deployed releases. The goal of this study is to identify the benefits of adapting the Continuous Integration - Continuous Deployment in Application Software. The Pipeline involves Implementation of CI – CS - CD on a web application ClubSoc which is a Club Management Application and using unsupervised machine learning algorithms to detect anomalies in the CI-CS-CD process. The Continuous Integration is implemented using Jenkins CI Tool, Continuous Security is implemented using Veracode Tool and Continuous Deployment is done using Docker and Jenkins. The results have shown by adapting this methodology, the author is able to improve the quality of the code, finding vulnerabilities using static scans, portability and saving time by automation in deployment of applications using Docker and Jenkins. Applying machine learning helps in predicting the defects, failures and trends in the Continuous Integration Pipeline, whereas it can help in predicting the business impact in Continuous Delivery. Unsupervised learning algorithms such as K-means Clustering, Symbolic Aggregate Approximation (SAX) and Markov are used for Quality and Performance Regression analysis in the CICD Model. Using the CICD model, the developers can fix the bugs pre-release and this will impact the company as a whole by raising the profit and attracting more customers. The Updated Application reaches the client faster using Continuous Deployment. By Analyzing the failure trends using Unsupervised machine learning, the developers might be able to predict where the next error is likely to happen and prevent it in the pre-build stage

Download Full-text

Spatial and temporal variations in river water quality of the Middle Ganga Basin using unsupervised machine learning techniques

Environmental Monitoring and Assessment ◽

10.1007/s10661-020-08624-4 ◽

2020 ◽

Vol 192 (12) ◽

Author(s):

Ashwitha Krishnaraj ◽

Paresh Chandra Deka

Keyword(s):

Machine Learning ◽

Water Quality ◽

River Water ◽

Temporal Variations ◽

Machine Learning Techniques ◽

Spatial And Temporal Variations ◽

Ganga Basin ◽

Unsupervised Machine Learning ◽

Learning Techniques

Download Full-text

Prediction of Patient Readmission via Machine Learning Algorithms

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f7770.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 3226-3232

Keyword(s):

Machine Learning ◽

Hospital Readmission ◽

Machine Learning Algorithms ◽

High Rate ◽

Machine Learning Techniques ◽

Healthcare Sector ◽

Gradient Boosting ◽

Support Vector ◽

Linear Discriminant

Predicting the probability of hospital readmission is one of the most vital issues and is considered to be an important research area in the healthcare sector. For curing any of the diseases that might arise, there shall be some essential resources such as medical staff, expertise, beds and rooms. This secures getting excellent medical service. For example, heart failure (HF) or diabetes is a syndrome that could reduce the living quality of patients and has a serious influence on systems of healthcare. The previously mentioned diseases can result in high rate of readmission and hence high rate of costs as well. In this case, algorithms of machine learning are utilized to curb readmissions levels and improve the life quality of patients. Unluckily, a comparatively few numbers of researches in the literature endeavored to address this issue while a large proportion of researches were interested in predicting the probability of detecting diseases. Despite there is a plainly visible shortage on this topic, this paper seeks to spot most of the studies related to predict the probability of hospital readmission by the usage of machine learning techniques such as such as Logistic Regression (LR), Support Vector Machine (SVM), Artificial Neural Networks (ANNs), Linear Discriminant Analysis (LDA), Bayes algorithm, Random Forest (RF), Decision Trees (DTs), AdaBoost and Gradient Boosting (GB). Specifically, we explore the different techniques used in a medical area under the machine learning research field. In addition, we define four features that are used as criteria for an effective comparison among the employed techniques. These features include goal, data size, method, and performance. Furthermore, some recommendations are drawn from the comparison which is related to the selection of the best techniques in the medical field. Based on the outcomes of this research, it was found out that (bagging and DT) is the best technique to predict diabetes, whereas SVM is the best technique when it comes to prediction the breast cancer, and hospital readmission.

Download Full-text