A Contributor-Focused Intrinsic Quality Assessment of OpenStreetMap in Mozambique Using Unsupervised Machine Learning

Anyone can contribute geographic information to OpenStreetMap (OSM), regardless of their level of experience or skills, which has raised concerns about quality. When reference data is not available to assess the quality of OSM data, intrinsic methods that assess the data and its metadata can be used. In this study, we applied unsupervised machine learning for analysing OSM history data to get a better understanding of who contributed when and how in Mozambique. Even though no absolute statements can be made about the quality of the data, the results provide valuable insight into the quality. Most of the data in Mozambique (93%) was contributed by a small group of active contributors (25%). However, these were less active than the OSM Foundation’s definition of active contributorship and the Humanitarian OpenStreetMap Team (HOT) definition for intermediate mappers. Compared to other contributor classifications, our results revealed a new class: contributors who were new in the area and most likely attracted by HOT mapping events during disaster relief operations in Mozambique in 2019. More studies in different parts of the world would establish whether the patterns observed here are typical for developing countries. Intrinsic methods cannot replace ground truthing or extrinsic methods, but provide alternative ways for gaining insight about quality, and they can also be used to inform efforts to further improve the quality. We provide suggestions for how contributor-focused intrinsic quality assessments could be further refined.

Download Full-text

Supervised and Unsupervised Machine Learning Methodologies for Crime Pattern Analysis

International Journal of Artificial Intelligence & Applications ◽

10.5121/ijaia.2021.12106 ◽

2021 ◽

Vol 12 (1) ◽

pp. 83-99

Author(s):

Divya Sardana ◽

Shruti Marwaha ◽

Raj Bhatnagar

Keyword(s):

Machine Learning ◽

San Francisco ◽

Pattern Analysis ◽

Gradient Boosting ◽

Unsupervised Machine Learning ◽

Task Forces ◽

Crime Types ◽

Grave Problem ◽

Crime Pattern

Crime is a grave problem that affects all countries in the world. The level of crime in a country has a big impact on its economic growth and quality of life of citizens. In this paper, we provide a survey of trends of supervised and unsupervised machine learning methods used for crime pattern analysis. We use a spatiotemporal dataset of crimes in San Francisco, CA to demonstrate some of these strategies for crime analysis. We use classification models, namely, Logistic Regression, Random Forest, Gradient Boosting and Naive Bayes to predict crime types such as Larceny, Theft, etc. and propose model optimization strategies. Further, we use a graph based unsupervised machine learning technique called core periphery structures to analyze how crime behavior evolves over time. These methods can be generalized to use for different counties and can be greatly helpful in planning police task forces for law enforcement and crime prevention.

Download Full-text

Timely Autonomic Adaptation of Publish/Subscribe Middleware in Dynamic Environments

International Journal of Adaptive Resilient and Autonomic Systems ◽

10.4018/jaras.2011100101 ◽

2011 ◽

Vol 2 (4) ◽

pp. 1-24 ◽

Cited By ~ 9

Author(s):

Joe Hoffert ◽

Aniruddha Gokhale ◽

Douglas C. Schmidt

Keyword(s):

Quality Of Service ◽

Embedded Systems ◽

Real Time ◽

Data Dissemination ◽

Response Times ◽

Dynamic Environments ◽

Power Grids ◽

Disaster Relief Operations ◽

Relief Operations

Quality-of-service enabled publish/subscribe (pub/sub) middleware provides powerful support for scalable data dissemination. It is difficult to maintain key quality of service properties (such as reliability and latency) in dynamic environments for distributed real-time and embedded systems (such as disaster relief operations or power grids). Managing quality of service manually is often not feasible in dynamic environments due to slow response times, the complexity of managing multiple interrelated quality of service settings, and the scale of the systems being managed. For certain domains, distributed real-time and embedded systems must be able to reflect on the conditions of their environment and adapt accordingly in a bounded amount of time. This paper describes an architecture of quality of service-enabled middleware and corresponding algorithms to support specified quality of service in dynamic environments.

Download Full-text

An Unsupervised Machine-Learning Technique for the Definition of a Rule-Based Control Strategy in a Complex HEV

SAE International Journal of Alternative Powertrains ◽

10.4271/2016-01-1243 ◽

2016 ◽

Vol 5 (2) ◽

pp. 308-327 ◽

Cited By ~ 7

Author(s):

Roberto Finesso ◽

Ezio Spessa ◽

Mattia Venditti

Keyword(s):

Machine Learning ◽

Control Strategy ◽

Machine Learning Technique ◽

Unsupervised Machine Learning ◽

Rule Based ◽

Learning Technique ◽

Definition Of

Download Full-text

Timely Autonomic Adaptation of Publish/Subscribe Middleware in Dynamic Environments

Innovations and Approaches for Resilient and Adaptive Systems ◽

10.4018/978-1-4666-2056-8.ch010 ◽

2012 ◽

pp. 172-195 ◽

Cited By ~ 2

Author(s):

Joe Hoffert ◽

Aniruddha Gokhale ◽

Douglas C. Schmidt

Keyword(s):

Quality Of Service ◽

Embedded Systems ◽

Real Time ◽

Data Dissemination ◽

Response Times ◽

Dynamic Environments ◽

Power Grids ◽

Disaster Relief Operations ◽

Relief Operations

Download Full-text

An unsupervised machine learning method for assessing quality of tandem mass spectra

Proteome Science ◽

10.1186/1477-5956-10-s1-s12 ◽

2012 ◽

Vol 10 (Suppl 1) ◽

pp. S12 ◽

Cited By ~ 3

Author(s):

Wenjun Lin ◽

Jianxin Wang ◽

Wen-Jun Zhang ◽

Fang-Xiang Wu

Keyword(s):

Machine Learning ◽

Mass Spectra ◽

Tandem Mass ◽

Machine Learning Method ◽

Learning Method ◽

Unsupervised Machine Learning ◽

Tandem Mass Spectra

Download Full-text

Application of Predictive Maintenance Concepts Using Artificial Intelligence Tools

Applied Sciences ◽

10.3390/app11010018 ◽

2020 ◽

Vol 11 (1) ◽

pp. 18

Author(s):

Diogo Cardoso ◽

Luís Ferreira

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Data Storage ◽

Predictive Maintenance ◽

Operational Performance ◽

Industrial Systems ◽

Equipment Failures ◽

Quality Of Products ◽

Definition Of

The growing competitiveness of the market, coupled with the increase in automation driven with the advent of Industry 4.0, highlights the importance of maintenance within organizations. At the same time, the amount of data capable of being extracted from industrial systems has increased exponentially due to the proliferation of sensors, transmission devices and data storage via Internet of Things. These data, when processed and analyzed, can provide valuable information and knowledge about the equipment, allowing a move towards predictive maintenance. Maintenance is fundamental to a company’s competitiveness, since actions taken at this level have a direct impact on aspects such as cost and quality of products. Hence, equipment failures need to be identified and resolved. Artificial Intelligence tools, in particular Machine Learning, exhibit enormous potential in the analysis of large amounts of data, now readily available, thus aiming to improve the availability of systems, reducing maintenance costs, and increasing operational performance and support in decision making. In this dissertation, Artificial Intelligence tools, more specifically Machine Learning, are applied to a set of data made available online and the specifics of this implementation are analyzed as well as the definition of methodologies, in order to provide information and tools to the maintenance area.

Download Full-text

Data quality issues leading to sub optimal machine learning for money laundering models

Journal of Money Laundering Control ◽

10.1108/jmlc-05-2021-0049 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Abhishek Gupta ◽

Dwijendra Nath Dwivedi ◽

Jigar Shah ◽

Ashish Jain

Keyword(s):

Machine Learning ◽

Data Quality ◽

Money Laundering ◽

Time Lag ◽

Time Duration ◽

Content Type ◽

Case Closure ◽

Definition Of ◽

The Impact

Purpose Good quality input data is critical to developing a robust machine learning model for identifying possible money laundering transactions. McKinsey, during one of the conferences of ACAMS, attributed data quality as one of the reasons for struggling artificial intelligence use cases in compliance to data. There were often use concerns raised on data quality of predictors such as wrong transaction codes, industry classification, etc. However, there has not been much discussion on the most critical variable of machine learning, the definition of an event, i.e. the date on which the suspicious activity reports (SAR) is filed. Design/methodology/approach The team analyzed the transaction behavior of four major banks spread across Asia and Europe. Based on the findings, the team created a synthetic database comprising 2,000 SAR customers mimicking the time of investigation and case closure. In this paper, the authors focused on one very specific area of data quality, the definition of an event, i.e. the SAR/suspicious transaction report. Findings The analysis of few of the banks in Asia and Europe suggests that this itself can improve the effectiveness of model and reduce the prediction span, i.e. the time lag between money laundering transaction done and prediction of money laundering as an alert for investigation Research limitations/implications The analysis was done with existing experience of all situations where the time duration between alert and case closure is high (anywhere between 15 days till 10 months). Team could not quantify the impact of this finding due to lack of such actual case observed so far. Originality/value The key finding from paper suggests that the money launderers typically either increase their level of activity or reduce their activity in the recent quarter. This is not true in terms of real behavior. They typically show a spike in activity through various means during money laundering. This in turn impacts the quality of insights that the model should be trained on. The authors believe that once the financial institutions start speeding up investigations on high risk cases, the scatter plot of SAR behavior will change significantly and will lead to better capture of money laundering behavior and a faster and more precise “catch” rate.

Download Full-text

Applying Unsupervised Machine Learning in Continuous Integration, Security and Deployment Pipeline Automation for Application Software System

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d7387.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 1426-1430

Keyword(s):

Machine Learning ◽

Web Application ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Application Software ◽

Unsupervised Machine Learning ◽

Continuous Integration ◽

And Performance ◽

Agile Software

Continuous integration and Continuous Deployment (CICD) is a trending practice in agile software development. Using Continuous Integration helps the developers to find the bugs before it goes to the production by running unit tests, smoke tests etc. Deploying the components of the application in Production using Continuous Deployment, using this way, the new release of the application reaches the client faster. Continuous Security makes sure that the application is less prone to vulnerabilities by doing static scans on code and dynamic scans on the deployed releases. The goal of this study is to identify the benefits of adapting the Continuous Integration - Continuous Deployment in Application Software. The Pipeline involves Implementation of CI – CS - CD on a web application ClubSoc which is a Club Management Application and using unsupervised machine learning algorithms to detect anomalies in the CI-CS-CD process. The Continuous Integration is implemented using Jenkins CI Tool, Continuous Security is implemented using Veracode Tool and Continuous Deployment is done using Docker and Jenkins. The results have shown by adapting this methodology, the author is able to improve the quality of the code, finding vulnerabilities using static scans, portability and saving time by automation in deployment of applications using Docker and Jenkins. Applying machine learning helps in predicting the defects, failures and trends in the Continuous Integration Pipeline, whereas it can help in predicting the business impact in Continuous Delivery. Unsupervised learning algorithms such as K-means Clustering, Symbolic Aggregate Approximation (SAX) and Markov are used for Quality and Performance Regression analysis in the CICD Model. Using the CICD model, the developers can fix the bugs pre-release and this will impact the company as a whole by raising the profit and attracting more customers. The Updated Application reaches the client faster using Continuous Deployment. By Analyzing the failure trends using Unsupervised machine learning, the developers might be able to predict where the next error is likely to happen and prevent it in the pre-build stage

Download Full-text