Mapping the risk terrain for crime using machine learning

Mapping Intimacies ◽

10.31235/osf.io/xc538 ◽

2020 ◽

Author(s):

Andrew Palmer Wheeler ◽

Wouter Steenbeek

Keyword(s):

Machine Learning ◽

Random Forests ◽

Predictive Accuracy ◽

Learning Algorithm ◽

Kernel Density ◽

Black Box ◽

Terrain Models ◽

Interpretable Model ◽

Spatially Varying

Objectives: We illustrate how a machine learning algorithm, Random Forests, can provide accurate long-term predictions of crime at micro places relative to other popular techniques. We also show how recent advances in model summaries can help to open the ‘black box’ of Random Forests, considerably improving their interpretability.Methods: We generate long-term crime forecasts for robberies in Dallas at 200 by 200 feet grid cells that allow spatially varying associations of crime generators and demographic factors across the study area. We then show how using interpretable model summaries facilitate understanding the model’s inner workings.Results: We find that Random Forests greatly outperform Risk Terrain Models and Kernel Density Estimation in terms of forecasting future crimes using different measures of predictive accuracy, but only slightly outperform using prior counts of crime. We find different factors that predict crime are highly non-linear and vary over space. Conclusions: We show how using black-box machine learning models can provide accurate micro placed based crime predictions, but still be interpreted in a manner that fosters understanding of why a place is predicted to be risky.Data and code to replicate the results can be downloaded from https://www.dropbox.com/sh/b3n9a6z5xw14rd6/AAAjqnoMVKjzNQnWP9eu7M1ra?dl=0

Download Full-text

Multi-Class Assessment Based on Random Forests

Education Sciences ◽

10.3390/educsci11030092 ◽

2021 ◽

Vol 11 (3) ◽

pp. 92

Author(s):

Mehdi Berriri ◽

Sofiane Djema ◽

Gaëtan Rey ◽

Christel Dartigues-Pallez

Keyword(s):

Higher Education ◽

Machine Learning ◽

Random Forests ◽

Learning Algorithm ◽

Teaching Staff ◽

Machine Learning Algorithm ◽

Process Data ◽

Training Courses ◽

Education Courses

Today, many students are moving towards higher education courses that do not suit them and end up failing. The purpose of this study is to help provide counselors with better knowledge so that they can offer future students courses corresponding to their profile. The second objective is to allow the teaching staff to propose training courses adapted to students by anticipating their possible difficulties. This is possible thanks to a machine learning algorithm called Random Forest, allowing for the classification of the students depending on their results. We had to process data, generate models using our algorithm, and cross the results obtained to have a better final prediction. We tested our method on different use cases, from two classes to five classes. These sets of classes represent the different intervals with an average ranging from 0 to 20. Thus, an accuracy of 75% was achieved with a set of five classes and up to 85% for sets of two and three classes.

Download Full-text

Application of a Rough Set-Based Inductive Learning System

Fundamenta Informaticae ◽

10.3233/fi-1993-182-409 ◽

1993 ◽

Vol 18 (2-4) ◽

pp. 209-220

Author(s):

Michael Hadjimichael ◽

Anita Wasilewska

Keyword(s):

Machine Learning ◽

Rough Set ◽

Presidential Election ◽

Predictive Accuracy ◽

Learning Algorithm ◽

Inductive Learning ◽

Real Data ◽

Semantic Content ◽

Learning System ◽

Voter Preferences

We present here an application of Rough Set formalism to Machine Learning. The resulting Inductive Learning algorithm is described, and its application to a set of real data is examined. The data consists of a survey of voter preferences taken during the 1988 presidential election in the U.S.A. Results include an analysis of the predictive accuracy of the generated rules, and an analysis of the semantic content of the rules.

Download Full-text

Analyzing the Impact of Climate Factors on GNSS-Derived Displacements by Combining the Extended Helmert Transformation and XGboost Machine Learning Algorithm

Journal of Sensors ◽

10.1155/2021/9926442 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Hanlin Liu ◽

Linqiang Yang ◽

Linchao Li

Keyword(s):

Machine Learning ◽

Puerto Rico ◽

Reference Frame ◽

Learning Algorithm ◽

Virgin Islands ◽

Machine Learning Algorithm ◽

Climate Factors ◽

Helmert Transformation ◽

The Impact

A variety of climate factors influence the precision of the long-term Global Navigation Satellite System (GNSS) monitoring data. To precisely analyze the effect of different climate factors on long-term GNSS monitoring records, this study combines the extended seven-parameter Helmert transformation and a machine learning algorithm named Extreme Gradient boosting (XGboost) to establish a hybrid model. We established a local-scale reference frame called stable Puerto Rico and Virgin Islands reference frame of 2019 (PRVI19) using ten continuously operating long-term GNSS sites located in the rigid portion of the Puerto Rico and Virgin Islands (PRVI) microplate. The stability of PRVI19 is approximately 0.4 mm/year and 0.5 mm/year in the horizontal and vertical directions, respectively. The stable reference frame PRVI19 can avoid the risk of bias due to long-term plate motions when studying localized ground deformation. Furthermore, we applied the XGBoost algorithm to the postprocessed long-term GNSS records and daily climate data to train the model. We quantitatively evaluated the importance of various daily climate factors on the GNSS time series. The results show that wind is the most influential factor with a unit-less index of 0.013. Notably, we used the model with climate and GNSS records to predict the GNSS-derived displacements. The results show that the predicted displacements have a slightly lower root mean square error compared to the fitted results using spline method (prediction: 0.22 versus fitted: 0.31). It indicates that the proposed model considering the climate records has the appropriate predict results for long-term GNSS monitoring.

Download Full-text

The Role of Textualisation and Argumentation in Understanding the Machine Learning Process

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/765 ◽

2017 ◽

Author(s):

Kacper Sokol ◽

Peter Flach

Keyword(s):

Machine Learning ◽

Predictive Accuracy ◽

Spatial Perception ◽

Black Box ◽

High Dimensional ◽

Box Models ◽

Machine Learning Applications ◽

Black Box Models ◽

Machine Learning Models

Understanding data, models and predictions is important for machine learning applications. Due to the limitations of our spatial perception and intuition, analysing high-dimensional data is inherently difficult. Furthermore, black-box models achieving high predictive accuracy are widely used, yet the logic behind their predictions is often opaque. Use of textualisation -- a natural language narrative of selected phenomena -- can tackle these shortcomings. When extended with argumentation theory we could envisage machine learning models and predictions arguing persuasively for their choices.

Download Full-text

Random Forests for Genetic Association Studies

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1691 ◽

2011 ◽

Vol 10 (1) ◽

Cited By ~ 85

Author(s):

Benjamin A Goldstein ◽

Eric C Polley ◽

Farren B. S. Briggs

Keyword(s):

Machine Learning ◽

Genetic Association ◽

Random Forests ◽

Learning Algorithm ◽

Association Studies ◽

Genetic Association Studies ◽

Machine Learning Algorithms ◽

Computationally Efficient ◽

Genetic Studies ◽

Variable Importance Measures

The Random Forests (RF) algorithm has become a commonly used machine learning algorithm for genetic association studies. It is well suited for genetic applications since it is both computationally efficient and models genetic causal mechanisms well. With its growing ubiquity, there has been inconsistent and less than optimal use of RF in the literature. The purpose of this review is to breakdown the theoretical and statistical basis of RF so that practitioners are able to apply it in their work. An emphasis is placed on showing how the various components contribute to bias and variance, as well as discussing variable importance measures. Applications specific to genetic studies are highlighted. To provide context, RF is compared to other commonly used machine learning algorithms.

Download Full-text

Assessment of appropriate body mass index cut-off points for long-term mortality among ST-elevation myocardial infarction survivors in Asian population using machine learning algorithm

Heart and Vessels ◽

10.1007/s00380-021-01916-w ◽

2021 ◽

Author(s):

Naoki Yoshioka ◽

Kensuke Takagi ◽

Akihito Tanaka ◽

Yasuhiro Morita ◽

Ruka Yoshida ◽

...

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Body Mass Index ◽

Body Mass ◽

Learning Algorithm ◽

Asian Population ◽

Machine Learning Algorithm ◽

St Elevation ◽

Long Term Mortality

Download Full-text

Predicting Bank Operational Efficiency Using Machine Learning Algorithm: Comparative Study of Decision Tree, Random Forest, and Neural Networks

Advances in Fuzzy Systems ◽

10.1155/2020/8581202 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Peter Appiahene ◽

Yaw Marfo Missah ◽

Ussiph Najim

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Banking Sector ◽

Banking Industry ◽

Predictive Accuracy ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Algorithm ◽

And Performance

The financial crisis that hit Ghana from 2015 to 2018 has raised various issues with respect to the efficiency of banks and the safety of depositors’ in the banking industry. As part of measures to improve the banking sector and also restore customers’ confidence, efficiency and performance analysis in the banking industry has become a hot issue. This is because stakeholders have to detect the underlying causes of inefficiencies within the banking industry. Nonparametric methods such as Data Envelopment Analysis (DEA) have been suggested in the literature as a good measure of banks’ efficiency and performance. Machine learning algorithms have also been viewed as a good tool to estimate various nonparametric and nonlinear problems. This paper presents a combined DEA with three machine learning approaches in evaluating bank efficiency and performance using 444 Ghanaian bank branches, Decision Making Units (DMUs). The results were compared with the corresponding efficiency ratings obtained from the DEA. Finally, the prediction accuracies of the three machine learning algorithm models were compared. The results suggested that the decision tree (DT) and its C5.0 algorithm provided the best predictive model. It had 100% accuracy in predicting the 134 holdout sample dataset (30% banks) and a P value of 0.00. The DT was followed closely by random forest algorithm with a predictive accuracy of 98.5% and a P value of 0.00 and finally the neural network (86.6% accuracy) with a P value 0.66. The study concluded that banks in Ghana can use the result of this study to predict their respective efficiencies. All experiments were performed within a simulation environment and conducted in R studio using R codes.

Download Full-text

A Predictive Model for Suicidal Ideation of Adolescents Using Random Forests Machine Learning Algorithm

Korean Journal of Social Welfare ◽

10.20970/kasw.2020.72.3.007 ◽

2020 ◽

Vol 72 (3) ◽

pp. 157-180

Author(s):

Ki Hye Hong

Keyword(s):

Machine Learning ◽

Suicidal Ideation ◽

Predictive Model ◽

Random Forests ◽

Learning Algorithm ◽

Machine Learning Algorithm

Download Full-text

Precipitation Estimates from MSG SEVIRI Daytime, Nighttime, and Twilight Data with Random Forests

Journal of Applied Meteorology and Climatology ◽

10.1175/jamc-d-14-0082.1 ◽

2014 ◽

Vol 53 (11) ◽

pp. 2457-2480 ◽

Cited By ~ 30

Author(s):

Meike Kühnlein ◽

Tim Appelhans ◽

Boris Thies ◽

Thomas Nauß

Keyword(s):

Machine Learning ◽

Random Forests ◽

Learning Algorithm ◽

Radar Data ◽

Convective Precipitation ◽

General Tendency ◽

Learning Approaches ◽

Rain Area ◽

Individual Step ◽

Performance Patterns

AbstractA new rainfall retrieval technique for determining rainfall rates in a continuous manner (day, twilight, and night) resulting in a 24-h estimation applicable to midlatitudes is presented. The approach is based on satellite-derived information on cloud-top height, cloud-top temperature, cloud phase, and cloud water path retrieved from Meteosat Second Generation (MSG) Spinning Enhanced Visible and Infrared Imager (SEVIRI) data and uses the random forests (RF) machine-learning algorithm. The technique is realized in three steps: (i) precipitating cloud areas are identified, (ii) the areas are separated into convective and advective-stratiform precipitating areas, and (iii) rainfall rates are assigned separately to the convective and advective-stratiform precipitating areas. Validation studies were carried out for each individual step as well as for the overall procedure using collocated ground-based radar data. Regarding each individual step, the models for rain area and convective precipitation detection produce good results. Both retrieval steps show a general tendency toward elevated prediction skill during summer months and daytime. The RF models for rainfall-rate assignment exhibit similar performance patterns, yet it is noteworthy how well the model is able to predict rainfall rates during nighttime and twilight. The performance of the overall procedure shows a very promising potential to estimate rainfall rates at high temporal and spatial resolutions in an automated manner. The near-real-time continuous applicability of the technique with acceptable prediction performances at 3–8-hourly intervals is particularly remarkable. This provides a very promising basis for future investigations into precipitation estimation based on machine-learning approaches and MSG SEVIRI data.

Download Full-text

Amyloid PET-Positive Predictability of Machine Learning Algorithm Based on MDS-OAβ Levels

10.21203/rs.3.rs-578834/v1 ◽

2021 ◽

Author(s):

Young Chul Youn ◽

Jung-Min Pyun ◽

Hye Ryoun Kim ◽

Sungmin Kang ◽

Nayoung Ryoo ◽

...

Keyword(s):

Machine Learning ◽

Predictive Accuracy ◽

Learning Algorithm ◽

Detection System ◽

Amyloid Β ◽

Machine Learning Algorithms ◽

Amyloid Pet ◽

Amyloid Positron Emission Tomography ◽

Positron Emission ◽

Negative Findings

Abstract Background: The Multimer Detection System-Oligomeric amyloid-β (MDS-OAβ) level is a valuable blood-based biomarker for Alzheimer’s disease (AD). We used machine learning algorithms trained using multi-center datasets to examine whether blood MDS-OAβ values can predict AD-associated changes in the brain.Methods: A logistic regression model using TensorFlow (ver. 2.3.0) was applied to data obtained from 163 participants (amyloid positron emission tomography [PET]-positive and -negative findings in 102 and 61 participants, respectively). Algorithms with various combinations of features (MDS-OAβ levels, age, gender, and anticoagulant type) were tested 50 times on each dataset. Results: The predictive accuracy, sensitivity, and specificity values of blood MDS-OAβ levels for amyloid PET positivity were 78.16±4.97%, 83.87±9.40%, and 70.00±13.13%, respectively.Conclusions: The findings from this multi-center machine learning-based study suggest that MDS-OAβ values may be used to predict amyloid PET-positivity.

Download Full-text