Grid-Based Crime Prediction Using Geographical Features

Machine learning is useful for grid-based crime prediction. Many previous studies have examined factors including time, space, and type of crime, but the geographic characteristics of the grid are rarely discussed, leaving prediction models unable to predict crime displacement. This study incorporates the concept of a criminal environment in grid-based crime prediction modeling, and establishes a range of spatial-temporal features based on 84 types of geographic information by applying the Google Places API to theft data for Taoyuan City, Taiwan. The best model was found to be Deep Neural Networks, which outperforms the popular Random Decision Forest, Support Vector Machine, and K-Near Neighbor algorithms. After tuning, compared to our design’s baseline 11-month moving average, the F1 score improves about 7% on 100-by-100 grids. Experiments demonstrate the importance of the geographic feature design for improving performance and explanatory ability. In addition, testing for crime displacement also shows that our model design outperforms the baseline.

Download Full-text

Traffic Prediction using Time-Space Diagram: A Convolutional Neural Network Approach

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198119841291 ◽

2019 ◽

Vol 2673 (7) ◽

pp. 425-435 ◽

Cited By ~ 5

Author(s):

Mohammadreza Khajeh Hosseini ◽

Alireza Talebpour

Keyword(s):

Data Analysis ◽

Traffic Management ◽

Moving Average ◽

Traffic Prediction ◽

Support Vector ◽

Learning Approaches ◽

Neural Network Approach ◽

Time Space ◽

Traffic Management System ◽

The Individual

Traffic prediction is a major component of any traffic management system. With the increase in data sources and advancement in connectivity, data analysis and machine learning approaches for traffic prediction have gained a lot of attention. Most of the existing data analysis approaches in traffic prediction rely on aggregated inputs such as flow and density, with limited studies using the individual vehicle-level data. The time-space diagram of the vehicles can be constructed from the connected vehicles’ data. This plot is comprehensive and contains all the information about traffic flow dynamics at both microscopic and macroscopic levels. Accordingly, this study introduces a deep learning-based methodology to directly predict the traffic state based on the time-space diagram with the use of convolutional neural networks (CNN). The time-space diagram is directly used as the input to the traffic prediction model using a CNN. The prediction capability of the proposed model is compared with multilayer perceptron, support vector regression, and autoregressive integrated moving average, and the results indicate a superior capability of CNN in predicting flow and density across all possible values of these parameters.

Download Full-text

Forecasting of Short-Term Metro Ridership with Support Vector Machine Online Model

Journal of Advanced Transportation ◽

10.1155/2018/3189238 ◽

2018 ◽

Vol 2018 ◽

pp. 1-13 ◽

Cited By ~ 9

Author(s):

Xuemei Wang ◽

Ning Zhang ◽

Yunlong Zhang ◽

Zhuangbin Shi

Keyword(s):

Support Vector Machine ◽

Prediction Models ◽

Moving Average ◽

Back Propagation ◽

Back Propagation Neural Network ◽

Support Vector ◽

Time Interval ◽

Short Term ◽

Nonlinear Characteristics ◽

Predicted Values

Forecasting for short-term ridership is the foundation of metro operation and management. A prediction model is necessary to seize the weekly periodicity and nonlinearity characteristics of short-term ridership in real-time. First, this research captures the inherent periodicity of ridership via seasonal autoregressive integrated moving average model (SARIMA) and proposes a support vector machine overall online model (SVMOOL) which insets the weekly periodic characteristics and trains the updated data day by day. Then, this research captures the nonlinear characteristics of the ridership via successive ridership value inputs and proposes a support vector machine partial online model (SVMPOL) which insets the nonlinear characteristics and trains the updated data of the predicted day by time interval (such as 5-min). Afterwards, to avoid the drawbacks and to take advantages of the strengths of the two individual online models, this research takes the average predicted values of two models as the final predicted values, which are called support vector machine combined online model (SVMCOL). Finally, this research uses the 5-min ridership at Zhujianglu and Sanshanjie Stations of Nanjing Metro to compare the SVMCOL model with three well-known prediction models including SARIMA, back-propagation neural network (BPNN), and SVM models. The resultant performance comparisons suggest that SARIMA is superior for the stable weekday ridership to other models. Yet the SVMCOL model is the best performer for the unstable weekend ridership and holiday ridership. It shows that for metro operation manager that gear toward timely response to real-world unstable and abnormal situations, the SVMCOL may be a better tool than the three well-known models.

Download Full-text

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Download Full-text

Integrating Second-order Moving Average and Over-sampling Algorithm to Predict Apoptosis Protein Subcellular Localization

Current Bioinformatics ◽

10.2174/1574893614666190902155811 ◽

2020 ◽

Vol 15 (6) ◽

pp. 517-527

Author(s):

Yunyun Liang ◽

Shengli Zhang

Keyword(s):

Subcellular Localization ◽

Moving Average ◽

Subcellular Location ◽

Second Order ◽

Test Method ◽

Support Vector ◽

Protein Subcellular Localization ◽

Protein Subcellular Location ◽

Apoptosis Protein ◽

Leibler Divergence

Background: Apoptosis proteins have a key role in the development and the homeostasis of the organism, and are very important to understand the mechanism of cell proliferation and death. The function of apoptosis protein is closely related to its subcellular location. Objective: Prediction of apoptosis protein subcellular localization is a meaningful task. Methods: In this study, we predict the apoptosis protein subcellular location by using the PSSMbased second-order moving average descriptor, nonnegative matrix factorization based on Kullback-Leibler divergence and over-sampling algorithms. This model is named by SOMAPKLNMF- OS and constructed on the ZD98, ZW225 and CL317 benchmark datasets. Then, the support vector machine is adopted as the classifier, and the bias-free jackknife test method is used to evaluate the accuracy. Results: Our prediction system achieves the favorable and promising performance of the overall accuracy on the three datasets and also outperforms the other listed models. Conclusion: The results show that our model offers a high throughput tool for the identification of apoptosis protein subcellular localization.

Download Full-text

Machine Learning Methods Applied to the Prediction of Pseudo-nitzschia spp. Blooms in the Galician Rias Baixas (NW Spain)

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10040199 ◽

2021 ◽

Vol 10 (4) ◽

pp. 199

Author(s):

Francisco M. Bellas Aláez ◽

Jesus M. Torres Palenzuela ◽

Evangelos Spyrakos ◽

Luis González Vilas

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Prediction Models ◽

Support Vector ◽

False Alarms ◽

Learning Approaches ◽

Learning Methods ◽

Machine Learning Methods ◽

Rías Baixas ◽

New Algorithms

This work presents new prediction models based on recent developments in machine learning methods, such as Random Forest (RF) and AdaBoost, and compares them with more classical approaches, i.e., support vector machines (SVMs) and neural networks (NNs). The models predict Pseudo-nitzschia spp. blooms in the Galician Rias Baixas. This work builds on a previous study by the authors (doi.org/10.1016/j.pocean.2014.03.003) but uses an extended database (from 2002 to 2012) and new algorithms. Our results show that RF and AdaBoost provide better prediction results compared to SVMs and NNs, as they show improved performance metrics and a better balance between sensitivity and specificity. Classical machine learning approaches show higher sensitivities, but at a cost of lower specificity and higher percentages of false alarms (lower precision). These results seem to indicate a greater adaptation of new algorithms (RF and AdaBoost) to unbalanced datasets. Our models could be operationally implemented to establish a short-term prediction system.

Download Full-text

Development of Machine Learning Models for Prediction of Smoking Cessation Outcome

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18052584 ◽

2021 ◽

Vol 18 (5) ◽

pp. 2584

Author(s):

Cheng-Chien Lai ◽

Wei-Hsin Huang ◽

Betty Chia-Chen Chang ◽

Lee-Ching Hwang

Keyword(s):

Machine Learning ◽

Smoking Cessation ◽

Success Rate ◽

Prediction Models ◽

Smoking Status ◽

Medical Center ◽

Machine Learning Algorithms ◽

Classification And Regression Tree ◽

Support Vector ◽

Smoking Cessation Outcome

Predictors for success in smoking cessation have been studied, but a prediction model capable of providing a success rate for each patient attempting to quit smoking is still lacking. The aim of this study is to develop prediction models using machine learning algorithms to predict the outcome of smoking cessation. Data was acquired from patients underwent smoking cessation program at one medical center in Northern Taiwan. A total of 4875 enrollments fulfilled our inclusion criteria. Models with artificial neural network (ANN), support vector machine (SVM), random forest (RF), logistic regression (LoR), k-nearest neighbor (KNN), classification and regression tree (CART), and naïve Bayes (NB) were trained to predict the final smoking status of the patients in a six-month period. Sensitivity, specificity, accuracy, and area under receiver operating characteristic (ROC) curve (AUC or ROC value) were used to determine the performance of the models. We adopted the ANN model which reached a slightly better performance, with a sensitivity of 0.704, a specificity of 0.567, an accuracy of 0.640, and an ROC value of 0.660 (95% confidence interval (CI): 0.617–0.702) for prediction in smoking cessation outcome. A predictive model for smoking cessation was constructed. The model could aid in providing the predicted success rate for all smokers. It also had the potential to achieve personalized and precision medicine for treatment of smoking cessation.

Download Full-text

Machine Learning-Based Prediction of Air Quality

Applied Sciences ◽

10.3390/app10249151 ◽

2020 ◽

Vol 10 (24) ◽

pp. 9151

Author(s):

Yun-Chia Liang ◽

Yona Maimury ◽

Angela Hsiang-Ling Chen ◽

Josue Rodolfo Cuevas Juarez

Keyword(s):

Machine Learning ◽

Air Quality ◽

Random Forest ◽

Prediction Models ◽

Superior Performance ◽

Support Vector ◽

Economic Activities ◽

Adaptive Boosting ◽

Series Of Experiments ◽

Artificial Neural Network Ann

Air, an essential natural resource, has been compromised in terms of quality by economic activities. Considerable research has been devoted to predicting instances of poor air quality, but most studies are limited by insufficient longitudinal data, making it difficult to account for seasonal and other factors. Several prediction models have been developed using an 11-year dataset collected by Taiwan’s Environmental Protection Administration (EPA). Machine learning methods, including adaptive boosting (AdaBoost), artificial neural network (ANN), random forest, stacking ensemble, and support vector machine (SVM), produce promising results for air quality index (AQI) level predictions. A series of experiments, using datasets for three different regions to obtain the best prediction performance from the stacking ensemble, AdaBoost, and random forest, found the stacking ensemble delivers consistently superior performance for R2 and RMSE, while AdaBoost provides best results for MAE.

Download Full-text

Ripeness Prediction of Postharvest Kiwifruit Using a MOS E-Nose Combined with Chemometrics

Sensors ◽

10.3390/s19020419 ◽

2019 ◽

Vol 19 (2) ◽

pp. 419 ◽

Cited By ~ 11

Author(s):

Dongdong Du ◽

Jun Wang ◽

Bo Wang ◽

Luyi Zhu ◽

Xuezhen Hong

Keyword(s):

Prediction Models ◽

Extraction Methods ◽

Oxide Semiconductor ◽

Soluble Solids ◽

Support Vector ◽

Least Squares Regression ◽

Accuracy Rate ◽

Linear Discriminant ◽

The Difference ◽

Ripe Stage

Postharvest kiwifruit continues to ripen for a period until it reaches the optimal “eating ripe” stage. Without damaging the fruit, it is very difficult to identify the ripeness of postharvest kiwifruit by conventional means. In this study, an electronic nose (E-nose) with 10 metal oxide semiconductor (MOS) gas sensors was used to predict the ripeness of postharvest kiwifruit. Three different feature extraction methods (the max/min values, the difference values and the 70th s values) were employed to discriminate kiwifruit at different ripening times by linear discriminant analysis (LDA), and results showed that the 70th s values method had the best performance in discriminating kiwifruit at different ripening stages, obtaining a 100% original accuracy rate and a 99.4% cross-validation accuracy rate. Partial least squares regression (PLSR), support vector machine (SVM) and random forest (RF) were employed to build prediction models for overall ripeness, soluble solids content (SSC) and firmness. The regression results showed that the RF algorithm had the best performance in predicting the ripeness indexes of postharvest kiwifruit compared with PLSR and SVM, which illustrated that the E-nose data had high correlations with overall ripeness (training: R2 = 0.9928; testing: R2 = 0.9928), SSC (training: R2 = 0.9749; testing: R2 = 0.9143) and firmness (training: R2 = 0.9814; testing: R2 = 0.9290). This study demonstrated that E-nose could be a comprehensive approach to predict the ripeness of postharvest kiwifruit through aroma volatiles.

Download Full-text

Assessment for Thermal Conductivity of Frozen Soil Based on Nonlinear Regression and Support Vector Regression Methods

Advances in Civil Engineering ◽

10.1155/2020/8898126 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Fu-Qing Cui ◽

Wei Zhang ◽

Zhi-Yun Liu ◽

Wei Wang ◽

Jian-bing Chen ◽

...

Keyword(s):

Thermal Conductivity ◽

Support Vector Regression ◽

Nonlinear Regression ◽

Prediction Accuracy ◽

Prediction Models ◽

Frozen Soil ◽

Support Vector ◽

Soil Thermal Conductivity ◽

Fine Grained ◽

Fine Grained Soil

The comprehensive understanding of the variation law of soil thermal conductivity is the prerequisite of design and construction of engineering applications in permafrost regions. Compared with the unfrozen soil, the specimen preparation and experimental procedures of frozen soil thermal conductivity testing are more complex and challengeable. In this work, considering for essentially multiphase and porous structural characteristic information reflection of unfrozen soil thermal conductivity, prediction models of frozen soil thermal conductivity using nonlinear regression and Support Vector Regression (SVR) methods have been developed. Thermal conductivity of multiple types of soil samples which are sampled from the Qinghai-Tibet Engineering Corridor (QTEC) are tested by the transient plane source (TPS) method. Correlations of thermal conductivity between unfrozen and frozen soil has been analyzed and recognized. Based on the measurement data of unfrozen soil thermal conductivity, the prediction models of frozen soil thermal conductivity for 7 typical soils in the QTEC are proposed. To further facilitate engineering applications, the prediction models of two soil categories (coarse and fine-grained soil) have also been proposed. The results demonstrate that, compared with nonideal prediction accuracy of using water content and dry density as the fitting parameter, the ternary fitting model has a higher thermal conductivity prediction accuracy for 7 types of frozen soils (more than 98% of the soil specimens’ relative error are within 20%). The SVR model can further improve the frozen soil thermal conductivity prediction accuracy and more than 98% of the soil specimens’ relative error are within 15%. For coarse and fine-grained soil categories, the above two models still have reliable prediction accuracy and determine coefficient (R2) ranges from 0.8 to 0.91, which validates the applicability for small sample soils. This study provides feasible prediction models for frozen soil thermal conductivity and guidelines of the thermal design and freeze-thaw damage prevention for engineering structures in cold regions.

Download Full-text

Dynamic Bus Travel Time Prediction Models on Road with Multiple Bus Routes

Computational Intelligence and Neuroscience ◽

10.1155/2015/432389 ◽

2015 ◽

Vol 2015 ◽

pp. 1-9 ◽

Cited By ~ 31

Author(s):

Cong Bai ◽

Zhong-Ren Peng ◽

Qing-Chang Lu ◽

Jian Sun

Keyword(s):

Travel Time ◽

Dynamic Model ◽

Kalman Filtering ◽

Prediction Models ◽

Support Vector ◽

Travel Times ◽

Travel Time Prediction ◽

Real World Data ◽

Time Prediction ◽

Bus Travel Time

Accurate and real-time travel time information for buses can help passengers better plan their trips and minimize waiting times. A dynamic travel time prediction model for buses addressing the cases on road with multiple bus routes is proposed in this paper, based on support vector machines (SVMs) and Kalman filtering-based algorithm. In the proposed model, the well-trained SVM model predicts the baseline bus travel times from the historical bus trip data; the Kalman filtering-based dynamic algorithm can adjust bus travel times with the latest bus operation information and the estimated baseline travel times. The performance of the proposed dynamic model is validated with the real-world data on road with multiple bus routes in Shenzhen, China. The results show that the proposed dynamic model is feasible and applicable for bus travel time prediction and has the best prediction performance among all the five models proposed in the study in terms of prediction accuracy on road with multiple bus routes.

Download Full-text