Integrating Semantic Zoning Information with the Prediction of Road Link Speed Based on Taxi GPS Data

Road link speed is one of the important indicators for traffic states. In order to incorporate the spatiotemporal dynamics and correlation characteristics of road links into speed prediction, this paper proposes a method based on LDA and GCN. First, we construct a trajectory dataset from map-matched GPS location data of taxis. Then, we use the LDA algorithm to extract the semantic function vectors of urban zones and quantify the spatial dynamic characteristics of road links based on taxi trajectories. Finally, we add semantic function vectors to the dataset and train a graph convolutional network to learn the spatial and temporal dependencies of road links. The learned model is used to predict the future speed of road links. The proposed method is compared with six baseline models on the same dataset generated by GPS equipped on taxis in Shenzhen, China, and the results show that our method has better prediction performance when semantic zoning information is added. Both composite and single-valued semantic zoning information can improve the performance of graph convolutional networks by 6.46% and 8.35%, respectively, while the baseline machine learning models work only for single-valued semantic zoning information on the experimental dataset.

Download Full-text

Evaluation of Short-Term Freeway Speed Prediction Based on Periodic Analysis Using Statistical Models and Machine Learning Models

Journal of Advanced Transportation ◽

10.1155/2020/9628957 ◽

2020 ◽

Vol 2020 ◽

pp. 1-16 ◽

Cited By ~ 19

Author(s):

Xiaoxue Yang ◽

Yajie Zou ◽

Jinjun Tang ◽

Jian Liang ◽

Muhammad Ijaz

Keyword(s):

Machine Learning ◽

Statistical Models ◽

Prediction Performance ◽

Periodic Component ◽

Learning Approaches ◽

Learning Models ◽

Short Term ◽

Speed Prediction ◽

The Impact ◽

Machine Learning Models

Accurate prediction of traffic information (i.e., traffic flow, travel time, traffic speed, etc.) is a key component of Intelligent Transportation System (ITS). Traffic speed is an important indicator to evaluate traffic efficiency. Up to date, although a few studies have considered the periodic feature in traffic prediction, very few studies comprehensively evaluate the impact of periodic component on statistical and machine learning prediction models. This paper selects several representative statistical models and machine learning models to analyze the influence of periodic component on short-term speed prediction under different scenarios: (1) multi-horizon ahead prediction (5, 15, 30, 60 minutes ahead predictions), (2) with and without periodic component, (3) two data aggregation levels (5-minute and 15-minute), (4) peak hours and off-peak hours. Specifically, three statistical models (i.e., space time (ST) model, vector autoregressive (VAR) model, autoregressive integrated moving average (ARIMA) model) and three machine learning approaches (i.e., support vector machines (SVM) model, multi-layer perceptron (MLP) model, recurrent neural network (RNN) model) are developed and examined. Furthermore, the periodic features of the speed data are considered via a hybrid prediction method, which assumes that the data consist of two components: a periodic component and a residual component. The periodic component is described by a trigonometric regression function, and the residual component is modeled by the statistical models or the machine learning approaches. The important conclusions can be summarized as follows: (1) the multi-step ahead prediction accuracy improves when considering the periodic component of speed data for both three statistical models and three machine learning models, especially in the peak hours; (2) considering the impact of periodic component for all models, the prediction performance improvement gradually becomes larger as the time step increases; (3) under the same prediction horizon, the prediction performance of all models for 15-minute speed data is generally better than that for 5-minute speed data. Overall, the findings in this paper suggest that the proposed hybrid prediction approach is effective for both statistical and machine learning models in short-term speed prediction.

Download Full-text

A Comparative Assessment of Six Machine Learning Models for Prediction of Bending Force in Hot Strip Rolling Process

Metals ◽

10.3390/met10050685 ◽

2020 ◽

Vol 10 (5) ◽

pp. 685 ◽

Cited By ~ 2

Author(s):

Xu Li ◽

Feng Luan ◽

Yan Wu

Keyword(s):

Prediction Accuracy ◽

Computational Cost ◽

Regression Tree ◽

Prediction Performance ◽

Learning Models ◽

Hot Strip Rolling ◽

Strip Rolling ◽

Bending Force ◽

Hot Strip ◽

Machine Learning Models

In the hot strip rolling (HSR) process, accurate prediction of bending force can improve the control accuracy of the strip crown and flatness, and further improve the strip shape quality. In this paper, six machine learning models, including Artificial Neural Network (ANN), Support Vector Machine (SVR), Classification and Regression Tree (CART), Bagging Regression Tree (BRT), Least Absolute Shrinkage and Selection operator (LASSO), and Gaussian Process Regression (GPR), were applied to predict the bending force in the HSR process. A comparative experiment was carried out based on a real-life dataset, and the prediction performance of the six models was analyzed from prediction accuracy, stability, and computational cost. The prediction performance of the six models was assessed using three evaluation metrics of root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2). The results show that the GPR model is considered as the optimal model for bending force prediction with the best prediction accuracy, better stability, and acceptable computational cost. The prediction accuracy and stability of CART and ANN are slightly lower than that of GPR. Although BRT also shows a good combination of prediction accuracy and computational cost, the stability of BRT is the worst in the six models. SVM not only has poor prediction accuracy, but also has the highest computational cost while LASSO showed the worst prediction accuracy.

Download Full-text

Comparative analysis of surface water quality prediction performance and identification of key water parameters using different machine learning models based on big data

Water Research ◽

10.1016/j.watres.2019.115454 ◽

2020 ◽

Vol 171 ◽

pp. 115454 ◽

Cited By ~ 9

Author(s):

Kangyang Chen ◽

Hexia Chen ◽

Chuanlong Zhou ◽

Yichao Huang ◽

Xiangyang Qi ◽

...

Keyword(s):

Machine Learning ◽

Water Quality ◽

Big Data ◽

Surface Water Quality ◽

Prediction Performance ◽

Quality Prediction ◽

Learning Models ◽

Water Parameters ◽

Water Quality Prediction ◽

Machine Learning Models

Download Full-text

Machine learning-based prediction model for responses of bDMARDs in patients with rheumatoid arthritis and ankylosing spondylitis

Arthritis Research & Therapy ◽

10.1186/s13075-021-02635-3 ◽

2021 ◽

Vol 23 (1) ◽

Author(s):

Seulkee Lee ◽

Seonyoung Kang ◽

Yeonghee Eun ◽

Hong-Hee Won ◽

Hyungjin Kim ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Machine Learning ◽

Logistic Regression ◽

Ankylosing Spondylitis ◽

Regression Model ◽

Logistic Regression Model ◽

Prediction Performance ◽

Learning Models ◽

Independent Test ◽

Machine Learning Models

Abstract Background Few studies on rheumatoid arthritis (RA) have generated machine learning models to predict biologic disease-modifying antirheumatic drugs (bDMARDs) responses; however, these studies included insufficient analysis on important features. Moreover, machine learning is yet to be used to predict bDMARD responses in ankylosing spondylitis (AS). Thus, in this study, machine learning was used to predict such responses in RA and AS patients. Methods Data were retrieved from the Korean College of Rheumatology Biologics therapy (KOBIO) registry. The number of RA and AS patients in the training dataset were 625 and 611, respectively. We prepared independent test datasets that did not participate in any process of generating machine learning models. Baseline clinical characteristics were used as input features. Responders were defined as those who met the ACR 20% improvement response criteria (ACR20) and ASAS 20% improvement response criteria (ASAS20) in RA and AS, respectively, at the first follow-up. Multiple machine learning methods, including random forest (RF-method), were used to generate models to predict bDMARD responses, and we compared them with the logistic regression model. Results The RF-method model had superior prediction performance to logistic regression model (accuracy: 0.726 [95% confidence interval (CI): 0.725–0.730] vs. 0.689 [0.606–0.717], area under curve (AUC) of the receiver operating characteristic curve (ROC) 0.638 [0.576–0.658] vs. 0.565 [0.493–0.605], F1 score 0.841 [0.837–0.843] vs. 0.803 [0.732–0.828], AUC of the precision-recall curve 0.808 [0.763–0.829] vs. 0.754 [0.714–0.789]) with independent test datasets in patients with RA. However, machine learning and logistic regression exhibited similar prediction performance in AS patients. Furthermore, the patient self-reporting scales, which are patient global assessment of disease activity (PtGA) in RA and Bath Ankylosing Spondylitis Functional Index (BASFI) in AS, were revealed as the most important features in both diseases. Conclusions RF-method exhibited superior prediction performance for responses of bDMARDs to a conventional statistical method, i.e., logistic regression, in RA patients. In contrast, despite the comparable size of the dataset, machine learning did not outperform in AS patients. The most important features of both diseases, according to feature importance analysis were patient self-reporting scales.

Download Full-text

Predicting unstable software benchmarks using static source code features

Empirical Software Engineering ◽

10.1007/s10664-021-09996-y ◽

2021 ◽

Vol 26 (6) ◽

Author(s):

Christoph Laaber ◽

Mikael Basmaci ◽

Pasquale Salza

Keyword(s):

Machine Learning ◽

Source Code ◽

Prediction Performance ◽

Repeated Measurements ◽

Good Prediction ◽

Testing Time ◽

Learning Models ◽

Actual Performance ◽

Meta Information ◽

Machine Learning Models

AbstractSoftware benchmarks are only as good as the performance measurements they yield. Unstable benchmarks show high variability among repeated measurements, which causes uncertainty about the actual performance and complicates reliable change assessment. However, if a benchmark is stable or unstable only becomes evident after it has been executed and its results are available. In this paper, we introduce a machine-learning-based approach to predict a benchmark’s stability without having to execute it. Our approach relies on 58 statically-computed source code features, extracted for benchmark code and code called by a benchmark, related to (1) meta information, e.g., lines of code (LOC), (2) programming language elements, e.g., conditionals or loops, and (3) potentially performance-impacting standard library calls, e.g., file and network input/output (I/O). To assess our approach’s effectiveness, we perform a large-scale experiment on 4,461 Go benchmarks coming from 230 open-source software (OSS) projects. First, we assess the prediction performance of our machine learning models using 11 binary classification algorithms. We find that Random Forest performs best with good prediction performance from 0.79 to 0.90, and 0.43 to 0.68, in terms of AUC and MCC, respectively. Second, we perform feature importance analyses for individual features and feature categories. We find that 7 features related to meta-information, slice usage, nested loops, and synchronization application programming interfaces (APIs) are individually important for good predictions; and that the combination of all features of the called source code is paramount for our model, while the combination of features of the benchmark itself is less important. Our results show that although benchmark stability is affected by more than just the source code, we can effectively utilize machine learning models to predict whether a benchmark will be stable or not ahead of execution. This enables spending precious testing time on reliable benchmarks, supporting developers to identify unstable benchmarks during development, allowing unstable benchmarks to be repeated more often, estimating stability in scenarios where repeated benchmark execution is infeasible or impossible, and warning developers if new benchmarks or existing benchmarks executed in new environments will be unstable.

Download Full-text

A Review of Publicly Available Automatic Brain Segmentation Methodologies, Machine Learning Models, Recent Advancements, and Their Comparison

Annals of Neurosciences ◽

10.1177/0972753121990175 ◽

2021 ◽

pp. 097275312199017

Author(s):

Mahender Kumar Singh ◽

Krishna Kumar Singh

Keyword(s):

Machine Learning ◽

Automatic Segmentation ◽

Three Dimensional ◽

Brain Structures ◽

Magnetic Resonance Images ◽

Learning Models ◽

Convolutional Network ◽

Research Perspective ◽

The Brain ◽

Machine Learning Models

Background: The noninvasive study of the structure and functions of the brain using neuroimaging techniques is increasingly being used for its clinical and research perspective. The morphological and volumetric changes in several regions and structures of brains are associated with the prognosis of neurological disorders such as Alzheimer’s disease, epilepsy, schizophrenia, etc. and the early identification of such changes can have huge clinical significance. The accurate segmentation of three-dimensional brain magnetic resonance images into tissue types (i.e., grey matter, white matter, cerebrospinal fluid) and brain structures, thus, has huge importance as they can act as early biomarkers. The manual segmentation though considered the “gold standard” is time-consuming, subjective, and not suitable for bigger neuroimaging studies. Several automatic segmentation tools and algorithms have been developed over the years; the machine learning models particularly those using deep convolutional neural network (CNN) architecture are increasingly being applied to improve the accuracy of automatic methods. Purpose: The purpose of the study is to understand the current and emerging state of automatic segmentation tools, their comparison, machine learning models, their reliability, and shortcomings with an intent to focus on the development of improved methods and algorithms. Methods: The study focuses on the review of publicly available neuroimaging tools, their comparison, and emerging machine learning models particularly those based on CNN architecture developed and published during the last five years. Conclusion: Several software tools developed by various research groups and made publicly available for automatic segmentation of the brain show variability in their results in several comparison studies and have not attained the level of reliability required for clinical studies. The machine learning models particularly three dimensional fully convolutional network models can provide a robust and efficient alternative with relation to publicly available tools but perform poorly on unseen datasets. The challenges related to training, computation cost, reproducibility, and validation across distinct scanning modalities for machine learning models need to be addressed.

Download Full-text

Predicting which genes will respond to perturbations of a TF: TF-independent properties of genes are major determinants of their responsiveness

10.1101/2020.12.15.422864 ◽

2020 ◽

Author(s):

Yiming Kang ◽

Michael Brent

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Predictive Power ◽

Yeast Cells ◽

Expression Level ◽

Learning Models ◽

Histone Marks ◽

Expression Variation ◽

Location Data ◽

Machine Learning Models

Background: The ability to predict which genes will respond to perturbation of a TF's activity serves as a benchmark for our systems-level understanding of transcriptional regulatory networks. In previous work, machine learning models have been trained to predict static gene expression levels in a given sample by using data from the same or similar conditions, including data on TF binding locations, histone marks, or DNA sequence. We report on a different challenge -- training machine learning models that can predict which genes will respond to perturbation of a TF without using any data from the perturbed cells. Results: Existing TF location data (ChIP-Seq) from human K562 cells have no detectable utility for predicting which genes will respond to perturbation of the TF, but data obtained by newer methods in yeast cells are useful. TF-independent features of genes, including their preperturbation expression level and expression variation, are very useful for predicting responses to TF perturbations. This shows that some genes are poised to respond to TF perturbations and others are resistant, shedding significant light on why it has been so difficult to predict responses from binding locations. Certain histone marks (HMs), including H3K4me1 and H3K4me3, have some predictive power, especially when downstream of the transcription start site. In human, the predictive power of HMs is much less than that of gene expression level and variation. Code is available at https://github.com/yiming-kang/TFPertRespExplainer. Conclusions: Sequence-based or epigenetic properties of genes strongly influence their tendency to respond to direct TF perturbations, partially explaining the oft-noted difficulty of predicting responsiveness from TF binding location data. These molecular features are largely reflected in and summarized by the gene's expression level and expression variation.

Download Full-text

Comparing the prediction performance of a Deep Learning Neural Network model with conventional machine learning models in landslide susceptibility assessment

CATENA ◽

10.1016/j.catena.2019.104426 ◽

2020 ◽

Vol 188 ◽

pp. 104426 ◽

Cited By ~ 14

Author(s):

Dieu Tien Bui ◽

Paraskevas Tsangaratos ◽

Viet-Tien Nguyen ◽

Ngo Van Liem ◽

Phan Trong Trinh

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Prediction Performance ◽

Susceptibility Assessment ◽

Learning Models ◽

Landslide Susceptibility Assessment ◽

Conventional Machine ◽

Deep Learning Neural Network ◽

Machine Learning Models

Download Full-text

Skin Lesion Classification Using Densely Connected Convolutional Networks with Attention Residual Learning

Sensors ◽

10.3390/s20247080 ◽

2020 ◽

Vol 20 (24) ◽

pp. 7080

Author(s):

Jing Wu ◽

Wei Hu ◽

Yuan Wen ◽

Wenli Tu ◽

Xiaoming Liu

Keyword(s):

Skin Lesion ◽

State Of The Art ◽

Learning Models ◽

Residual Network ◽

Convolutional Network ◽

Connected Network ◽

Convolutional Networks ◽

Residual Learning ◽

Precise Diagnosis ◽

Lesion Classification

Skin lesion classification is an effective approach aided by computer vision for the diagnosis of skin cancer. Though deep learning models presented advantages over traditional methods and brought tremendous breakthroughs, a precise diagnosis is still challenging because of the intra-class variation and inter-class similarity caused by the diversity of imaging methods and clinicopathology. In this paper, we propose a densely connected convolutional network with an attention and residual learning (ARDT-DenseNet) method for skin lesion classification. Each ARDT block consists of dense blocks, transition blocks and attention and residual modules. Compared to a residual network with the same number of convolutional layers, the size of the parameters of the densely connected network proposed in this paper has been reduced by half, while the accuracy of skin lesion classification is preserved. Our improved densely connected network adds an attention mechanism and residual learning after each dense block and transition block without introducing additional parameters. We evaluate the ARDT-DenseNet model with the ISIC 2016 and ISIC 2017 datasets. Our method achieves an ACC of 85.7% and an AUC of 83.7% in skin lesion classification with ISIC 2016 and an average AUC of 91.8% in skin lesion classification with ISIC 2017. The experimental results show that the method proposed in this paper has achieved a significant improvement in skin lesion classification, which is superior to that of the state-of-the-art method.

Download Full-text

Graph convolutional networks: a comprehensive review

Computational Social Networks ◽

10.1186/s40649-019-0069-y ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 27

Author(s):

Si Zhang ◽

Hanghang Tong ◽

Jiejun Xu ◽

Ross Maciejewski

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Network Models ◽

Representation Learning ◽

Superior Performance ◽

Learning Models ◽

Convolutional Network ◽

Comprehensive Review ◽

Convolutional Networks ◽

Graph Neural Networks

Abstract Graphs naturally appear in numerous application domains, ranging from social analysis, bioinformatics to computer vision. The unique capability of graphs enables capturing the structural relations among data, and thus allows to harvest more insights compared to analyzing data in isolation. However, it is often very challenging to solve the learning problems on graphs, because (1) many types of data are not originally structured as graphs, such as images and text data, and (2) for graph-structured data, the underlying connectivity patterns are often complex and diverse. On the other hand, the representation learning has achieved great successes in many areas. Thereby, a potential solution is to learn the representation of graphs in a low-dimensional Euclidean space, such that the graph properties can be preserved. Although tremendous efforts have been made to address the graph representation learning problem, many of them still suffer from their shallow learning mechanisms. Deep learning models on graphs (e.g., graph neural networks) have recently emerged in machine learning and other related areas, and demonstrated the superior performance in various problems. In this survey, despite numerous types of graph neural networks, we conduct a comprehensive review specifically on the emerging field of graph convolutional networks, which is one of the most prominent graph deep learning models. First, we group the existing graph convolutional network models into two categories based on the types of convolutions and highlight some graph convolutional network models in details. Then, we categorize different graph convolutional networks according to the areas of their applications. Finally, we present several open challenges in this area and discuss potential directions for future research.

Download Full-text