A COMPARISON OF SCORING METRICS FOR PREDICTING THE NEXT NAVIGATION STEP WITH MARKOV MODEL-BASED SYSTEMS

2010 ◽  
Vol 09 (04) ◽  
pp. 547-573 ◽  
Author(s):  
JOSÉ BORGES ◽  
MARK LEVENE

The problem of predicting the next request during a user's navigation session has been extensively studied. In this context, higher-order Markov models have been widely used to model navigation sessions and to predict the next navigation step, while prediction accuracy has been mainly evaluated with the hit and miss score. We claim that this score, although useful, is not sufficient for evaluating next link prediction models with the aim of finding a sufficient order of the model, the size of a recommendation set, and assessing the impact of unexpected events on the prediction accuracy. Herein, we make use of a variable length Markov model to compare the usefulness of three alternatives to the hit and miss score: the Mean Absolute Error, the Ignorance Score, and the Brier score. We present an extensive evaluation of the methods on real data sets and a comprehensive comparison of the scoring methods.

2020 ◽  
Vol 12 (1) ◽  
pp. 626-636
Author(s):  
Wang Song ◽  
Zhao Yunlin ◽  
Xu Zhenggang ◽  
Yang Guiyan ◽  
Huang Tian ◽  
...  

AbstractUnderstanding and modeling of land use change is of great significance to environmental protection and land use planning. The cellular automata-Markov chain (CA-Markov) model is a powerful tool to predict the change of land use, and the prediction accuracy is limited by many factors. To explore the impact of land use and socio-economic factors on the prediction of CA-Markov model on county scale, this paper uses the CA-Markov model to simulate the land use of Anren County in 2016, based on the land use of 1996 and 2006. Then, the correlation between the land use, socio-economic data and the prediction accuracy was analyzed. The results show that Shannon’s evenness index and population density having an important impact on the accuracy of model predictions, negatively correlate with kappa coefficient. The research not only provides a reference for correct use of the model but also helps us to understand the driving mechanism of landscape changes.


2015 ◽  
Vol 2015 ◽  
pp. 1-23 ◽  
Author(s):  
Francesco Cartella ◽  
Jan Lemeire ◽  
Luca Dimiccoli ◽  
Hichem Sahli

Realistic predictive maintenance approaches are essential for condition monitoring and predictive maintenance of industrial machines. In this work, we propose Hidden Semi-Markov Models (HSMMs) with (i) no constraints on the state duration density function and (ii) being applied to continuous or discrete observation. To deal with such a type of HSMM, we also propose modifications to the learning, inference, and prediction algorithms. Finally, automatic model selection has been made possible using the Akaike Information Criterion. This paper describes the theoretical formalization of the model as well as several experiments performed on simulated and real data with the aim of methodology validation. In all performed experiments, the model is able to correctly estimate the current state and to effectively predict the time to a predefined event with a low overall average absolute error. As a consequence, its applicability to real world settings can be beneficial, especially where in real time the Remaining Useful Lifetime (RUL) of the machine is calculated.


PLoS ONE ◽  
2021 ◽  
Vol 16 (1) ◽  
pp. e0245357
Author(s):  
Daniel Silver ◽  
Thiago H. Silva

This paper seeks to advance neighbourhood change research and complexity theories of cities by developing and exploring a Markov model of socio-spatial neighbourhood evolution in Toronto, Canada. First, we classify Toronto neighbourhoods into distinct groups using established geodemographic segmentation techniques, a relatively novel application in this geographic setting. Extending previous studies, we pursue a hierarchical approach to classifying neighbourhoods that situates many neighbourhood types within the city’s broader structure. Our hierarchical approach is able to incorporate a richer set of types than most past research and allows us to study how neighbourhoods’ positions within this hierarchy shape their trajectories of change. Second, we use Markov models to identify generative processes that produce patterns of change in the city’s distribution of neighbourhood types. Moreover, we add a spatial component to the Markov process to uncover the extent to which change in one type of neighbourhood depends on the character of nearby neighbourhoods. In contrast to the few studies that have explored Markov models in this research tradition, we validate the model’s predictive power. Third, we demonstrate how to use such models in theoretical scenarios considering the impact on the city’s predicted evolutionary trajectory when existing probabilities of neighbourhood transitions or distributions of neighbourhood types would hypothetically change. Markov models of transition patterns prove to be highly accurate in predicting the final distribution of neighbourhood types. Counterfactual scenarios empirically demonstrate urban complexity: small initial changes reverberate throughout the system, and unfold differently depending on their initial geographic distribution. These scenarios show the value of complexity as a framework for interpreting data and guiding scenario-based planning exercises.


2020 ◽  
Author(s):  
Sagnik Palmal ◽  
Kaustubh Adhikari ◽  
Javier Mendoza-Revilla ◽  
Macarena Fuentes-Guajardo ◽  
Caio C. Silva de Cerqueira ◽  
...  

AbstractWe report an evaluation of prediction accuracy for eye, hair and skin pigmentation based on genomic and phenotypic data for over 6,500 admixed Latin Americans (the CANDELA dataset). We examined the impact on prediction accuracy of three main factors: (i) The methods of prediction, including classical statistical methods and machine learning approaches, (ii) The inclusion of non-genetic predictors, continental genetic ancestry and pigmentation SNPs in the prediction models, and (iii) Compared two sets of pigmentation SNPs: the commonly-used HIrisPlex-S set (developed in Europeans) and novel SNP sets we defined here based on genome-wide association results in the CANDELA sample. We find that Random Forest or regression are globally the best performing methods. Although continental genetic ancestry has substantial power for prediction of pigmentation in Latin Americans, the inclusion of pigmentation SNPs increases prediction accuracy considerably, particularly for skin color. For hair and eye color, HIrisPlex-S has a similar performance to the CANDELA-specific prediction SNP sets. However, for skin pigmentation the performance of HIrisPlex-S is markedly lower than the SNP set defined here, including predictions in an independent dataset of Native American data. These results reflect the relatively high variation in hair and eye color among Europeans for whom HIrisPlex-S was developed, whereas their variation in skin pigmentation is comparatively lower. Furthermore, we show that the dataset used in the training of prediction models strongly impacts on the portability of these models across Europeans and Native Americans.


Author(s):  
Nicolas Rodrigue ◽  
Thibault Latrille ◽  
Nicolas Lartillot

Abstract In recent years, codon substitution models based on the mutation–selection principle have been extended for the purpose of detecting signatures of adaptive evolution in protein-coding genes. However, the approaches used to date have either focused on detecting global signals of adaptive regimes—across the entire gene—or on contexts where experimentally derived, site-specific amino acid fitness profiles are available. Here, we present a Bayesian site-heterogeneous mutation–selection framework for site-specific detection of adaptive substitution regimes given a protein-coding DNA alignment. We offer implementations, briefly present simulation results, and apply the approach on a few real data sets. Our analyses suggest that the new approach shows greater sensitivity than traditional methods. However, more study is required to assess the impact of potential model violations on the method, and gain a greater empirical sense its behavior on a broader range of real data sets. We propose an outline of such a research program.


Author(s):  
Arvind Keprate ◽  
R. M. Chandima Ratnayake ◽  
Shankar Sankararaman

The main aim of this paper is to perform the validation of the adaptive Gaussian process regression model (AGPRM) developed by the authors for the Stress Intensity Factor (SIF) prediction of a crack propagating in topside piping. For validation purposes, the values of SIF obtained from experiments available in the literature are used. Sixty-six data points (consisting of L, a, c and SIF values obtained by experiments) are used to train the AGPRM, while four independent data sets are used for validation purposes. The experimental validation of the AGPRM also consists of the comparison of the prediction accuracy of AGPRM and Finite Element Method (FEM) relative to the experimentally derived SIF values. Four metrics, namely, Root Mean Square Error (RMSE), Average Absolute Error (AAE), Maximum Absolute Error (MAE), and Coefficient of Determination (R2), are used to compare the accuracy. A case study illustrating the development and experimental validation of the AGPRM is presented. Results indicate that the prediction accuracy of the AGPRM is comparable with and even higher than that of the FEM, provided the training points of the AGPRM are aptly chosen.


2020 ◽  
Vol 309 ◽  
pp. 05005
Author(s):  
Yonghong Chen ◽  
Ping Hu ◽  
Dong Zhang

Life cycle cost(LCC) is an important content of equipment integrated logistics support. While the LCC includes the whole life cycle of equipment from development, production, service and maintenance to retirement, in order to effectively manage and control the LCC and better develop integrated logistics support, it is necessary to analyze and predict it. The unbiased grey markov model(UGMM) was introduced into the LCC prediction in the paper, in order to check model accuracy, the posterior difference method(PDM) was used, also the influence by the number of state intervals in UGMM on the prediction accuracy is analyzed and studied. The result indicate that UGMM can be used to predict the LCC, also have the highest prediction accuracy comparing with unbiased grey model and grey separating model, and in order to ensure the prediction accuracy, the state interval should be divided according to the number of sequence.


Energies ◽  
2018 ◽  
Vol 11 (12) ◽  
pp. 3415 ◽  
Author(s):  
Muzhou Hou ◽  
Tianle Zhang ◽  
Futian Weng ◽  
Mumtaz Ali ◽  
Nadhir Al-Ansari ◽  
...  

Accurate global solar radiation prediction is highly essential for related research on renewable energy sources. The cost implication and measurement expertise of global solar radiation emphasize that intelligence prediction models need to be applied. On the basis of long-term measured daily solar radiation data, this study uses a novel regularized online sequential extreme learning machine, integrated with variable forgetting factor (FOS-ELM), to predict global solar radiation at Bur Dedougou, in the Burkina Faso region. Bayesian Information Criterion (BIC) is applied to build the seven input combinations based on speed (Wspeed), maximum and minimum temperature (Tmax and Tmin), maximum and minimum humidity (Hmax and Hmin), evaporation (Eo) and vapor pressure deficiency (VPD). For the difference input parameters magnitudes, seven models were developed and evaluated for the optimal input combination. Various statistical indicators were computed for the prediction accuracy examination. The experimental results of the applied FOS-ELM model demonstrated a reliable prediction accuracy against the classical extreme learning machine (ELM) model for daily global solar radiation simulation. In fact, compared to classical ELM, the FOS-ELM model reported an enhancement in the root mean square error (RMSE) and mean absolute error (MAE) by (68.8–79.8%). In summary, the results clearly confirm the effectiveness of the FOS-ELM model, owing to the fixed internal tuning parameters.


2014 ◽  
Vol 2014 ◽  
pp. 1-14 ◽  
Author(s):  
Wei Ming ◽  
Yukun Bao ◽  
Zhongyi Hu ◽  
Tao Xiong

The hybrid ARIMA-SVMs prediction models have been established recently, which take advantage of the unique strength of ARIMA and SVMs models in linear and nonlinear modeling, respectively. Built upon this hybrid ARIMA-SVMs models alike, this study goes further to extend them into the case of multistep-ahead prediction for air passengers traffic with the two most commonly used multistep-ahead prediction strategies, that is, iterated strategy and direct strategy. Additionally, the effectiveness of data preprocessing approaches, such as deseasonalization and detrending, is investigated and proofed along with the two strategies. Real data sets including four selected airlines’ monthly series were collected to justify the effectiveness of the proposed approach. Empirical results demonstrate that the direct strategy performs better than iterative one in long term prediction case while iterative one performs better in the case of short term prediction. Furthermore, both deseasonalization and detrending can significantly improve the prediction accuracy for both strategies, indicating the necessity of data preprocessing. As such, this study contributes as a full reference to the planners from air transportation industries on how to tackle multistep-ahead prediction tasks in the implementation of either prediction strategy.


Author(s):  
ELHAM PAIKARI ◽  
MICHAEL M. RICHTER ◽  
GUENTHER RUHE

Software defect prediction is an acknowledged approach used to achieve better product quality and to better utilize resources needed for that purpose. One known method for predicting the number of defects is to apply case-based reasoning (CBR). In this paper, different attribute weighting techniques for CBR-based defect prediction are analyzed. One of the weighting techniques used in this work, Sensitivity Analysis based on Neural Networks (SANN), is based on sensitivity analysis of the impact of attributes as part of neural network analysis. Neural networks are applicable when there are non-linear and complicated relationships among the attributes. Since weighting plays a key role in the CBR model, using an efficient weight calculation method can change the results. The results of SANN are compared with applying uniform weights and weights gained from Multiple Linear Regression (MLR).Evaluation of the accuracy of the overall method for applying the three different weighting techniques is done over five data sets, comprising about 5000 modules from NASA. Two quality measures are applied: Average Absolute Error (AAE) and Average Relative Error (ARE). In addition to the variation of weighting techniques, the impact of varying the number of nearest neighbors is studied.The three main results of the empirical analysis are: (i) In the majority of cases, SANN achieves the most accurate results; (ii) uniform weighting performs better than the MLR-based weighting heuristic; and (iii) there is no significant preference pattern for defining the number of similar objects used for prediction in CBR.


Sign in / Sign up

Export Citation Format

Share Document