International tourism demand forecasting with machine learning models: The power of the number of lagged inputs

2020 ◽  
pp. 135481662097695
Author(s):  
Jian-Wu Bi ◽  
Tian-Yu Han ◽  
Hui Li

This study explores how to select the optimal number of lagged inputs (NLIs) in international tourism demand forecasting. With international tourist arrivals at 10 European countries, the performances of eight machine learning models are evaluated using different NLIs. The results show that: (1) as NLIs increases, the error of most machine learning models first decreases rapidly and then tends to be stable (or fluctuates around a certain value) when NLIs reaches a certain cutoff point. The cutoff point is related to 12 and its multiples. This trend is not affected by the size of the test set; (2) for nonlinear and ensemble models, it is better to select one cycle of the data as the NLIs, while for linear models, multiple cycles are a better choice; (3) significantly different prediction results are obtained by different categories of models when the optimal NLIs are used.

2022 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Dinda Thalia Andariesta ◽  
Meditya Wasesa

PurposeThis research presents machine learning models for predicting international tourist arrivals in Indonesia during the COVID-19 pandemic using multisource Internet data.Design/methodology/approachTo develop the prediction models, this research utilizes multisource Internet data from TripAdvisor travel forum and Google Trends. Temporal factors, posts and comments, search queries index and previous tourist arrivals records are set as predictors. Four sets of predictors and three distinct data compositions were utilized for training the machine learning models, namely artificial neural networks (ANNs), support vector regression (SVR) and random forest (RF). To evaluate the models, this research uses three accuracy metrics, namely root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE).FindingsPrediction models trained using multisource Internet data predictors have better accuracy than those trained using single-source Internet data or other predictors. In addition, using more training sets that cover the phenomenon of interest, such as COVID-19, will enhance the prediction model's learning process and accuracy. The experiments show that the RF models have better prediction accuracy than the ANN and SVR models.Originality/valueFirst, this study pioneers the practice of a multisource Internet data approach in predicting tourist arrivals amid the unprecedented COVID-19 pandemic. Second, the use of multisource Internet data to improve prediction performance is validated with real empirical data. Finally, this is one of the few papers to provide perspectives on the current dynamics of Indonesia's tourism demand.


2021 ◽  
Author(s):  
Siddharth Ghule ◽  
Sayan Bagchi ◽  
Kumar Vanka

<div>Electricity generation is a major contributing factor for greenhouse gas emissions. Energy storage systems available today have a combined capacity to store less than 1% of the electricity being consumed worldwide. Redox Flow Batteries (RFBs) are promising candidates for green and efficient energy storage systems. RFBs are being used in renewable energy systems, but their widespread adoption is limited due to high production costs and toxicity associated with the transition-metal-based redox-active species. Therefore, cheaper and greener alternative organic redox-active species are being investigated. Recent reports have shown organic molecules based on phenazine are promising candidates for redox-active species in RFBs. However, the large number of available organic compounds makes the conventional experimental and DFT methods impractical to screen thousands of molecules in a reasonable amount of time. In contrast, machine-learning models have low development time, short prediction time, and high accuracy; thus, are being heavily investigated for virtual screening applications. In this work, we developed machine-learning models to predict the redox potential of phenazine derivatives in DME solvent using a small dataset of 185 molecules. 2D, 3D, and Molecular Fingerprint features were computed using readily available and easy-to-use python libraries, making our approach easily adaptable to similar work. Twenty linear and non-linear machine-learning models were investigated in this work. These models achieved excellent performance on the unseen data (i.e., R<sup>2</sup> > 0.98, MSE < 0.008 V2 and MAE < 0.07 V). Model performance was assessed in a consistent manner using the training and evaluation pipeline developed in this work. We showed that 2D molecular features are most informative and achieve the best prediction accuracy among four feature sets. We also showed that often less preferred but relatively faster linear models could perform better than non-linear models when the feature set contains different types of features (i.e., 2D, 3D, and Molecular Fingerprints). Further investigations revealed that it is possible to reduce the training and inference time without sacrificing prediction accuracy by using a small subset of features. Moreover, models were able to predict the previously reported promising redox-active compounds with high accuracy. Also, significantly low prediction errors were observed for the functional groups. Although some functional groups had only one compound in the training set, best-performing models could achieve errors (MAPE) less than 10%. The major source of error was a lack of data near-zero and in the positive region. Therefore, this work shows that it is possible to develop accurate machine-learning models that could potentially screen millions of compounds in a short amount of time with a small training set and limited number of easy to compute features. Thus, results obtained in this report would help in the adoption of green energy by accelerating the field of materials discovery for energy storage applications.</div>


2021 ◽  
Author(s):  
Scott Kulm ◽  
Lior Kofman ◽  
Jason Mezey ◽  
Olivier Elemento

ABSTRACTA patient’s risk for cancer is usually estimated through simple linear models that sum effect sizes of proven risk factors. In theory, more advanced machine learning models can be used for the same task. Using data from the UK Biobank, a large prospective health study, we have developed linear and machine learning models for the prediction of 12 different cancers diagnoses within a 10 year time span. We find that the top machine learning algorithm, XGBoost (XGB), trained on 707 features generated an average area under the receiver operator curve of 0.736 (with a range of 0.65-0.85). Linear models trained with only 10 features were found to be statistically indifferent from the machine learning performance. The linear models were significantly more accurate than the prominent QCancer models (p = 0.0019), which are trained on 45 million patient records and available to over 4,000 United Kingdom general practices. The increase in accuracy may be caused by the consideration of often omitted feature types, including survey answers, census records, and genetic information. This approach led to the discovery of significant novel risk features, including self-reported happiness with own health (relevant to 12 cancers), measured testosterone (relevant to 8 cancers), and ICD codes for rehabilitation procedures (relevant to 3 cancers). These ten feature models can be easily implemented within the clinic, allowing for personalized screening schedules that may increase the cancer survival within a population.


2021 ◽  
Vol 99 (Supplement_3) ◽  
pp. 285-285
Author(s):  
Vanessa Rotondo ◽  
Dan Tulpan ◽  
Katharine M Wood ◽  
Marlene Paibomesai ◽  
Vern R Osborne

Abstract The objective of this study is to investigate how linear body measurements relate to and can be used to predict calf body weight using linear and machine learning models. To meet these objectives, a total of 103 Angus cross calves were enrolled in the study from wk 2 - 8. Calves were weighed and linear measurements were collected weekly, such as: poll to nose, width across the eyes (WE), width across the right ear, neck length, wither height, heart girth (HG), midpiece height (MH), midpiece circumference, midpiece width (MW), midpiece depth (MD), hook height, hook width, pin height, top of pin bones width (PW), width across the ends of pin bones, nose to tail body length, the length between the withers and pins, forearm to hoof, cannon bone to hoof. These measurements were taken using a commercial soft tape measure and calipers. To assess relationships between traits and to fit a model to predict BW, data were analyzed using the Weka (The University of Waikato, New Zealand) software using both linear regression (LR) and random forest (RF) machine learning models. The models were trained using a 10-fold cross-validation approach. The automatically derived LR model used 11 traits to fit the data to weekly BW (r2 = 0.97), where the traits with the highest coefficients were HG, PW and WE. The RF model improved further the BW predictions (r2= 0.98). Additionally, sex differences were examined. Although the BW model continued to fit well (r2 0.97), some of the top linear traits differed. The results of this study suggest that linear models built on linear measurements can accurately estimate body weight in beef calves, and that machine learning can further improve the model fit.


2021 ◽  
Vol 12 (6) ◽  
pp. 1-24
Author(s):  
Shaojie Qiao ◽  
Nan Han ◽  
Jianbin Huang ◽  
Kun Yue ◽  
Rui Mao ◽  
...  

Bike-sharing systems are becoming popular and generate a large volume of trajectory data. In a bike-sharing system, users can borrow and return bikes at different stations. In particular, a bike-sharing system will be affected by weather, the time period, and other dynamic factors, which challenges the scheduling of shared bikes. In this article, a new shared-bike demand forecasting model based on dynamic convolutional neural networks, called SDF , is proposed to predict the demand of shared bikes. SDF chooses the most relevant weather features from real weather data by using the Pearson correlation coefficient and transforms them into a two-dimensional dynamic feature matrix, taking into account the states of stations from historical data. The feature information in the matrix is extracted, learned, and trained with a newly proposed dynamic convolutional neural network to predict the demand of shared bikes in a dynamical and intelligent fashion. The phase of parameter update is optimized from three aspects: the loss function, optimization algorithm, and learning rate. Then, an accurate shared-bike demand forecasting model is designed based on the basic idea of minimizing the loss value. By comparing with classical machine learning models, the weight sharing strategy employed by SDF reduces the complexity of the network. It allows a high prediction accuracy to be achieved within a relatively short period of time. Extensive experiments are conducted on real-world bike-sharing datasets to evaluate SDF. The results show that SDF significantly outperforms classical machine learning models in prediction accuracy and efficiency.


Author(s):  
Christoph M. Kanzler ◽  
Ilse Lamers ◽  
Peter Feys ◽  
Roger Gassert ◽  
Olivier Lambercy

AbstractBackgroundA personalized prediction of upper limb neurorehabilitation outcomes in persons with multiple sclerosis (pwMS) promises to optimize the allocation of therapy and to stratify individuals for resource-demanding clinical trials. Previous research identified predictors on a population level through linear models and clinical data, including conventional assessments describing sensorimotor impairments. The objective of this work was to explore the feasibility of providing an individualized and more accurate prediction of rehabilitation outcomes in pwMS by leveraging non-linear machine learning models, clinical data, and digital health metrics characterizing sensorimotor impairments.MethodsClinical data and digital health metrics were recorded from eleven pwMS undergoing neurorehabilitation. Machine learning models were trained on data recorded pre-intervention. The dependent variables indicated whether a considerable improvement on the activity level was observed across the intervention or not (binary classification), as defined by the Action Research Arm Test (ARAT), Box and Block Test (BBT), or Nine Hole Peg Test (NHPT).ResultsIn a cross-validation, considerable improvements in ARAT or BBT could be accurately predicted (94% balanced accuracy) by only relying on patient master data. Considerable improvements in NHPT could be accurately predicted (89% balanced accuracy), but required knowledge about sensorimotor impairments. Assessing these with digital health metrics instead of conventional scales allowed increasing the balanced accuracy by +17% . Non-linear machine-learning models improved the predictive accuracy for the NHPT by +25% compared to linear models.ConclusionsThis work demonstrates the feasibility of a personalized prediction of upper limb neurorehabilitation outcomes in pwMS using multi-modal data collected before neurorehabilitation and machine learning. Information from digital health metrics about sensorimotor impairment was necessary to predict changes in dexterous hand control, thereby underlining their potential to provide a more sensitive and fine-grained assessment than conventional scales. Non-linear models outperformed ones, suggesting that the commonly assumed linearity of neurorehabilitation is oversimplified.clinicaltrials.gov registration number: NCT02688231


Diagnostics ◽  
2021 ◽  
Vol 11 (12) ◽  
pp. 2288
Author(s):  
Kaixiang Su ◽  
Jiao Wu ◽  
Dongxiao Gu ◽  
Shanlin Yang ◽  
Shuyuan Deng ◽  
...  

Increasingly, machine learning methods have been applied to aid in diagnosis with good results. However, some complex models can confuse physicians because they are difficult to understand, while data differences across diagnostic tasks and institutions can cause model performance fluctuations. To address this challenge, we combined the Deep Ensemble Model (DEM) and tree-structured Parzen Estimator (TPE) and proposed an adaptive deep ensemble learning method (TPE-DEM) for dynamic evolving diagnostic task scenarios. Different from previous research that focuses on achieving better performance with a fixed structure model, our proposed model uses TPE to efficiently aggregate simple models more easily understood by physicians and require less training data. In addition, our proposed model can choose the optimal number of layers for the model and the type and number of basic learners to achieve the best performance in different diagnostic task scenarios based on the data distribution and characteristics of the current diagnostic task. We tested our model on one dataset constructed with a partner hospital and five UCI public datasets with different characteristics and volumes based on various diagnostic tasks. Our performance evaluation results show that our proposed model outperforms other baseline models on different datasets. Our study provides a novel approach for simple and understandable machine learning models in tasks with variable datasets and feature sets, and the findings have important implications for the application of machine learning models in computer-aided diagnosis.


2021 ◽  
Author(s):  
Siddharth Ghule ◽  
Sayan Bagchi ◽  
Kumar Vanka

<div>Electricity generation is a major contributing factor for greenhouse gas emissions. Energy storage systems available today have a combined capacity to store less than 1% of the electricity being consumed worldwide. Redox Flow Batteries (RFBs) are promising candidates for green and efficient energy storage systems. RFBs are being used in renewable energy systems, but their widespread adoption is limited due to high production costs and toxicity associated with the transition-metal-based redox-active species. Therefore, cheaper and greener alternative organic redox-active species are being investigated. Recent reports have shown organic molecules based on phenazine are promising candidates for redox-active species in RFBs. However, the large number of available organic compounds makes the conventional experimental and DFT methods impractical to screen thousands of molecules in a reasonable amount of time. In contrast, machine-learning models have low development time, short prediction time, and high accuracy; thus, are being heavily investigated for virtual screening applications. In this work, we developed machine-learning models to predict the redox potential of phenazine derivatives in DME solvent using a small dataset of 185 molecules. 2D, 3D, and Molecular Fingerprint features were computed using readily available and easy-to-use python libraries, making our approach easily adaptable to similar work. Twenty linear and non-linear machine-learning models were investigated in this work. These models achieved excellent performance on the unseen data (i.e., R<sup>2</sup> > 0.98, MSE < 0.008 V2 and MAE < 0.07 V). Model performance was assessed in a consistent manner using the training and evaluation pipeline developed in this work. We showed that 2D molecular features are most informative and achieve the best prediction accuracy among four feature sets. We also showed that often less preferred but relatively faster linear models could perform better than non-linear models when the feature set contains different types of features (i.e., 2D, 3D, and Molecular Fingerprints). Further investigations revealed that it is possible to reduce the training and inference time without sacrificing prediction accuracy by using a small subset of features. Moreover, models were able to predict the previously reported promising redox-active compounds with high accuracy. Also, significantly low prediction errors were observed for the functional groups. Although some functional groups had only one compound in the training set, best-performing models could achieve errors (MAPE) less than 10%. The major source of error was a lack of data near-zero and in the positive region. Therefore, this work shows that it is possible to develop accurate machine-learning models that could potentially screen millions of compounds in a short amount of time with a small training set and limited number of easy to compute features. Thus, results obtained in this report would help in the adoption of green energy by accelerating the field of materials discovery for energy storage applications.</div>


Sign in / Sign up

Export Citation Format

Share Document