scholarly journals Time series analysis of hemorrhagic fever with renal syndrome in mainland China by using an XGBoost forecasting model

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Cai-Xia Lv ◽  
Shu-Yi An ◽  
Bao-Jun Qiao ◽  
Wei Wu

Abstract Background Hemorrhagic fever with renal syndrome (HFRS) is still attracting public attention because of its outbreak in various cities in China. Predicting future outbreaks or epidemics disease based on past incidence data can help health departments take targeted measures to prevent diseases in advance. In this study, we propose a multistep prediction strategy based on extreme gradient boosting (XGBoost) for HFRS as an extension of the one-step prediction model. Moreover, the fitting and prediction accuracy of the XGBoost model will be compared with the autoregressive integrated moving average (ARIMA) model by different evaluation indicators. Methods We collected HFRS incidence data from 2004 to 2018 of mainland China. The data from 2004 to 2017 were divided into training sets to establish the seasonal ARIMA model and XGBoost model, while the 2018 data were used to test the prediction performance. In the multistep XGBoost forecasting model, one-hot encoding was used to handle seasonal features. Furthermore, a series of evaluation indices were performed to evaluate the accuracy of the multistep forecast XGBoost model. Results There were 200,237 HFRS cases in China from 2004 to 2018. A long-term downward trend and bimodal seasonality were identified in the original time series. According to the minimum corrected akaike information criterion (CAIC) value, the optimal ARIMA (3, 1, 0) × (1, 1, 0)12 model is selected. The index ME, RMSE, MAE, MPE, MAPE, and MASE indices of the XGBoost model were higher than those of the ARIMA model in the fitting part, whereas the RMSE of the XGBoost model was lower. The prediction performance evaluation indicators (MAE, MPE, MAPE, RMSE and MASE) of the one-step prediction and multistep prediction XGBoost model were all notably lower than those of the ARIMA model. Conclusions The multistep XGBoost prediction model showed a much better prediction accuracy and model stability than the multistep ARIMA prediction model. The XGBoost model performed better in predicting complicated and nonlinear data like HFRS. Additionally, Multistep prediction models are more practical than one-step prediction models in forecasting infectious diseases.

2020 ◽  
Author(s):  
Cai-Xia Lv ◽  
Shu-Yi An ◽  
Bao-Jun Qiao ◽  
Wei Wu

Abstract Background: Hemorrhagic fever with renal syndrome is still attracting public attention because of its outbreak in various cities in China. It is one of the effective preventive measures to predict the peak incidence rate in the future based on the past incidence data, and implement targeted actions. In this study, we propose a multi-step prediction strategy based on XGBoost for hemorrhagic fever with renal syndrome as an extension of the one-step prediction model. Moreover, the fitting and prediction accuracy of XGBoost model will be compared with seasonal ARIMA model by different evaluation indicators. Methods: We collected monthly hemorrhagic fever with renal syndrome incidence data from 2004 to 2018 in mainland China .The part from 2004 to 2017 was divided as training set to establish the seasonal ARIMA model and XGBoost model. The rest 2018 data was used to test the prediction outcomes. In multi-step forecasting XGBoost model, one-hot encoding was used to handle seasonal features. Furthermore, series of evaluation index(MAE,MPE,MAPE,RMSE,MASE, ACF1, Theil’s U) were performed to evaluate the accuracy of multi-step forecast XGBoost model.Results: There were totally 200237 HFRS cases in China from 2004 to 2018. A slightly long-term downward trend and obvious bimodal peak seasonal character were identified in the original time series. According to the minimum CAIC value, the optimal ARIMA (3,1,0) × (1,1,0)12 model is selected. the ME , MAE, MPE, MAPE, MASE of XGBoost were higher than ARIMA model in the fitting part, whereas the RMSE of XGBoost was lower. The evaluation indicators (MAE, MPE, MAPE, RMSE , MASE) of the one-step prediction and multi-step prediction XGBoost model are all notably lower than the ARIMA model in prediction performance. Conclusions: The multi-step prediction XGBoost model showed a much better prediction accuracy and model stability in HFRS disease . In general, compared to the seasonal ARIMA model, the XGBoost model performs better when predicting complicated and non-linear data like Hemorrhagic fever with renal syndrome. Additionally Multi-step prediction models are more practical than one-step prediction in forecasting infectious diseases.


2021 ◽  
Vol 9 (2) ◽  
pp. 334-344
Author(s):  
Sapana Sharma ◽  
Sanju Karol

Many developed and developing countries are at the core of the security and peace agenda concerning rising defense expenditure and its enduring sustainability. The unremitting upsurge in defense expenditure pressurizes the government to rationally manage the resources so as to provide security and peace services in the most efficient, effective and equitable way. It is necessary to forecast the defense expenditure in India which leads the policy makers to execute reforms in order to detract burdens on these resources, as well as introduce appropriate plan strategies on the basis of rational decision making for the issues that may arise. The purpose of this study is to investigate the appropriate type of model based on the Box–Jenkins methodology to forecast defense expenditure in India. The present study applies the one-step ahead forecasting method for annual data over the period 1961 to 2020. The results show that ARIMA (1,1,1) model with static forecasting being the most appropriate to forecast the India’s defense expenditure.


2018 ◽  
Vol 146 (13) ◽  
pp. 1680-1688 ◽  
Author(s):  
Ling Sun ◽  
Lu-Xi Zou

AbstractHemorrhagic fever with renal syndrome (HFRS) caused by hantaviruses is a serious public health problem in China, accounting for 90% of HFRS cases reported globally. In this study, we applied geographical information system (GIS), spatial autocorrelation analyses and a seasonal autoregressive-integrated moving average (SARIMA) model to describe and predict HFRS epidemic with the objective of monitoring and forecasting HFRS in mainland China. Chinese HFRS data from 2004 to 2016 were obtained from National Infectious Diseases Reporting System (NIDRS) database and Chinese Centre for Disease Control and Prevention (CDC). GIS maps were produced to detect the spatial distribution of HFRS cases. The Moran's I was adopted in spatial global autocorrelation analysis to identify the integral spatiotemporal pattern of HFRS outbreaks, while the local Moran's Ii was performed to identify ‘hotspot’ regions of HFRS at province level. A fittest SARIMA model was developed to forecast HFRS incidence in the year 2016, which was selected by Akaike information criterion and Ljung–Box test. During 2004–2015, a total of 165 710 HFRS cases were reported with the average annual incidence at province level ranged from 0 to 13.05 per 100 000 persons. Global Moran's I analysis showed that the HFRS outbreaks presented spatially clustered distribution, with the degree of cluster gradually decreasing from 2004 to 2009, then turned out to be randomly distributed and reached lowest point in 2012. Local Moran's Ii identified that four provinces in northeast China contributed to a ‘high–high’ cluster as a traditional epidemic centre, and Shaanxi became another HFRS ‘hotspot’ region since 2011. The monthly incidence of HFRS decreased sharply from 2004 to 2009 in mainland China, then increased markedly from 2010 to 2012, and decreased again since 2013, with obvious seasonal fluctuations. The SARIMA ((0,1,3) × (1,0,1)12) model was the most fittest forecasting model for the dataset of HFRS in mainland China. The spatiotemporal distribution of HFRS in mainland China varied in recent years; together with the SARIMA forecasting model, this study provided several potential decision supportive tools for the control and risk-management plan of HFRS in China.


2020 ◽  
Vol 10 (11) ◽  
pp. 4083-4102
Author(s):  
Abelardo Montesinos-López ◽  
Humberto Gutierrez-Pulido ◽  
Osval Antonio Montesinos-López ◽  
José Crossa

Due to the ever-increasing data collected in genomic breeding programs, there is a need for genomic prediction models that can deal better with big data. For this reason, here we propose a Maximum a posteriori Threshold Genomic Prediction (MAPT) model for ordinal traits that is more efficient than the conventional Bayesian Threshold Genomic Prediction model for ordinal traits. The MAPT performs the predictions of the Threshold Genomic Prediction model by using the maximum a posteriori estimation of the parameters, that is, the values of the parameters that maximize the joint posterior density. We compared the prediction performance of the proposed MAPT to the conventional Bayesian Threshold Genomic Prediction model, the multinomial Ridge regression and support vector machine on 8 real data sets. We found that the proposed MAPT was competitive with regard to the multinomial and support vector machine models in terms of prediction performance, and slightly better than the conventional Bayesian Threshold Genomic Prediction model. With regard to the implementation time, we found that in general the MAPT and the support vector machine were the best, while the slowest was the multinomial Ridge regression model. However, it is important to point out that the successful implementation of the proposed MAPT model depends on the informative priors used to avoid underestimation of variance components.


Sensors ◽  
2018 ◽  
Vol 18 (9) ◽  
pp. 2983 ◽  
Author(s):  
Tiago Oliveira ◽  
Ana Silva ◽  
Ken Satoh ◽  
Vicente Julian ◽  
Pedro Leão ◽  
...  

Prediction in health care is closely related with the decision-making process. On the one hand, accurate survivability prediction can help physicians decide between palliative care or other practice for a patient. On the other hand, the notion of remaining lifetime can be an incentive for patients to live a fuller and more fulfilling life. This work presents a pipeline for the development of survivability prediction models and a system that provides survivability predictions for years one to five after the treatment of patients with colon or rectal cancer. The functionalities of the system are made available through a tool that balances the number of necessary inputs and prediction performance. It is mobile-friendly and facilitates the access of health care professionals to an instrument capable of enriching their practice and improving outcomes. The performance of survivability models was compared with other existing works in the literature and found to be an improvement over the current state of the art. The underlying system is capable of recalculating its prediction models upon the addition of new data, continuously evolving as time passes.


PLoS ONE ◽  
2021 ◽  
Vol 16 (3) ◽  
pp. e0248597
Author(s):  
Guo-hua Ye ◽  
Mirxat Alim ◽  
Peng Guan ◽  
De-sheng Huang ◽  
Bao-sen Zhou ◽  
...  

Objective Hemorrhagic fever with renal syndrome (HFRS), one of the main public health concerns in mainland China, is a group of clinically similar diseases caused by hantaviruses. Statistical approaches have always been leveraged to forecast the future incidence rates of certain infectious diseases to effectively control their prevalence and outbreak potential. Compared to the use of one base model, model stacking can often produce better forecasting results. In this study, we fitted the monthly reported cases of HFRS in mainland China with a model stacking approach and compared its forecasting performance with those of five base models. Method We fitted the monthly reported cases of HFRS ranging from January 2004 to June 2019 in mainland China with an autoregressive integrated moving average (ARIMA) model; the Holt-Winter (HW) method, seasonal decomposition of the time series by LOESS (STL); a neural network autoregressive (NNAR) model; and an exponential smoothing state space model with a Box-Cox transformation; ARMA errors; and trend and seasonal components (TBATS), and we combined the forecasting results with the inverse rank approach. The forecasting performance was estimated based on several accuracy criteria for model prediction, including the mean absolute percentage error (MAPE), root-mean-squared error (RMSE) and mean absolute error (MAE). Result There was a slight downward trend and obvious seasonal periodicity inherent in the time series data for HFRS in mainland China. The model stacking method was selected as the best approach with the best performance in terms of both fitting (RMSE 128.19, MAE 85.63, MAPE 8.18) and prediction (RMSE 151.86, MAE 118.28, MAPE 13.16). Conclusion The results showed that model stacking by using the optimal mean forecasting weight of the five abovementioned models achieved the best performance in terms of predicting HFRS one year into the future. This study has corroborated the conclusion that model stacking is an easy way to enhance prediction accuracy when modeling HFRS.


Author(s):  
Tania Dehesh ◽  
H.A. Mardani-Fard ◽  
Paria Dehesh

AbstractBackgroundThe epidemic of a novel coronavirus illness (COVID-19) becomes as a global threat. The aim of this study is first to find the best prediction models for daily confirmed cases in countries with high number of confirmed cases in the world and second to predict confirmed cases with these models in order to have more readiness in healthcare systems.MethodsThis study was conducted based on daily confirmed cases of COVID-19 that were collected from the official website of Johns Hopkins University from January 22th, 2020 to March 1th, 2020. Auto Regressive Integrated Moving Average (ARIMA) model was used to predict the trend of confirmed cases. Stata version 12 were used.ResultsParameters used for ARIMA were (2,1,0) for Mainland China, ARIMA (2,2,2) for Italy, ARIMA(1,0,0) for South Korea, ARIMA (2,3,0) for Iran, and ARIMA(3,1,0) for Thailand. Mainland China and Thailand had almost a stable trend. The trend of South Korea was decreasing and will become stable in near future. Iran and Italy had unstable trends.ConclusionsMainland China and Thailand were successful in haltering COVID-19 epidemic. Investigating their protocol in this control like quarantine should be in the first line of other countries’ program


Author(s):  
Vaibhavi Rajendran ◽  
G Bharadwaja Kumar

A speech synthesizer which sounds similar to a human voice is preferred over a robotic voice, and hence to increase the naturalness of a speech synthesizer an efficacious prosody model is imperative. Hence, this paper is focused on developing a prosody prediction model using sentiment analysis for a Tamil speech synthesizer. Two variations of prosody prediction models using SentiWordNet are experimented: one without a stemmer and the other with a stemmer. The prosody prediction model with a stemmer performs much more efficiently than the one without a stemmer as it tackles the highly agglutinative and inflectional words in Tamil language in a better way and is exemplified clearly, in this paper. The performance of the prosody prediction model with a stemmer has a higher classification accuracy of 77% on the test set in comparison to the 57% accuracy by the prosody model without a stemmer. 


2018 ◽  
Vol 7 (1) ◽  
pp. 64
Author(s):  
NOVIAN ENDI GUNAWAN ◽  
I WAYAN SUMARJAYA ◽  
I GUSTI AYU MADE SRINADI

Forecasting is a way to predict future events. One model in forecasting is a transfer function. The transfer function is a forecasting model that combines characteristics of the ARIMA model with some characteristics of regression analysis. Dengue Hemorrhagic Fever is a major problem in Bali. Recorded Bali Province ranked fourth in the spread of dengue virus and Denpasar City ranked first in the number of death cases of Dengue Hemorrhagic Fever. The purpose of this research is to know the multivariate transfer function model and the prediction of people with Dengue Hemorrhagic Fever in Denpasar City based on the level of rain and humidity. Forecasting results in 2017 in January to June were 46, 51, 226, 625, 1064, 1001, and 580 peoples with a percentage error model transfer function of 17.2%.


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. e14530-e14530
Author(s):  
Petri Bono ◽  
Jussi Ekström ◽  
Matti K Karvonen ◽  
Jami Mandelin ◽  
Jussi Koivunen

e14530 Background: Bexmarilimab, an investigational immunotherapeutic antibody targeting Clever-1, is currently investigated in phase I/II MATINS study (NCT03733990) for advanced solid tumors. Machine learning (ML) based models combining extensive data could be generated to predict treatment responses to this first-in-class macrophage checkpoint inhibitor. Methods: 58 baseline features from 30 patients included in the part 1 of phase I/II MATINS trial were included in ML modelling. Seven patients were classified as benefitting from the therapy by RECIST 1.1 (PR or SD response in target or non-target lesions). Initial feature selection was done using a combination of domain knowledge and removal of features with several missing values resulting in 20 clinically relevant features from 25 patients. The remaining data was standardized and feature selection using variance analysis (ANOVA) based on F-values between response and features was performed. With this approach, the number of features could be further reduced as the prediction performance increased until the most important features were included in the model. Several prediction models were trained, and prediction performance evaluated using leave-one-out cross-validation (LOOCV), with and without SMOTE oversampling of the positive class of the training data inside each LOOCV fold. In LOOCV the prediction model was trained 25 times. Stacked meta classifier with SMOTE oversampling combining three classifiers: elastic-net logistic regression, random forest and extreme gradient boosting was chosen as the best performing prediction model. Results: Seven baseline features were associated with bexmarilimab treatment benefit. Increasing bexmarilimab dose and high tumor FoxP3 cells showed positive benefit. On contrary, high baseline blood neutrophils, CD4, T-cells, B-cells, and CXCL10 indicated negative relationship to the treatment benefit. The ML model trained with these seven features performed well in LOOCV as 6/7 benefitting and 16/18 non-benefitting were classified correctly, and all considered classification performance metrics were good. In feature importance analysis, low baseline CXCL10 and neutrophils were characterized as the most important predictors for treatment benefit with values of 0.19 and 0.16. Conclusions: This study highlights possibility of using ML models in predicting treatment benefit for novel cancer drugs such as bexmarilimab and boost the clinical development. These findings are in line of expected immune activation of bexmarilimab treatment. The generated ML models should be further validated in a larger patient cohort. Clinical trial information: NCT03733990.


Sign in / Sign up

Export Citation Format

Share Document