Enhancing COVID-19 Epidemics Forecasting Accuracy by Combining Real-time and Historical Data from Social Media, Online News Articles, and Search Queries (Preprint)

2021 ◽  
Author(s):  
Jingwei Li ◽  
Wei Huang ◽  
Choon Ling Sia ◽  
Zhuo Chen ◽  
Tailai Wu ◽  
...  

BACKGROUND The SARS-COV-2 virus and its variants are posing extraordinary challenges for public health worldwide. More timely and accurate forecasting of COVID-19 epidemics is the key to maintaining timely interventions and policies and efficient resources allocation. Internet-based data sources have shown great potential to supplement traditional infectious disease surveillance, and the combination of different Internet-based data sources has shown greater power to enhance epidemic forecasting accuracy than using a single Internet-based data source. However, existing methods incorporating multiple Internet-based data sources only used real-time data from these sources as exogenous inputs, but didn’t take all the historical data into account. Moreover, the predictive power of different Internet-based data sources in providing early warning for COVID-19 outbreaks has not been fully explored. OBJECTIVE The main aim of our study is to explore whether combining real-time and historical data from multiple Internet-based sources could improve the COVID-19 forecasting accuracy over the existing baseline models. A secondary aim is to explore the COVID-19 forecasting timeliness based on different Internet-based data sources. METHODS We first used core terms and symptoms related keywords-based methods to extract COVID-19 related Internet-based data from December 21, 2019, to February 29, 2020. The Internet-based data we explored included 90,493,912 online news articles, 37,401,900 microblogs, and all the Baidu search query data during that period. We then proposed an autoregressive model with exogenous inputs, incorporating the real-time and historical data from multiple Internet-based sources. Our proposed model was compared with baseline models, and all the models were tested during the first wave of COVID-19 epidemics in Hubei province and the rest of mainland China separately. We also used the lagged Pearson correlations for the COVID-19 forecasting timeliness analysis. RESULTS Our proposed model achieved the highest accuracy in all the five accuracy measures, compared with all the baseline models in both Hubei province and the rest of mainland China. In mainland China except Hubei, the COVID-19 epidemics forecasting accuracy differences between our proposed model (model i) and all the other baseline models were statistically significant (model 1, t=–8.722, P<.001; model 2, t=–5.000, P<.001, model 3, t=–1.882, P =0.063, model 4, t=–4.644, P<.001; model 5, t=–4.488, P<.001). In Hubei province, our proposed model's forecasting accuracy improved significantly compared with the baseline model using historical COVID-19 new confirmed case counts only (model 1, t=–1.732, P=0.086). Our results also showed that Internet-based sources could provide a 2-6 days earlier warning for COVID-19 outbreaks. CONCLUSIONS Our approach incorporating real-time and historical data from multiple Internet-based sources could improve forecasting accuracy for COVID-19 epidemics and its variants, which may help improve public health agencies' interventions and resources allocation in mitigating and controlling new waves of COVID-19 or other epidemics.

Author(s):  
Nghiem Van Tinh

Over the past 25 years, numerous fuzzy time series forecasting models have been proposed to deal the complex and uncertain problems. The main factors that affect the forecasting results of these models are partition universe of discourse, creation of fuzzy relationship groups and defuzzification of forecasting output values. So, this study presents a hybrid fuzzy time series forecasting model combined particle swarm optimization (PSO) and fuzzy C-means clustering (FCM) for solving issues above. The FCM clustering is used to divide the historical data into initial intervals with unequal size. After generating interval, the historical data is fuzzified into fuzzy sets with the aim to serve for establishing fuzzy relationship groups according to chronological order. Then the information obtained from the fuzzy relationship groups can be used to calculate forecasted value based on a new defuzzification technique. In addition, in order to enhance forecasting accuracy, the PSO algorithm is used for finding optimum interval lengths in the universe of discourse. The proposed model is applied to forecast three well-known numerical datasets (enrolments data of the University of Alabama, the Taiwan futures exchange —TAIFEX data and yearly deaths in car road accidents in Belgium). These datasets are also examined by using some other forecasting models available in the literature. The forecasting results obtained from the proposed model are compared to those produced by the other models. It is observed that the proposed model achieves higher forecasting accuracy than its counterparts for both first—order and high—order fuzzy logical relationship.


2018 ◽  
Vol 7 (4.30) ◽  
pp. 281
Author(s):  
Nazirah Ramli ◽  
Siti Musleha Ab Mutalib ◽  
Daud Mohamad

This paper proposes an enhanced fuzzy time series (FTS) prediction model that can keep some information under a various level of confidence throughout the forecasting procedure. The forecasting accuracy is developed based on the similarity between the fuzzified historical data and the fuzzy forecast values. No defuzzification process involves in the proposed method. The frequency density method is used to partition the interval, and the area and height type of similarity measure is utilized to get the forecasting accuracy. The proposed model is applied in a numerical example of the unemployment rate in Malaysia. The results show that on average 96.9% of the forecast values are similar to the historical data. The forecasting error based on the distance of the similarity measure is 0.031. The forecasting accuracy can be obtained directly from the forecast values of trapezoidal fuzzy numbers form without experiencing the defuzzification procedure.


2021 ◽  
pp. 096228022110619
Author(s):  
Yuanke Qu ◽  
Chun Yin Lee ◽  
KF Lam

Infectious diseases, such as the ongoing COVID-19 pandemic, pose a significant threat to public health globally. Fatality rate serves as a key indicator for the effectiveness of potential treatments or interventions. With limited time and understanding of novel emerging epidemics, comparisons of the fatality rates in real-time among different groups, say, divided by treatment, age, or area, have an important role to play in informing public health strategies. We propose a statistical test for the null hypothesis of equal real-time fatality rates across multiple groups during an ongoing epidemic. An elegant property of the proposed test statistic is that it converges to a Brownian motion under the null hypothesis, which allows one to develop a sequential testing approach for rejecting the null hypothesis at the earliest possible time when statistical evidence accumulates. This property is particularly important as scientists and clinicians are competing with time to identify possible treatments or effective interventions to combat the emerging epidemic. The method is widely applicable as it only requires the cumulative number of confirmed cases, deaths, and recoveries. A large-scale simulation study shows that the finite-sample performance of the proposed test is highly satisfactory. The proposed test is applied to compare the difference in disease severity among Wuhan, Hubei province (exclude Wuhan) and mainland China (exclude Hubei) from February to March 2020. The result suggests that the disease severity is potentially associated with the health care resource availability during the early phase of the COVID-19 pandemic in mainland China.


2019 ◽  
Author(s):  
Canelle Poirier ◽  
Yulin Hswen ◽  
Guillaume Bouzillé ◽  
Marc Cuggia ◽  
Audrey Lavenu ◽  
...  

AbstractEffective and timely disease surveillance systems have the potential to help public health officials design interventions to mitigate the effects of disease outbreaks. Currently, healthcare-based disease monitoring systems in France offer influenza activity information that lags real-time by 1 to 3 weeks. This temporal data gap introduces uncertainty that prevents public health officials from having a timely perspective on the population-level disease activity. Here, we present a machine-learning modeling approach that produces real-time estimates and short-term forecasts of influenza activity for the 12 continental regions of France by leveraging multiple disparate data sources that include, Google search activity, real-time and local weather information, flu-related Twitter micro-blogs, electronic health records data, and historical disease activity synchronicities across regions. Our results show that all data sources contribute to improving influenza surveillance and that machine-learning ensembles that combine all data sources lead to accurate and timely predictions.Author summaryThe role of public health is to protect the health of populations by providing the right intervention to the right population at the right time. In France and all around the world, Influenza is a major public health problem. Traditional surveillance systems produce estimates of influenza-like illness (ILI) incidence rates, but with one-to three-week delay. Accurate real-time monitoring systems of influenza outbreaks could be useful for public health decisions. By combining different data sources and different statistical models, we propose an accurate and timely forecasting platform to track the flu in France at a spatial resolution that, to our knowledge, has not been explored before.


2021 ◽  
Vol 18 (5) ◽  
pp. 907-921
Author(s):  
Jiamin Liu ◽  
Ze Chen ◽  
Yanyan Ouyang ◽  
Xu Guo ◽  
Wangli Xu

2019 ◽  
Vol 35 (3) ◽  
pp. 267-292
Author(s):  
Nghiem Van Tinh ◽  
Nguyen Cong Dieu

Fuzzy time series (FTS) model is one of the effective tools that can be used to identify factors in order to solve the complex process and uncertainty. Nowadays, it has been widely used in many forecasting problems. However, establishing effective fuzzy relationships groups, finding proper length of each interval, and building defuzzification rule are three issues that exist in FTS model. Therefore, in this paper, a novel FTS forecasting model based on fuzzy C-means (FCM) clustering and particle swarm optimization (PSO) was developed to enhance the forecasting accuracy. Firstly, the FCM clustering is used to divide the historical data into intervals with different lengths. After generating interval, the historical data is fuzzified into fuzzy sets. Following, fuzzy relationship groups were established based on the appearance history of the fuzzy sets on the right-hand side of the fuzzy logical relationships with the aim to serve for calculating the forecasting output.  Finally, the proposed model combined with PSO algorithm was applied to adjust interval lengths and find proper intervals in the universe of discourse for obtaining the best forecasting accuracy. To verify the effectiveness of the forecasting model, three numerical datasets (enrolments data of the University of Alabama, the Taiwan futures exchange –TAIFEX data and yearly deaths in car road accidents in Belgium) are selected to illustrate the proposed model. The experimental results indicate that the proposed model is better than any existing forecasting models in term of forecasting accuracy based on the first – order and high-order FTS.


2019 ◽  
Vol 11 (1) ◽  
Author(s):  
Jyllisa Mabion

ObjectiveTo improve Texas Syndromic Surveillance by integrating data from the Texas Poison Center and Emergency Medical Services for opioid overdose surveillance.IntroductionIn recent years, the number of deaths from illicit and prescription opioids has increased significantly resulting in a national and local public health crisis. According to the Texas Center for Health Statistics, there were 1340 opioid related deaths in 2015.1 In 2005, by comparison, there were 913 opioid related deaths. Syndromic surveillance can be used to monitor overdose trends in near real-time and provide much needed information to public health officials. Texas Syndromic Surveillance (TxS2) is the statewide syndromic surveillance system hosted by the Texas Department of State Health Services (DSHS). To enhance the capabilities of TxS2 and to better understand the opioid epidemic, DSHS is integrating both Texas Poison Center (TPC) data and Emergency Medical Services (EMS) data into the system.Much of the data collected at public health organizations can be several years old by the time it is released for public use. As a result, there have been major efforts to integrate more real-time data sources for a variety of surveillance needs and during emergency response activities.MethodsGuided by the Oregon Public Health Division’s successful integration of poison data into Oregon ESSENCE, DSHS has followed a similar path.2 DSHS already receives TPC data from the Commission on State Emergency Communication (CSEC), hence copying and routing that data into TxS2 requires a Memorandum of Understanding (MOU) with CSEC, which is charged with administering the implementation of the Texas Poison Control Network.EMS records are currently received by the DSHS Office of Injury Prevention (OIP) via file upload and extracted from web services as an XML file. Regional and Local Health Operations, the division where the syndromic surveillance program is located, and OIP, are both sections within DSHS. Therefore, it is not necessary to have a formal MOU in place. Both parties would operate under the rules and regulations that are established for data under the Community Health Improvement Division.CSEC and EMS will push data extracts to a DSHS SFTP folder location for polling by Rhapsody in Amazon Web Services. The message data will be extracted and transformed into the ESSENCE database format. Data are received at least once every 24 hours.ResultsTxS2 will now include TPC and EMS data, giving system users the ability to analyze and overlay real-time data for opioid overdose surveillance in one application. The integration of these data sources in TxS2 can be used for both routine surveillance and for unexpected public health events. This effort has led to discussions on how different sections within DSHS can collaborate by using syndromic surveillance data, and has generated interest in incorporating additional data streams into TxS2 in the future.ConclusionsWhile this venture is still a work in progress, it is anticipated that adding TPC and EMS data to TxS2 will be beneficial in surveilling not just opioid overdoses but other conditions and illnesses, as well as capturing disaster related injuries.References1. Texas Health Data, Center for Health Statistics [Internet]. Austin (TX): Department of State Health Services. Available from: http://healthdata.dshs.texas.gov/Opioids/Deaths2. Laing R, Powell M. Integrating Poison Center Data into Oregon ESSENCE using a Low-Cost Solution. OJPHI. 2017 May 1; 9(1).


Author(s):  
Jingwei (Louis) Li ◽  
Choon Ling Sia ◽  
Zhuo Chen ◽  
Wei (Wayne) Huang

Real-time online data sources have contributed to timely and accurate forecasting of influenza activities while also suffered from instability and linguistic noise. Few previous studies have focused on unofficial online news articles, which are abundant in their numbers, rich in information, and relatively low in noise. This study examined whether monitoring both official and unofficial online news articles can improve influenza activity forecasting accuracy during influenza outbreaks. Data were retrieved from a Chinese commercial online platform and the website of the Chinese National Influenza Center. We modeled weekly fractions of influenza-related online news articles and compared them against weekly influenza-like illness (ILI) rates using autoregression analyses. We retrieved 153,958,695 and 149,822,871 online news articles focusing on the south and north of mainland China separately from 6 October 2019 to 17 May 2020. Our model based on online news articles could significantly improve the forecasting accuracy, compared to other influenza surveillance models based on historical ILI rates (p = 0.002 in the south; p = 0.000 in the north) or adding microblog data as an exogenous input (p = 0.029 in the south; p = 0.000 in the north). Our finding also showed that influenza forecasting based on online news articles could be 1–2 weeks ahead of official ILI surveillance reports. The results revealed that monitoring online news articles could supplement traditional influenza surveillance systems, improve resource allocation, and offer models for surveillance of other emerging diseases.


Author(s):  
Munesh Chandra Trivedi ◽  
Virendra Kumar Yadav ◽  
Avadhesh Kumar Gupta

<p>Data warehouse generally contains both types of data i.e. historical &amp; current data from various data sources. Data warehouse in world of computing can be defined as system created for analysis and reporting of these both types of data. These analysis report is then used by an organization to make decisions which helps them in their growth. Construction of data warehouse appears to be simple, collection of data from data sources into one place (after extraction, transform and loading). But construction involves several issues such as inconsistent data, logic conflicts, user acceptance, cost, quality, security, stake holder’s contradictions, REST alignment etc. These issues need to be overcome otherwise will lead to unfortunate consequences affecting the organization growth. Proposed model tries to solve these issues such as REST alignment, stake holder’s contradiction etc. by involving experts of various domains such as technical, analytical, decision makers, management representatives etc. during initialization phase to better understand the requirements and mapping these requirements to data sources during design phase of data warehouse.</p>


Sensors ◽  
2021 ◽  
Vol 21 (21) ◽  
pp. 7001
Author(s):  
Miloš Simić ◽  
Goran Sladić ◽  
Miroslav Zarić ◽  
Branko Markoski

Edge computing offers cloud services closer to data sources and end-users, making the foundation for novel applications. The infrastructure deployment is taking off, bringing new challenges: how to use geo-distribution properly, or harness the advantages of having resources at a specific location? New real-time applications require multi-tier infrastructure, preferably doing data preprocessing locally, but using the cloud for heavy workloads. We present a model, able to organize geo-distributed nodes into micro clouds dynamically, allowing resource reorganization to best serve population needs. Such elasticity is achieved by relying on cloud organization principles, adapted for a different environment. The desired state is specified descriptively, and the system handles the rest. As such, infrastructure is abstracted to the software level, thus enabling “infrastructure as software” at the edge. We argue about blending the proposed model into existing tools, allowing cloud providers to offer future micro clouds as a service.


Sign in / Sign up

Export Citation Format

Share Document