scholarly journals Real-Time Forecasting of the COVID-19 Outbreak in Chinese Provinces: Machine Learning Approach Using Novel Digital Data and Estimates From Mechanistic Models

10.2196/20285 ◽  
2020 ◽  
Vol 22 (8) ◽  
pp. e20285
Author(s):  
Dianbo Liu ◽  
Leonardo Clemente ◽  
Canelle Poirier ◽  
Xiyu Ding ◽  
Matteo Chinazzi ◽  
...  

Background The inherent difficulty of identifying and monitoring emerging outbreaks caused by novel pathogens can lead to their rapid spread; and if left unchecked, they may become major public health threats to the planet. The ongoing coronavirus disease (COVID-19) outbreak, which has infected over 2,300,000 individuals and caused over 150,000 deaths, is an example of one of these catastrophic events. Objective We present a timely and novel methodology that combines disease estimates from mechanistic models and digital traces, via interpretable machine learning methodologies, to reliably forecast COVID-19 activity in Chinese provinces in real time. Methods Our method uses the following as inputs: (a) official health reports, (b) COVID-19–related internet search activity, (c) news media activity, and (d) daily forecasts of COVID-19 activity from a metapopulation mechanistic model. Our machine learning methodology uses a clustering technique that enables the exploitation of geospatial synchronicities of COVID-19 activity across Chinese provinces and a data augmentation technique to deal with the small number of historical disease observations characteristic of emerging outbreaks. Results Our model is able to produce stable and accurate forecasts 2 days ahead of the current time and outperforms a collection of baseline models in 27 out of 32 Chinese provinces. Conclusions Our methodology could be easily extended to other geographies currently affected by COVID-19 to aid decision makers with monitoring and possibly prevention.


2020 ◽  
Author(s):  
Canelle Poirier ◽  
Dianbo Liu ◽  
Leonardo Clemente ◽  
Xiyu Ding ◽  
Matteo Chinazzi ◽  
...  

BACKGROUND The inherent difficulty of identifying and monitoring emerging outbreaks caused by novel pathogens can lead to their rapid spread; and if left unchecked, they may become major public health threats to the planet. The ongoing coronavirus disease (COVID-19) outbreak, which has infected over 2,300,000 individuals and caused over 150,000 deaths, is an example of one of these catastrophic events. OBJECTIVE We present a timely and novel methodology that combines disease estimates from mechanistic models and digital traces, via interpretable machine learning methodologies, to reliably forecast COVID-19 activity in Chinese provinces in real time. METHODS Our method uses the following as inputs: (a) official health reports, (b) COVID-19–related internet search activity, (c) news media activity, and (d) daily forecasts of COVID-19 activity from a metapopulation mechanistic model. Our machine learning methodology uses a clustering technique that enables the exploitation of geospatial synchronicities of COVID-19 activity across Chinese provinces and a data augmentation technique to deal with the small number of historical disease observations characteristic of emerging outbreaks. RESULTS Our model is able to produce stable and accurate forecasts 2 days ahead of the current time and outperforms a collection of baseline models in 27 out of 32 Chinese provinces. CONCLUSIONS Our methodology could be easily extended to other geographies currently affected by COVID-19 to aid decision makers with monitoring and possibly prevention.



Author(s):  
Dianbo Liu ◽  
Leonardo Clemente ◽  
Canelle Poirier ◽  
Xiyu Ding ◽  
Matteo Chinazzi ◽  
...  

UNSTRUCTURED The inherent difficulty of identifying and monitoring emerging outbreaks caused by novel pathogens can lead to their rapid spread; and if left unchecked, they may become major public health threats to the planet. The ongoing coronavirus disease (COVID-19) outbreak, which has infected over 2,300,000 individuals and caused over 150,000 deaths, is an example of one of these catastrophic events. We present a timely and novel methodology that combines disease estimates from mechanistic models and digital traces, via interpretable machine learning methodologies, to reliably forecast COVID-19 activity in Chinese provinces in real time. Our method uses the following as inputs: (a) official health reports, (b) COVID-19–related internet search activity, (c) news media activity, and (d) daily forecasts of COVID-19 activity from a metapopulation mechanistic model. Our machine learning methodology uses a clustering technique that enables the exploitation of geospatial synchronicities of COVID-19 activity across Chinese provinces and a data augmentation technique to deal with the small number of historical disease observations characteristic of emerging outbreaks. Our model is able to produce stable and accurate forecasts 2 days ahead of the current time and outperforms a collection of baseline models in 27 out of 32 Chinese provinces. Our methodology could be easily extended to other geographies currently affected by COVID-19 to aid decision makers with monitoring and possibly prevention.



10.2196/23996 ◽  
2020 ◽  
Vol 22 (9) ◽  
pp. e23996
Author(s):  
Dianbo Liu ◽  
Leonardo Clemente ◽  
Canelle Poirier ◽  
Xiyu Ding ◽  
Matteo Chinazzi ◽  
...  



2021 ◽  
Author(s):  
Aurore Lafond ◽  
Maurice Ringer ◽  
Florian Le Blay ◽  
Jiaxu Liu ◽  
Ekaterina Millan ◽  
...  

Abstract Abnormal surface pressure is typically the first indicator of a number of problematic events, including kicks, losses, washouts and stuck pipe. These events account for 60–70% of all drilling-related nonproductive time, so their early and accurate detection has the potential to save the industry billions of dollars. Detecting these events today requires an expert user watching multiple curves, which can be costly, and subject to human errors. The solution presented in this paper is aiming at augmenting traditional models with new machine learning techniques, which enable to detect these events automatically and help the monitoring of the drilling well. Today’s real-time monitoring systems employ complex physical models to estimate surface standpipe pressure while drilling. These require many inputs and are difficult to calibrate. Machine learning is an alternative method to predict pump pressure, but this alone needs significant labelled training data, which is often lacking in the drilling world. The new system combines these approaches: a machine learning framework is used to enable automated learning while the physical models work to compensate any gaps in the training data. The system uses only standard surface measurements, is fully automated, and is continuously retrained while drilling to ensure the most accurate pressure prediction. In addition, a stochastic (Bayesian) machine learning technique is used, which enables not only a prediction of the pressure, but also the uncertainty and confidence of this prediction. Last, the new system includes a data quality control workflow. It discards periods of low data quality for the pressure anomaly detection and enables to have a smarter real-time events analysis. The new system has been tested on historical wells using a new test and validation framework. The framework runs the system automatically on large volumes of both historical and simulated data, to enable cross-referencing the results with observations. In this paper, we show the results of the automated test framework as well as the capabilities of the new system in two specific case studies, one on land and another offshore. Moreover, large scale statistics enlighten the reliability and the efficiency of this new detection workflow. The new system builds on the trend in our industry to better capture and utilize digital data for optimizing drilling.



2019 ◽  
Author(s):  
Marina Esteban ◽  
María Peña-Chilet ◽  
Carlos Loucera ◽  
Joaquín Dopazo

AbstractBackgroundIn spite of the abundance of genomic data, predictive models that describe phenotypes as a function of gene expression or mutations are difficult to obtain because they are affected by the curse of dimensionality, given the disbalance between samples and candidate genes. And this is especially dramatic in scenarios in which the availability of samples is difficult, such as the case of rare diseases.ResultsThe application of multi-output regression machine learning methodologies to predict the potential effect of external proteins over the signaling circuits that trigger Fanconi anemia related cell functionalities, inferred with a mechanistic model, allowed us to detect over 20 potential therapeutic targets.ConclusionsThe use of artificial intelligence methods for the prediction of potentially causal relationships between proteins of interest and cell activities related with disease-related phenotypes opens promising avenues for the systematic search of new targets in rare diseases.



10.2196/19348 ◽  
2020 ◽  
Vol 7 (9) ◽  
pp. e19348
Author(s):  
Michael Leo Birnbaum ◽  
Prathamesh "Param" Kulkarni ◽  
Anna Van Meter ◽  
Victor Chen ◽  
Asra F Rizvi ◽  
...  

Background Psychiatry is nearly entirely reliant on patient self-reporting, and there are few objective and reliable tests or sources of collateral information available to help diagnostic and assessment procedures. Technology offers opportunities to collect objective digital data to complement patient experience and facilitate more informed treatment decisions. Objective We aimed to develop computational algorithms based on internet search activity designed to support diagnostic procedures and relapse identification in individuals with schizophrenia spectrum disorders. Methods We extracted 32,733 time-stamped search queries across 42 participants with schizophrenia spectrum disorders and 74 healthy volunteers between the ages of 15 and 35 (mean 24.4 years, 44.0% male), and built machine-learning diagnostic and relapse classifiers utilizing the timing, frequency, and content of online search activity. Results Classifiers predicted a diagnosis of schizophrenia spectrum disorders with an area under the curve value of 0.74 and predicted a psychotic relapse in individuals with schizophrenia spectrum disorders with an area under the curve of 0.71. Compared with healthy participants, those with schizophrenia spectrum disorders made fewer searches and their searches consisted of fewer words. Prior to a relapse hospitalization, participants with schizophrenia spectrum disorders were more likely to use words related to hearing, perception, and anger, and were less likely to use words related to health. Conclusions Online search activity holds promise for gathering objective and easily accessed indicators of psychiatric symptoms. Utilizing search activity as collateral behavioral health information would represent a major advancement in efforts to capitalize on objective digital data to improve mental health monitoring.



2019 ◽  
Author(s):  
Marina Esteban-Medina ◽  
María Peña-Chilet ◽  
Carlos Loucera ◽  
Joaquin Dopazo

Abstract Background In spite of the abundance of genomic data, predictive models that describe phenotypes as a function of gene expression or mutations are difficult to obtain because they are affected by the curse of dimensionality, given the disbalance between samples and candidate genes. And this is especially dramatic in scenarios in which the availability of samples is difficult, such as the case of rare diseases. Results The application of multi-output regression machine learning methodologies to predict the potential effect of external proteins over the signaling circuits that trigger Fanconi anemia related cell functionalities, inferred with a mechanistic model, allowed us to detect over 20 potential therapeutic targets. Conclusions The use of artificial intelligence methods for the prediction of potentially causal relationships between proteins of interest and cell activities related with disease-related phenotypes opens promising avenues for the systematic search of new targets in rare diseases.



2020 ◽  
Author(s):  
Michael Leo Birnbaum ◽  
Prathamesh "Param" Kulkarni ◽  
Anna Van Meter ◽  
Victor Chen ◽  
Asra F Rizvi ◽  
...  

BACKGROUND Psychiatry is nearly entirely reliant on patient self-reporting, and there are few objective and reliable tests or sources of collateral information available to help diagnostic and assessment procedures. Technology offers opportunities to collect objective digital data to complement patient experience and facilitate more informed treatment decisions. OBJECTIVE We aimed to develop computational algorithms based on internet search activity designed to support diagnostic procedures and relapse identification in individuals with schizophrenia spectrum disorders. METHODS We extracted 32,733 time-stamped search queries across 42 participants with schizophrenia spectrum disorders and 74 healthy volunteers between the ages of 15 and 35 (mean 24.4 years, 44.0% male), and built machine-learning diagnostic and relapse classifiers utilizing the timing, frequency, and content of online search activity. RESULTS Classifiers predicted a diagnosis of schizophrenia spectrum disorders with an area under the curve value of 0.74 and predicted a psychotic relapse in individuals with schizophrenia spectrum disorders with an area under the curve of 0.71. Compared with healthy participants, those with schizophrenia spectrum disorders made fewer searches and their searches consisted of fewer words. Prior to a relapse hospitalization, participants with schizophrenia spectrum disorders were more likely to use words related to hearing, perception, and anger, and were less likely to use words related to health. CONCLUSIONS Online search activity holds promise for gathering objective and easily accessed indicators of psychiatric symptoms. Utilizing search activity as collateral behavioral health information would represent a major advancement in efforts to capitalize on objective digital data to improve mental health monitoring.



Geophysics ◽  
2021 ◽  
pp. 1-69
Author(s):  
Jorge Nustes Andrade ◽  
Mirko van der Baan

The spatiotemporal distribution of hydraulic fracturing microseismicity is complicated and depends on various mechanical and diffusional parameters. Hydraulic fracture modeling can aid in understanding fracture propagation and microseismicity. Nevertheless, the complex spatial and temporal interaction of several processes occurring within and around the fracture represents a challenge for developing real-time tools for microseismic prediction. Two approaches were developed to forecast the microseismic cloud size in real-time. The first approach uses fracture propagation models to derive the cloud size directly from the microseismic observations. The second approach is based on a convolutional neural network (CNN) trained with the engineering parameters and past microseismic cloud size values. A rolling-forecasting strategy is employed to train consecutive CNN models in real-time to make predictions at a specified time lag. A data augmentation technique known as double noise injection is used to ensure that the amount of training examples available to the machine learning models at each time step is similar or larger than the number of free parameters. Results show that the CNN outperforms the quality of predictions of the physics-based models but with a reduced prediction capability. The physics-based approach can predict growth at any time but ignores the engineering parameters. In addition, the physics-based methods lead to real-time insights into the fracturing regime, revealing whether microseismicity is most likely generated due to a leak-off-dominated or a storage-dominated regime. The CNN model can forecast the cloud size only at a single future time lag while using the engineering parameters and past cloud growth as input. However, this approach does not provide a physical interpretation of the fracture propagation regime. The prediction accuracy of both methodologies varies depending on the microseismic behavior. We postulate that the CNN forecasts could be improved by including more physical constraints into the predictive model.



Sign in / Sign up

Export Citation Format

Share Document