Capacity estimation of batteries: Influence of training dataset size and diversity on data driven prognostic models

2021 ◽  
Vol 216 ◽  
pp. 108048
Author(s):  
Vijay Mohan Nagulapati ◽  
Hyunjun Lee ◽  
DaWoon Jung ◽  
Boris Brigljevic ◽  
Yunseok Choi ◽  
...  
Energy and AI ◽  
2021 ◽  
pp. 100089
Author(s):  
Vijay Mohan Nagulapati ◽  
Hyunjun Lee ◽  
DaWoon Jung ◽  
SalaiSargunan S. Paramanantham ◽  
Boris Brigljevic ◽  
...  

2021 ◽  
Vol 5 (1) ◽  
Author(s):  
Kara-Louise Royle ◽  
David A. Cairns

Abstract Background The United Kingdom Myeloma Research Alliance (UK-MRA) Myeloma Risk Profile is a prognostic model for overall survival. It was trained and tested on clinical trial data, aiming to improve the stratification of transplant ineligible (TNE) patients with newly diagnosed multiple myeloma. Missing data is a common problem which affects the development and validation of prognostic models, where decisions on how to address missingness have implications on the choice of methodology. Methods Model building The training and test datasets were the TNE pathways from two large randomised multicentre, phase III clinical trials. Potential prognostic factors were identified by expert opinion. Missing data in the training dataset was imputed using multiple imputation by chained equations. Univariate analysis fitted Cox proportional hazards models in each imputed dataset with the estimates combined by Rubin’s rules. Multivariable analysis applied penalised Cox regression models, with a fixed penalty term across the imputed datasets. The estimates from each imputed dataset and bootstrap standard errors were combined by Rubin’s rules to define the prognostic model. Model assessment Calibration was assessed by visualising the observed and predicted probabilities across the imputed datasets. Discrimination was assessed by combining the prognostic separation D-statistic from each imputed dataset by Rubin’s rules. Model validation The D-statistic was applied in a bootstrap internal validation process in the training dataset and an external validation process in the test dataset, where acceptable performance was pre-specified. Development of risk groups Risk groups were defined using the tertiles of the combined prognostic index, obtained by combining the prognostic index from each imputed dataset by Rubin’s rules. Results The training dataset included 1852 patients, 1268 (68.47%) with complete case data. Ten imputed datasets were generated. Five hundred twenty patients were included in the test dataset. The D-statistic for the prognostic model was 0.840 (95% CI 0.716–0.964) in the training dataset and 0.654 (95% CI 0.497–0.811) in the test dataset and the corrected D-Statistic was 0.801. Conclusion The decision to impute missing covariate data in the training dataset influenced the methods implemented to train and test the model. To extend current literature and aid future researchers, we have presented a detailed example of one approach. Whilst our example is not without limitations, a benefit is that all of the patient information available in the training dataset was utilised to develop the model. Trial registration Both trials were registered; Myeloma IX-ISRCTN68454111, registered 21 September 2000. Myeloma XI-ISRCTN49407852, registered 24 June 2009.


Author(s):  
Ramin Bostanabad ◽  
Yu-Chin Chan ◽  
Liwei Wang ◽  
Ping Zhu ◽  
Wei Chen

Abstract Our main contribution is to introduce a novel method for Gaussian process (GP) modeling of massive datasets. The key idea is to build an ensemble of independent GPs that use the same hyperparameters but distribute the entire training dataset among themselves. This is motivated by our observation that estimates of the GP hyperparameters change negligibly as the size of the training data exceeds a certain level, which can be found in a systematic way. For inference, the predictions from all GPs in the ensemble are pooled to efficiently exploit the entire training dataset for prediction. We name our modeling approach globally approximate Gaussian process (GAGP), which, unlike most largescale supervised learners such as neural networks and trees, is easy to fit and can interpret the model behavior. These features make it particularly useful in engineering design with big data. We use analytical examples to demonstrate that GAGP achieves very high predictive power that matches or exceeds that of state-of-the-art machine learning methods. We illustrate the application of GAGP in engineering design with a problem on data-driven metamaterials design where it is used to link reduced-dimension geometrical descriptors of unit cells and their properties. Searching for new unit cell designs with desired properties is then accomplished by employing GAGP in inverse optimization.


2020 ◽  
Author(s):  
Bora Shehu ◽  
Malkin Gerchow ◽  
Uwe Haberlandt

<p>The short term forecast of rainfall intensities for fine temporal and spatial resolutions, has always been challenging due to the unpredictable nature of rainfall. Commonly at such scales, radar data are employed to track and extrapolate rainfall storms in the future. For very short lead times, the Lagrange persistence can produce reliable results up to 20 min whilst for longer lead times hybrid models are necessary, in order to account for the birth, death and non-linear transformations of storms that might increase the predictability of rainfall. Recently, data driven techniques, are gaining popularity due to their high learning skills, although their performance is highly dependent on the size of the training dataset and don’t include any physical background. Thus the aim of this study is to investigate the use of data driven techniques in increasing the predictability of rainfall forecast at very fine scales.</p><p>For this purpose, a deep convolutional artificial neural network (CNN) is employed to predict rainfall intensities at 5min and 1km<sup>2</sup> resolutions for the Hannover radar range area at lead times from 5min to 3 hours. The deep CNN is trained for each lead time based on a past window of 15 minutes. The training dataset consist of 93 events (convective, stratiform and mixed events) from the period 2003-2012 and the validating dataset of 17 convective events from the period 2013-2018. The performance is assessed by computing the correlation and the root mean square error of the forecasted fields from the observed radar field, and is compared against the performance of an existing Lagrange-based nowcast method; the Lucas-Kanade optical flow. Special attention is given to the quality of the radar input by using a merged product between radar and gauge data (100 recording stations are used) instead of the raw radar one.  </p><p>The results of this study reveal that the deep CNN is able to learn complex relationship and improve the nowcast for short lead times. However there is a limit that a CNN cannot pass; for those lead times a blending of the radar based nowcast with NWP might be more desirable. Moreover, since most urban models are validated on gauge observations, forecasting on merged data yields more reliable results for urban flood forecasting as the forecast agrees better with the gauge observation.</p>


2009 ◽  
Author(s):  
Thach Nguyen Huy ◽  
Sombut Foitong ◽  
Sornchai Udomthanapong ◽  
Ouen Pinngern ◽  
Sio-Iong Ao ◽  
...  

GigaScience ◽  
2021 ◽  
Vol 10 (11) ◽  
Author(s):  
Dominic Cushnan ◽  
Oscar Bennett ◽  
Rosalind Berka ◽  
Ottavia Bertolli ◽  
Ashwin Chopra ◽  
...  

Abstract Background The National COVID-19 Chest Imaging Database (NCCID) is a centralized database containing mainly chest X-rays and computed tomography scans from patients across the UK. The objective of the initiative is to support a better understanding of the coronavirus SARS-CoV-2 disease (COVID-19) and the development of machine learning technologies that will improve care for patients hospitalized with a severe COVID-19 infection. This article introduces the training dataset, including a snapshot analysis covering the completeness of clinical data, and availability of image data for the various use-cases (diagnosis, prognosis, longitudinal risk). An additional cohort analysis measures how well the NCCID represents the wider COVID-19–affected UK population in terms of geographic, demographic, and temporal coverage. Findings The NCCID offers high-quality DICOM images acquired across a variety of imaging machinery; multiple time points including historical images are available for a subset of patients. This volume and variety make the database well suited to development of diagnostic/prognostic models for COVID-associated respiratory conditions. Historical images and clinical data may aid long-term risk stratification, particularly as availability of comorbidity data increases through linkage to other resources. The cohort analysis revealed good alignment to general UK COVID-19 statistics for some categories, e.g., sex, whilst identifying areas for improvements to data collection methods, particularly geographic coverage. Conclusion The NCCID is a growing resource that provides researchers with a large, high-quality database that can be leveraged both to support the response to the COVID-19 pandemic and as a test bed for building clinically viable medical imaging models.


Sensors ◽  
2021 ◽  
Vol 21 (20) ◽  
pp. 6856
Author(s):  
Muhammad Rafiqul Islam ◽  
Manoranjan Paul

Video analytics and computer vision applications face challenges when using video sequences with low visibility. The visibility of a video sequence is degraded when the sequence is affected by atmospheric interference like rain. Many approaches have been proposed to remove rain streaks from video sequences. Some approaches are based on physical features, and some are based on data-driven (i.e., deep-learning) models. Although the physical features-based approaches have better rain interpretability, the challenges are extracting the appropriate features and fusing them for meaningful rain removal, as the rain streaks and moving objects have dynamic physical characteristics and are difficult to distinguish. Additionally, the outcome of the data-driven models mostly depends on variations relating to the training dataset. It is difficult to include datasets with all possible variations in model training. This paper addresses both issues and proposes a novel hybrid technique where we extract novel physical features and data-driven features and then combine them to create an effective rain-streak removal strategy. The performance of the proposed algorithm has been tested in comparison to several relevant and contemporary methods using benchmark datasets. The experimental result shows that the proposed method outperforms the other methods in terms of subjective, objective, and object detection comparisons for both synthetic and real rain scenarios by removing rain streaks and retaining the moving objects more effectively.


Sign in / Sign up

Export Citation Format

Share Document