Data-driven components in a model of inner-shelf sorted bedforms: a new hybrid model

Abstract. Numerical models rely on the parameterization of processes that often lack a deterministic description. In this contribution we demonstrate the applicability of using machine learning, a class of optimization tools from the discipline of computer science, to develop parameterizations when extensive data sets exist. We develop a new predictor for near-bed suspended sediment reference concentration under unbroken waves using genetic programming, a machine learning technique. We demonstrate that this newly developed parameterization performs as well or better than existing empirical predictors, depending on the chosen error metric. We add this new predictor into an established model for inner-shelf sorted bedforms. Additionally we incorporate a previously reported machine-learning-derived predictor for oscillatory flow ripples into the sorted bedform model. This new "hybrid" sorted bedform model, whereby machine learning components are integrated into a numerical model, demonstrates a method of incorporating observational data (filtered through a machine learning algorithm) directly into a numerical model. Results suggest that the new hybrid model is able to capture dynamics previously absent from the model – specifically, two observed pattern modes of sorted bedforms. Lastly we discuss the challenge of integrating data-driven components into morphodynamic models and the future of hybrid modeling.

Download Full-text

Data driven components in a model of inner shelf sorted bedforms: a new hybrid model

Earth Surface Dynamics Discussions ◽

10.5194/esurfd-1-531-2013 ◽

2013 ◽

Vol 1 (1) ◽

pp. 531-569 ◽

Cited By ~ 2

Author(s):

E. B. Goldstein ◽

G. Coco ◽

A. B. Murray ◽

M. O. Green

Keyword(s):

Machine Learning ◽

Numerical Model ◽

Hybrid Model ◽

Learning Algorithm ◽

Numerical Models ◽

Data Driven ◽

Data Sets ◽

Inner Shelf ◽

Reference Concentration ◽

Deterministic Description

Abstract. Numerical models rely on the parameterization of processes that often lack a deterministic description. In this contribution we demonstrate the applicability of using machine learning, optimization tools from the discipline of computer science, to develop parameterizations when extensive data sets exist. We develop a new predictor for near bed suspended sediment reference concentration under unbroken waves using genetic programming, a machine learning technique. This newly developed parameterization performs better than existing empirical predictors. We add this new predictor into an established model for inner shelf sorted bedforms. Additionally we incorporate a previously reported machine learning derived predictor for oscillatory flow ripples into the sorted bedform model. This new "hybrid" sorted bedform model, whereby machine learning components are integrated into a numerical model, demonstrates a method of incorporating observational data (filtered through a machine learning algorithm) directly into a numerical model. Results suggest that the new hybrid model is able to capture dynamics previously absent from the model, specifically, the two observed pattern modes of sorted bedforms. However, caveats exist when data driven components do not have parity with traditional theoretical components of morphodynamic models, and we discuss the challenges of integrating these disparate pieces and the future of this type of modeling.

Download Full-text

Data-driven parametrizations in numerical models using data assimilation and machine learning.

10.5194/egusphere-egu2020-13794 ◽

2020 ◽

Author(s):

Julien Brajard ◽

Alberto Carrassi ◽

Marc Bocquet ◽

Laurent Bertino

Keyword(s):

Machine Learning ◽

High Resolution ◽

Data Assimilation ◽

Physical Model ◽

Hybrid Model ◽

Numerical Models ◽

Coarse Graining ◽

Model Error ◽

Data Driven ◽

Model Combining

Can we build a machine learning parametrization in a numerical model using sparse and noisy observations?In recent years, machine learning (ML) has been proposed to devise data-driven parametrizations of unresolved processes in dynamical numerical models. In most of the cases, ML is trained by coarse-graining high-resolution simulations to provide a dense, unnoisy target state (or even the tendency of the model).Our goal is to go beyond the use of high-resolution simulations and train ML-based parametrization using direct data. Furthermore, we intentionally place ourselves in the realistic scenario of noisy and sparse observations.The algorithm proposed in this work derives from the algorithm presented by the same authors in https://arxiv.org/abs/2001.01520.The principle is to first apply data assimilation (DA) techniques to estimate the full state of the system from a non-parametrized model, referred hereafter as the physical model. The parametrization term to be estimated is viewed as a model error in the DA system. In a second step, ML is used to define the parametrization, e.g., a predictor of the model error given the state of the system. Finally, the ML system is incorporated within the physical model to produce a hybrid model, combining a physical core with a ML-based parametrization.The approach is applied to dynamical systems from low to intermediate complexity. The DA component of the proposed approach relies on an ensemble Kalman filter/smoother while the parametrization is represented by a convolutional neural network. &#160;We show that the hybrid model yields better performance than the physical model in terms of both short-term (forecast skill) and long-term (power spectrum, Lyapunov exponents) properties. Sensitivity to the noise and density of observation is also assessed.

Download Full-text

Hybrid model-driven and data-driven control method based on machine learning algorithm in energy hub and application

Applied Energy ◽

10.1016/j.apenergy.2021.117913 ◽

2022 ◽

Vol 305 ◽

pp. 117913

Author(s):

Qingsen Cai ◽

XingQi Luo ◽

Peng Wang ◽

Chunyang Gao ◽

Peiyu Zhao

Keyword(s):

Machine Learning ◽

Hybrid Model ◽

Control Method ◽

Learning Algorithm ◽

Data Driven ◽

Machine Learning Algorithm ◽

Model Driven ◽

Energy Hub

Download Full-text

Analysis of various machine learning algorithm and hybrid model for stock market prediction using python

2020 International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE) ◽

10.1109/icstcee49637.2020.9276859 ◽

2020 ◽

Author(s):

Sahil Vazirani ◽

Abhishek Sharma ◽

Pavika Sharma

Keyword(s):

Machine Learning ◽

Stock Market ◽

Hybrid Model ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Stock Market Prediction

Download Full-text

Combining Physics-Based and Data-Driven Models for Estimation of WOB During Ultra-Deep Ocean Drilling

Volume 8: Polar and Arctic Sciences and Technology; Petroleum Technology ◽

10.1115/omae2018-78229 ◽

2018 ◽

Author(s):

Tatsuya Kaneko ◽

Ryota Wada ◽

Masahiko Ozaki ◽

Tomoya Inoue

Keyword(s):

Hybrid Model ◽

Numerical Models ◽

Drill String ◽

Operating Conditions ◽

Data Driven ◽

High Frequency Oscillation ◽

Lumped Mass ◽

Unknown Parameters ◽

Mass Model ◽

Time Operation

Offshore drilling with drill string over 10,000m long has many technical challenges. Among them, the challenge to control the weight on bit (WOB) between a certain range is inevitable for the integrity of drill pipes and the efficiency of the drilling operation. Since WOB cannot be monitored directly during drilling, the tension at the top of the drill string is used as an indicator of the WOB. However, WOB and the surface measured tension are known to show different features. The deviation among the two is due to the dynamic longitudinal behavior of the drill string, which becomes stronger as the drill string gets longer and more elastic. One feature of the difference is related to the occurrence of high-frequency oscillation. We have analyzed the longitudinal behavior of drill string with lumped-mass model and captured the descriptive behavior of such phenomena. However, such physics-based models are not sufficient for real-time operation. There are many unknown parameters that need to be tuned to fit the actual operating conditions. In addition, the huge and complex drilling system will have non-linear behavior, especially near the drilling annulus. These features will only be captured in the data obtained during operation. The proposed hybrid model is a combination of physics-based models and data-driven models. The basic idea is to utilize data-driven techniques to integrate the obtained data during operation into the physics-based model. There are many options on how far we integrate the data-driven techniques to the physics-based model. For example, we have been successful in estimating the WOB from the surface measured tension and the displacement of the drill string top with only recurrent neural networks (RNNs), provided we have enough data of WOB. Lack of WOB measurement cannot be avoided, so the amount of data needs to be increased by utilizing results from physics-based numerical models. The aim of the research is to find a good combination of the two models. In this paper, we will discuss several hybrid model configurations and its performance.

Download Full-text

Performance of a machine-learning algorithm for fully automatic LGE scar quantification in the large multi-national derivate registry

European Heart Journal - Cardiovascular Imaging ◽

10.1093/ehjci/jeab090.023 ◽

2021 ◽

Vol 22 (Supplement_2) ◽

Author(s):

F Ghanbari ◽

T Joyce ◽

S Kozerke ◽

AI Guaricci ◽

PG Masci ◽

...

Keyword(s):

Machine Learning ◽

Test Data ◽

Learning Algorithm ◽

Test Time ◽

Breath Hold ◽

Human Observer ◽

Data Sets ◽

Observer Variability ◽

General Electric ◽

Total N

Abstract Funding Acknowledgements Type of funding sources: Other. Main funding source(s): J. Schwitter receives research support by “ Bayer Schweiz AG “. C.N.C. received grant by Siemens. Gianluca Pontone received institutional fees by General Electric, Bracco, Heartflow, Medtronic, and Bayer. U.J.S received grand by Astellas, Bayer, General Electric. This work was supported by Italian Ministry of Health, Rome, Italy (RC 2017 R659/17-CCM698). This work was supported by Gyrotools, Zurich, Switzerland. Background Late Gadolinium enhancement (LGE) scar quantification is generally recognized as an accurate and reproducible technique, but it is observer-dependent and time consuming. Machine learning (ML) potentially offers to solve this problem. Purpose to develop and validate a ML-algorithm to allow for scar quantification thereby fully avoiding observer variability, and to apply this algorithm to the prospective international multicentre Derivate cohort. Method The Derivate Registry collected heart failure patients with LV ejection fraction <50% in 20 European and US centres. In the post-myocardial infarction patients (n = 689) quality of the LGE short-axis breath-hold images was determined (good, acceptable, sufficient, borderline, poor, excluded) and ground truth (GT) was produced (endo-epicardial contours, 2 remote reference regions, artefact elimination) to determine mass of non-infarcted myocardium and of dense (≥5SD above mean-remote) and non-dense scar (>2SD to <5SD above mean-remote). Data were divided into the learning (total n = 573; training: n = 289; testing: n = 284) and validation set (n = 116). A Ternaus-network (loss function = average of dice and binary-cross-entropy) produced 4 outputs (initial prediction, test time augmentation (TTA), threshold-based prediction (TB), and TTA + TB) representing normal myocardium, non-dense, and dense scar (Figure 1).Outputs were evaluated by dice metrics, Bland-Altman, and correlations. Results In the validation and test data sets, both not used for training, the dense scar GT was 20.8 ± 9.6% and 21.9 ± 13.3% of LV mass, respectively. The TTA-network yielded the best results with small biases vs GT (-2.2 ± 6.1%, p < 0.02; -1.7 ± 6.0%, p < 0.003, respectively) and 95%CI vs GT in the range of inter-human comparisons, i.e. TTA yielded SD of the differences vs GT in the validation and test data of 6.1 and 6.0 percentage points (%p), respectively (Fig 2), which was comparable to the 7.7%p for the inter-observer comparison (n = 40). For non-dense scar, TTA performance was similar with small biases (-1.9 ± 8.6%, p < 0.0005, -1.4 ± 8.2%, p < 0.0001, in the validation and test sets, respectively, GT 39.2 ± 13.8% and 42.1 ± 14.2%) and acceptable 95%CI with SD of the differences of 8.6 and 8.2%p for TTA vs GT, respectively, and 9.3%p for inter-observer. Conclusions In the large Derivate cohort from 20 centres, performance of the presented ML-algorithm to quantify dense and non-dense scar fully automatically is comparable to that of experienced humans with small bias and acceptable 95%-CI. Such a tool could facilitate scar quantification in clinical routine as it eliminates human observer variability and can handle large data sets.

Download Full-text

THE USE OF ARRAY PROCESSORS FOR NUMERICAL MODELLING OF TIDAL ESTUARY DYNAMICS

Coastal Engineering Proceedings ◽

10.9753/icce.v17.142 ◽

1980 ◽

Vol 1 (17) ◽

pp. 142

Author(s):

D. Prandle ◽

E.R. Funke ◽

N.L. Crookshank ◽

R. Renner

Keyword(s):

Numerical Model ◽

Numerical Modelling ◽

Hybrid Model ◽

Numerical Models ◽

Computer Time ◽

Scale Model ◽

Hybrid Modelling ◽

Tidal Propagation ◽

Physical Scale ◽

Array Processors

The use of array processors for the numerical modelling of estuarine systems is discussed here in the context of "hybrid modelling", however, it is shown that array processors may be used to advantage in independent numerical simulations. Hybrid modelling of tidal estuaries was first introduced by fiolz (1977) and later by Funke and Crookshank (1978). In a hybrid model, tidal propagation in an estuary is simulated by dynamically linking an hydraulic (or physical) scale model of part of the estuary to a numerical model of the remaining part in a manner such that a free interchange of flow occurs at the interface(s). Typically, the elevation of the water surface at the boundary of the scale model is measured and transmitted to the numerical model. In return, the flow computed at the boundary of the numerical model is fed directly into the scale model. This approach enables the extent of the scale model to be limited to the area of immediate interest (or to that area where flow conditions are such that they can be most accurately simulated by a scale model). In addition, since the region simulated by the numerical model can be extended almost indefinitely, the problems of spurious reflections from downstream boundaries can be eliminated. In normal use, numerical models are evaluated on the basis of computing requirements, cost and accuracy. The computer time required to simulate one tide cycle is, in itself, seldom of interest except in so far as it affects the above criteria. However in hybrid modelling this parameter is often paramount since concurrent operation of the numerical and scale models requires that the former must keep pace with the latter. The earlier hybrid model of the St. Lawrence (Funke and Crookshank, 1978) involved a one-dimensional numerical model of the upstream regions of the river. However, future applications are likely to involve extensive two-dimensional numerical simulation.

Download Full-text

Data Driven Model Identification for a Chaotic Pendulum With Variable Interaction Potential

Volume 2: 16th International Conference on Multibody Systems, Nonlinear Dynamics, and Control (MSNDC) ◽

10.1115/detc2020-22597 ◽

2020 ◽

Author(s):

Melih C. Yesilli ◽

Firas A. Khasawneh

Keyword(s):

Numerical Models ◽

Model Identification ◽

Potential Interaction ◽

Data Driven ◽

Estimation Methods ◽

Total Variation Regularization ◽

Data Sets ◽

Derivative Estimation ◽

Long Time ◽

And Performance

Abstract Data driven model identification methods have grown increasingly popular due to enhancements in measuring devices and data mining. They provide a useful approach for comparing the performance of a device to the simplified model that was used in the design phase. One of the modern, popular methods for model identification is Sparse Identification of Nonlinear Dynamics (SINDy). Although this approach has been widely investigated in the literature using mostly numerical models, its applicability and performance with physical systems is still a topic of current research. In this paper we extend SINDy to identify the mathematical model of a complicated physical experiment of a chaotic pendulum with a varying potential interaction. We also test the approach using a simulated model of a nonlinear, simple pendulum. The input to the approach is a time series, and estimates of its derivatives. While the standard approach in SINDy is to use the Total Variation Regularization (TVR) for derivative estimates, we show some caveats for using this route, and we benchmark the performance of TVR against other methods for derivative estimation. Our results show that the estimated model coefficients and their resulting fit are sensitive to the selection of the TVR parameters, and that most of the available derivative estimation methods are easier to tune than TVR. We also highlight other guidelines for utilizing SINDy to avoid overfitting, and we point out that the fitted model may not yield accurate results over long time scales. We test the performance of each method for noisy data sets and provide both experimental and simulation results. We also post the files needed to build and reproduce our experiment in a public repository.

Download Full-text

Text Classification and Tagging of United States Army Ground Vehicle Fault Descriptions in Support of Data-Driven Prognostics

Annual Conference of the PHM Society ◽

10.36001/phmconf.2020.v12i1.1154 ◽

2020 ◽

Vol 12 (1) ◽

pp. 8

Author(s):

Brandon Hansen ◽

Cody Coleman ◽

Yi Zhang ◽

Maria Seale

Keyword(s):

Machine Learning ◽

Natural Language ◽

A Priori ◽

Supervised Machine Learning ◽

Data Driven ◽

A Priori Knowledge ◽

Data Sets ◽

Ground Vehicle ◽

Us Army ◽

Priori Knowledge

The manner in which a prognostics problem is framed is critical for enabling its solution by the proper method. Recently, data-driven prognostics techniques have demonstrated enormous potential when used alone, or as part of a hybrid solution in conjunction with physics-based models. Historical maintenance data constitutes a critical element for the use of a data-driven approach to prognostics, such as supervised machine learning. The historical data is used to create training and testing data sets to develop the machine learning model. Categorical classes for prediction are required for machine learning methods; however, faults of interest in US Army Ground Vehicle Maintenance Records appear as natural language text descriptions rather than a finite set of discrete labels. Transforming linguistically complex data into a set of prognostics classes is necessary for utilizing supervised machine learning approaches for prognostics. Manually labeling fault description instances is effective, but extremely time-consuming; thus, an automated approach to labelling is preferred. The approach described in this paper examines key aspects of the fault text relevant to enabling automatic labeling. A method was developed based on the hypothesis that a given fault description could be generalized into a category. This method uses various natural language processing (NLP) techniques and a priori knowledge of ground vehicle faults to assign classes to the maintenance fault descriptions. The core component of the method used in this paper is a Word2Vec word-embedding model. Word embeddings are used in conjunction with a token-oriented rule-based data structure for document classification. This methodology tags text with user-provided classes using a corpus of similar text fields as its training set. With classes of faults reliably assigned to a given description, supervised machine learning with these classes can be applied using related maintenance information that preceded the fault. This method was developed for labeling US Army Ground Vehicle Maintenance Records, but is general enough to be applied to any natural language data sets accompanied with a priori knowledge of its contents for consistent labeling. In addition to applications in machine learning, generated labels are also conducive to general summarization and case-by-case analysis of faults. The maintenance components of interest for this current application are alternators and gaskets, with future development directed towards determining the RUL of these components based on the labeled data.

Download Full-text

An Analytical Model for Prediction of Heart Disease using Machine Learning Classifiers

10.36227/techrxiv.14867175 ◽

2021 ◽

Author(s):

Diti Roy ◽

Md. Ashiq Mahmood ◽

Tamal Joyti Roy

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Random Forest ◽

Learning Algorithm ◽

Modern Technology ◽

Learning Approach ◽

Data Sets ◽

Machine Learning Classifiers ◽

Machine Learning Approach ◽

Day By Day

Heart Disease is the most dominating disease which is taking a large number of deaths every year. A report from WHO in 2016 portrayed that every year at least 17 million people die of heart disease. This number is gradually increasing day by day and WHO estimated that this death toll will reach the summit of 75 million by 2030. Despite having modern technology and health care system predicting heart disease is still beyond limitations. As the Machine Learning algorithm is a vital source predicting data from available data sets we have used a machine learning approach to predict heart disease. We have collected data from the UCI repository. In our study, we have used Random Forest, Zero R, Voted Perceptron, K star classifier. We have got the best result through the Random Forest classifier with an accuracy of 97.69.

Download Full-text