Operationalizing Heterogeneous Data-Driven Process Models for Various Industrial Sectors through Microservice-Oriented Cloud-Based Architecture

Industrial performance optimization increasingly makes the use of various analytical data-driven models. In this context, modern machine learning capabilities to predict future production quality outcomes, model predictive control to better account for complex multivariable environments of process industry, Bayesian Networks enabling improved decision support systems for diagnostics and fault detection are some of the main examples to be named. The key challenge is to integrate these highly heterogeneous models in a holistic system, which would also be suitable for applications from the most different industries. Core elements of the underlying solution architecture constitute highly decoupled model microservices, ensuring the creation of largely customizable model runtime environments. Deployment of isolated user-space instances, called containers, further extends the overall possibilities to integrate heterogeneous models. Strong requirements on high availability, scalability, and security are satisfied through the application of cloud-based services. Tieto successfully applied the outlined approach during the participation in FUture DIrections for Process industry Optimization (FUDIPO), a project funded by the European Commission under the H2020 program, SPIRE-02-2016.

Download Full-text

Data-driven versus self-similar parameterizations for Stochastic Advection by Lie Transport and Location Uncertainty

10.5194/npg-2019-54 ◽

2019 ◽

Cited By ~ 2

Author(s):

Valentin Resseguier ◽

Wei Pan ◽

Baylor Fox-Kemper

Keyword(s):

Fluid Dynamics ◽

Heterogeneous Data ◽

Data Driven ◽

Accurate Data ◽

High Quality ◽

Ensemble Forecasts ◽

Homogeneous Surface ◽

Location Uncertainty ◽

Physically Based ◽

Self Similar

Abstract. Stochastic subgrid parameterizations enable ensemble forecasts of fluid dynamics systems and ultimately accurate data assimilation. Stochastic Advection by Lie Transport (SALT) and models under Location Uncertainty (LU) are recent and similar physically-based stochastic schemes. SALT dynamics conserve helicity whereas LU models conserve kinetic energy. After highlighting general similarities between LU and SALT frameworks, this paper focuses on their common challenge: the parameterization choice. We compare uncertainty quantification skills of a stationary heterogeneous data-driven parameterization and a non-stationary homogeneous self-similar parameterization. For stationary, homogeneous Surface Quasi-Geostrophic (SQG) turbulence, both parameterizations lead to high quality ensemble forecasts. This paper also discusses a heterogeneous adaptation of the homogeneous parameterization targeted at better simulation of strong straight buoyancy fronts.

Download Full-text

Knowledge Discovery Process Models

Advances in Business Information Systems and Analytics - Business Intelligence and Agile Methodologies for Knowledge-Based Organizations ◽

10.4018/978-1-61350-050-7.ch004 ◽

2012 ◽

pp. 72-100 ◽

Cited By ~ 5

Author(s):

Mouhib Alnoukari ◽

Asim El Sheikh

Keyword(s):

Life Cycle ◽

Knowledge Discovery ◽

Process Model ◽

Common Factor ◽

Final Outcome ◽

Process Models ◽

Data Driven ◽

Discovery Process ◽

The Common ◽

Discovery Process Models

Knowledge Discovery (KD) process model was first discussed in 1989. Different models were suggested starting with Fayyad’s et al (1996) process model. The common factor of all data-driven discovery process is that knowledge is the final outcome of this process. In this chapter, the authors will analyze most of the KD process models suggested in the literature. The chapter will have a detailed discussion on the KD process models that have innovative life cycle steps. It will propose a categorization of the existing KD models. The chapter deeply analyzes the strengths and weaknesses of the leading KD process models, with the supported commercial systems and reported applications, and their matrix characteristics.

Download Full-text

Core Methodologies in Data Warehouse Design and Development

International Journal of Robotics Applications and Technologies ◽

10.4018/ijrat.2013010104 ◽

2013 ◽

Vol 1 (1) ◽

pp. 57-66

Author(s):

James Yao ◽

John Wang ◽

Qiyang Chen ◽

Ruben Xing

Keyword(s):

Data Warehouse ◽

System Development ◽

Hybrid Approach ◽

Heterogeneous Data ◽

Data Driven ◽

Decision Making Process ◽

Process Data ◽

Design And Development ◽

Heterogeneous Data Sources ◽

Warehouse Design

Data warehouse is a system which can integrate heterogeneous data sources to support the decision making process. Data warehouse design is a lengthy, time-consuming, and costly process. There has been a high failure in data warehouse development projects. Thus how to design and develop a data warehouse have become important issues for information systems designers and developers. This paper reviews and discusses some of the core data warehouse design and development methodologies in information system development. The paper presents in particular the most recent and much heated hybrid approach which is a combination of data-driven and requirement-driven approaches.

Download Full-text

Sepsis in the era of data-driven medicine: personalizing risks, diagnoses, treatments and prognoses

Briefings in Bioinformatics ◽

10.1093/bib/bbz059 ◽

2019 ◽

Vol 21 (4) ◽

pp. 1182-1195

Author(s):

Andrew C Liu ◽

Krishna Patel ◽

Ramya Dhatri Vunikili ◽

Kipp W Johnson ◽

Fahad Abdu ◽

...

Keyword(s):

At Risk ◽

Data Science ◽

Drug Repositioning ◽

Healthcare Delivery ◽

Heterogeneous Data ◽

Machine Intelligence ◽

Adverse Outcomes ◽

Data Driven ◽

Biomedical Data ◽

Prolonged Stay

Abstract Sepsis is a series of clinical syndromes caused by the immunological response to infection. The clinical evidence for sepsis could typically attribute to bacterial infection or bacterial endotoxins, but infections due to viruses, fungi or parasites could also lead to sepsis. Regardless of the etiology, rapid clinical deterioration, prolonged stay in intensive care units and high risk for mortality correlate with the incidence of sepsis. Despite its prevalence and morbidity, improvement in sepsis outcomes has remained limited. In this comprehensive review, we summarize the current landscape of risk estimation, diagnosis, treatment and prognosis strategies in the setting of sepsis and discuss future challenges. We argue that the advent of modern technologies such as in-depth molecular profiling, biomedical big data and machine intelligence methods will augment the treatment and prevention of sepsis. The volume, variety, veracity and velocity of heterogeneous data generated as part of healthcare delivery and recent advances in biotechnology-driven therapeutics and companion diagnostics may provide a new wave of approaches to identify the most at-risk sepsis patients and reduce the symptom burden in patients within shorter turnaround times. Developing novel therapies by leveraging modern drug discovery strategies including computational drug repositioning, cell and gene-therapy, clustered regularly interspaced short palindromic repeats -based genetic editing systems, immunotherapy, microbiome restoration, nanomaterial-based therapy and phage therapy may help to develop treatments to target sepsis. We also provide empirical evidence for potential new sepsis targets including FER and STARD3NL. Implementing data-driven methods that use real-time collection and analysis of clinical variables to trace, track and treat sepsis-related adverse outcomes will be key. Understanding the root and route of sepsis and its comorbid conditions that complicate treatment outcomes and lead to organ dysfunction may help to facilitate identification of most at-risk patients and prevent further deterioration. To conclude, leveraging the advances in precision medicine, biomedical data science and translational bioinformatics approaches may help to develop better strategies to diagnose and treat sepsis in the next decade.

Download Full-text

Rapid Multi-Dimensional Impact Assessment of Floods

Sustainability ◽

10.3390/su12104246 ◽

2020 ◽

Vol 12 (10) ◽

pp. 4246 ◽

Cited By ~ 1

Author(s):

David Pastor-Escuredo ◽

Yolanda Torres ◽

María Martínez-Torres ◽

Pedro J. Zufiria

Keyword(s):

Social Media ◽

Impact Assessment ◽

Rural Areas ◽

Disaster Response ◽

Heterogeneous Data ◽

Data Driven ◽

Sensor Data ◽

Mobile Phone Data ◽

Aggregated Data ◽

The Impact

Natural disasters affect hundreds of millions of people worldwide every year. The impact assessment of a disaster is key to improve the response and mitigate how a natural hazard turns into a social disaster. An actionable quantification of impact must be integratively multi-dimensional. We propose a rapid impact assessment framework that comprises detailed geographical and temporal landmarks as well as the potential socio-economic magnitude of the disaster based on heterogeneous data sources: Environment sensor data, social media, remote sensing, digital topography, and mobile phone data. As dynamics of floods greatly vary depending on their causes, the framework may support different phases of decision-making during the disaster management cycle. To evaluate its usability and scope, we explored four flooding cases with variable conditions. The results show that social media proxies provide a robust identification with daily granularity even when rainfall detectors fail. The detection also provides information of the magnitude of the flood, which is potentially useful for planning. Network analysis was applied to the social media to extract patterns of social effects after the flood. This analysis showed significant variability in the obtained proxies, which encourages the scaling of schemes to comparatively characterize patterns across many floods with different contexts and cultural factors. This framework is presented as a module of a larger data-driven system designed to be the basis for responsive and more resilient systems in urban and rural areas. The impact-driven approach presented may facilitate public–private collaboration and data sharing by providing real-time evidence with aggregated data to support the requests of private data with higher granularity, which is the current most important limitation in implementing fully data-driven systems for disaster response from both local and international actors.

Download Full-text

Semantic Knowledge Discovery and Data-Driven Logical Reasoning from Heterogeneous Data Sources

Uncertainty Reasoning for the Semantic Web III - Lecture Notes in Computer Science ◽

10.1007/978-3-319-13413-0_9 ◽

2014 ◽

pp. 163-183 ◽

Cited By ~ 1

Author(s):

Claudia d’Amato ◽

Volha Bryl ◽

Luciano Serafini

Keyword(s):

Knowledge Discovery ◽

Heterogeneous Data ◽

Semantic Knowledge ◽

Logical Reasoning ◽

Data Sources ◽

Data Driven ◽

Heterogeneous Data Sources

Download Full-text

Data-driven, PCFG-based and Pseudo-PCFG-based Models for Chinese Dependency Parsing

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00229 ◽

2013 ◽

Vol 1 ◽

pp. 301-314 ◽

Cited By ~ 2

Author(s):

Weiwei Sun ◽

Xiaojun Wan

Keyword(s):

Comparative Study ◽

State Of The Art ◽

Data Driven ◽

Dependency Parsing ◽

Transition Graph ◽

Final Model ◽

System Combination ◽

Pos Tagging ◽

Heterogeneous Models ◽

The Impact

We present a comparative study of transition-, graph- and PCFG-based models aimed at illuminating more precisely the likely contribution of CFGs in improving Chinese dependency parsing accuracy, especially by combining heterogeneous models. Inspired by the impact of a constituency grammar on dependency parsing, we propose several strategies to acquire pseudo CFGs only from dependency annotations. Compared to linguistic grammars learned from rich phrase-structure treebanks, well designed pseudo grammars achieve similar parsing accuracy and have equivalent contributions to parser ensemble. Moreover, pseudo grammars increase the diversity of base models; therefore, together with all other models, further improve system combination. Based on automatic POS tagging, our final model achieves a UAS of 87.23%, resulting in a significant improvement of the state of the art.

Download Full-text

New Metrics for Validation of Data-Driven Random Process Models in Uncertainty Quantification

Journal of Verification Validation and Uncertainty Quantification ◽

10.1115/1.4031813 ◽

2015 ◽

Vol 1 (2) ◽

Cited By ~ 4

Author(s):

Hongyi Xu ◽

Zhen Jiang ◽

Daniel W. Apley ◽

Wei Chen

Keyword(s):

Uncertainty Quantification ◽

Random Process ◽

Process Model ◽

Goodness Of Fit ◽

Experimental Tests ◽

Process Models ◽

Data Driven ◽

Gaussian Copula ◽

High Dimensional ◽

Marginal Distributions

Data-driven random process models have become increasingly important for uncertainty quantification (UQ) in science and engineering applications, due to their merit of capturing both the marginal distributions and the correlations of high-dimensional responses. However, the choice of a random process model is neither unique nor straightforward. To quantitatively validate the accuracy of random process UQ models, new metrics are needed to measure their capability in capturing the statistical information of high-dimensional data collected from simulations or experimental tests. In this work, two goodness-of-fit (GOF) metrics, namely, a statistical moment-based metric (SMM) and an M-margin U-pooling metric (MUPM), are proposed for comparing different stochastic models, taking into account their capabilities of capturing the marginal distributions and the correlations in spatial/temporal domains. This work demonstrates the effectiveness of the two proposed metrics by comparing the accuracies of four random process models (Gaussian process (GP), Gaussian copula, Hermite polynomial chaos expansion (PCE), and Karhunen–Loeve (K–L) expansion) in multiple numerical examples and an engineering example of stochastic analysis of microstructural materials properties. In addition to the new metrics, this paper provides insights into the pros and cons of various data-driven random process models in UQ.

Download Full-text

Data-Driven Adaptive Observer for Fault Diagnosis

Mathematical Problems in Engineering ◽

10.1155/2012/832836 ◽

2012 ◽

Vol 2012 ◽

pp. 1-21 ◽

Cited By ~ 90

Author(s):

Shen Yin ◽

Xuebo Yang ◽

Hamid Reza Karimi

Keyword(s):

Fault Diagnosis ◽

Process Model ◽

Large Scale ◽

Process Models ◽

Physical Models ◽

Adaptive Observer ◽

Data Driven ◽

Process Data ◽

Tank System ◽

Residual Generator

This paper presents an approach for data-driven design of fault diagnosis system. The proposed fault diagnosis scheme consists of an adaptive residual generator and a bank of isolation observers, whose parameters are directly identified from the process data without identification of complete process model. To deal with normal variations in the process, the parameters of residual generator are online updated by standard adaptive technique to achieve reliable fault detection performance. After a fault is successfully detected, the isolation scheme will be activated, in which each isolation observer serves as an indicator corresponding to occurrence of a particular type of fault in the process. The thresholds can be determined analytically or through estimating the probability density function of related variables. To illustrate the performance of proposed fault diagnosis approach, a laboratory-scale three-tank system is finally utilized. It shows that the proposed data-driven scheme is efficient to deal with applications, whose analytical process models are unavailable. Especially, for the large-scale plants, whose physical models are generally difficult to be established, the proposed approach may offer an effective alternative solution for process monitoring.

Download Full-text

A Novel HDF-Based Data Compression and Integration Approach to Support BIM-GIS Practical Applications

Advances in Civil Engineering ◽

10.1155/2020/8865107 ◽

2020 ◽

Vol 2020 ◽

pp. 1-22

Author(s):

Zeyu Pan ◽

Jianyong Shi ◽

Liu Jiang

Keyword(s):

High Efficiency ◽

Analytical Data ◽

Heterogeneous Data ◽

Information Modeling ◽

World City ◽

Integration Model ◽

Practical Applications ◽

Digital Twin ◽

Gis Integration ◽

Modeling Data

Constructing a fully mapped virtual world is the premise of establishing the digital twin system; thus, the integration of BIM (Building Information Modeling) and GIS (Geographic Information System) constitutes a fundamental part of the mature digital twin. Although quite a few theoretical unified data models of BIM and GIS integration have been presented in precedent research, the practical web-based applications and software support for real-world city information modeling are still difficult to achieve. One of the challenges is storage inefficiency. It thence leads to burdens in exchange and analysis, which hugely impede the realization of virtual-real fusion. To address that issue, in this research, we contribute to exploring an HDF- (Hierarchical Data Format-) based, innovative, and synthesized scheme with three significant technical processes. First, data reorganization trims the original data with efficient redundancy elimination and reordering for IFC-SPF (IFC STEP Physical File) and XML-encoded CityGML (City Geography Markup Language). Next, bidirectional transformation methods for BIM and GIS modeling data, images, and analytical data into HDF are proposed. They retain the entities and relationships in modeling data and can further improve the compactness and accessibility by ultimately 60%–80% which saves 497,612 KB from 565,152 KB in the test of ZhongChun transformation substation. Finally, data aggregation enhances the bond of the integrated model and heterogeneous data resources from the transformed HDF files. The case studies show that the approach proposed in this paper could reach high efficiency for practicability of the BIM + GIS integration model. This light-weight integration method can further improve the front-end service responsiveness in digital twin applications.

Download Full-text