Patient Cohort Identification on Time Series Data Using the OMOP Common Data Model

Abstract Background The identification of patient cohorts for recruiting patients into clinical trials requires an evaluation of study-specific inclusion and exclusion criteria. These criteria are specified depending on corresponding clinical facts. Some of these facts may not be present in the clinical source systems and need to be calculated either in advance or at cohort query runtime (so-called feasibility query). Objectives We use the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) as the repository for our clinical data. However, Atlas, the graphical user interface of OMOP, does not offer the functionality to perform calculations on facts data. Therefore, we were in search for a different approach. The objective of this study is to investigate whether the Arden Syntax can be used for feasibility queries on the OMOP CDM to enable on-the-fly calculations at query runtime, to eliminate the need to precalculate data elements that are involved with researchers' criteria specification. Methods We implemented a service that reads the facts from the OMOP repository and provides it in a form which an Arden Syntax Medical Logic Module (MLM) can process. Then, we implemented an MLM that applies the eligibility criteria to every patient data set and outputs the list of eligible cases (i.e., performs the feasibility query). Results The study resulted in an MLM-based feasibility query that identifies cases of overventilation as an example of how an on-the-fly calculation can be realized. The algorithm is split into two MLMs to provide the reusability of the approach. Conclusion We found that MLMs are a suitable technology for feasibility queries on the OMOP CDM. Our method of performing on-the-fly calculations can be employed with any OMOP instance and without touching existing infrastructure like the Extract, Transform and Load pipeline. Therefore, we think that it is a well-suited method to perform on-the-fly calculations on OMOP.

Download Full-text

From OpenEHR to FHIR and OMOP Data Model for Microbiology Findings

Studies in Health Technology and Informatics - Public Health and Informatics ◽

10.3233/shti210189 ◽

2021 ◽

Author(s):

Eugenia Rinaldi ◽

Sylvia Thun

Keyword(s):

Infection Control ◽

Data Model ◽

Data Exchange ◽

Control Group ◽

Common Data Model ◽

University Hospitals ◽

Data Set ◽

Related Data ◽

National Initiative ◽

Data Elements

HiGHmed is a German Consortium where eight University Hospitals have agreed to the cross-institutional data exchange through novel medical informatics solutions. The HiGHmed Use Case Infection Control group has modelled a set of infection-related data in the openEHR format. In order to establish interoperability with the other German Consortia belonging to the same national initiative, we mapped the openEHR information to the Fast Healthcare Interoperability Resources (FHIR) format recommended within the initiative. FHIR enables fast exchange of data thanks to the discrete and independent data elements into which information is organized. Furthermore, to explore the possibility of maximizing analysis capabilities for our data set, we subsequently mapped the FHIR elements to the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM). The OMOP data model is designed to support the conduct of research to identify and evaluate associations between interventions and outcomes caused by these interventions. Mapping across standard allows to exploit their peculiarities while establishing and/or maintaining interoperability. This article provides an overview of our experience in mapping infection control related data across three different standards openEHR, FHIR and OMOP CDM.

Download Full-text

Deep Learning Approach to Parse Eligibility Criteria in Dietary Supplements Clinical Trials Following OMOP Common Data Model

10.1101/2020.09.16.20196022 ◽

2020 ◽

Author(s):

Anusha Bompelli ◽

Jianfu Li ◽

Yiqi Xu ◽

Nan Wang ◽

Yanshan Wang ◽

...

Keyword(s):

Clinical Trials ◽

Deep Learning ◽

Dietary Supplements ◽

Data Model ◽

Common Data Model ◽

Recruitment Process ◽

Eligibility Criteria ◽

Cohort Identification ◽

Timely Fashion ◽

Deep Learning Model

Dietary supplements (DSs) have been widely used in the U.S. and evaluated in clinical trials as potential interventions for various diseases. However, many clinical trials face challenges in recruiting enough eligible patients in a timely fashion, causing delays or even early termination. Using electronic health records to find eligible patients who meet clinical trial eligibility criteria has been shown as a promising way to assess recruitment feasibility and accelerate the recruitment process. In this study, we analyzed the eligibility criteria of 100 randomly selected DS clinical trials and identified both computable and non-computable criteria. We mapped annotated entities to OMOP Common Data Model (CDM) with novel entities (e.g., DS). We also evaluated a deep learning model (Bi-LSTM-CRF) for extracting these entities on CLAMP platform, with an average F1 measure of 0.601. This study shows the feasibility of automatic parsing of the eligibility criteria following OMOP CDM for future cohort identification.

Download Full-text

Some statistical and CI models to predict chaotic high-frequency financial data

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189107 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6419-6430

Author(s):

Dusan Marcek

Keyword(s):

Time Series Data ◽

Moving Average ◽

Methodological Approach ◽

Back Propagation ◽

Large Data ◽

Series Data ◽

Data Set ◽

Training Time ◽

Optimal Population ◽

Forecast Time

To forecast time series data, two methodological frameworks of statistical and computational intelligence modelling are considered. The statistical methodological approach is based on the theory of invertible ARIMA (Auto-Regressive Integrated Moving Average) models with Maximum Likelihood (ML) estimating method. As a competitive tool to statistical forecasting models, we use the popular classic neural network (NN) of perceptron type. To train NN, the Back-Propagation (BP) algorithm and heuristics like genetic and micro-genetic algorithm (GA and MGA) are implemented on the large data set. A comparative analysis of selected learning methods is performed and evaluated. From performed experiments we find that the optimal population size will likely be 20 with the lowest training time from all NN trained by the evolutionary algorithms, while the prediction accuracy level is lesser, but still acceptable by managers.

Download Full-text

Does Tax Increment Financing Pass the “But-for” Test in Missouri?

Economic Development Quarterly ◽

10.1177/0891242419859097 ◽

2019 ◽

Vol 33 (3) ◽

pp. 187-202

Author(s):

Ahmed Rachid El-Khattabi ◽

T. William Lester

Keyword(s):

Economic Development ◽

Kansas City ◽

Time Series Data ◽

Local Level ◽

Conclusive Evidence ◽

Series Data ◽

Tax Increment Financing ◽

Data Set ◽

Development Indicators ◽

The Impact

The use of tax increment financing (TIF) remains a popular, yet highly controversial, tool among policy makers in their efforts to promote economic development. This study conducts a comprehensive assessment of the effectiveness of Missouri’s TIF program, specifically in Kansas City and St. Louis, in creating economic opportunities. We build a time-series data set starting 1990 through 2012 of detailed employment levels, establishment counts, and sales at the census block-group level to run a set of difference-in-differences with matching estimates for the impact of TIF at the local level. Although we analyze the impact of TIF on a wide set of indicators and across various industry sectors, we find no conclusive evidence that the TIF program in either city has a causal impact on key economic development indicators.

Download Full-text

Remaining Useful Life Prediction Using Temporal Convolution with Attention

AI ◽

10.3390/ai2010005 ◽

2021 ◽

Vol 2 (1) ◽

pp. 48-70

Author(s):

Wei Ming Tan ◽

T. Hui Teo

Keyword(s):

Neural Network ◽

Time Series ◽

Time Series Data ◽

Remaining Useful Life ◽

Sensor Data ◽

Series Data ◽

Multiple Time ◽

Data Set ◽

Form Complex ◽

Useful Life

Prognostic techniques attempt to predict the Remaining Useful Life (RUL) of a subsystem or a component. Such techniques often use sensor data which are periodically measured and recorded into a time series data set. Such multivariate data sets form complex and non-linear inter-dependencies through recorded time steps and between sensors. Many current existing algorithms for prognostic purposes starts to explore Deep Neural Network (DNN) and its effectiveness in the field. Although Deep Learning (DL) techniques outperform the traditional prognostic algorithms, the networks are generally complex to deploy or train. This paper proposes a Multi-variable Time Series (MTS) focused approach to prognostics that implements a lightweight Convolutional Neural Network (CNN) with attention mechanism. The convolution filters work to extract the abstract temporal patterns from the multiple time series, while the attention mechanisms review the information across the time axis and select the relevant information. The results suggest that the proposed method not only produces a superior accuracy of RUL estimation but it also trains many folds faster than the reported works. The superiority of deploying the network is also demonstrated on a lightweight hardware platform by not just being much compact, but also more efficient for the resource restricted environment.

Download Full-text

Studying monthly rainfall over Dibrugarh, Assam: Use of SARIMA approach

MAUSAM ◽

10.54302/mausam.v68i2.637 ◽

2021 ◽

Vol 68 (2) ◽

pp. 349-356

Author(s):

J. HAZARIKA ◽

B. PATHAK ◽

A. N. PATOWARY

Keyword(s):

Time Series ◽

Time Series Data ◽

Moving Average ◽

Demand Management ◽

Arima Model ◽

Monthly Rainfall ◽

Series Data ◽

Data Set ◽

Modeling And Forecasting ◽

Moving Average Model

Perceptive the rainfall pattern is tough for the solution of several regional environmental issues of water resources management, with implications for agriculture, climate change, and natural calamity such as floods and droughts. Statistical computing, modeling and forecasting data are key instruments for studying these patterns. The study of time series analysis and forecasting has become a major tool in different applications in hydrology and environmental fields. Among the most effective approaches for analyzing time series data is the ARIMA (Autoregressive Integrated Moving Average) model introduced by Box and Jenkins. In this study, an attempt has been made to use Box-Jenkins methodology to build ARIMA model for monthly rainfall data taken from Dibrugarh for the period of 1980- 2014 with a total of 420 points. We investigated and found that ARIMA (0, 0, 0) (0, 1, 1)12 model is suitable for the given data set. As such this model can be used to forecast the pattern of monthly rainfall for the upcoming years, which can help the decision makers to establish priorities in terms of agricultural, flood, water demand management etc.

Download Full-text

Economic growth, Industrialization, Trade, Electricity production and Carbon dioxide emissions: Evidence from Ghana

Journal of Economic Science Research ◽

10.30564/jesr.v4i1.2716 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Kingsley Appiah ◽

Rhoda Appah ◽

Oware Kofi Mintah ◽

Benjamin Yeboah

Keyword(s):

Economic Growth ◽

Carbon Dioxide ◽

Carbon Dioxide Emissions ◽

Time Series Data ◽

Electricity Production ◽

Series Data ◽

Data Set ◽

Development Indicators ◽

Explanatory Variables ◽

Distributed Lag

Abstract: The study scrutinized correlation between electricity production, trade, economic growth, industrialization and carbon dioxide emissions in Ghana. Our study disaggregated trade into export and import to spell out distinctive and individual variable contribution to emissions in Ghana. In an attempt to investigate, the study used time-series data set of World Development Indicators from 1971 to 2014. By means of Autoregressive Distributed Lag (ARDL) cointegrating technique, study established that variables are co-integrated and have long-run equilibrium relationship. Results of long-term effect of explanatory variables on carbon dioxide emissions indicated that 1% each increase of economic growth and industrialization, will cause an increase of emissions by 16.9% and 79% individually whiles each increase of 1% of electricity production, trade exports, trade imports, will cause a decrease in carbon dioxide emissions by 80.3%, 27.7% and 4.1% correspondingly. In the pursuit of carbon emissions' mitigation and achievement of Sustainable Development Goal (SDG) 13, Ghana need to increase electricity production and trade exports.

Download Full-text

Mining the Relationships in the form of the Predisposing Factors and Co-Incident Factors among Numerical Dynamic Attributes in Time Series Data Set by Using the Combination of Some Existing Techniques

Enterprise Information Systems VI ◽

10.1007/1-4020-3675-2_16 ◽

2006 ◽

pp. 135-142

Author(s):

Suwimon Kooptiwoot ◽

M. Abdus Salam

Keyword(s):

Time Series ◽

Time Series Data ◽

Predisposing Factors ◽

Series Data ◽

Data Set ◽

Dynamic Attributes

Download Full-text

Exploratory Time Series Data Mining by Genetic Clustering

Mathematical Methods for Knowledge Discovery and Data Mining ◽

10.4018/978-1-59904-528-3.ch010 ◽

2011 ◽

pp. 157-178

Author(s):

T. Warren Liao

Keyword(s):

Data Mining ◽

Time Series ◽

Time Series Data ◽

Distance Measures ◽

Series Data ◽

Synthetic Control ◽

Data Set ◽

Univariate Time Series ◽

Genetic Clustering ◽

Data Objects

In this chapter, we present genetic algorithm (GA) based methods developed for clustering univariate time series with equal or unequal length as an exploratory step of data mining. These methods basically implement the k-medoids algorithm. Each chromosome encodes in binary the data objects serving as the k-medoids. To compare their performance, both fixed-parameter and adaptive GAs were used. We first employed the synthetic control chart data set to investigate the performance of three fitness functions, two distance measures, and other GA parameters such as population size, crossover rate, and mutation rate. Two more sets of time series with or without known number of clusters were also experimented: one is the cylinder-bell-funnel data and the other is the novel battle simulation data. The clustering results are presented and discussed.

Download Full-text