AstroCatR: a mechanism and tool for efficient time series reconstruction of large-scale astronomical catalogues

Ce Yu; Kun Li; Shanjiang Tang; Chao Sun; Bin Ma; Qing Zhao

doi:10.1093/mnras/staa1413

AstroCatR: a mechanism and tool for efficient time series reconstruction of large-scale astronomical catalogues

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa1413 ◽

2020 ◽

Vol 496 (1) ◽

pp. 629-637

Author(s):

Ce Yu ◽

Kun Li ◽

Shanjiang Tang ◽

Chao Sun ◽

Bin Ma ◽

...

Keyword(s):

Time Series ◽

High Performance ◽

Large Scale ◽

Extrasolar Planets ◽

Time Series Data ◽

Series Data ◽

Data Sets ◽

Observation Data ◽

Data Volume ◽

And Performance

ABSTRACT Time series data of celestial objects are commonly used to study valuable and unexpected objects such as extrasolar planets and supernova in time domain astronomy. Due to the rapid growth of data volume, traditional manual methods are becoming extremely hard and infeasible for continuously analysing accumulated observation data. To meet such demands, we designed and implemented a special tool named AstroCatR that can efficiently and flexibly reconstruct time series data from large-scale astronomical catalogues. AstroCatR can load original catalogue data from Flexible Image Transport System (FITS) files or data bases, match each item to determine which object it belongs to, and finally produce time series data sets. To support the high-performance parallel processing of large-scale data sets, AstroCatR uses the extract-transform-load (ETL) pre-processing module to create sky zone files and balance the workload. The matching module uses the overlapped indexing method and an in-memory reference table to improve accuracy and performance. The output of AstroCatR can be stored in CSV files or be transformed other into formats as needed. Simultaneously, the module-based software architecture ensures the flexibility and scalability of AstroCatR. We evaluated AstroCatR with actual observation data from The three Antarctic Survey Telescopes (AST3). The experiments demonstrate that AstroCatR can efficiently and flexibly reconstruct all time series data by setting relevant parameters and configuration files. Furthermore, the tool is approximately 3× faster than methods using relational data base management systems at matching massive catalogues.

Download Full-text

An efficient visual assessment of cluster tendency tool for large-scale time series data sets

2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) ◽

10.1109/fuzz-ieee.2017.8015587 ◽

2017 ◽

Cited By ~ 2

Author(s):

Timothy B. Iredale ◽

Sarah M. Erfani ◽

Christopher Leckie

Keyword(s):

Time Series ◽

Large Scale ◽

Time Series Data ◽

Visual Assessment ◽

Series Data ◽

Data Sets

Download Full-text

Feature-aware forecasting of large-scale time series data sets

it - Information Technology ◽

10.1515/itit-2019-0035 ◽

2020 ◽

Vol 62 (3-4) ◽

pp. 157-168

Author(s):

Claudio Hartmann ◽

Lars Kegel ◽

Wolfgang Lehner

Keyword(s):

Time Series ◽

Large Scale ◽

Missing Values ◽

Time Series Data ◽

Series Data ◽

The Internet ◽

Data Sets ◽

Similar Time ◽

Forecast Time ◽

The Internet Of Things

AbstractThe Internet of Things (IoT) sparks a revolution in time series forecasting. Traditional techniques forecast time series individually, which becomes unfeasible when the focus changes to thousands of time series exhibiting anomalies like noise and missing values. This work presents CSAR, a technique forecasting a set of time series with only one model, and a feature-aware partitioning applying CSAR on subsets of similar time series. These techniques provide accurate forecasts a hundred times faster than traditional techniques, preparing forecasting for the arising challenges of the IoT era.

Download Full-text

Towards Designing and Performance Analysis of Evolving Higher Order Neural Networks for Modeling and Forecasting Exchange Rate Time Series Data

Proceedings of ICETIT 2019 - Lecture Notes in Electrical Engineering ◽

10.1007/978-3-030-30577-2_22 ◽

2019 ◽

pp. 258-268

Author(s):

Kishore Kumar Sahu ◽

Sarat Chandra Nayak ◽

Himansu Sekhar Behera

Keyword(s):

Neural Networks ◽

Time Series ◽

Exchange Rate ◽

Performance Analysis ◽

Time Series Data ◽

Higher Order ◽

Series Data ◽

Modeling And Forecasting ◽

Higher Order Neural Networks ◽

And Performance

Download Full-text

Evidence Graphs: Supporting Transparent and FAIR Computation, with Defeasible Reasoning on Data, Methods and Results

10.1101/2021.03.29.437561 ◽

2021 ◽

Author(s):

Sadnan Al Manir ◽

Justin Niestroy ◽

Maxwell Adam Levinson ◽

Timothy Clark

Keyword(s):

Time Series ◽

Large Scale ◽

Time Series Data ◽

Predictive Analytics ◽

Defeasible Reasoning ◽

Series Data ◽

Inference Rules ◽

Deep Networks ◽

Evidence Graph ◽

Over Time

Introduction: Transparency of computation is a requirement for assessing the validity of computed results and research claims based upon them; and it is essential for access to, assessment, and reuse of computational components. These components may be subject to methodological or other challenges over time. While reference to archived software and/or data is increasingly common in publications, a single machine-interpretable, integrative representation of how results were derived, that supports defeasible reasoning, has been absent. Methods: We developed the Evidence Graph Ontology, EVI, in OWL 2, with a set of inference rules, to provide deep representations of supporting and challenging evidence for computations, services, software, data, and results, across arbitrarily deep networks of computations, in connected or fully distinct processes. EVI integrates FAIR practices on data and software, with important concepts from provenance models, and argumentation theory. It extends PROV for additional expressiveness, with support for defeasible reasoning. EVI treats any com- putational result or component of evidence as a defeasible assertion, supported by a DAG of the computations, software, data, and agents that produced it. Results: We have successfully deployed EVI for very-large-scale predictive analytics on clinical time-series data. Every result may reference its own evidence graph as metadata, which can be extended when subsequent computations are executed. Discussion: Evidence graphs support transparency and defeasible reasoning on results. They are first-class computational objects, and reference the datasets and software from which they are derived. They support fully transparent computation, with challenge and support propagation. The EVI approach may be extended to include instruments, animal models, and critical experimental reagents.

Download Full-text

Cell cycle time series gene expression data encoded as cyclic attractors in Hopfield systems

10.1101/170027 ◽

2017 ◽

Author(s):

Anthony Szedlak ◽

Spencer Sims ◽

Nicholas Smith ◽

Giovanni Paternostro ◽

Carlo Piermarocchi

Keyword(s):

Neural Network ◽

Gene Expression ◽

Cell Cycle ◽

Time Series ◽

Time Series Data ◽

Series Data ◽

Data Sets ◽

Expression Data ◽

Time Series Gene Expression ◽

Human Cervical Cancer

AbstractModern time series gene expression and other omics data sets have enabled unprecedented resolution of the dynamics of cellular processes such as cell cycle and response to pharmaceutical compounds. In anticipation of the proliferation of time series data sets in the near future, we use the Hopfield model, a recurrent neural network based on spin glasses, to model the dynamics of cell cycle in HeLa (human cervical cancer) and S. cerevisiae cells. We study some of the rich dynamical properties of these cyclic Hopfield systems, including the ability of populations of simulated cells to recreate experimental expression data and the effects of noise on the dynamics. Next, we use a genetic algorithm to identify sets of genes which, when selectively inhibited by local external fields representing gene silencing compounds such as kinase inhibitors, disrupt the encoded cell cycle. We find, for example, that inhibiting the set of four kinases BRD4, MAPK1, NEK7, and YES1 in HeLa cells causes simulated cells to accumulate in the M phase. Finally, we suggest possible improvements and extensions to our model.Author SummaryCell cycle – the process in which a parent cell replicates its DNA and divides into two daughter cells – is an upregulated process in many forms of cancer. Identifying gene inhibition targets to regulate cell cycle is important to the development of effective therapies. Although modern high throughput techniques offer unprecedented resolution of the molecular details of biological processes like cell cycle, analyzing the vast quantities of the resulting experimental data and extracting actionable information remains a formidable task. Here, we create a dynamical model of the process of cell cycle using the Hopfield model (a type of recurrent neural network) and gene expression data from human cervical cancer cells and yeast cells. We find that the model recreates the oscillations observed in experimental data. Tuning the level of noise (representing the inherent randomness in gene expression and regulation) to the “edge of chaos” is crucial for the proper behavior of the system. We then use this model to identify potential gene targets for disrupting the process of cell cycle. This method could be applied to other time series data sets and used to predict the effects of untested targeted perturbations.

Download Full-text

Differentially Private Autocorrelation Time-Series Data Publishing Based on Sliding Window

Security and Communication Networks ◽

10.1155/2021/6665984 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Jing Zhao ◽

Shubo Liu ◽

Xingxing Xiong ◽

Zhaohui Cai

Keyword(s):

Time Series ◽

Privacy Protection ◽

Large Scale ◽

Differential Privacy ◽

Time Series Data ◽

Sliding Window ◽

Data Publishing ◽

Series Data ◽

Data Publication ◽

Autocorrelation Time

Privacy protection is one of the major obstacles for data sharing. Time-series data have the characteristics of autocorrelation, continuity, and large scale. Current research on time-series data publication mainly ignores the correlation of time-series data and the lack of privacy protection. In this paper, we study the problem of correlated time-series data publication and propose a sliding window-based autocorrelation time-series data publication algorithm, called SW-ATS. Instead of using global sensitivity in the traditional differential privacy mechanisms, we proposed periodic sensitivity to provide a stronger degree of privacy guarantee. SW-ATS introduces a sliding window mechanism, with the correlation between the noise-adding sequence and the original time-series data guaranteed by sequence indistinguishability, to protect the privacy of the latest data. We prove that SW-ATS satisfies ε-differential privacy. Compared with the state-of-the-art algorithm, SW-ATS is superior in reducing the error rate of MAE which is about 25%, improving the utility of data, and providing stronger privacy protection.

Download Full-text

An Efficient Method for Forecasting Using Fuzzy Time Series

Emerging Research on Applied Fuzzy Sets and Intuitionistic Fuzzy Matrices - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-0914-1.ch013 ◽

2017 ◽

pp. 287-304 ◽

Cited By ~ 3

Author(s):

Pritpal Singh

Keyword(s):

Time Series ◽

Time Series Data ◽

Weather Forecasting ◽

Small Error ◽

Fuzzy Time Series ◽

Series Data ◽

Data Sets ◽

Proposed Model ◽

Temperature Forecasting ◽

The University

Forecasting using fuzzy time series has been applied in several areas including forecasting university enrollments, sales, road accidents, financial forecasting, weather forecasting, etc. Recently, many researchers have paid attention to apply fuzzy time series in time series forecasting problems. In this paper, we present a new model to forecast the enrollments in the University of Alabama and the daily average temperature in Taipei, based on one-factor fuzzy time series. In this model, a new frequency based clustering technique is employed for partitioning the time series data sets into different intervals. For defuzzification function, two new principles are also incorporated in this model. In case of enrollments as well daily temperature forecasting, proposed model exhibits very small error rate.

Download Full-text

How to Identify Varying Lead–Lag Effects in Time Series Data: Implementation, Validation, and Application of the Generalized Causality Algorithm

Algorithms ◽

10.3390/a13040095 ◽

2020 ◽

Vol 13 (4) ◽

pp. 95 ◽

Cited By ~ 1

Author(s):

Johannes Stübinger ◽

Katharina Adler

Keyword(s):

Time Series ◽

Large Scale ◽

Structural Breaks ◽

Time Series Data ◽

Consumer Price Index ◽

Real Data ◽

Linear Mapping ◽

Series Data ◽

Lag Effects ◽

Silver Metal

This paper develops the generalized causality algorithm and applies it to a multitude of data from the fields of economics and finance. Specifically, our parameter-free algorithm efficiently determines the optimal non-linear mapping and identifies varying lead–lag effects between two given time series. This procedure allows an elastic adjustment of the time axis to find similar but phase-shifted sequences—structural breaks in their relationship are also captured. A large-scale simulation study validates the outperformance in the vast majority of parameter constellations in terms of efficiency, robustness, and feasibility. Finally, the presented methodology is applied to real data from the areas of macroeconomics, finance, and metal. Highest similarity show the pairs of gross domestic product and consumer price index (macroeconomics), S&P 500 index and Deutscher Aktienindex (finance), as well as gold and silver (metal). In addition, the algorithm takes full use of its flexibility and identifies both various structural breaks and regime patterns over time, which are (partly) well documented in the literature.

Download Full-text

Similarity search and performance prediction of shield tunnels in operation through time series data mining

Automation in Construction ◽

10.1016/j.autcon.2020.103178 ◽

2020 ◽

Vol 114 ◽

pp. 103178

Author(s):

Hehua Zhu ◽

Xin Wang ◽

Xueqin Chen ◽

Lianyang Zhang

Keyword(s):

Data Mining ◽

Time Series ◽

Performance Prediction ◽

Similarity Search ◽

Time Series Data ◽

Series Data ◽

Time Series Data Mining ◽

And Performance

Download Full-text

Applying multiple time series data mining to large-scale network traffic analysis

2008 IEEE Conference on Cybernetics and Intelligent Systems ◽

10.1109/iccis.2008.4670844 ◽

2008 ◽

Cited By ~ 1

Author(s):

Weisong He ◽

Guangmin Hu ◽

Xingmiao Yao ◽

Guangyuan Kan ◽

Hong Wang ◽

...

Keyword(s):

Data Mining ◽

Time Series ◽

Large Scale ◽

Time Series Data ◽

Series Data ◽

Multiple Time ◽

Multiple Time Series ◽

Network Traffic Analysis ◽

Large Scale Network ◽

Scale Network

Download Full-text