FIKWaste: A Waste Generation Dataset from Three Restaurant Kitchens in Portugal

In the era of big data and artificial intelligence, public datasets are becoming increasingly important for researchers to build and evaluate their models. This paper presents the FIKWaste dataset, which contains time series data for the volume of waste produced in three restaurant kitchens in Portugal. Organic (undifferentiated) and inorganic (glass, paper, and plastic) waste bins were monitored for a consecutive period of four weeks. In addition to the time series measurements, the FIKWaste dataset contains labels for waste disposal events, i.e., when the waste bins are emptied, and technical and non-technical details of the monitored kitchens.

Download Full-text

G-CNN and double-referenced thresholding for detecting time series anomalies

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-200175 ◽

2020 ◽

pp. 1-12

Author(s):

Liping Li ◽

Zean Tian ◽

Kenli Li ◽

Cen Chen

Keyword(s):

Time Series ◽

Time Series Data ◽

Series Data ◽

Prediction Errors ◽

Feedback Information ◽

Public Datasets ◽

Gated Recurrent Units ◽

Control Feedback ◽

Intermediate Output ◽

Internet Company

Anomaly detection based on time series data is of great importance in many fields. Time series data produced by man-made systems usually include two parts: monitored and exogenous data, which respectively are the detected object and the control/feedback information. In this paper, a so-called G-CNN architecture that combined the gated recurrent units (GRU) with a convolutional neural network (CNN) is proposed, which respectively focus on the monitored and exogenous data. The most important is the introduction of a complementary double-referenced thresholding approach that processes prediction errors and calculates threshold, achieving balance between the minimization of false positives and the false negatives. The outstanding performance and extensive applicability of our model is demonstrated by experiments on two public datasets from aerospace and a new server machine dataset from an Internet company. It is also found that the monitored data is close associated with the exogenous data if any, and the interpretability of the G-CNN is discussed by visualizing the intermediate output of neural networks.

Download Full-text

PhilDB - The time series database with built-in change logging

10.7287/peerj.preprints.1488v1 ◽

2015 ◽

Author(s):

Andrew MacDonald

Keyword(s):

Time Series ◽

Big Data ◽

Open Source ◽

High Performance ◽

Time Series Data ◽

Handling Time ◽

Series Data ◽

Meta Data ◽

Static Data ◽

Data Tracking

PhilDB is an open-source time series database. It supports storage of time series datasets that are dynamic, that is recording updates to existing values in a log as they occur. Recent open-source systems, such as InfluxDB and OpenTSDB, have been developed to indefinitely store long-period, high-resolution time series data. Unfortunately they require a large initial installation investment before use because they are designed to operate over a cluster of servers to achieve high-performance writing of static data in real time. In essence, they have a ‘big data’ approach to storage and access. Other open-source projects for handling time series data that don’t take the ‘big data’ approach are also relatively new and are complex or incomplete. None of these systems gracefully handle revision of existing data while tracking values that changed. Unlike ‘big data’ solutions, PhilDB has been designed for single machine deployment on commodity hardware, reducing the barrier to deployment. PhilDB eases loading of data for the user by utilising an intelligent data write method. It preserves existing values during updates and abstracts the update complexity required to achieve logging of data value changes. PhilDB improves accessing datasets by two methods. Firstly, it uses fast reads which make it practical to select data for analysis. Secondly, it uses simple read methods to minimise effort required to extract data. PhilDB takes a unique approach to meta-data tracking; optional attribute attachment. This facilitates scaling the complexities of storing a wide variety of data. That is, it allows time series data to be loaded as time series instances with minimal initial meta-data, yet additional attributes can be created and attached to differentiate the time series instances as a wider variety of data is needed. PhilDB was written in Python, leveraging existing libraries. This paper describes the general approach, architecture, and philosophy of the PhilDB software.

Download Full-text

A forecasting of stock trading price using time series information based on big data

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i3.pp2548-2554 ◽

2021 ◽

Vol 11 (3) ◽

pp. 2548

Author(s):

Soo-Tai Nam ◽

Chan-Yong Jin ◽

Seong-Yoon Shin

Keyword(s):

Time Series ◽

Big Data ◽

Stock Price ◽

Euclidean Distance ◽

Time Series Data ◽

Series Data ◽

Analysis Tool ◽

Large Set ◽

Data Generation ◽

Management Tools

Big data is a large set of structured or unstructured data that can collect, store, manage, and analyze data with existing database management tools. And it means the technique of extracting value from these data and interpreting the results. Big data has three characteristics: The size of existing data and other data (volume), the speed of data generation (velocity), and the variety of information forms (variety). The time series data are obtained by collecting and recording the data generated in accordance with the flow of time. If the analysis of these time series data, found the characteristics of the data implies that feature helps to understand and analyze time series data. The concept of distance is the simplest and the most obvious in dealing with the similarities between objects. The commonly used and widely known method for measuring distance is the Euclidean distance. This study is the result of analyzing the similarity of stock price flow using 793,800 closing prices of 1,323 companies in Korea. Visual studio and Excel presented calculate the Euclidean distance using an analysis tool. We selected “000100” as a target domestic company and prepared for big data analysis. As a result of the analysis, the shortest Euclidean distance is the code “143860” company, and the calculated value is “11.147”. Therefore, based on the results of the analysis, the limitations of the study and theoretical implications are suggested.

Download Full-text

A Long Short Term Memory with Peephole Connections and Generative Adversarial Network Based Collaborative Methodology to Identify Outliers in ECG Dataset

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9273 ◽

2020 ◽

Vol 17 (8) ◽

pp. 3798-3803

Author(s):

M. D. Anto Praveena ◽

B. Bharathi

Keyword(s):

Time Series ◽

Big Data ◽

Data Analytics ◽

Time Series Data ◽

Short Term Memory ◽

Big Data Analytics ◽

Data Preprocessing ◽

Heterogeneous Data ◽

Series Data ◽

Outlier Identification

Big Data analytics has become an upward field, and it plays a pivotal role in Healthcare and research practices. Big data analytics in healthcare cover vast numbers of dynamic heterogeneous data integration and analysis. Medical records of patients include several data including medical conditions, medications and test findings. One of the major challenges of analytics and prediction in healthcare is data preprocessing. In data preprocessing the outlier identification and correction is the important challenge. Outliers are exciting values that deviates from other values of the attribute; they may simply experimental errors or novelty. Outlier identification is the method of identifying data objects with somewhat different behaviors than expectations. Detecting outliers in time series data is different from normal data. Time series data are the data that are in a series of certain time periods. This kind of data are identified and cleared to bring the quality dataset. In this proposed work a hybrid outlier detection algorithm extended LSTM-GAN is helped to recognize the outliers in time series data. The outcome of the proposed extended algorithm attained better enactment in the time series analysis on ECG dataset processing compared with traditional methodologies.

Download Full-text

Artificial intelligence algorithm for optimal time series data model

IEEE Access ◽

10.1109/access.2020.2981488 ◽

2020 ◽

pp. 1-1

Author(s):

Kang Wang

Keyword(s):

Artificial Intelligence ◽

Time Series ◽

Data Model ◽

Time Series Data ◽

Series Data ◽

Optimal Time ◽

Intelligence Algorithm ◽

Artificial Intelligence Algorithm

Download Full-text

Predicting Plant Growth from Time-Series Data Using Deep Learning

Remote Sensing ◽

10.3390/rs13030331 ◽

2021 ◽

Vol 13 (3) ◽

pp. 331

Author(s):

Robail Yasrab ◽

Jincheng Zhang ◽

Polina Smyth ◽

Michael P. Pound

Keyword(s):

Time Series ◽

Deep Learning ◽

Plant Growth ◽

Time Series Data ◽

Plant Traits ◽

Domain Adaptation ◽

Series Data ◽

Plant Phenotyping ◽

Research Issues ◽

Public Datasets

Phenotyping involves the quantitative assessment of the anatomical, biochemical, and physiological plant traits. Natural plant growth cycles can be extremely slow, hindering the experimental processes of phenotyping. Deep learning offers a great deal of support for automating and addressing key plant phenotyping research issues. Machine learning-based high-throughput phenotyping is a potential solution to the phenotyping bottleneck, promising to accelerate the experimental cycles within phenomic research. This research presents a study of deep networks’ potential to predict plants’ expected growth, by generating segmentation masks of root and shoot systems into the future. We adapt an existing generative adversarial predictive network into this new domain. The results show an efficient plant leaf and root segmentation network that provides predictive segmentation of what a leaf and root system will look like at a future time, based on time-series data of plant growth. We present benchmark results on two public datasets of Arabidopsis (A. thaliana) and Brassica rapa (Komatsuna) plants. The experimental results show strong performance, and the capability of proposed methods to match expert annotation. The proposed method is highly adaptable, trainable (transfer learning/domain adaptation) on different plant species and mutations.

Download Full-text

Multi-Channel Fusion Classification Method Based on Time-Series Data

Sensors ◽

10.3390/s21134391 ◽

2021 ◽

Vol 21 (13) ◽

pp. 4391

Author(s):

Xue-Bo Jin ◽

Aiqiang Yang ◽

Tingli Su ◽

Jian-Lei Kong ◽

Yuting Bai

Keyword(s):

Time Series ◽

Time Series Data ◽

Short Term Memory ◽

Evidence Theory ◽

Recurrence Plot ◽

Series Data ◽

Public Datasets ◽

Application Fields ◽

Original Time

Time-series data generally exists in many application fields, and the classification of time-series data is one of the important research directions in time-series data mining. In this paper, univariate time-series data are taken as the research object, deep learning and broad learning systems (BLSs) are the basic methods used to explore the classification of multi-modal time-series data features. Long short-term memory (LSTM), gated recurrent unit, and bidirectional LSTM networks are used to learn and test the original time-series data, and a Gramian angular field and recurrence plot are used to encode time-series data to images, and a BLS is employed for image learning and testing. Finally, to obtain the final classification results, Dempster–Shafer evidence theory (D–S evidence theory) is considered to fuse the probability outputs of the two categories. Through the testing of public datasets, the method proposed in this paper obtains competitive results, compensating for the deficiencies of using only time-series data or images for different types of datasets.

Download Full-text

Real time interpretation and optimization of time series data stream in big data

2018 IEEE 3rd International Conference on Cloud Computing and Big Data Analysis (ICCCBDA) ◽

10.1109/icccbda.2018.8386520 ◽

2018 ◽

Cited By ~ 2

Author(s):

Zheyuan Jiang ◽

Ke Liu

Keyword(s):

Time Series ◽

Big Data ◽

Real Time ◽

Data Stream ◽

Time Series Data ◽

Series Data

Download Full-text

PhilDB: the time series database with built-in change logging

PeerJ Computer Science ◽

10.7717/peerj-cs.52 ◽

2016 ◽

Vol 2 ◽

pp. e52 ◽

Cited By ~ 1

Author(s):

Andrew MacDonald

Keyword(s):

Time Series ◽

Big Data ◽

Open Source ◽

High Performance ◽

Time Series Data ◽

Handling Time ◽

Series Data ◽

Meta Data ◽

Static Data ◽

Data Tracking

PhilDB is an open-source time series database that supports storage of time series datasets that are dynamic; that is, it records updates to existing values in a log as they occur. PhilDB eases loading of data for the user by utilising an intelligent data write method. It preserves existing values during updates and abstracts the update complexity required to achieve logging of data value changes. It implements fast reads to make it practical to select data for analysis. Recent open-source systems have been developed to indefinitely store long-period high-resolution time series data without change logging. Unfortunately, such systems generally require a large initial installation investment before use because they are designed to operate over a cluster of servers to achieve high-performance writing of static data in real time. In essence, they have a ‘big data’ approach to storage and access. Other open-source projects for handling time series data that avoid the ‘big data’ approach are also relatively new and are complex or incomplete. None of these systems gracefully handle revision of existing data while tracking values that change. Unlike ‘big data’ solutions, PhilDB has been designed for single machine deployment on commodity hardware, reducing the barrier to deployment. PhilDB takes a unique approach to meta-data tracking; optional attribute attachment. This facilitates scaling the complexities of storing a wide variety of data. That is, it allows time series data to be loaded as time series instances with minimal initial meta-data, yet additional attributes can be created and attached to differentiate the time series instances when a wider variety of data is needed. PhilDB was written in Python, leveraging existing libraries. While some existing systems come close to meeting the needs PhilDB addresses, none cover all the needs at once. PhilDB was written to fill this gap in existing solutions. This paper explores existing time series database solutions, discusses the motivation for PhilDB, describes the architecture and philosophy of the PhilDB software, and performs an evaluation between InfluxDB, PhilDB, and SciDB.

Download Full-text

Dynamic predict in-hospital mortality risk in intensive care unit with a new deep learning of artificial intelligence

10.21203/rs.3.rs-44310/v1 ◽

2020 ◽

Author(s):

Yu-wen Chen ◽

Yu-jie Li ◽

Zhi-yong Yang ◽

Kun-hua Zhong ◽

Li-ge Zhang ◽

...

Keyword(s):

Artificial Intelligence ◽

Time Series ◽

Deep Learning ◽

Mortality Risk ◽

Time Series Data ◽

Risk Model ◽

Series Data ◽

Physiological Variables ◽

Prediction Of Mortality ◽

Mimic Iii

Abstract Background Dynamic prediction of patients’ mortality risk in ICU with time series data is limited due to the high dimensionality, uncertainty with sampling intervals, and other issues. New deep learning method, temporal convolution network (TCN), makes it possible to deal with complex clinical time series data in ICU. We aimed to develop and validate it to predict mortality risk using time series data from MIMIC III dataset. Methods Finally, 21139 records of ICU stays were analyzed and in total 17 physiological variables from the MIMIC III dataset were used to predict mortality risk. Then we compared the model performances of attention-based TCN with traditional artificial intelligence (AI) method. Results The Area Under Receiver Operating Characteristic (AUCROC) and Area Under Precision-Recall curve (AUC-PR) of attention-based TCN for predicting the mortality risk 48 h after ICU admission were 0.837(0.824–0.850) and 0.454. The sensitivity and specificity of attention-based TCN were 67.1% and 82.6%, compared to the traditional AI method yield low sensitivity (< 50%). Conclusions Attention-based TCN model achieved better performance in prediction of mortality risk with time series data than traditional AI methods and conventional score-based models. Attention-based TCN mortality risk model has the potential for helping decision-making in critical patients.

Download Full-text