Deep Learning of Multisensory Streaming Data for Predictive Modelling with Applications in Finance, Ecology, Transport and Environment

Author(s):  
Nikola K. Kasabov
Author(s):  
S. Priya ◽  
R. Annie Uthra

AbstractIn present times, data science become popular to support and improve decision-making process. Due to the accessibility of a wide application perspective of data streaming, class imbalance and concept drifting become crucial learning problems. The advent of deep learning (DL) models finds useful for the classification of concept drift in data streaming applications. This paper presents an effective class imbalance with concept drift detection (CIDD) using Adadelta optimizer-based deep neural networks (ADODNN), named CIDD-ADODNN model for the classification of highly imbalanced streaming data. The presented model involves four processes namely preprocessing, class imbalance handling, concept drift detection, and classification. The proposed model uses adaptive synthetic (ADASYN) technique for handling class imbalance data, which utilizes a weighted distribution for diverse minority class examples based on the level of difficulty in learning. Next, a drift detection technique called adaptive sliding window (ADWIN) is employed to detect the existence of the concept drift. Besides, ADODNN model is utilized for the classification processes. For increasing the classifier performance of the DNN model, ADO-based hyperparameter tuning process takes place to determine the optimal parameters of the DNN model. The performance of the presented model is evaluated using three streaming datasets namely intrusion detection (NSL KDDCup) dataset, Spam dataset, and Chess dataset. A detailed comparative results analysis takes place and the simulation results verified the superior performance of the presented model by obtaining a maximum accuracy of 0.9592, 0.9320, and 0.7646 on the applied KDDCup, Spam, and Chess dataset, respectively.


2021 ◽  
Vol 23 (2) ◽  
pp. 359-370
Author(s):  
Michał Matuszczak ◽  
Mateusz Żbikowski ◽  
Andrzej Teodorczyk

The article proposes an approach based on deep and machine learning models to predict a component failure as an enhancement of condition based maintenance scheme of a turbofan engine and reviews currently used prognostics approaches in the aviation industry. Component degradation scale representing its life consumption is proposed and such collected condition data are combined with engines sensors and environmental data. With use of data manipulation techniques, a framework for models training is created and models' hyperparameters obtained through Bayesian optimization. Models predict the continuous variable representing condition based on the input. Best performed model is identified by detemining its score on the holdout set. Deep learning models achieved 0.71 MSE score (ensemble meta-model of neural networks) and outperformed significantly machine learning models with their best score at 1.75. The deep learning models shown their feasibility to predict the component condition within less than 1 unit of the error in the rank scale.


Sensors ◽  
2019 ◽  
Vol 19 (11) ◽  
pp. 2451 ◽  
Author(s):  
Mohsin Munir ◽  
Shoaib Ahmed Siddiqui ◽  
Muhammad Ali Chattha ◽  
Andreas Dengel ◽  
Sheraz Ahmed

The need for robust unsupervised anomaly detection in streaming data is increasing rapidly in the current era of smart devices, where enormous data are gathered from numerous sensors. These sensors record the internal state of a machine, the external environment, and the interaction of machines with other machines and humans. It is of prime importance to leverage this information in order to minimize downtime of machines, or even avoid downtime completely by constant monitoring. Since each device generates a different type of streaming data, it is normally the case that a specific kind of anomaly detection technique performs better than the others depending on the data type. For some types of data and use-cases, statistical anomaly detection techniques work better, whereas for others, deep learning-based techniques are preferred. In this paper, we present a novel anomaly detection technique, FuseAD, which takes advantage of both statistical and deep-learning-based approaches by fusing them together in a residual fashion. The obtained results show an increase in area under the curve (AUC) as compared to state-of-the-art anomaly detection methods when FuseAD is tested on a publicly available dataset (Yahoo Webscope benchmark). The obtained results advocate that this fusion-based technique can obtain the best of both worlds by combining their strengths and complementing their weaknesses. We also perform an ablation study to quantify the contribution of the individual components in FuseAD, i.e., the statistical ARIMA model as well as the deep-learning-based convolutional neural network (CNN) model.


Author(s):  
Ioannis Prapas ◽  
Behrouz Derakhshan ◽  
Alireza Rezaei Mahdiraji ◽  
Volker Markl

AbstractDeep Learning (DL) has consistently surpassed other Machine Learning methods and achieved state-of-the-art performance in multiple cases. Several modern applications like financial and recommender systems require models that are constantly updated with fresh data. The prominent approach for keeping a DL model fresh is to trigger full retraining from scratch when enough new data are available. However, retraining large and complex DL models is time-consuming and compute-intensive. This makes full retraining costly, wasteful, and slow. In this paper, we present an approach to continuously train and deploy DL models. First, we enable continuous training through proactive training that combines samples of historical data with new streaming data. Second, we enable continuous deployment through gradient sparsification that allows us to send a small percentage of the model updates per training iteration. Our experimental results with LeNet5 on MNIST and modern DL models on CIFAR-10 show that proactive training keeps models fresh with comparable—if not superior—performance to full retraining at a fraction of the time. Combined with gradient sparsification, sparse proactive training enables very fast updates of a deployed model with arbitrarily large sparsity, reducing communication per iteration up to four orders of magnitude, with minimal—if any—losses in model quality. Sparse training, however, comes at a price; it incurs overhead on the training that depends on the size of the model and increases the training time by factors ranging from 1.25 to 3 in our experiments. Arguably, a small price to pay for successfully enabling the continuous training and deployment of large DL models.


Electronics ◽  
2020 ◽  
Vol 9 (10) ◽  
pp. 1664
Author(s):  
Yoon-Ki Kim ◽  
Yongsung Kim

Recently, as the amount of real-time video streaming data has increased, distributed parallel processing systems have rapidly evolved to process large-scale data. In addition, with an increase in the scale of computing resources constituting the distributed parallel processing system, the orchestration of technology has become crucial for proper management of computing resources, in terms of allocating computing resources, setting up a programming environment, and deploying user applications. In this paper, we present a new distributed parallel processing platform for real-time large-scale image processing based on deep learning model inference, called DiPLIP. It provides a scheme for large-scale real-time image inference using buffer layer and a scalable parallel processing environment according to the size of the stream image. It allows users to easily process trained deep learning models for processing real-time images in a distributed parallel processing environment at high speeds, through the distribution of the virtual machine container.


2020 ◽  
Author(s):  
Martin Cimmino ◽  
Matteo Calabrese ◽  
Dimos Kapetis ◽  
Sture Lygren ◽  
Daniele Vanzan ◽  
...  

Sensors ◽  
2021 ◽  
Vol 21 (7) ◽  
pp. 2430
Author(s):  
Saurabh Suradhaniwar ◽  
Soumyashree Kar ◽  
Surya S. Durbha ◽  
Adinarayana Jagarlapudi

High-frequency monitoring of agrometeorological parameters is quintessential in the domain of Precision Agriculture (PA), where timeliness of collected observations and the ability to generate ahead-of-time predictions can substantially impact the crop yield. In this context, state-of-the-art internet-of-things (IoT)-based sensing platforms are often employed to generate, pre-process and assimilate real-time data from heterogeneous sensors and streaming data sources. Simultaneously, Time-Series Forecasting Algorithms (TSFAs) are responsible for generating reliable forecasts with a pre-defined forecast horizon and confidence. These TSFAs often rely on modelling the correlation between endogenous variables, the impact of exogenous variables on latent form and structural properties of data such as autocorrelation, periodicity, trend, pattern, and causality to approximate the model parameters. Traditionally, TSFAs such as the Holt–Winters (HW) and Autoregressive family of models (ARIMA) apply a linear and parametric approach towards model approximation, whilst models like Support Vector Regression (SVRs) and Neural Networks (NNs) adhere to a non-linear, non-parametric approach for modelling the historical data. Recently, Deep-Learning-based TSFAs such as Recurrent Neural Networks (RNNs), and Long-Short-Term-Memory (LSTMS) have gained popularity due to their capability to model long sequences of highly non-linear and stochastic data effectively. However, the evolution of TSFAs for predicting agrometeorological parameters pivots around one-step-ahead forecasting, which often overestimates the performance metrics defined for validating forecast capabilities of potential TSFAs. Hence, this paper attempts to evaluate and compare the performance of different machine learning (ML) and deep learning (DL) based TSFAs under one-step and multi-step-ahead forecast scenarios, thereby estimating the generalization capabilities of TSFA models over unseen data. The data used in this study are collected from an Automatic Weather Station (AWS), sampled at an interval of 15 min, and range over one month. Temperature (T) and Humidity (H) observations from the AWS are further converted into univariate, supervised time-series diurnal data profiles. Finally, walk-forward validation is used to evaluate recursive one-step-ahead forecasts until the desired prediction horizon is achieved. The results show that the Seasonal Auto-Regressive Integrated Moving Average (SARIMA) and SVR models outperform their DL-based counterparts in one-step and multi-step ahead settings with a fixed forecast horizon. This work aims to present a baseline comparison between different TSFAs to assist the process of model selection and facilitate rapid ahead-of-time forecasting for end-user applications.


Sign in / Sign up

Export Citation Format

Share Document