scholarly journals Information-Theoretical Criteria for Characterizing the Earliness of Time-Series Data

Entropy ◽  
2019 ◽  
Vol 22 (1) ◽  
pp. 49
Author(s):  
Mariano Lemus ◽  
João P. Beirão ◽  
Nikola Paunković ◽  
Alexandra M. Carvalho ◽  
Paulo Mateus

Biomedical signals constitute time-series that sustain machine learning techniques to achieve classification. These signals are complex with measurements of several features over, eventually, an extended period. Characterizing whether the data can anticipate prediction is an essential task in time-series mining. The ability to obtain information in advance by having early knowledge about a specific event may be of great utility in many areas. Early classification arises as an extension of the time-series classification problem, given the need to obtain a reliable prediction as soon as possible. In this work, we propose an information-theoretic method, named Multivariate Correlations for Early Classification (MCEC), to characterize the early classification opportunity of a time-series. Experimental validation is performed on synthetic and benchmark data, confirming the ability of the MCEC algorithm to perform a trade-off between accuracy and earliness in a wide-spectrum of time-series data, such as those collected from sensors, images, spectrographs, and electrocardiograms.

Author(s):  
Qianguang Lin ◽  
Ni Li ◽  
Qi Qi ◽  
Jiabin Hu

Internet of Things (IoT) devices built on different processor architectures have increasingly become targets of adversarial attacks. In this paper, we propose an algorithm for the malware classification problem of the IoT domain to deal with the increasingly severe IoT security threats. Application executions are represented by sequences of consecutive API calls. The time series of data is analyzed and filtered based on the improved information gains. It performs more effectively than chi-square statistics, in reducing the sequence lengths of input data meanwhile keeping the important information, according to the experimental results. We use a multi-layer convolutional neural network to classify various types of malwares, which is suitable for processing time series data. When the convolution window slides down the time sequence, it can obtain higher-level positions by collecting different sequence features, thereby understanding the characteristics of the corresponding sequence position. By comparing the iterative efficiency of different optimization algorithms in the model, we select an algorithm that can approximate the optimal solution to a small number of iterations to speed up the convergence of the model training. The experimental results from real world IoT malware sample show that the classification accuracy of this approach can reach more than 98%. Overall, our method has demonstrated practical suitability for IoT malware classification with high accuracies and low computational overheads by undergoing a comprehensive evaluation.


Modern Italy ◽  
2020 ◽  
Vol 25 (3) ◽  
pp. 279-297
Author(s):  
Bruno Bracalente ◽  
Davide Pellegrino ◽  
Antonio Forcina

Using an analysis of time series data over an extended period, this article describes the waning strength of the left-wing vote in Italy's ‘red regions’. By analysing changes to the provincial share of the vote for successive principal left-wing parties over the period 1953–2018, the degree of continuity in relation to the left's traditional territorial entrenchment is assessed. It becomes clear that after an extended period of minimal change, in more recent years there has been an increasing disruption of previous patterns. A thorough analysis of voter transitions during the 2001–19 period in Umbria, the first red region in which the left lost control of the regional government, shows that in this case the gradual weakening of the traditional left-wing ‘vote of belonging’ has experienced a dramatic acceleration during the more recent period. This has been expressed in a growing rate of abstention, vote-switching according to the type of electoral contest, and a marked propensity to vote for populist movements and parties on both the left and right.


Mathematics ◽  
2021 ◽  
Vol 9 (17) ◽  
pp. 2146
Author(s):  
Mikhail Zymbler ◽  
Elena Ivanova

Currently, big sensor data arise in a wide spectrum of Industry 4.0, Internet of Things, and Smart City applications. In such subject domains, sensors tend to have a high frequency and produce massive time series in a relatively short time interval. The data collected from the sensors are subject to mining in order to make strategic decisions. In the article, we consider the problem of choosing a Time Series Database Management System (TSDBMS) to provide efficient storing and mining of big sensor data. We overview InfluxDB, OpenTSDB, and TimescaleDB, which are among the most popular state-of-the-art TSDBMSs, and represent different categories of such systems, namely native, add-ons over NoSQL systems, and add-ons over relational DBMSs (RDBMSs), respectively. Our overview shows that, at present, TSDBMSs offer a modest built-in toolset to mine big sensor data. This leads to the use of third-party mining systems and unwanted overhead costs due to exporting data outside a TSDBMS, data conversion, and so on. We propose an approach to managing and mining sensor data inside RDBMSs that exploits the Matrix Profile concept. A Matrix Profile is a data structure that annotates a time series through the index of and the distance to the nearest neighbor of each subsequence of the time series and serves as a basis to discover motifs, anomalies, and other time-series data mining primitives. This approach is implemented as a PostgreSQL extension that allows an application programmer both to compute matrix profiles and mining primitives and to represent them as relational tables. Experimental case studies show that our approach surpasses the above-mentioned out-of-TSDBMS competitors in terms of performance since it assumes that sensor data are mined inside a TSDBMS at no significant overhead costs.


2019 ◽  
Vol 3 (2) ◽  
pp. 282-287
Author(s):  
Ika Oktavianti ◽  
Ermatita Ermatita ◽  
Dian Palupi Rini

Licensing services is one of the forms of public services that important in supporting increased investment in Indonesia and is currently carried out by the Investment and Licensing Services Department. The problems that occur in general are the length of time to process licenses and one of the contributing factors is the limited number of licensing officers. Licensing data is a time series data which have monthly observation. The Artificial Neural Network (ANN) and Support Vector Machine (SVR) is used as machine learning techniques to predict licensing pattern based on time series data. Of the data used dataset 1 and dataset 2, the sharing of training data and testing data is equal to 70% and 30% with consideration that training data must be more than testing data. The result of the study showed for Dataset 1, the ANN-Multilayer Perceptron have a better performance than Support Vector Regression (SVR) with MSE, MAE and RMSE values is 251.09, 11.45, and 15.84. Then for dataset 2, SVR-Linear has better performance than MLP with values of MSE, MAE and RMSE of 1839.93, 32.80, and 42.89. The dataset used to predict the number of permissions is dataset 2. The study also used the Simple Linear Regression (SLR) method to see the causal relationship between the number of licenses issued and licensing service officers. The result is that the relationship between the number of licenses issued and the number of service officers is less significant because there are other factors that affect the number of licenses.  


Author(s):  
Ashish Gupta ◽  
Rajdeep Pal ◽  
Rahul Mishra ◽  
Hari Prabhat Gupta ◽  
Tanima Dutta ◽  
...  

Author(s):  
Daniela A. Gomez-Cravioto ◽  
Ramon E. Diaz-Ramos ◽  
Francisco J. Cantu-Ortiz ◽  
Hector G. Ceballos

AbstractTo understand and approach the spread of the SARS-CoV-2 epidemic, machine learning offers fundamental tools. This study presents the use of machine learning techniques for projecting COVID-19 infections and deaths in Mexico. The research has three main objectives: first, to identify which function adjusts the best to the infected population growth in Mexico; second, to determine the feature importance of climate and mobility; third, to compare the results of a traditional time series statistical model with a modern approach in machine learning. The motivation for this work is to support health care providers in their preparation and planning. The methods compared are linear, polynomial, and generalized logistic regression models to describe the growth of COVID-19 incidents in Mexico. Additionally, machine learning and time series techniques are used to identify feature importance and perform forecasting for daily cases and fatalities. The study uses the publicly available data sets from the John Hopkins University of Medicine in conjunction with the mobility rates obtained from Google’s Mobility Reports and climate variables acquired from the Weather Online API. The results suggest that the logistic growth model fits best the pandemic’s behavior, that there is enough correlation of climate and mobility variables with the disease numbers, and that the Long short-term memory network can be exploited for predicting daily cases. Given this, we propose a model to predict daily cases and fatalities for SARS-CoV-2 using time series data, mobility, and weather variables.


2020 ◽  
Author(s):  
Pavan Kumar Jonnakuti ◽  
Udaya Bhaskar Tata Venkata Sai

<p>Sea surface temperature (SST) is a key variable of the global ocean, which affects air-sea interaction processes. Forecasts based on statistics and machine learning techniques did not succeed in considering the spatial and temporal relationships of the time series data. Therefore, to achieve precision in SST prediction we propose a deep learning-based model, by which we can produce a more realistic and accurate account of SST ‘behavior’ as it focuses both on space and time. Our hybrid CNN-LSTM model uses multiple processing layers to learn hierarchical representations by implementing 3D and 2D convolution neural networks as a method to better understand the spatial features and additionally we use LSTM to examine the temporal sequence of relations in SST time-series satellite data. Widespread studies, based on the historical satellite datasets spanning from 1980 - present time, in Indian Ocean region shows that our proposed deep learning-based CNN-LSTM model is extremely capable for short and mid-term daily SST prediction accurately exclusive based on the error estimates (obtained from LSTM) of the forecasted data sets.</p><p><strong>Keywords: Deep Learning, Sea Surface Temperature, CNN, LSTM, Prediction.</strong></p><p> </p>


The stock market has been one of the primary revenue streams for many for years. The stock market is often incalculable and uncertain; therefore predicting the ups and downs of the stock market is an uphill task even for the financial experts, which they been trying to tackle without any little success. But it is now possible to predict stock markets due to rapid improvement in technology which led to better processing speed and more accurate algorithms. It is necessary to forswear the misconception that prediction of stock market is only meant for people who have expertise in finance; hence an application can be developed to guide the user about the tempo of the stock market and risk associated with it.The prediction of prices in stock market is a complicated task, and there are various techniques that are used to solve the problem, this paper investigates some of these techniques and compares the accuracy of each of the methods. Forecasting the time series data is important topic in many economics, statistics, finance and business. Of the many techniques in forecasting time series data such as the Autoregressive, Moving Average, and the Autoregressive Integrated Moving Average, it is the Autoregressive Integrated Moving Average that has higher accuracy and higher precision than other methods. And with recent advancement in computational power of processors and advancement in knowledge of machine learning techniques and deep learning, new algorithms could be made to tackle the problem of predicting the stock market. This paper investigates one of such machine learning algorithms to forecast time series data such as Long Short Term Memory. It is compared with traditional algorithms such as the ARIMA method, to determine how superior the LSTM is compared to the traditional methods for predicting the stock market.


Sign in / Sign up

Export Citation Format

Share Document