Tourism and Big Data: Forecasting with Hierarchical and Sequential Cluster Analysis

A new Big Data cluster method was developed to forecast the hotel accommodation market. The simulation and training of time series data are from January 2008 to December 2019 for the Spanish case. Applying the Hierarchical and Sequential Clustering Analysis method represents an improvement in forecasting modelling of the Big Data literature. The model is presented to obtain better explanatory and forecasting capacity than models used by Google data sources. Furthermore, the model allows knowledge of the tourists’ search on the internet profiles before their hotel reservation. With the information obtained, stakeholders can make decisions efficiently. The Matrix U1 Theil was used to establish a dynamic forecasting comparison.

Download Full-text

Bernoulli Time Series Modelling with Application to Accommodation Tourism Demand

Engineering Proceedings ◽

10.3390/engproc2021005017 ◽

2021 ◽

Vol 5 (1) ◽

pp. 17

Author(s):

Miguel Ángel Ruiz Reina

Keyword(s):

Time Series ◽

Time Series Data ◽

Prognostic Models ◽

Series Data ◽

Economic Agents ◽

Time Series Modelling ◽

Simulation And Training ◽

Uncertainty Method ◽

And Training ◽

Hotel Accommodation

In this research, a new uncertainty method has been developed and applied to forecasting the hotel accommodation market. The simulation and training of Time Series data are from January 2001 to December 2018 in the Spanish case. The Log-log BeTSUF method estimated by GMM-HAC-Newey-West is considered as a contribution for measuring uncertainty vs. other prognostic models in the literature. The results of our model present better indicators of the RMSE and Ratio Theil’s for the predictive evaluation period of twelve months. Furthermore, the straightforward interpretation of the model and the high descriptive capacity of the model allow economic agents to make efficient decisions.

Download Full-text

Prediction of Children Diabetes by Autoregressive Integrated Moving Averages Model Using Big Data and Not Only SQL

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.8315 ◽

2019 ◽

Vol 16 (8) ◽

pp. 3510-3513

Author(s):

A. Bazila Banu ◽

R. K. Priyadarshini ◽

Ponniah Thirumalaikolundusubramanian

Keyword(s):

Big Data ◽

Time Series Data ◽

Query Language ◽

Moving Average ◽

Arima Model ◽

Glucose Monitoring ◽

Series Data ◽

Monitoring Systems ◽

Glucose Levels ◽

Diabetes Monitoring

Enormous efforts have been made by the health care organizations to assess the frequency and occurrence of diabetes among children. The epidemiology of diabetes is estimated with different methods. However, to effectively manage and estimate the diabetes, monitoring systems like glucose meters and Continuous Glucose Monitoring Systems (CGM) can be used. CGM is a way to determine glucose levels right through the day and night. The data obtained from such systems can be utilized effectively to manage as well to predict the diabetes. As the glucose level of the patient is monitored throughout the day, it results in an enormous amount of data. It is difficult to analyze large datasets using SQL, therefore NoSQL is used for handling big data based prediction. One such NoSQL tool known as ArangoDB is used to process the dataset with Arango Query Language (AQL). Investigations relevant to selection of attributes required for the model are discussed. In this paper, ARIMA model has been implemented to predict the diabetes among children. The model is evaluated in terms of moving average of glucose value of a particular person on a specific day. The results show that ARIMA model is appropriate for predicting Time-Series data especially like data obtained by CGM systems.

Download Full-text

PhilDB - The time series database with built-in change logging

10.7287/peerj.preprints.1488v1 ◽

2015 ◽

Author(s):

Andrew MacDonald

Keyword(s):

Time Series ◽

Big Data ◽

Open Source ◽

High Performance ◽

Time Series Data ◽

Handling Time ◽

Series Data ◽

Meta Data ◽

Static Data ◽

Data Tracking

PhilDB is an open-source time series database. It supports storage of time series datasets that are dynamic, that is recording updates to existing values in a log as they occur. Recent open-source systems, such as InfluxDB and OpenTSDB, have been developed to indefinitely store long-period, high-resolution time series data. Unfortunately they require a large initial installation investment before use because they are designed to operate over a cluster of servers to achieve high-performance writing of static data in real time. In essence, they have a ‘big data’ approach to storage and access. Other open-source projects for handling time series data that don’t take the ‘big data’ approach are also relatively new and are complex or incomplete. None of these systems gracefully handle revision of existing data while tracking values that changed. Unlike ‘big data’ solutions, PhilDB has been designed for single machine deployment on commodity hardware, reducing the barrier to deployment. PhilDB eases loading of data for the user by utilising an intelligent data write method. It preserves existing values during updates and abstracts the update complexity required to achieve logging of data value changes. PhilDB improves accessing datasets by two methods. Firstly, it uses fast reads which make it practical to select data for analysis. Secondly, it uses simple read methods to minimise effort required to extract data. PhilDB takes a unique approach to meta-data tracking; optional attribute attachment. This facilitates scaling the complexities of storing a wide variety of data. That is, it allows time series data to be loaded as time series instances with minimal initial meta-data, yet additional attributes can be created and attached to differentiate the time series instances as a wider variety of data is needed. PhilDB was written in Python, leveraging existing libraries. This paper describes the general approach, architecture, and philosophy of the PhilDB software.

Download Full-text

Application of GARCH Forecasting Method in Predicting The Number of Rail Passengers (Thousands of People) in Jabodetabek Region

Jurnal Matematika Statistika dan Komputasi ◽

10.20956/j.v18i2.18382 ◽

2022 ◽

Vol 18 (2) ◽

pp. 198-223

Author(s):

Farin Cyntiya Garini ◽

Warosatul Anbiya

Keyword(s):

Time Series ◽

Public Transportation ◽

Time Series Data ◽

Series Data ◽

Analysis Method ◽

Short Term ◽

Seasonal Factors ◽

Forecasting Method ◽

Short Term Forecasting ◽

Over Time

PT. Kereta Api Indonesia and PT. KAI Commuter Jabodetabek records time series data in the form of the number of train passengers (thousand people) in Jabodetabek Region in 2011-2020. One of the time series methods that can be used to predict the number of train passengers (thousand people) in Jabodetabek area is ARIMA method. ARIMA or also known as Box-Jenkins time series analysis method is used for short-term forecasting and does not accommodate seasonal factors. If the assumption of residual homoscedasticity is violated, the ARCH / GARCH method can be used, which explicitly models changes in residual variety over time. This study aims to model and forecast the number of train passengers (thousand people) in Jabodetabek area in 2021. Based on data analysis and processing using ARIMA method, the best model is ARIMA (1,1,1) with an AIC value of 2,159.87 and with ARCH / GARCH method, the best model is GARCH (1,1) with an AIC value of 18.314. Forecasting results obtained based on the best model can be used as a reference for related parties in managing and providing public transportation facilities, especially trains.

Download Full-text

A forecasting of stock trading price using time series information based on big data

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i3.pp2548-2554 ◽

2021 ◽

Vol 11 (3) ◽

pp. 2548

Author(s):

Soo-Tai Nam ◽

Chan-Yong Jin ◽

Seong-Yoon Shin

Keyword(s):

Time Series ◽

Big Data ◽

Stock Price ◽

Euclidean Distance ◽

Time Series Data ◽

Series Data ◽

Analysis Tool ◽

Large Set ◽

Data Generation ◽

Management Tools

Big data is a large set of structured or unstructured data that can collect, store, manage, and analyze data with existing database management tools. And it means the technique of extracting value from these data and interpreting the results. Big data has three characteristics: The size of existing data and other data (volume), the speed of data generation (velocity), and the variety of information forms (variety). The time series data are obtained by collecting and recording the data generated in accordance with the flow of time. If the analysis of these time series data, found the characteristics of the data implies that feature helps to understand and analyze time series data. The concept of distance is the simplest and the most obvious in dealing with the similarities between objects. The commonly used and widely known method for measuring distance is the Euclidean distance. This study is the result of analyzing the similarity of stock price flow using 793,800 closing prices of 1,323 companies in Korea. Visual studio and Excel presented calculate the Euclidean distance using an analysis tool. We selected “000100” as a target domestic company and prepared for big data analysis. As a result of the analysis, the shortest Euclidean distance is the code “143860” company, and the calculated value is “11.147”. Therefore, based on the results of the analysis, the limitations of the study and theoretical implications are suggested.

Download Full-text

COMPARISON OF VILLAGE DEVELOPMENT BEFORE AND AFTER VILLAGE FUND IMPLEMENTATION IN NAGAN RAYA DISTRICT

JURNAL EKONOMI DAN BISNIS ISLAM ◽

10.32505/j-ebis.v6i1.2605 ◽

2021 ◽

pp. 18-35

Author(s):

Arroyyan Ramly

Keyword(s):

Time Series Data ◽

Random Effect ◽

Random Effect Model ◽

Secondary Data ◽

Primary Data ◽

Series Data ◽

P Value ◽

Analysis Method ◽

Effect Model ◽

The Village

This study aims to analyze and see the effectiveness of the distribution of the use of village funds in Kuala Subdistrict, Nagan Raya Regency and its relationship with poverty levels. The data used is in the form of time series data from 2015 to 2018 which is collected through primary and secondary data. Primary data were obtained by directly visiting villages in the Kuala sub-district. Meanwhile, secondary data were obtained from the website of the Central Statistics Agency (BPS), document review, articles related to the object of research. This study conducted observations of 10 villages as a sample of 17 villages in Kuala District. The analysis method uses panel data regression with the random effect model (REM) analysis method. From the regression results of the random effect model, it was found that the village fund variable had a positive and significant effect on poverty with a probability of 0.0000 = p-value α = 5%. Then the village fund allocation variable has a significant negative effect on poverty with a probability of 0.0000 = p-value α = 5%. This means that adding 1% of village funds or increasing village funds will reduce poverty in Kuala Subdistrict, Nagan Raya Regency.

Download Full-text

Research on inefficiency analysis method of building energy utilizing time series data

IOP Conference Series Earth and Environmental Science ◽

10.1088/1755-1315/294/1/012052 ◽

2019 ◽

Vol 294 ◽

pp. 012052

Author(s):

Magori Bumpei ◽

Tomonari Yashiro

Keyword(s):

Time Series ◽

Time Series Data ◽

Building Energy ◽

Series Data ◽

Analysis Method

Download Full-text

The Swiss army knife of time series data mining: ten useful things you can do with the matrix profile and ten lines of code

Data Mining and Knowledge Discovery ◽

10.1007/s10618-019-00668-6 ◽

2020 ◽

Vol 34 (4) ◽

pp. 949-979

Author(s):

Yan Zhu ◽

Shaghayegh Gharghabi ◽

Diego Furtado Silva ◽

Hoang Anh Dau ◽

Chin-Chia Michael Yeh ◽

...

Keyword(s):

Data Mining ◽

Time Series ◽

Time Series Data ◽

Series Data ◽

Time Series Data Mining ◽

The Matrix

Download Full-text

A Long Short Term Memory with Peephole Connections and Generative Adversarial Network Based Collaborative Methodology to Identify Outliers in ECG Dataset

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9273 ◽

2020 ◽

Vol 17 (8) ◽

pp. 3798-3803

Author(s):

M. D. Anto Praveena ◽

B. Bharathi

Keyword(s):

Time Series ◽

Big Data ◽

Data Analytics ◽

Time Series Data ◽

Short Term Memory ◽

Big Data Analytics ◽

Data Preprocessing ◽

Heterogeneous Data ◽

Series Data ◽

Outlier Identification

Big Data analytics has become an upward field, and it plays a pivotal role in Healthcare and research practices. Big data analytics in healthcare cover vast numbers of dynamic heterogeneous data integration and analysis. Medical records of patients include several data including medical conditions, medications and test findings. One of the major challenges of analytics and prediction in healthcare is data preprocessing. In data preprocessing the outlier identification and correction is the important challenge. Outliers are exciting values that deviates from other values of the attribute; they may simply experimental errors or novelty. Outlier identification is the method of identifying data objects with somewhat different behaviors than expectations. Detecting outliers in time series data is different from normal data. Time series data are the data that are in a series of certain time periods. This kind of data are identified and cleared to bring the quality dataset. In this proposed work a hybrid outlier detection algorithm extended LSTM-GAN is helped to recognize the outliers in time series data. The outcome of the proposed extended algorithm attained better enactment in the time series analysis on ECG dataset processing compared with traditional methodologies.

Download Full-text

Research on Tourism Network Index Model Based on Baidu Index --- A Case Study of Sanya

MATEC Web of Conferences ◽

10.1051/matecconf/201822805017 ◽

2018 ◽

Vol 228 ◽

pp. 05017

Author(s):

Caixia Chen ◽

Chun Shi ◽

Jue Chen

Keyword(s):

Time Series Data ◽

Principal Component ◽

Series Data ◽

Analysis Method ◽

Index Model ◽

Network Search ◽

Weighted Analysis ◽

Development Level ◽

The Stability

Tourism index is a "barometer" to reflect the overall development level of tourism. The tourism index compiled by historical data can not reflect the real situation accurately with the increasing influence of network events on tourism In the Internet era. This study collects time series data of tourism network search by Baidu index tool and uses data mining method and principal component analysis method to detect and standardize the stability of the data. The spss system and the weighted analysis method are used to construct the tourism network index model. Finally, the model detection is carried out by comparing the actual tourism data in Sanya. This study is an important supplement to the existing tourism index.

Download Full-text