stream data Latest Research Papers

TinyLFU-Based Semi-Stream Cache Join for Near-Real-Time Data Warehousing

10.21203/rs.3.rs-944044/v1 ◽

2022 ◽

Author(s):

M. Asif Naeem ◽

Wasiullah Waqar ◽

Farhaan Mirza ◽

Ali Tahir

Keyword(s):

Real Time ◽

Data Warehousing ◽

Cost Model ◽

Research Problem ◽

Daily Basis ◽

Stream Data ◽

Time Data ◽

Business Decisions ◽

Real Time Data ◽

Modern Era

Abstract Semi-stream join is an emerging research problem in the domain of near-real-time data warehousing. A semi-stream join is basically a join between a fast stream (S) and a slow disk-based relation (R). In the modern era of technology, huge amounts of data are being generated swiftly on a daily basis which needs to be instantly analyzed for making successful business decisions. Keeping this in mind, a famous algorithm called CACHEJOIN (Cache Join) was proposed. The limitation of the CACHEJOIN algorithm is that it does not deal with the frequently changing trends in a stream data efficiently. To overcome this limitation, in this paper we propose a TinyLFU-CACHEJOIN algorithm, a modified version of the original CACHEJOIN algorithm, which is designed to enhance the performance of a CACHEJOIN algorithm. TinyLFU-CACHEJOIN employs an intelligent strategy which keeps only those records of $R$ in the cache that have a high hit rate in S. This mechanism of TinyLFU-CACHEJOIN allows it to deal with the sudden and abrupt trend changes in S. We developed a cost model for our TinyLFU-CACHEJOIN algorithm and proved it empirically. We also assessed the performance of our proposed TinyLFU-CACHEJOIN algorithm with the existing CACHEJOIN algorithm on a skewed synthetic dataset. The experiments proved that TinyLFU-CACHEJOIN algorithm significantly outperforms the CACHEJOIN algorithm.

Download Full-text

Exploring jump back behavior patterns and reasons in e-book system

Smart Learning Environments ◽

10.1186/s40561-021-00183-6 ◽

2022 ◽

Vol 9 (1) ◽

Author(s):

Boxuan Ma ◽

Min Lu ◽

Yuta Taniguchi ◽

Shin’ichi Konomi

Keyword(s):

Student Learning ◽

Activity Patterns ◽

Learning Performance ◽

Digital Learning ◽

Stream Data ◽

Learning Materials ◽

Reading Behaviors ◽

Log Data ◽

Learning Behaviors ◽

Event Stream

AbstractWith the increasing use of digital learning materials in higher education, the accumulated operational log data provide a unique opportunity to analyzing student learning behaviors and their effects on student learning performance to understand how students learn with e-books. Among the students’ reading behaviors interacting with e-book systems, we find that jump-back is a frequent and informative behavior type. In this paper, we aim to understand the student’s intention for a jump-back using user learning log data on the e-book materials of a course in our university. We at first formally define the “jump-back” behaviors that can be detected from the click event stream of slide reading and then systematically study the behaviors from different perspectives on the e-book event stream data. Finally, by sampling 22 learning materials, we identify six reading activity patterns that can explain jump backs. Our analysis provides an approach to enriching the understanding of e-book learning behaviors and informs design implications for e-book systems.

Download Full-text

A Scalable Big Data Framework for Real-Time Traffic Monitoring System

10.21203/rs.3.rs-1200646/v1 ◽

2022 ◽

Author(s):

Wilfried Yves Hamilton Adoni ◽

Tarik Nahhal ◽

Najib Ben Aoun ◽

Moez Krichen ◽

Mohammed Alzahrani

Keyword(s):

Big Data ◽

Real Time ◽

Public Transportation ◽

Large Scale ◽

Network Performance ◽

Intelligent Transportation System ◽

Traffic Monitoring ◽

Transportation System ◽

Stream Data ◽

Data Framework

Abstract In this paper, we present a scalable and real-time intelligent transportation system based on a big data framework. The proposed system allows for the use of existing data from road sensors to better understand traffic flow, traveler behavior, and increase road network performance. Our transportation system is designed to process large-scale stream data to analyze traffic events such as incidents, crashes and congestion. The experiments performed on the public transportation modes of the city of Casablanca in Morocco reveal that the proposed system achieves a significant gain of time, gathers large-scale data from many road sensors and is not expensive in terms of hardware resource consumption.

Download Full-text

Estimating deflation representing people spreading in stream data and estimating a specific position

International Journal of Intelligent Information and Database Systems ◽

10.1504/ijiids.2022.10042489 ◽

2022 ◽

Vol 15 (1) ◽

pp. 1

Author(s):

Shigeyoshi Ohno ◽

Takuo Kikuchi ◽

Masaki Endo ◽

Takuma Toyoshima ◽

Hiroshi Ishikawa

Keyword(s):

Stream Data ◽

Specific Position

Download Full-text

Estimating deflation representing people spreading in stream data and estimating a specific position

International Journal of Intelligent Information and Database Systems ◽

10.1504/ijiids.2022.120150 ◽

2022 ◽

Vol 15 (1) ◽

pp. 104

Author(s):

Takuma Toyoshima ◽

Masaki Endo ◽

Takuo Kikuchi ◽

Shigeyoshi Ohno ◽

Hiroshi Ishikawa

Keyword(s):

Stream Data ◽

Specific Position

Download Full-text

What Is Open Source Software (OSS) and What Is Big Data?

10.4018/978-1-6684-3662-2.ch005 ◽

2022 ◽

pp. 77-118

Author(s):

Richard S. Segall

Keyword(s):

Big Data ◽

Open Source ◽

Open Source Software ◽

Fog Computing ◽

Computer Software ◽

Data Sets ◽

Copyright Holder ◽

Stream Data ◽

Big Data Visualization ◽

Continuous Stream

This chapter discusses what Open Source Software is and its relationship to Big Data and how it differs from other types of software and its software development cycle. Open source software (OSS) is a type of computer software in which source code is released under a license in which the copyright holder grants users the rights to study, change, and distribute the software to anyone and for any purpose. Big Data are data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data can be discrete or a continuous stream data and is accessible using many types of computing devices ranging from supercomputers and personal workstations to mobile devices and tablets. It is discussed how fog computing can be performed with cloud computing for visualization of Big Data. This chapter also presents a summary of additional web-based Big Data visualization software.

Download Full-text

Stream Classification Algorithm Based on Decision Tree

Mobile Information Systems ◽

10.1155/2021/3103053 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Jinlin Guo ◽

Haoran Wang ◽

Xinwei Li ◽

Li Zhang

Keyword(s):

Decision Tree ◽

Concept Drift ◽

Data Classification ◽

Classification Algorithm ◽

Current Data ◽

Classification Model ◽

Stream Data ◽

Integration Algorithm ◽

Stream Classification ◽

Model Classification

Due to the rise of many fields such as e-commerce platforms, a large number of stream data has emerged. The incomplete labeling problem and concept drift problem of these data pose a huge challenge to the existing stream data classification methods. In this respect, a dynamic stream data classification algorithm is proposed for the stream data. For the incomplete labeling problem, this method introduces randomization and iterative strategy based on the very fast decision tree VFDT algorithm to design an iterative integration algorithm, and the algorithm uses the previous model classification result as the next model input and implements the voting mechanism for new data classification. At the same time, the window mechanism is used to store data and calculate the data distribution characteristics in the window, then, combined with the calculated result and the predicted amount of data to adjust the size of the sliding window. Experiments show the superiority of the algorithm in classification accuracy. The aim of the study is to compare different algorithms to evaluate whether classification model adapts to the current data environment.

Download Full-text

Adaptive Interval Fuzzy Modeling from Stream Data and Application in Cryptocurrencies Forecasting

Advances in Intelligent Systems and Computing - Fuzzy Information Processing 2020 ◽

10.1007/978-3-030-81561-5_7 ◽

2021 ◽

pp. 69-81

Author(s):

Leandro Maciel ◽

Rosangela Ballini ◽

Fernando Gomide

Keyword(s):

Fuzzy Modeling ◽

Stream Data

Download Full-text

A Clustering Algorithm in Stream Data Using Strong Coreset

Journal of Interconnection Networks ◽

10.1142/s0219265921430118 ◽

2021 ◽

Author(s):

Manmohan Singh ◽

Rajendra Pamula ◽

Alok Kumar

Keyword(s):

Data Mining ◽

Clustering Algorithm ◽

Local Optimum ◽

Reduction Algorithm ◽

Stream Data ◽

Stream Data Mining ◽

Clustering Approach ◽

Approximation Guarantee ◽

Competitive Algorithms ◽

Learning Data

There are various applications of clustering in the fields of machine learning, data mining, data compression along with pattern recognition. The existent techniques like the Llyods algorithm (sometimes called k-means) were affected by the issue of the algorithm which converges to a local optimum along with no approximation guarantee. For overcoming these shortcomings, an efficient k-means clustering approach is offered by this paper for stream data mining. Coreset is a popular and fundamental concept for k-means clustering in stream data. In each step, reduction determines a coreset of inputs, and represents the error, where P represents number of input points according to nested property of coreset. Hence, a bit reduction in error of final coreset gets n times more accurate. Therefore, this motivated the author to propose a new coreset-reduction algorithm. The proposed algorithm executed on the Covertype dataset, Spambase dataset, Census 1990 dataset, Bigcross dataset, and Tower dataset. Our algorithm outperforms with competitive algorithms like Streamkm[Formula: see text], BICO (BIRCH meets Coresets for k-means clustering), and BIRCH (Balance Iterative Reducing and Clustering using Hierarchies.

Download Full-text

Using Log Stream Data and Item Response Theory to Understand Proficiency in Collaboration

10.1007/978-3-030-86316-6_2 ◽

2021 ◽

pp. 23-40

Author(s):

Claire Scoular

Keyword(s):

Item Response Theory ◽

Item Response ◽

Stream Data ◽

Response Theory

Download Full-text

stream data
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

TinyLFU-Based Semi-Stream Cache Join for Near-Real-Time Data Warehousing

Exploring jump back behavior patterns and reasons in e-book system

A Scalable Big Data Framework for Real-Time Traffic Monitoring System

Estimating deflation representing people spreading in stream data and estimating a specific position

Estimating deflation representing people spreading in stream data and estimating a specific position

What Is Open Source Software (OSS) and What Is Big Data?

Stream Classification Algorithm Based on Decision Tree

Adaptive Interval Fuzzy Modeling from Stream Data and Application in Cryptocurrencies Forecasting

A Clustering Algorithm in Stream Data Using Strong Coreset

Using Log Stream Data and Item Response Theory to Understand Proficiency in Collaboration

Export Citation Format

stream dataRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

TinyLFU-Based Semi-Stream Cache Join for Near-Real-Time Data Warehousing

Exploring jump back behavior patterns and reasons in e-book system

A Scalable Big Data Framework for Real-Time Traffic Monitoring System

Estimating deflation representing people spreading in stream data and estimating a specific position

Estimating deflation representing people spreading in stream data and estimating a specific position

What Is Open Source Software (OSS) and What Is Big Data?

Stream Classification Algorithm Based on Decision Tree

Adaptive Interval Fuzzy Modeling from Stream Data and Application in Cryptocurrencies Forecasting

A Clustering Algorithm in Stream Data Using Strong Coreset

Using Log Stream Data and Item Response Theory to Understand Proficiency in Collaboration

stream data
Recently Published Documents