stream data
Recently Published Documents


TOTAL DOCUMENTS

618
(FIVE YEARS 196)

H-INDEX

23
(FIVE YEARS 4)

2022 ◽  
Author(s):  
M. Asif Naeem ◽  
Wasiullah Waqar ◽  
Farhaan Mirza ◽  
Ali Tahir

Abstract Semi-stream join is an emerging research problem in the domain of near-real-time data warehousing. A semi-stream join is basically a join between a fast stream (S) and a slow disk-based relation (R). In the modern era of technology, huge amounts of data are being generated swiftly on a daily basis which needs to be instantly analyzed for making successful business decisions. Keeping this in mind, a famous algorithm called CACHEJOIN (Cache Join) was proposed. The limitation of the CACHEJOIN algorithm is that it does not deal with the frequently changing trends in a stream data efficiently. To overcome this limitation, in this paper we propose a TinyLFU-CACHEJOIN algorithm, a modified version of the original CACHEJOIN algorithm, which is designed to enhance the performance of a CACHEJOIN algorithm. TinyLFU-CACHEJOIN employs an intelligent strategy which keeps only those records of $R$ in the cache that have a high hit rate in S. This mechanism of TinyLFU-CACHEJOIN allows it to deal with the sudden and abrupt trend changes in S. We developed a cost model for our TinyLFU-CACHEJOIN algorithm and proved it empirically. We also assessed the performance of our proposed TinyLFU-CACHEJOIN algorithm with the existing CACHEJOIN algorithm on a skewed synthetic dataset. The experiments proved that TinyLFU-CACHEJOIN algorithm significantly outperforms the CACHEJOIN algorithm.


2022 ◽  
Vol 9 (1) ◽  
Author(s):  
Boxuan Ma ◽  
Min Lu ◽  
Yuta Taniguchi ◽  
Shin’ichi Konomi

AbstractWith the increasing use of digital learning materials in higher education, the accumulated operational log data provide a unique opportunity to analyzing student learning behaviors and their effects on student learning performance to understand how students learn with e-books. Among the students’ reading behaviors interacting with e-book systems, we find that jump-back is a frequent and informative behavior type. In this paper, we aim to understand the student’s intention for a jump-back using user learning log data on the e-book materials of a course in our university. We at first formally define the “jump-back” behaviors that can be detected from the click event stream of slide reading and then systematically study the behaviors from different perspectives on the e-book event stream data. Finally, by sampling 22 learning materials, we identify six reading activity patterns that can explain jump backs. Our analysis provides an approach to enriching the understanding of e-book learning behaviors and informs design implications for e-book systems.


2022 ◽  
Author(s):  
Wilfried Yves Hamilton Adoni ◽  
Tarik Nahhal ◽  
Najib Ben Aoun ◽  
Moez Krichen ◽  
Mohammed Alzahrani

Abstract In this paper, we present a scalable and real-time intelligent transportation system based on a big data framework. The proposed system allows for the use of existing data from road sensors to better understand traffic flow, traveler behavior, and increase road network performance. Our transportation system is designed to process large-scale stream data to analyze traffic events such as incidents, crashes and congestion. The experiments performed on the public transportation modes of the city of Casablanca in Morocco reveal that the proposed system achieves a significant gain of time, gathers large-scale data from many road sensors and is not expensive in terms of hardware resource consumption.


Author(s):  
Shigeyoshi Ohno ◽  
Takuo Kikuchi ◽  
Masaki Endo ◽  
Takuma Toyoshima ◽  
Hiroshi Ishikawa

Author(s):  
Takuma Toyoshima ◽  
Masaki Endo ◽  
Takuo Kikuchi ◽  
Shigeyoshi Ohno ◽  
Hiroshi Ishikawa

2022 ◽  
pp. 77-118
Author(s):  
Richard S. Segall

This chapter discusses what Open Source Software is and its relationship to Big Data and how it differs from other types of software and its software development cycle. Open source software (OSS) is a type of computer software in which source code is released under a license in which the copyright holder grants users the rights to study, change, and distribute the software to anyone and for any purpose. Big Data are data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data can be discrete or a continuous stream data and is accessible using many types of computing devices ranging from supercomputers and personal workstations to mobile devices and tablets. It is discussed how fog computing can be performed with cloud computing for visualization of Big Data. This chapter also presents a summary of additional web-based Big Data visualization software.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Jinlin Guo ◽  
Haoran Wang ◽  
Xinwei Li ◽  
Li Zhang

Due to the rise of many fields such as e-commerce platforms, a large number of stream data has emerged. The incomplete labeling problem and concept drift problem of these data pose a huge challenge to the existing stream data classification methods. In this respect, a dynamic stream data classification algorithm is proposed for the stream data. For the incomplete labeling problem, this method introduces randomization and iterative strategy based on the very fast decision tree VFDT algorithm to design an iterative integration algorithm, and the algorithm uses the previous model classification result as the next model input and implements the voting mechanism for new data classification. At the same time, the window mechanism is used to store data and calculate the data distribution characteristics in the window, then, combined with the calculated result and the predicted amount of data to adjust the size of the sliding window. Experiments show the superiority of the algorithm in classification accuracy. The aim of the study is to compare different algorithms to evaluate whether classification model adapts to the current data environment.


Author(s):  
Manmohan Singh ◽  
Rajendra Pamula ◽  
Alok Kumar

There are various applications of clustering in the fields of machine learning, data mining, data compression along with pattern recognition. The existent techniques like the Llyods algorithm (sometimes called k-means) were affected by the issue of the algorithm which converges to a local optimum along with no approximation guarantee. For overcoming these shortcomings, an efficient k-means clustering approach is offered by this paper for stream data mining. Coreset is a popular and fundamental concept for k-means clustering in stream data. In each step, reduction determines a coreset of inputs, and represents the error, where P represents number of input points according to nested property of coreset. Hence, a bit reduction in error of final coreset gets n times more accurate. Therefore, this motivated the author to propose a new coreset-reduction algorithm. The proposed algorithm executed on the Covertype dataset, Spambase dataset, Census 1990 dataset, Bigcross dataset, and Tower dataset. Our algorithm outperforms with competitive algorithms like Streamkm[Formula: see text], BICO (BIRCH meets Coresets for k-means clustering), and BIRCH (Balance Iterative Reducing and Clustering using Hierarchies.


Sign in / Sign up

Export Citation Format

Share Document