scholarly journals Big Data Directed Acyclic Graph Model for Real-time COVID-19 Twitter Stream Detection

2021 ◽  
pp. 108404
Author(s):  
Bakhtiar Amen ◽  
Syahirul Faiz ◽  
Thanh-Toan Do
Author(s):  
Jahwan Koo ◽  
Nawab Muhammad Faseeh Qureshi ◽  
Isma Farah Siddiqui ◽  
Asad Abbas ◽  
Ali Kashif Bashir

Abstract Real-time data streaming fetches live sensory segments of the dataset in the heterogeneous distributed computing environment. This process assembles data chunks at a rapid encapsulation rate through a streaming technique that bundles sensor segments into multiple micro-batches and extracts into a repository, respectively. Recently, the acquisition process is enhanced with an additional feature of exchanging IoT devices’ dataset comprised of two components: (i) sensory data and (ii) metadata. The body of sensory data includes record information, and the metadata part consists of logs, heterogeneous events, and routing path tables to transmit micro-batch streams into the repository. Real-time acquisition procedure uses the Directed Acyclic Graph (DAG) to extract live query outcomes from in-place micro-batches through MapReduce stages and returns a result set. However, few bottlenecks affect the performance during the execution process, such as (i) homogeneous micro-batches formation only, (ii) complexity of dataset diversification, (iii) heterogeneous data tuples processing, and (iv) linear DAG workflow only. As a result, it produces huge processing latency and the additional cost of extracting event-enabled IoT datasets. Thus, the Spark cluster that processes Resilient Distributed Dataset (RDD) in a fast-pace using Random access memory (RAM) defies expected robustness in processing IoT streams in the distributed computing environment. This paper presents an IoT-enabled Directed Acyclic Graph (I-DAG) technique that labels micro-batches at the stage of building a stream event and arranges stream elements with event labels. In the next step, heterogeneous stream events are processed through the I-DAG workflow, which has non-linear DAG operation for extracting queries’ results in a Spark cluster. The performance evaluation shows that I-DAG resolves homogeneous IoT-enabled stream event issues and provides an effective stream event heterogeneous solution for IoT-enabled datasets in spark clusters.


2010 ◽  
Vol 2 (7) ◽  
pp. 469 ◽  
Author(s):  
Min Zhu ◽  
Wei Guo ◽  
Shilin Xiao ◽  
Anne Wei ◽  
Yaohui Jin ◽  
...  

Privacy is one of the biggest concerns that hinder most organizations to adopt the Big Data technology. Some mechanisms and systems have been set-up to handle huge databases. Nevertheless, the scalability requirements of Big Data are far beyond the conventional databases to handle. Therefore, it is trivial to set-up scalable privacy algorithms for conventional databases. Most data are stored in a single location, which means the records it keeps are open and effortlessly irrefutable to third parties. Centralized versions of this data make it too easy for hackers to attack. As such, in this paper, we present the opportunities and challenges of implementing cryptography and blockchain for privacy perseverance in Big Data, focusing in the healthcare domain. In addition, we also present some use cases of integrating Directed Acyclic Graph (DAG) into healthcare database framework for anchoring information security and privacy.


Healthcare ◽  
2020 ◽  
Vol 8 (3) ◽  
pp. 234 ◽  
Author(s):  
Hyun Yoo ◽  
Soyoung Han ◽  
Kyungyong Chung

Recently, a massive amount of big data of bioinformation is collected by sensor-based IoT devices. The collected data are also classified into different types of health big data in various techniques. A personalized analysis technique is a basis for judging the risk factors of personal cardiovascular disorders in real-time. The objective of this paper is to provide the model for the personalized heart condition classification in combination with the fast and effective preprocessing technique and deep neural network in order to process the real-time accumulated biosensor input data. The model can be useful to learn input data and develop an approximation function, and it can help users recognize risk situations. For the analysis of the pulse frequency, a fast Fourier transform is applied in preprocessing work. With the use of the frequency-by-frequency ratio data of the extracted power spectrum, data reduction is performed. To analyze the meanings of preprocessed data, a neural network algorithm is applied. In particular, a deep neural network is used to analyze and evaluate linear data. A deep neural network can make multiple layers and can establish an operation model of nodes with the use of gradient descent. The completed model was trained by classifying the ECG signals collected in advance into normal, control, and noise groups. Thereafter, the ECG signal input in real time through the trained deep neural network system was classified into normal, control, and noise. To evaluate the performance of the proposed model, this study utilized a ratio of data operation cost reduction and F-measure. As a result, with the use of fast Fourier transform and cumulative frequency percentage, the size of ECG reduced to 1:32. According to the analysis on the F-measure of the deep neural network, the model had 83.83% accuracy. Given the results, the modified deep neural network technique can reduce the size of big data in terms of computing work, and it is an effective system to reduce operation time.


Sign in / Sign up

Export Citation Format

Share Document