scholarly journals Applying Apache Spark on Streaming Big Data for Health Status Prediction

2022 ◽  
Vol 70 (2) ◽  
pp. 3511-3527
Author(s):  
Ahmed Ismail Ebada ◽  
Ibrahim Elhenawy ◽  
Chang-Won Jeong ◽  
Yunyoung Nam ◽  
Hazem Elbakry ◽  
...  
Author(s):  
Muhammad Junaid ◽  
Shiraz Ali Wagan ◽  
Nawab Muhammad Faseeh Qureshi ◽  
Choon Sung Nam ◽  
Dong Ryeol Shin

2021 ◽  
Vol 464 ◽  
pp. 432-437
Author(s):  
Mario Juez-Gil ◽  
Álvar Arnaiz-González ◽  
Juan J. Rodríguez ◽  
Carlos López-Nozal ◽  
César García-Osorio
Keyword(s):  
Big Data ◽  

2018 ◽  
Vol 14 (1) ◽  
pp. 30-50 ◽  
Author(s):  
William H. Money ◽  
Stephen J. Cohen

This article analyzes the properties of unknown faults in knowledge management and Big Data systems processing Big Data in real-time. These faults introduce risks and threaten the knowledge pyramid and decisions based on knowledge gleaned from volumes of complex data. The authors hypothesize that not yet encountered faults may require fault handling, an analytic model, and an architectural framework to assess and manage the faults and mitigate the risks of correlating or integrating otherwise uncorrelated Big Data, and to ensure the source pedigree, quality, set integrity, freshness, and validity of the data. New architectures, methods, and tools for handling and analyzing Big Data systems functioning in real-time will contribute to organizational knowledge and performance. System designs must mitigate faults resulting from real-time streaming processes while ensuring that variables such as synchronization, redundancy, and latency are addressed. This article concludes that with improved designs, real-time Big Data systems may continuously deliver the value of streaming Big Data.


Author(s):  
J. Boehm ◽  
K. Liu ◽  
C. Alis

In the geospatial domain we have now reached the point where data volumes we handle have clearly grown beyond the capacity of most desktop computers. This is particularly true in the area of point cloud processing. It is therefore naturally lucrative to explore established big data frameworks for big geospatial data. The very first hurdle is the import of geospatial data into big data frameworks, commonly referred to as data ingestion. Geospatial data is typically encoded in specialised binary file formats, which are not naturally supported by the existing big data frameworks. Instead such file formats are supported by software libraries that are restricted to single CPU execution. We present an approach that allows the use of existing point cloud file format libraries on the Apache Spark big data framework. We demonstrate the ingestion of large volumes of point cloud data into a compute cluster. The approach uses a map function to distribute the data ingestion across the nodes of a cluster. We test the capabilities of the proposed method to load billions of points into a commodity hardware compute cluster and we discuss the implications on scalability and performance. The performance is benchmarked against an existing native Apache Spark data import implementation.


Sign in / Sign up

Export Citation Format

Share Document